Web Document Classification based on SVM
With the rapid growth of web information, web document classification has become an important research field for the management of Internet information. Most of the existing methods are based on traditional statistics and they are effective only when the sample size tends to be infinite.They may not work well in practical case with limited samples and it will easily lead to the problem of over-fitting.In order to effectively classify web pages, the paper studies the approach of web document classification in Vector Space Model and feature extraction, and analysis the selection of kernel functions. Based on Support Vector Machine (SVM), a web document classification model and algorithm is proposed. The experiment shows that it can not only improve the training efficiency, but also has good precision.
Web document classification Support vector machines Kernel function Feature selection, Statistical learning theory
Qiang Niu Zhixiao Wang Dai Chen
School of Computer Science and Technology, China University of Mining and Technology.XuZhou, JiangSu, 221008, CHINA
国际会议
杭州
英文
619-622
2006-10-12(万方平台首次上网日期,不代表论文的发表时间)