Web Document Classification based on SVM

摘要：

With the rapid growth of web information, web document classification has become an important research field for the management of Internet information. Most of the existing methods are based on traditional statistics and they are effective only when the sample size tends to be infinite.They may not work well in practical case with limited samples and it will easily lead to the problem of over-fitting.In order to effectively classify web pages, the paper studies the approach of web document classification in Vector Space Model and feature extraction, and analysis the selection of kernel functions. Based on Support Vector Machine (SVM), a web document classification model and algorithm is proposed. The experiment shows that it can not only improve the training efficiency, but also has good precision.

关键词： Web document classification Support vector machines Kernel function Feature selection, Statistical learning theory

作者: Qiang Niu Zhixiao Wang Dai Chen

作者单位: School of Computer Science and Technology, China University of Mining and Technology.XuZhou, JiangSu, 221008, CHINA

会议类型: 国际会议

会议名称: 2006 International Symposium on Distributed Computing and Applications to Business,Engineering and Science(2006年国际电子、工程及科学领域的分布式计算应用学术研讨会)

会议地点: 杭州

会议语种:英文

页码: 619-622

在线出版日期: 2006-10-12（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Web Document Classification based on SVM