Research on Chinese Text Automatic Categorization Based on VSM
Automatic text classifying is an import application of the information processing technology. This paper introduces the key techniques of Chinese text categorization such as text preprocessing, feature selection, feature representation, training and classifying algorithm, especially analyses the current most important several feature selection methods with emphasis. A Chinese text classifier based on KNN algorithm was developed. The system can preferably implement Chinese automatic text categorization and has a higher quality. We also use this classifier to compare several feature selection methods. In the end, we utilize the experiment results to prove the importance role of feature selection in text categorization.
text categorization vector space model KNN algorithm
TONG Xiao-Jun CUI Ming-Gen SONG Guo-Long
School of Computer Science & Technology, Harbin Institute of Technology,Weihai ,264209, CHINA College of Science, Harbin Institute of Technology,Weihai,264209, CHINA School of Information Science & Engineering, Northeastern University, Shenyang 110004, CHINA
国际会议
上海
英文
2007-09-21(万方平台首次上网日期,不代表论文的发表时间)