A Text Categorization Method Based on Features clustering

摘要：

　　Choosing Features of a text is an important part of text categorization.Its result can affect the quality and efficiency of the text categorizer.Since there are usually thousands of features of a text,it always needs to reduce the dimension of the feature space.Considering the semantic relationship among words,a new text categorization method based on features clustering is proposed in this paper.This method first uses word segmentation to split texts into words,then,remove stop words and words with low information,and then calculate the distribution of words in these texts to construct a matrix of co-occurrence words.After that,cluster algorithms are employed to reduce the dimension of the feature space.Finally some experiments are carried out on two corpuses using several text categorization algorithms.The results demonstrate that this new method can not only improve the precision and recall of text categorization,but also increase the efficiency.

关键词： text categorization feature selection features clustering matrix of co-occurrence words

作者: Zhibin Feng Pingjian Zhang Juanjuan Zhao

作者单位: School of Computer Science & Engineering,South China University of Technology,China School of Computer Software,South China University of Technology,China

会议类型: 国际会议

会议名称: 2012 2nd international Conference on Materials Science and Information Technology(2012第二届材料科学与信息技术国际会议)(MSIT2012)

会议地点: 西安

会议语种:英文

页码: 1090-1094

在线出版日期: 2012-08-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Text Categorization Method Based on Features clustering