A Text Categorization Method Based on Features clustering
Choosing Features of a text is an important part of text categorization.Its result can affect the quality and efficiency of the text categorizer.Since there are usually thousands of features of a text,it always needs to reduce the dimension of the feature space.Considering the semantic relationship among words,a new text categorization method based on features clustering is proposed in this paper.This method first uses word segmentation to split texts into words,then,remove stop words and words with low information,and then calculate the distribution of words in these texts to construct a matrix of co-occurrence words.After that,cluster algorithms are employed to reduce the dimension of the feature space.Finally some experiments are carried out on two corpuses using several text categorization algorithms.The results demonstrate that this new method can not only improve the precision and recall of text categorization,but also increase the efficiency.
text categorization feature selection features clustering matrix of co-occurrence words
Zhibin Feng Pingjian Zhang Juanjuan Zhao
School of Computer Science & Engineering,South China University of Technology,China School of Computer Software,South China University of Technology,China
国际会议
西安
英文
1090-1094
2012-08-24(万方平台首次上网日期,不代表论文的发表时间)