会议专题

A Text Categorization Method Based on Features clustering

  Choosing Features of a text is an important part of text categorization.Its result can affect the quality and efficiency of the text categorizer.Since there are usually thousands of features of a text,it always needs to reduce the dimension of the feature space.Considering the semantic relationship among words,a new text categorization method based on features clustering is proposed in this paper.This method first uses word segmentation to split texts into words,then,remove stop words and words with low information,and then calculate the distribution of words in these texts to construct a matrix of co-occurrence words.After that,cluster algorithms are employed to reduce the dimension of the feature space.Finally some experiments are carried out on two corpuses using several text categorization algorithms.The results demonstrate that this new method can not only improve the precision and recall of text categorization,but also increase the efficiency.

text categorization feature selection features clustering matrix of co-occurrence words

Zhibin Feng Pingjian Zhang Juanjuan Zhao

School of Computer Science & Engineering,South China University of Technology,China School of Computer Software,South China University of Technology,China

国际会议

2012 2nd international Conference on Materials Science and Information Technology(2012第二届材料科学与信息技术国际会议)(MSIT2012)

西安

英文

1090-1094

2012-08-24(万方平台首次上网日期,不代表论文的发表时间)