会议专题

A Novel Feature Weight Algorithm for Text Categorization

With the development of the web, large numbers of documents are put onto the Internet. More and more digital libraries, news sources and inner data of companies are available. Automatic text categorization becomes more and more important for dealing with massive data. However, text preprocessing is still the bottleneck of text categorization based on Vector Space Model (VSM). The result of text preprocessing directly affects the performance and precision of categorization. Moreover, feature selection and feature weight become the major obstacles of text preprocessing. In this paper, we mainly focus on feature weight. We present a novel feature weight algorithm----TF-Gini that can improve the categorization performance significantly. The experiment results verify the effectiveness of this algorithm.

Wenqian SHANG Hongbin DONG Haibin ZHU Yongbin WANG

School of Computer,Communication University of China,100024,China National Science Park of Harbin Engineering University,Harbin,150001,China Senior Member,IEEE,Dept.of Computer Science,Nipissing University,North Bay,ON P1B 8L7,Canda

国际会议

The 2008 IEEE International Conference on Natural Language Processing and Knowledge Engineering(IEEE NLP-KE 2008)(2008IEEE自然语言处理与知识工程国际会议)

北京

英文

2008-10-19(万方平台首次上网日期,不代表论文的发表时间)