会议专题

Research and Improvement of feature words weight based on TFIDF Algorithm

  With the development of cloud era,more and more people have been attracted by Big data.More and more applications involve large data.Analysis methods of large data is particularly important.This paper mainly analyzes and research feature words weight which are used in unstructured data classification of big data.Firstly,we combine the traditional feature words weight calculation method and analyze the shortcoming of traditional TF-IDF algorithm,It doesnt think about feature words distribution.It can lead that some feature words weight which dont have strong discrimination have heavier weight.Aiming at the shortage of TFIDF algorithm,combining with practical effect to text classification,this paper modify traditional TFIDF algorithm formula,excluding the inner impact to disturb characteristic,adding the concept of intra-class dispersion,presenting a new TFIDF algorithm.In the experiment,experimental data comes from People news about the financial,military,entertainment and sports four categories,respectively calculating test value by using the traditional TFIDF algorithm and improved TFIDF algorithm.Results show that improved TFIDF algorithm has higher accuracy than traditional TFIDF algorithms.

TFIDF algorithm Text classification Feature selection Feature weighting

Aizhang Guo Tao Yang

Qilu University of Technology Jinan250353,China

国际会议

2016IEEE第二届信息技术、网络、电子及自动化控制会议

重庆

英文

415-419

2016-03-20(万方平台首次上网日期,不代表论文的发表时间)