Research and Improvement of feature words weight based on TFIDF Algorithm
With the development of cloud era,more and more people have been attracted by Big data.More and more applications involve large data.Analysis methods of large data is particularly important.This paper mainly analyzes and research feature words weight which are used in unstructured data classification of big data.Firstly,we combine the traditional feature words weight calculation method and analyze the shortcoming of traditional TF-IDF algorithm,It doesnt think about feature words distribution.It can lead that some feature words weight which dont have strong discrimination have heavier weight.Aiming at the shortage of TFIDF algorithm,combining with practical effect to text classification,this paper modify traditional TFIDF algorithm formula,excluding the inner impact to disturb characteristic,adding the concept of intra-class dispersion,presenting a new TFIDF algorithm.In the experiment,experimental data comes from People news about the financial,military,entertainment and sports four categories,respectively calculating test value by using the traditional TFIDF algorithm and improved TFIDF algorithm.Results show that improved TFIDF algorithm has higher accuracy than traditional TFIDF algorithms.
TFIDF algorithm Text classification Feature selection Feature weighting
Aizhang Guo Tao Yang
Qilu University of Technology Jinan250353,China
国际会议
重庆
英文
415-419
2016-03-20(万方平台首次上网日期,不代表论文的发表时间)