会议专题

Improved term selection algorithm based on variance in text categorization

  This article improves the algorithm of term weighting in automated text classification.The traditional TFIDF algorithm is a common method that is used to measure term weighting in text classification.However,the algorithm does not take the distribution of terms in inter-class.In order to solve the problem,variance which describes the distribution of terms in inter-class and intra-class is used to revise TFIDF algorithm.This article mainly researched about the construction of LFHW term sets and new approaches to term weighting,These new approaches are also applied to the hierarchical classification system.Compared with traditional TFIDF algorithm,the results of simulation experiment have demonstrated that the improved TFIDF algorithm can get better classification results.

variance text classification term selection

Ran Li Xianjiu Guo

Information engineering collegeDalian Ocean Univercity Dalian,China

国际会议

2013 2nd International Conference on Systems Engineering and Modeling(ICSEM-13)(2013年第二届系统工程与建模国际会议)

北京

英文

753-756

2013-04-19(万方平台首次上网日期,不代表论文的发表时间)