A Feature Weight Algorithm for Text Classification Based on Class Information
TFIDF algorithm was used for feature weighting in text classification.But the reault of classification was not very well because of lack of class information in feature Weighting.The known clan information in the training set was used to improve the traditional TFIDF feature weight algorithm.Class distinction ability and class deacription ability were introduced,reapectively expressed by inverse class frequency and term frequency in class,document frequency in class.A new feature weight algorithm besed on class information,TF_IDT,was proposed.Nalve Bayea classifier was used to test the algorithm.The precision,recall and F1 measure were significantly increased.Macro F1 measure raise by 6.46%.It was proved to be useful for improving text clarification to use class information in feature weighting.In addition,the computational complexity of the proposed algorithm was lower and more suitable for use in fields of limited computing capability.
text classification feature weight inverse class frequency term frequency in class document frequency in class
LI Yong-fei
Department of Computer North China Institute of Science and Technology Beijing, China
国际会议
太原
英文
930-932
2012-12-08(万方平台首次上网日期,不代表论文的发表时间)