An Improved Information Gain Algorithm Based on Relative Document Frequency Distribution
Feature selection algorithm plays an important role in text categorization.Considering some drawbacks proposed from traditional and recently improved information gain(IG)approach,an improved IG feature selection method based on relative document frequency distribution is proposed,which combines reducing the impact of unbalanced data sets and low-frequency characteristics,the frequency distribution of features within category and the rela-tive frequency document distribution of features among different categories.The experimental results of NLPCC-ICCPOL 2016 stance detection in Chinese microblogs show that the performance of the improved method is better than traditional IG approach and another improved method in feature selection.
Feature selection Information gain Relative document frequency distribution Low-frequency Characteristic
Jian Peng Xiao-Hua Yang Chun-Ping Ouyang Yong-Bin Liu
School of Computer Science and Technology,University of South China,Hengyang 421001,China
国际会议
第五届自然语言处理与中文计算会议(NLPCC-ICCPOL2016)
昆明
英文
1-8
2016-12-02(万方平台首次上网日期,不代表论文的发表时间)