会议专题

Complete Gini-Index Text (GIT) Feature-Selection Algorithm for Text Classification

The recently introduced Gini-Index Text (GIT) feature-selection algorithm for text classification, through incorporating an improved Gini Index for better feature-selection performance, has some drawbacks. Specifically, the algorithm, under real-world experimental conditions,concentrates feature values to one point and be inadequate for selecting representative features. As such, good representative features cannot be estimated, and neither, moreover, can good performance be achieved in unbalanced text classification.Therefore, we suggest a new complete GIT feature-selection algorithm for text classification. The new algorithm, according to experimental results, could obtain unbiased feature values, and could eliminate many irrelevant and redundant features from feature subsets while retaining many representative features.Furthermore, the new algorithm, compared with the original version, demonstrated a notably improved overall classification performance.

component Gini-Index feature selection text classification

Heum Park Soonho Kwon Hyuk-Chul Kwon

AI Lab. Dept. of Computer Science Pusan National University Busan. Korea

国际会议

The 2nd International Conference on Software Engineering and Data Mining(IEEE 第二届国际软件工程和数据挖掘学术大会 SEDM 2010)

成都

英文

301-306

2010-06-23(万方平台首次上网日期,不代表论文的发表时间)