会议专题

Minority Split and Gain Ratio for a Class Imbalance

A decision tree is one of most popular classifiers that classifies a balanced data set effectively. For an imbalanced data set, a standard decision tree tends to misclassify instances of a class having tiny number of samples. In this paper, we modify the decision tree induction algorithm by performing a ternary split on continuous-valued attributes focusing on distribution of minority class instances. The algorithm uses the minority variance to rank candidates of the high gain ratio, then it chooses the candidate with the minimum minority entropy. From our experiments with data sets from UCI and Statlog repository, this method achieves the better performance comparing with C4.5 using only gain ratio for imbalanced data sets.

Gain Ratio Minority split Decision tree Class imbalance Classification

Kesinee Boonchuay Krung Sinapiromsaran Chidchanok Lursinsap

Department of Mathematics, Faculty of Science Chulalongkom University Bangkok, Thailand

国际会议

2011 Eighth International Conference on Fuzzy System and Knowledge Discovery(第八届模糊系统与知识发现国际会议 FSKD 2011)

上海

英文

2114-2118

2011-07-26(万方平台首次上网日期,不代表论文的发表时间)