会议专题

Cluster-based Majority Under-Sampling Approaches for Class Imbalance Learning

The class imbalance problem usually occurs in real applications.The class imbalance is that the amount of one class may be much less than that of another in training set.Under-sampling is a very popular approach to deal with this problem.Under-sampling approach is very efficient,it only using a subset of the majority class.The drawback of under-sampling is that it throws away many potentially useful majority class examples.To overcome this drawback,we adopt an unsupervised learning technique for supervised learning.We proposes cluster-based majority undersampling approaches for selecting a representative subset from the majority class.Compared to undersampling,cluster-based under-sampling can effectively avoid the important information loss of majority class.We adopt two methods,to select representative subset from k clusters with certain-proportions,and then use the representative subset and the all minority class samples as training data to improve accuracy over minority and majority classes.In the paper,we compared the behaviors of our approaches with the traditional random under-sampling approach on ten UCI repository datasets using the following classifiers:k-nearest neighbor and Na(i)ve Bayes classifier.Recall,Precision,F-measure,G-mean and BACC (balance accuracy) are used for evaluating performance of classifiers.Experimental results show that our cluster-based majority under-sampling approaches outperform the random.under-sampling approach.Our approaches attain better overall performance on k-nearest neighbor classifier compared to Naive Bayes classifier.

classification clustering under-sampling class imbalance learning

Yan-Ping Zhang Li-Na Zhang Yong-Cheng Wang

School of Computer Science and Technology,Anhui University,Hefei,China

国际会议

2010 2nd IEEE International Conference on Information and Financial Engineering(2010年第二届IEEE信息与金融工程国际会议 ICIFE 2010)

重庆

英文

400-404

2010-09-17(万方平台首次上网日期,不代表论文的发表时间)