Cluster-based Majority Under-Sampling Approaches for Class Imbalance Learning
The class imbalance problem usually occurs in real applications.The class imbalance is that the amount of one class may be much less than that of another in training set.Under-sampling is a very popular approach to deal with this problem.Under-sampling approach is very efficient,it only using a subset of the majority class.The drawback of under-sampling is that it throws away many potentially useful majority class examples.To overcome this drawback,we adopt an unsupervised learning technique for supervised learning.We proposes cluster-based majority undersampling approaches for selecting a representative subset from the majority class.Compared to undersampling,cluster-based under-sampling can effectively avoid the important information loss of majority class.We adopt two methods,to select representative subset from k clusters with certain-proportions,and then use the representative subset and the all minority class samples as training data to improve accuracy over minority and majority classes.In the paper,we compared the behaviors of our approaches with the traditional random under-sampling approach on ten UCI repository datasets using the following classifiers:k-nearest neighbor and Na(i)ve Bayes classifier.Recall,Precision,F-measure,G-mean and BACC (balance accuracy) are used for evaluating performance of classifiers.Experimental results show that our cluster-based majority under-sampling approaches outperform the random.under-sampling approach.Our approaches attain better overall performance on k-nearest neighbor classifier compared to Naive Bayes classifier.
classification clustering under-sampling class imbalance learning
Yan-Ping Zhang Li-Na Zhang Yong-Cheng Wang
School of Computer Science and Technology,Anhui University,Hefei,China
国际会议
重庆
英文
400-404
2010-09-17(万方平台首次上网日期,不代表论文的发表时间)