Comparison of Classification Methods on Imbalanced Protein-Protein Interaction Text Set
Protein-protein interaction (PPI) network is essential to understand the fundamental processes governing cell biology. It is well known that document classification systems are potential to accelerate the curation process by retrieving PPIrelated documents. However, it is usually a case that there is an imbalanced two class data set on text classification. Learning from imbalanced data sets is an important challenge to the machine learning community. In this paper, we compare the performance of several document classifiers on one PPI document set and vary the size of the number of positives and the ratio of the number of positives to the number of negatives on training set and testing set in the experiment Through the experiment, we try to find what kind of classification algorithm is suitable for imbalanced PPI document classification.
Protein-Protein Interaction Imbalanced text classification Machine Learning
Guixian Xu Xu Gao
College of Information Engineering, Minzu University of China, Beijing, China 100081 Minority Langu North China Grid Company Limited, Beijing, China 100053
国际会议
海口
英文
105-109
2011-02-22(万方平台首次上网日期,不代表论文的发表时间)