Comparison of Classification Methods on Imbalanced Protein-Protein Interaction Text Set

摘要：

Protein-protein interaction (PPI) network is essential to understand the fundamental processes governing cell biology. It is well known that document classification systems are potential to accelerate the curation process by retrieving PPIrelated documents. However, it is usually a case that there is an imbalanced two class data set on text classification. Learning from imbalanced data sets is an important challenge to the machine learning community. In this paper, we compare the performance of several document classifiers on one PPI document set and vary the size of the number of positives and the ratio of the number of positives to the number of negatives on training set and testing set in the experiment Through the experiment, we try to find what kind of classification algorithm is suitable for imbalanced PPI document classification.

关键词： Protein-Protein Interaction Imbalanced text classification Machine Learning

作者: Guixian Xu Xu Gao

作者单位: College of Information Engineering, Minzu University of China, Beijing, China 100081 Minority Langu North China Grid Company Limited, Beijing, China 100053

会议类型: 国际会议

会议名称: 2011 International Conference on Bioinformatics and Computational Biology(ICBCB 2011)(2011年生物信息学与计算生物学国际会议)

会议地点: 海口

会议语种:英文

页码: 105-109

在线出版日期: 2011-02-22（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Comparison of Classification Methods on Imbalanced Protein-Protein Interaction Text Set