An Effective Data Mining Technique for the Multi-Class Protein Sequence Classification

摘要：

One way to understand the molecular mechanism of a cell is to understand the function of each protein encoded in its genome. The function of a protein is largely dependent on the three-dimensional structure the protein assumes after folding. Since the determination of three-dimensional structure experimentally is difficult and expensive, an easier and cheaper approach is for one to look at the primary sequence of a protein and to determine its function by classifying the sequence into the corresponding functional family. In this paper, we propose an effective data mining technique for the multiclass protein sequence classification. For experimentations, the proposed technique has been tested with different sets of protein sequences. Experimental results show that it outperforms other existing protein sequence classifiers and can effectively classify proteins into their corresponding functional families.

关键词： Protein sequence classification bioinformatics data mining

作者: Patrick C.H. Ma Keith C.C. Chan

作者单位: Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China

会议类型: 国际会议

会议名称: The 2nd International Conference on Bioinformatics and Biomedical Engineering(iCBBE 2008)(第二届生物信息与生物医学工程国际会议)

会议地点: 上海

会议语种:英文

页码: 486-489

在线出版日期: 2008-05-16（万方平台首次上网日期，不代表论文的发表时间）

会议专题

An Effective Data Mining Technique for the Multi-Class Protein Sequence Classification