A data mining approach to predict protein secondary structure

摘要：

In bioinformatics. Proteins are coded by strings, called primary structures. Biologists have long enough gathered these primary structures in large databases. Numerous experiments and analyses of primary structures have revealed that the protein primary structure closely correlates with the protein second structure. In this paper, we present a data mining approach based on machine learning techniques to predict protein second structure. Based on majority voting mechanism, the approach combine the predictions of homology analysis classifier, Support vector machine(SVM) classifier and modified Knowledge Discovery in Databases (KDD~*) process. They are validated with 2 different datasets. Their predictive accuracy results outperform the best secondary structure predictors by 2.00％ on average.

关键词： protein structure prediction protein secondary structure data mining

作者: Bingru Yang Haifeng Sui QuWu Lijun Wang

作者单位: School of Information Engineering University of Science and Technology Beijing, Beijing, China

会议类型: 国际会议

会议名称: The 2010 International Conference on Computer Application and System Modeling(2010计算机应用与系统建模国际会议 ICCASM 2010)

会议地点: 太原

会议语种:英文

页码: 589-593

在线出版日期: 2010-10-22（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A data mining approach to predict protein secondary structure