Using PLS Variable Selection Method to Build the Model between the Number of tria-coupled amino acid and the Number of Protein Secondary Structure
The relation between protein sequence and protein secondary structure is very important, which has been studied by the method of building the model. Based on the models (between pair-coupled amino acid and protein secondary structure) in literature, the models between the number of tria-coupled amino acid in protein sequence and the number of protein secondary structure have been built The models are more accurately reflect the relation between protein sequence and protein secondary structure. The models are more suitable to deal with the data in which the length of protein sequence varies a lot Comparing with the models between pair-coupled amino acid and protein secondary structure, the models contain more information about coupling effect among varies kinds of amino acids, and therefore are of the higher fitting accuracy. The data set in the study is very large, because the kinds of tria-coupled amino acid in protein sequence are very big (4200) and the number of samples from DSSP database is also very large (11600). The results indicate that the PLs variable selection method is effective to deal with the huge data modeling problem in which the number of variables is 4200 and the number of samples is 11600.
prediction of protein secondary structure PLS variable selection huge data modeling Triacoupled amino acid
Zhu Eryi
Key Laboratory of Analytical Sciences and Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
国际会议
The 6th International Conference on Partial Least Squares and Related Methods(第六届偏最小二乘及相关方法国际会议)
北京
英文
355-359
2009-09-04(万方平台首次上网日期,不代表论文的发表时间)