Identification of DNA-binding residues of a protein from its primary sequence
Identification of DNA-binding residues in protein has made important function in several areas such as posttranscriptional regulation and protein function. In our work, we propose a method which combines a novel hybrid feature with the random forest (RF) algorithm to predict DNA-binding residues in protein sequences. The hybrid feature contains the second structure feature;predicted solvent accessibility and novel feature which including evolutionary information combining physicochemical properties. Furthermore, performance comparison of each feature indicates that the novel feature contributes most to the prediction improvement. The result demonstrates that our model achieves a value of 0.7238 for Matthews correlation coefficient (MCC) and 92.67% overall accuracy (ACC) with a 78.96% sensitivity (SE) and 94.56% specificity (SP), respectively. It is clearly that the prediction model has significant better prediction performance of DNA-binding sites in proteins.
Random forest RNA-binding residues Position specific scoring matrix
Xin Ma Lefu Hu
Golden Audit College Nanjing Audit University Nanjing 210029, P. R. China Physical Education Department Nanjing Audit University Nanjing 210029, P. R. China
国际会议
杭州
英文
290-293
2012-10-28(万方平台首次上网日期,不代表论文的发表时间)