Identification of DNA-binding residues of a protein from its primary sequence

摘要：

Identification of DNA-binding residues in protein has made important function in several areas such as posttranscriptional regulation and protein function. In our work, we propose a method which combines a novel hybrid feature with the random forest (RF) algorithm to predict DNA-binding residues in protein sequences. The hybrid feature contains the second structure feature;predicted solvent accessibility and novel feature which including evolutionary information combining physicochemical properties. Furthermore, performance comparison of each feature indicates that the novel feature contributes most to the prediction improvement. The result demonstrates that our model achieves a value of 0.7238 for Matthews correlation coefficient (MCC) and 92.67％ overall accuracy (ACC) with a 78.96％ sensitivity (SE) and 94.56％ specificity (SP), respectively. It is clearly that the prediction model has significant better prediction performance of DNA-binding sites in proteins.

关键词： Random forest RNA-binding residues Position specific scoring matrix

作者: Xin Ma Lefu Hu

作者单位: Golden Audit College Nanjing Audit University Nanjing 210029, P. R. China Physical Education Department Nanjing Audit University Nanjing 210029, P. R. China

会议类型: 国际会议

会议名称: 2012 Fifth International Symposium on Computational Intelligence and Design 第五届计算智能与设计国际会议 ISCID 2012

会议地点: 杭州

会议语种:英文

页码: 290-293

在线出版日期: 2012-10-28（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Identification of DNA-binding residues of a protein from its primary sequence