Chaos game representation for discriminating thermophilic from mesophilic protein sequences
Can sequence analysis tell us about the function of protein? A basic question in protein science is which kind of proteins extent thermostability. Chaos game representation (CGR) can investigate the patterns hiding in protein sequence, visually revealing previously unknown structure. In this paper, we convert every protein sequence into a 20-dimensional vector by CGR algorithm, and based on these vectors we discriminate thermophiles from mesophiles using support vector machine (SVM). The overall accuracy achieves 100% in resubstitution test, and 87.12% in Jackknife test. Moreover, Matthews correlation coefficients (MCC) is 0.745.
and Phrases CGR Thermophilic mesophilic Protein sequence Support vector machine
Xue-Hai Hu Jing-Bo Xia Xiao-Hui Niu Xuan Ma Chao-Hong Song Feng Shi
College of Science Huazhong Agricultural University Wuhan,Hubei 430070
国际会议
北京
英文
1-4
2009-06-11(万方平台首次上网日期,不代表论文的发表时间)