会议专题

MINING SPECIES-SPECIFIC SUBSEQUENCES IN BACTERIA TRANSCRIPTION TERMINATORS

Transcription Terminators (TT) play an important role in bacterial RNA transcription. Some bacteria are known to have Species-Specific Subsequences (SSS) in their TTs, which might contain useful clues to bacterial evolution. Given DNA sequences for known TTs, how to effectively find the SSS is an interesting yet not well-studied computational problem. In this paper, we present a three-step method to identify the SSS by using Support Vector Machines (SVM). Firstly, we find out all frequent subsequences by using generalized suffix trees. Secondly, we use the subsequences as features to vectorize the DNA sequences and trains a SVM classifier. Finally, we output from the SVM classifier the SSS based on a defined measure of subsequence specificity. Our experiments show that the SSS are very close to the known SSS. We conclude that our method exhibits a novel application of classification to biology and is applicable to similar problems.

sequence mining transcription terminators species-specific sequences suffiz tree support vector machine

Baohua Gu Yi Sun

School of Computing Science, Simon Fraser University 8888 University Drive, Burnaby, BC, Canada

国际会议

China-Ireland International Conference on Information and Communications Technologies 2008(2008 中国-爱尔兰信息与通信技术国际会议 CIICT 2008)

北京

英文

1-5

2008-09-26(万方平台首次上网日期,不代表论文的发表时间)