会议专题

Prokaryote Gene Data Classifier Design Based on SVM

Gene Recognition is one of the important problems in bioinformatics, including a lot of classic experiments, theory and arithmetic research.The E.coli K12 whole genome sequence and gene mark files from GeneBank were analyzed for later gene prediction.First the gene four distribution types were analyzed.Then the non-coding samples were generated from intervals between the discrete genes and the training set was constructed with all gene samples and nongene fragments.Thirdly the GC ratio and length features probability density of the training samples were plotted using Parzen window method.The average GC ratio of gene and non-coding samples are 0.51 and 0.45 separately.The average length of gene and non-coding samples are 954 and 164 nucleotides separately.At last Fisher linear classifier and Support vector machine (SVM) were used to classify the gene and nongene patterns.The results show that the least squares support vector machines error rate is 14.8%, which is 1.3% less than fisher classifier.

Gene recognition Fisher classifier least squares support vector machines GC ratio component

LI Xiao-xia SUN Bo HAN Xue-mei ZHANG Ji-hong

School of Information Engineering Southwest University of Science and Technology Mianyang,621010,China

国际会议

The 3rd International Conference on Bioinformatics and Biomedical Engineering(iCBBE 2009)(第三届生物信息与生物医学工程国际会议)

北京

英文

1-4

2009-06-11(万方平台首次上网日期,不代表论文的发表时间)