Algorithm for finding coding signal using homogeneous Markov chains independently for three codon positions
Many currently used algorithms for protein coding sequences require large learning sets of true genes to estimate sensible values for used parameters which are necessary to make the prediction reasonable. They also fail in recognition of short genes which usually contain weak coding signal. To avoid these problems, we worked out a new algorithm for finding protein coding potential in prokaryotic genomes. This algorithm uses homogeneous Markov chain for modeling nucleotide transition between fixed positions in codons thereby reduces order of Markov chain retaining simultaneously information on dependence between nucleotides in sequence on relatively long distances. We tested performance of this algorithm in relationship to size of the learning set with true and false positive rates for different model orders. We also made some comparisons between our algorithm and commonly used GeneMark. The presented algorithm works better especially for smaller learning sets.
ORF gene finding Markov chains
Pawel Blazej Pawel Mackiewicz StanislawCebrat
Department of Genomics, Faculty of biotechnology University of Wroclawul. Przybyszewskiego 63/77,51-148 Wroclaw, Poland
国际会议
海口
英文
20-24
2011-02-22(万方平台首次上网日期,不代表论文的发表时间)