Classification of Splice Junction DNA sequence through Data mining techniques
Data mining on DNA sequences is gaining immense importance in state-of-the-art research as researchers and clinicians are placing more emphasis on detecting genetic markers for disease prediction and inventing new drugs for therapeutic purpose. This paper highlights the role played by machine learning algorithms in classifying a given splice gene sequence into three classes (Intron-Exon, ExonIntron, neither) that clearly differentiate between the DNA that is needed for protein creation and the superfluous DNA that is removed during protein generation. This research work involves the execution of nine classification algorithms on the Splice junctions of 3190 DNA sequences taken from the Keel data repository, each having 60 nucleotides, to detect the boundaries between introns and exons that will further aid in the process of analyzing genetic markers and understanding the mechanism of protein synthesis. The Quinlans C4.5 algorithm and the Random Tree classification algorithm reveal 99.97% classifier accuracy on this dataset. The validity of the results has been verified by classification of test data sets using the crafted classification framework. This work will enable accurate prediction of splice junctions in a DNA sequence whose class label is unknown.
Data mining Classification splice gene sequence Clinical data
Shomona Gracia Jacob R.Geetha Ramani P.Nancy
Research Scholar, Dept. of Computer Science and Engineering, Rajalakshmi Engineering College, Thanda Professor& Head, Dept. of Computer Science and Engineering, Rajalakshmi Engineering College,Thandala
国际会议
哈尔滨
英文
143-148
2012-05-19(万方平台首次上网日期,不代表论文的发表时间)