会议专题

Classification of Splice Junction DNA sequence through Data mining techniques

Data mining on DNA sequences is gaining immense importance in state-of-the-art research as researchers and clinicians are placing more emphasis on detecting genetic markers for disease prediction and inventing new drugs for therapeutic purpose. This paper highlights the role played by machine learning algorithms in classifying a given splice gene sequence into three classes (Intron-Exon, ExonIntron, neither) that clearly differentiate between the DNA that is needed for protein creation and the superfluous DNA that is removed during protein generation. This research work involves the execution of nine classification algorithms on the Splice junctions of 3190 DNA sequences taken from the Keel data repository, each having 60 nucleotides, to detect the boundaries between introns and exons that will further aid in the process of analyzing genetic markers and understanding the mechanism of protein synthesis. The Quinlans C4.5 algorithm and the Random Tree classification algorithm reveal 99.97% classifier accuracy on this dataset. The validity of the results has been verified by classification of test data sets using the crafted classification framework. This work will enable accurate prediction of splice junctions in a DNA sequence whose class label is unknown.

Data mining Classification splice gene sequence Clinical data

Shomona Gracia Jacob R.Geetha Ramani P.Nancy

Research Scholar, Dept. of Computer Science and Engineering, Rajalakshmi Engineering College, Thanda Professor& Head, Dept. of Computer Science and Engineering, Rajalakshmi Engineering College,Thandala

国际会议

2012 International Conference on Future Communication and Computer Technology(2012未来通信与计算机技术国际会议ICFCCT 2012)

哈尔滨

英文

143-148

2012-05-19(万方平台首次上网日期,不代表论文的发表时间)