Classification of Splice Junction DNA sequence through Data mining techniques

摘要：

Data mining on DNA sequences is gaining immense importance in state-of-the-art research as researchers and clinicians are placing more emphasis on detecting genetic markers for disease prediction and inventing new drugs for therapeutic purpose. This paper highlights the role played by machine learning algorithms in classifying a given splice gene sequence into three classes (Intron-Exon, ExonIntron, neither) that clearly differentiate between the DNA that is needed for protein creation and the superfluous DNA that is removed during protein generation. This research work involves the execution of nine classification algorithms on the Splice junctions of 3190 DNA sequences taken from the Keel data repository, each having 60 nucleotides, to detect the boundaries between introns and exons that will further aid in the process of analyzing genetic markers and understanding the mechanism of protein synthesis. The Quinlans C4.5 algorithm and the Random Tree classification algorithm reveal 99.97％ classifier accuracy on this dataset. The validity of the results has been verified by classification of test data sets using the crafted classification framework. This work will enable accurate prediction of splice junctions in a DNA sequence whose class label is unknown.

关键词： Data mining Classification splice gene sequence Clinical data

作者: Shomona Gracia Jacob R.Geetha Ramani P.Nancy

作者单位: Research Scholar, Dept. of Computer Science and Engineering, Rajalakshmi Engineering College, Thanda Professor& Head, Dept. of Computer Science and Engineering, Rajalakshmi Engineering College,Thandala

会议类型: 国际会议

会议名称: 2012 International Conference on Future Communication and Computer Technology(2012未来通信与计算机技术国际会议ICFCCT 2012)

会议地点: 哈尔滨

会议语种:英文

页码: 143-148

在线出版日期: 2012-05-19（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Classification of Splice Junction DNA sequence through Data mining techniques