Clustering and Annotating Multiple Gene Sequences

摘要：

New DNA and protein data of diverse species have been steadily discovered and deposited in public archives according to established formats. Sequence alignment can be used to study the relationship among sequences in sets of two or more. It is especially useful when studying the relationship of similar types of gene products expressed by different organisms. A suffix tree is an incremental linear-time algorithm that is useful in dealing with a large amount of genomic data. In this study, a new method of clustering and annotating gene sequences was proposed. Though the existing suffix tree algorithm finds common subsequences, clusters of similar sequences are not detected. Also, parallel EST clustering does not identify similarity among gene sequences. The proposed CLustering & Annotating Gene sequences procedure generates clusters of similar gene sequences and runs a BLAST search to annotate the clusters with DNA and protein databases, using the longest common subsequences of each cluster as query. The performance of the proposed technique was examined with 42 gene sequences in a TCA cycle (a citrate cycle) of bacteria.

关键词： Multiple sequence alignment Clustering BLAST Suffix Tree

作者: Kyu Suk Hwang Seung Bae Jung Chang Won Park Young Han Kim

作者单位: Department of Chemical Engineering,Pusan National University,Pusan,609-735,Korea Department of Chemical Engineering,Dong-A University,Pusan 604-714,Korea

会议类型: 国际会议

会议名称: 第四届亚洲过程系统工程会议暨2007年中国国际系统工程年会(The 4th International Symposium on Design,Operation & Control of Chemical Processes)(PSE ASIA 2007)

会议地点: 西安

会议语种:英文

在线出版日期: 2007-08-15（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Clustering and Annotating Multiple Gene Sequences