会议专题

Clustering and Annotating Multiple Gene Sequences

New DNA and protein data of diverse species have been steadily discovered and deposited in public archives according to established formats. Sequence alignment can be used to study the relationship among sequences in sets of two or more. It is especially useful when studying the relationship of similar types of gene products expressed by different organisms. A suffix tree is an incremental linear-time algorithm that is useful in dealing with a large amount of genomic data. In this study, a new method of clustering and annotating gene sequences was proposed. Though the existing suffix tree algorithm finds common subsequences, clusters of similar sequences are not detected. Also, parallel EST clustering does not identify similarity among gene sequences. The proposed CLustering & Annotating Gene sequences procedure generates clusters of similar gene sequences and runs a BLAST search to annotate the clusters with DNA and protein databases, using the longest common subsequences of each cluster as query. The performance of the proposed technique was examined with 42 gene sequences in a TCA cycle (a citrate cycle) of bacteria.

Multiple sequence alignment Clustering BLAST Suffix Tree

Kyu Suk Hwang Seung Bae Jung Chang Won Park Young Han Kim

Department of Chemical Engineering,Pusan National University,Pusan,609-735,Korea Department of Chemical Engineering,Dong-A University,Pusan 604-714,Korea

国际会议

第四届亚洲过程系统工程会议暨2007年中国国际系统工程年会(The 4th International Symposium on Design,Operation & Control of Chemical Processes)(PSE ASIA 2007)

西安

英文

2007-08-15(万方平台首次上网日期,不代表论文的发表时间)