ADAPTIVE WEIGHTING DISTANCE FOR FEATURE VECTORS OF BIOLOGICAL SEQUENCES

摘要：

Simllarity search in biology sequences has received substantial attention in the past decade.Sequence alignment is the essential task for similar sequence search in bioinformatics.The biological sequence databases have getting larger in past decade, finding sequences similar to the query sequence Is a time consuming task.By transforming sequences Into numeric feature vectors, we can quickly filter out sequences whose feature vectors are distant to the feature vector of the query sequence.We proposed an adaptive weighting distance which Is based on feature vector that contains three groups of features: Count, Extended Relative Position Dispersion (XRPD), and Extended Absolute Position Dispersion (XAPD) of a DNA sequence 5.Each group has four dimensions for A, C, T, and G.When computing distance between two feature vectors, Euclidean distance and L1 distance are commonly used.In this paper, we use weighted L1 distance for computing the distance between two feature vectors.We derive weights for the four nucleotldes from the Count group, and apply the weights to both XRPD and XAPD.In other words, if a certain kind of nucleotide appears much frequent than the other kinds of nudeotldes, the weight for the kind of nucleotide should also be large in XRPD and XAPD groups.Experiments show that such distance of feature vectors helps reflect the distance between sequences.

关键词： DNA Sequence Weight Assignment Feature Vector.

作者: HUANG-CHENG KUO PEI-YUAN JOU JEN-PENG HUANG

作者单位: Department of Computer Science and Information Engineering, National Chiayi University, Chiayi City Department of Information Management, Southern Taiwan University of Technology, Tainan 710, Taiwan

会议类型: 国际会议

会议名称: 2007 International Conference on Machine Learning and Cybernetics(IEEE第六届机器学习与控制论国际会议)

会议地点: 香港

会议语种:英文

页码: 2269-2273

在线出版日期: 2007-08-19（万方平台首次上网日期，不代表论文的发表时间）

会议专题

ADAPTIVE WEIGHTING DISTANCE FOR FEATURE VECTORS OF BIOLOGICAL SEQUENCES