会议专题

Towards comprehensive structural motif mining for better fold annotation in the twilight zone of sequence dissimilaritya

Background: Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the twilight- or midnight- zones where pair-wise sequence identities to known sequences fall below 25% and sequence-based functional annotations often fail.Results: Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in immunoevasins, proreins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method.Conclusions: We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty.

Yi Jia Jun Huan Vincent Buhr Jintao Zhang Leonidas N.Carayannopoulos

Department of Electrical Engineering & Computer Science, University of Kansas, Lawrence, KS, 66045, Department of Molecular Biosciences, The University of Kansas, Lawrence, KS 66046, USA School of Medicine, Washington University in St.Louis, St.Louis, MO, 63130, USA

国际会议

The 7th Asia-Pacific Bioinformatics Conference(第七届亚太生物信息学大会)

北京

英文

523-536

2009-01-01(万方平台首次上网日期,不代表论文的发表时间)