会议专题

Prediction of the Protein O-glycosylation by Machine Learning and Statistical Characters around the Glycosylation Sites

O-glycosylation of the mammalian protein is inves-tigated. It is serine or threonine specific, though any consensus sequence is still unknown. We have applied support vectormachines (SVM) for the prediction of O-glycosylation sites from various kinds of protein information, aiming to investigate the condition for glycosylation and elucidate the mechanisms. l present study, first we focus on the distribution of the glycosy-lation sites. It is observed that many O-glycosylated sites are in clusters of closely spaced glycosylated sites, whereas the other sites are found sparsely or isolated. These two types of crowded and isolated sites might have different glycosylation mechanisms.Therefore we divide the whole O-glycosylation sites into the crowded and the isolated groups. For each group, SVM. i.s trained to predict the O-glycosylation sites separately. The prediction re-sults of two SVMs have different input information dependency. The results indicate that some motifs are expected for the isolated group, while the interaction between the glycosylated sites and the relative proportion of the surrounding amino acids affect the glycosylation for the crowded group. Then, we compare the statistics of amino acid sequences around the glycosylatior of both groups. As the results, some amino acids (proline, valine,alanine etc.) have high existence probabilities at each specific position relative to a glycosylation site, especially for the isolated glycosylation. Moreover, independent component analysis for the amino acid sequences elucidates position specific existences of the above amino acids, including well known proline at -1 and+3,which are found as different independent components. Finally,we investigate the relation with O-glycosylation and the domain structure or the disordered region of the protein. It is clearly observed that O-glycosylation is more frequently observed in the disordered region and less in the domain. This could be the key feature to understand the non-conservation property, the role in functional diversity and structural stability of O-glycosylation.

Bioinformatics Protein glycosylation Support vector machine Intrinsically disordered

Ikuko Nishikawa Yukiko Nakajima Kazutoshi Sakakibara Masahiro Ito

College of Information Science and Engineering,Ritsumeikan University Kusatsu, Shiga 525-8577, Japan College of Life Science,Ritsumeikan University Kusatsu, Shiga 525-8577, Japan

国际会议

The 2nd International Conference on Software Engineering and Data Mining(IEEE 第二届国际软件工程和数据挖掘学术大会 SEDM 2010)

成都

英文

639-642

2010-06-23(万方平台首次上网日期,不代表论文的发表时间)