Prediction of Mucin-type O-glycosylation by Support Vector Machines
Mucin-type O-glycosylation is one of the main types of the mammalian protein glycosylation. It is serine (Ser) or threonine (Thr) speci.c, though any consensus sequence is still unknown. In this report, support vector machines (SVM) are used for the prediction of O-glycosylation for each Ser or Thr site in the protein sequences. 99 mammalian protein sequences are selected from UniProt8.0. A certain length of a protein subsequence with Ser or Thr site at the center is used as input data to SVM, after the encoding in three ways. That is, sparse encoding, 5-letter encoding, and multiple encoding which uses both sparse and 5-letter encodings. The results of prediction experiments show that multiple encoding is most effective. The effective prediction requires the detailed information on amino acid residues in the nearest neighbors of the prediction target site, and the relatively rough information of biochemical characteristics on amino acid residues within approximately the 15th nearest neighbors of the target site. In addition, it is observed that the ratio of positive to negative data for the learning affects the performance.
Ikuko Nishikawa Hirotaka Sakamoto Ikue Nouno Kazutoshi Sakakibara Masahiro Ito
Ritsumeikan University 1-1-1 Noji-higashi, Kusatsu, 525-8577 Japan
国际会议
北京
英文
1901-1905
2007-05-23(万方平台首次上网日期,不代表论文的发表时间)