REFERENCEPrediction, conservation analysis and structural characterization of mammalian mucin-type O-glycosylation sites.
K. Julenius, A. Mølgaard, R. Gupta and S. Brunak.
Glycobiology, 15:153-164, 2005.
Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, DK-2800 Lyngby, Denmark
View the full text
of the article.
O-GalNAc-glycosylation is one of the main types of glycosylation in mammalian cells. No consensus recognition sequence for the O-glycosyltransferases is known, making prediction methods necessary to bridge the gap between the large number of known protein sequences and the small number of proteins experimentally investigated with regard to glycosylation status. From O-GLYCBASE a total of 86 mammalian proteins experimentally investigated for in vivo O-GalNAc sites were extracted. Mammalian protein homologue comparisons showed surprisingly that a glycosylated serine or threonine is less likely to be precisely conserved than a non-glycosylated one. The Protein Data Bank was analyzed for structural information and 12 glycosylated structures were obtained. All positive sites were found in coil or turn regions. A method for predicting the location for mucin-type glycosylation sites was trained using a neural network approach. The best overall network used as input amino acid composition, averaged surface accessibility predictions together with substitution matrix profile encoding of the sequence. To improve prediction on isolated (single) sites, networks were trained on isolated sites only. The final method combines predictions from the best overall network and the best isolated site network and this prediction method correctly predicted 76\% of the glycosylated residues and 93\% of the non-glycosylated residues. NetOGlyc 3.1 can predict sites for completely new proteins without loosing its performance. The fact that the sites could be predicted from averaged properties together with the fact that glycosylation sites are not precisely conserved indicates that mucin-type glycosylation in most cases is a bulk property and not a very site-specific one.