Article abstract


Prediction, conservation analysis and structural characterization of mammalian mucin-type O-glycosylation sites.
K. Julenius, A. Mølgaard, R. Gupta and S. Brunak.
Submitted to Glycobiology

Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, DK-2800 Lyngby, Denmark


O-GalNAc-glycosylation, also called mucin-type, is one of the main types of glycosylation in mammalian cells. No consensus recognition sequence for the O-glycosyltransferases is known, making prediction methods necessary to bridge the gap between the large number of known protein sequences and the small number of proteins experimentally investigated with regard to glycosylation sites. From O-GLYCBASE a total of 85 mammalian proteins experimentally investigated for in vivo O-GalNAc sites were extracted, giving 421 positive and 2063 negative sites. The Protein Data Bank was analyzed for structural information and 14 glycosylated structures were obtained. All positive sites were found in coil or turn regions. Mammalian protein homologue comparisons showed that a glycosylated serine or threonine is less likely to be precisely conserved than a non-glycosylated one. A method for predicting the location for mucin-type glycosylation sites was trained using a neural network machine learning approach. Different ways of encoding the sequence were tried in combination with various sequence-derived features as input data to the network. The best network used amino acid composition, averaged surface accessibility predictions together with a small window using BLOSUM62 profile encoding. This prediction method correctly predicted 70% of the glycosylated residues and 93% of the non-glycosylated residues. NetOGlyc 3.0 can predict sites for completely new proteins without loosing its performance. The fact that the sites could be predicted from averaged properties together with the fact that glycosylation sites are not precisely conserved indicates that mucin-type glycosylation in most cases is a bulk property and not a very site-specific one.


Karin Julenius,