O-GalNAc-glycosylation is one of the
main types of glycosylation in mammalian cells. No consensus
recognition sequence for the O-glycosyltransferases is known, making
prediction methods necessary to bridge the gap between the large
number of known protein sequences and the small number
of proteins experimentally investigated with regard to glycosylation
status. From
O-GLYCBASE
a total of 86 mammalian proteins experimentally
investigated for in vivo O-GalNAc sites were extracted. Mammalian protein
homologue comparisons showed surprisingly that a glycosylated
serine or threonine is less likely to be precisely conserved than a
non-glycosylated one. The
Protein Data Bank
was analyzed
for structural information and 12 glycosylated structures were obtained.
All positive sites were found in coil or turn regions. A method for
predicting the location for
mucin-type glycosylation sites was trained using a neural network
approach. The best overall network used as input amino acid composition,
averaged surface accessibility predictions together with substitution
matrix profile encoding of the sequence. To improve prediction on
isolated (single) sites,
networks were trained on isolated sites only. The final method combines
predictions from the best overall network and the best isolated site
network and this prediction method correctly predicted 76\% of
the glycosylated residues and 93\% of the non-glycosylated residues.
NetOGlyc 3.1 can predict sites for completely new proteins without
loosing its performance. The fact that the sites could be predicted
from averaged properties together with the fact that glycosylation
sites are not precisely conserved indicates that mucin-type
glycosylation in most cases is a bulk property and not a very
site-specific one.