The specificities of the UDP-GalNAc:polypeptide
N-acetylgalactosaminyltransferase family which links the carbohydrate
GalNAc to the side chain of certain serine and threonine residues in
mucin type glycoproteins are presently unknown. The specificity seems to be
modulated by sequence context, secondary structure and surface
accessibility. The sequence context of glycosylated threonines was found
to differ from that of serine, and the sites were found to cluster.
Non-clustered sites had a sequence context different from that of
clustered sites. Charged residues were disfavoured at position -1 and
+3. A jury of artificial neural networks was trained to recognize the
sequence context and surface accessibility of 299 known and verified mucin
type O-glycosylation sites extracted from O-GLYCBASE. The
cross-validated NetOglyc network system correctly found 83% of the
glycosylated and 90% of the non-glycosylated serine and threonine residues
in independent test sets, thus proving more accurate than matrix statistics
and vector projection methods.
The network will be updated and predictions can alter due to different
versions. The network is balanced to give optimal predictions whether you
submit sequences with no homology to the known O-glycosylated proteins or not.
If however the submitted sequence is very close to or identical to the
sequences in our training dataset, the accuracy can be expected to be higher
than reported above. Even though the method is trained on mucin GalNAc type
sites from mammalian proteins it finds 51 % of mannose O-glycosylated sites in
fungal glycoproteins and 85 % of the not O-glycosylated sites in fungal
We would appreciate any confirmation or the opposite of our predictions. Since
an expanded data set with additional O-glycosylated sequences would
increase the performance of the network, we are very interested in receiving
such material. If you have knowledge of experimentally determined
O-glycosylation sites in glycoproteins not already in the data set O-GlycBase V2.0 we
would like to include them. User feedback is the only way we will learn to
enhance the performance of the method.