Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Abstract

The specificities of the UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase family which links the carbohydrate GalNAc to the side chain of certain serine and threonine residues in mucin type glycoproteins are presently unknown. The specificity seems to be modulated by sequence context, secondary structure and surface accessibility. The sequence context of glycosylated threonines was found to differ from that of serine, and the sites were found to cluster. Non-clustered sites had a sequence context different from that of clustered sites. Charged residues were disfavoured at position -1 and +3. A jury of artificial neural networks was trained to recognize the sequence context and surface accessibility of 299 known and verified mucin type O-glycosylation sites extracted from O-GLYCBASE. The cross-validated NetOglyc network system correctly found 83% of the glycosylated and 90% of the non-glycosylated serine and threonine residues in independent test sets, thus proving more accurate than matrix statistics and vector projection methods.



CURRENT NETWORK

The network will be updated and predictions can alter due to different versions. The network is balanced to give optimal predictions whether you submit sequences with no homology to the known O-glycosylated proteins or not. If however the submitted sequence is very close to or identical to the sequences in our training dataset, the accuracy can be expected to be higher than reported above. Even though the method is trained on mucin GalNAc type sites from mammalian proteins it finds 51 % of mannose O-glycosylated sites in fungal glycoproteins and 85 % of the not O-glycosylated sites in fungal proteins.



We would appreciate any confirmation or the opposite of our predictions. Since an expanded data set with additional O-glycosylated sequences would increase the performance of the network, we are very interested in receiving such material. If you have knowledge of experimentally determined O-glycosylation sites in glycoproteins not already in the data set O-GlycBase V2.0 we would like to include them. User feedback is the only way we will learn to enhance the performance of the method.


CORRESPONDENCE

Søren Brunak,