YinOYang abstract

Intracellular O-glycosylation is characterised by the addition of N-acetylglucosamine, in a beta anomeric linkage, to Serine and Threonine residues in a protein. The acceptor site does not display a definite consensus sequence. However, the fuzzy motif is marked by the close vicinity of Proline residues (positions -4,-3,-2), Valines (-1,+2,+4,+5) and a downstream tract of Serines (+1,+4,+7) though Leucines and Glutamines are disfavoured. Secondary structure predictions indicate the 21-mer window to be sheet or coil. We train a jury of neural networks on 40 experimentally determined O-(beta)-GlcNAc acceptor sites, to recognise the sequence context and surface accessibility. Non-acceptor Serine/Threonines were pruned from 1251 in number to 626. In a cross-validation, 72.5% of the glycosylated sites and 79.5% of the non-glycosylated sites were correctly identified in the test set, revealing a Matthews correlation coefficient of 0.22 on the original data, and 0.84 on the augmented data set.

The method was used to scan all human protein sequences available in SwissProt for potential O-(beta)-GlcNAc acceptors. Since this modification is known to be reciprocal with phosphorylation, we cross scanned for phosphorylation sites, and identified such 'Yin-Yang' sites. The spread of O-(beta)-GlcNAcylation, PEST regions and phosphorylation sites, was studied across cellular role categories, enzyme classes and subcellular compartments. Predicted O-(beta)-GlcNAc sites were found in over half of all SwissProt human sequences, 65% of which were nuclear or cytoplasmic.


Ramneek Gupta,