|
YinOYang abstract
Intracellular O-glycosylation is characterised by the addition of
N-acetylglucosamine, in a beta anomeric linkage, to Serine and
Threonine residues in a protein. The acceptor site does not display
a definite consensus sequence. However, the fuzzy motif is marked by
the close vicinity of Proline residues (positions -4,-3,-2),
Valines (-1,+2,+4,+5) and a downstream tract of Serines (+1,+4,+7)
though Leucines and Glutamines are disfavoured. Secondary structure
predictions indicate the 21-mer window to be sheet or coil. We train
a jury of neural networks on 40 experimentally determined
O-(beta)-GlcNAc acceptor sites, to recognise the sequence context
and surface accessibility. Non-acceptor Serine/Threonines were
pruned from 1251 in number to 626. In a cross-validation, 72.5%
of the glycosylated sites and 79.5% of the non-glycosylated sites
were correctly identified in the test set, revealing a Matthews
correlation coefficient of 0.22 on the original data, and 0.84 on
the augmented data set.
The method was used to scan all human protein sequences available in
SwissProt for potential O-(beta)-GlcNAc acceptors. Since this
modification is known to be reciprocal with phosphorylation, we
cross scanned for phosphorylation sites, and identified such
'Yin-Yang' sites. The spread of O-(beta)-GlcNAcylation, PEST regions
and phosphorylation sites, was studied across cellular role
categories, enzyme classes and subcellular compartments. Predicted
O-(beta)-GlcNAc sites were found in over half of all SwissProt human
sequences, 65% of which were nuclear or cytoplasmic.
CORRESPONDENCE
Ramneek Gupta,
|