GlycateBase ver. 1.0
Generation of data set
This webpage describes the generation of the data set used to develop the artificial neural network based glycation predictor
. NetGlycate-1.0 predicts glycation of ε amino groups of lysines in mammalian proteins and the data set therefore only contain such data. Glycation data was obtained from the literature. The resulting
consists of 20 proteins with 89 glycated lysines and 126
lysines and can be downloaded below
. Only experimentally verified glycation
were used, and all sequences were extracted from the UniProt
database (Bairoch et al., 2005
). It was
decided to mask out lysines in pro- and signal peptides
since these parts
of the proteins are cleaved off during maturation of the proteins and are thus
available for glycation. The references from which the glycation data
was taken are shown in
To avoid confusing the
prediction algorithm, unvalidated glycation sites were masked out, however,
some of the studies mentioned in Table 1
to have validated
some of the sites in the dataset that were masked out as unvalidated. The
reasons for these sites being masked out as unvalidated are described below
||Baldwin et al., 1995
||Watkins et al., 1985
Cotham et al., 2003
||Niemann et al., 1991
||Miyata et al., 1994
||Shapiro et al., 1980
Zhang et al., 2001
||Shapiro et al., 1980
Zhang et al., 2001
||Abraham et al., 1994
||Abraham et al., 1994
||Zhao et al., 1996
||Smith et al., 1996
||Calvo et al., 1993
||Shuvaev et al., 1999
||Garlick & Mazer, 1983
Shaklai et al., 1984
Iberg & Flückiger, 1986
Lapolla et al., 2004
||Swamy-Mruthinti & Schey, 1997
||Fujita et al., 1998
||Adachi et al., 1992
||Nacharaju et al., 1997
||Beranek et al., 2001
||Acosta et al., 2000
||Takahashi et al., 1995
Table 1. The references from where the glycation data is taken are shown.
For RNP_BOVIN, Cotham et al., 2003, states that all ten lysines are glycated but finds the same four major sites as Watkins et al., 1985. It was therefore decided to use the four glycation sites from Watkins et al., 1985 and mask out the remaining six lysines. Of the remaining six lysines the
only predicts K-92 to be glycated thus surporting the notion that the remaining six lysines are mainly minor sites. In fact Ames, 2005 finds K-92 to be glycated thus confirming the prediction made by our predictor.
There has been some controversy about the glycation sites for CRGB_BOVIN. In particular K-163. According to the newest article, Smith et al., 1996, it is only the N-terminus and K-2 that gets glycated and not K-163. It was therefore decided to mask out K-163. The
predicts K-163 to be un-glycated thus agreing with Smith et al., 1996.
The protein APE_HUMAN contains suspiciously few glycation sites compared to other proteins of similar length. Furthermore, since K-93 only corresponds to 20% of the detected Amadori products (Shuvaev et al., 1999), the other lysines in the protein are masked out as unvalidated. The same problem arises for the protein CFAB_HUMAN and it was therefore decided also to mask out the non-glycated lysines in this protein. For both APE_HUMAN and CFAB_HUMAN the
predicts several glycation sites among the masked out lysines thus suggesting that there is more than one glycation site in each of these two proteins.
The training of neural networks was done using three-fold cross-validation. The division into cross validation groups was made on the site level meaning that each
sequence in the cross-validation groups only contained one site.
The other sites were masked out and the sequence is then repeated one
time for each site.
The positive and negative sites were extracted as a window of 21 amino acid residues and a phylogenetic tree was constructed. The tree was then
inspected visually and the related sites were placed in the same
cross validation group. This was done in order to prevent the situation where the network had learned the sites in the test set before-hand from
the learning set. This situation would occur if related sites were
placed in the test and learning set and could lead to an
overestimation of the performance of the network if the related
sites belonged to the same category (glycated or non-glycated
lysine). If the related sites belonged to different categories it
could give problems with learning to classify the sites correctly.
The remaining positive and negative sites were then added randomly
to the three cross validation groups in such a way that the cross-validation groups contained almost the same number of positive and
negative sites (see Table 2) and that all sites
in the cross-validation groups were placed in random order.
Table 2. Number of positive and negative sites in each cross-validation group.
Both in vitro and in vivo data were used to make
the data set as large as possible. It was, however, decided to only
include in vitro data that were obtained at conditions
that resembles physiological conditions. Note that the glycated proteins
used in this study are of mammalian origin.
G: positive site
K: negative site
S: signal peptide (not used for training)
P: propeptide (not used for training)
U: unvalidated site (not used for training)
-: non-lysine residue (not used for training)
For the complete dataset click
- Abraham et al., 1994
Abraham,E.C., Cherian,M. and Smith,J.B. (1994).
Site selectivity in the glycation of alpha A- and alpha
B-crystallins by glucose.
Biochem Biophys Res Commun, 201 , 1451-1456.
- Acosta et al., 2000
Acosta,J., Hettinga,J., Flückiger,R., Krumrei,N., Goldfine,A., Angarita,L.
and Halperin,J. (2000).
Molecular basis for a link between complement and the vascular
complications of diabetes.
Proc Natl Acad Sci U S A, 97 , 5450-5455.
- Adachi et al., 1992
Adachi,T., Ohta,H., Hayashi,K., Hirano,K. and Marklund,S.L. (1992).
The site of nonenzymic glycation of human extracellular-superoxide
dismutase in vitro.
Free Radic Biol Med, 13 , 205-210.
- Ames, 2005
Ames J.M. (2005).
Application of semiquantitative proteomics techniques to the maillard reaction.
Ann N Y Acad Sci, 1043 , 225-35.
- Bairoch et al., 2005
Bairoch,A., Apweiler,R., Wu,C.H., Barker,W.C., Boeckmann,B., Ferro,S.,
Gasteiger,E., Huang,H., Lopez,R., Magrane,M., Martin,M.J., Natale,D.A.,
O'Donovan,C., Redaschi,N. and Yeh,L.S.L. (2005).
The Universal Protein Resource (UniProt).
Nucleic Acids Res, 33 , 154-159.
- Baldwin et al., 1995
Baldwin,J.S., Lee,L., Leung,T.K., Muruganandam,A. and Mutus,B. (1995).
Identification of the site of non-enzymatic glycation of glutathione
peroxidase: rationalization of the glycation-related catalytic alterations on
the basis of three-dimensional protein structure.
Biochim Biophys Acta, 1247 , 60-64.
- Beranek et al., 2001
Beranek,M., Drsata,J. and Palicka,V. (2001).
Inhibitory effect of glycation on catalytic activity of alanine
Mol Cell Biochem, 218 , 35-39.
- Calvo et al., 1993
Calvo,C., Ulloa,N., Campos,M., Verdugo,C. and Ayrault-Jarrier,M. (1993).
The preferential site of non-enzymatic glycation of human
apolipoprotein A-I in vivo.
Clin Chim Acta, 217 , 193-198.
- Cotham et al., 2003
Cotham,W.E., Hinton,D.J.S., Metz,T.O., Brock,J.W.C., Thorpe,S.R., Baynes,J.W.
and Ames,J.M. (2003).
Mass spectrometric analysis of glucose-modified ribonuclease.
Biochem Soc Trans, 31 , 1426-1427.
- Fujita et al., 1998
Fujita,T., Suzuki,K., Tada,T., Yoshihara,Y., Hamaoka,R., Uchida,K., Matuo,Y.,
Sasaki,T., Hanafusa,T. and Taniguchi,N. (1998).
Human erythrocyte bisphosphoglycerate mutase: inactivation by
glycation in vivo and in vitro.
J Biochem (Tokyo), 124 , 1237-1244.
- Garlick & Mazer, 1983
Garlick,R.L. and Mazer,J.S. (1983).
The principal site of nonenzymatic glycosylation of human serum
albumin in vivo.
J Biol Chem, 258 , 6142-6146.
- Iberg & Flückiger, 1986
Iberg,N. and Flückiger,R. (1986).
Nonenzymatic glycosylation of albumin in vivo. Identification of
multiple glycosylated sites.
J Biol Chem, 261 , 13542-13545.
- Lapolla et al., 2004
Lapolla,A., Fedele,D., Reitano,R., Arico,N.C., Seraglia,R., Traldi,P.,
Marotta,E. and Tonani,R. (2004).
Enzymatic digestion and mass spectrometry in the study of advanced
glycation end products/peptides.
J Am Soc Mass Spectrom, 15 , 496-509.
- Miyata et al., 1994
Miyata,T., Inagi,R., Wada,Y., Ueda,Y., Iida,Y., Takahashi,M., Taniguchi,N. and
Glycation of human beta 2-microglobulin in patients with
hemodialysis-associated amyloidosis: identification of the glycated sites.
Biochemistry, 33 , 12215-12221.
- Nacharaju et al., 1997
Nacharaju,P., Ko,L. and Yen,S.H. (1997).
Characterization of in vitro glycation sites of tau.
J Neurochem, 69 , 1709-1719.
- Niemann et al., 1991
Niemann,M.A., Bhown,A.S. and Miller,E.J. (1991).
The principal site of glycation of human complement factor B.
Biochem J, 274 ( Pt 2), 473-480.
- Shaklai et al., 1984
Shaklai,N., Garlick,R.L. and Bunn,H.F. (1984).
Nonenzymatic glycosylation of human serum albumin alters its
conformation and function.
J Biol Chem, 259 , 3812-3817.
- Shapiro et al., 1980
Shapiro,R., McManus,M.J., Zalut,C. and Bunn,H.F. (1980).
Sites of nonenzymatic glycosylation of human hemoglobin A.
J Biol Chem, 255 , 3120-3127.
- Shuvaev et al., 1999
Shuvaev,V.V., Fujii,J., Kawasaki,Y., Itoh,H., Hamaoka,R., Barbier,A.,
Ziegler,O., Siest,G. and Taniguchi,N. (1999).
Glycation of apolipoprotein E impairs its binding to heparin:
identification of the major glycation site.
Biochim Biophys Acta, 1454 , 296-308.
- Smith et al., 1996
Smith,J.B., Hanson,S.R., Cerny,R.L., Zhao,H.R. and Abraham,E.C. (1996).
Identification of the glycation site of lens gamma B-crystallin by
fast atom bombardment tandem mass spectrometry.
Anal Biochem, 243 , 186-189.
- Swamy-Mruthinti & Schey, 1997
Swamy-Mruthinti,S. and Schey,K.L. (1997).
Mass spectroscopic identification of in vitro glycated sites of
Curr Eye Res, 16 , 936-941.
- Takahashi et al., 1995
Takahashi,M., Lu,Y.B., Myint,T., Fujii,J., Wada,Y. and Taniguchi,N. (1995).
In vivo glycation of aldehyde reductase, a major 3-deoxyglucosone
reducing enzyme: identification of glycation sites.
Biochemistry, 34 , 1433-1438.
- Watkins et al., 1985
Watkins,N.G., Thorpe,S.R. and Baynes,J.W. (1985).
Glycation of amino groups in protein. Studies on the specificity of
modification of RNase by glucose.
J Biol Chem, 260 , 10629-10636.
- Zhang et al., 2001
Zhang,X., Medzihradszky,K.F., Cunningham,J., Lee,P.D., Rognerud,C.L., Ou,C.N.,
Harmatz,P. and Witkowska,H.E. (2001).
Characterization of glycated hemoglobin in diabetic patients:
usefulness of electrospray mass spectrometry in monitoring the extent and
distribution of glycation.
J Chromatogr B Biomed Sci Appl, 759 , 1-15.
- Zhao et al., 1996
Zhao,H.R., Smith,J.B., Jiang,X.Y. and Abraham,E.C. (1996).
Sites of glycation of beta B2-crystallin by glucose and fructose.
Biochem Biophys Res Commun, 229 , 128-133.
NEW DATA, COMMENTS AND SUGGESTIONS
New data, comments and suggestions may be sent to
Morten Bo Johansen
If the use of this database contributes significantly to your results,
Analysis and prediction of mammalian protein glycation.
Morten Bo Johansen, Lars Kiemer and Søren Brunak
Glycobiology, 16:844-853, 2006