|
O-GlycBase v6.00
Release notes
Note! The database itself has recently been updated to v6.00. Some of the information on this
page may not yet have been changed accordingly.
O-GLYCBASE is a revised database of O- and C-glycosylated proteins.
Version 6.00 has 242 glycoprotein entries. The criteria for
inclusion are at least one experimentally verified O- or C-glycosylation
site. The terminal sugar linked to serine or threonine is cited when known.
The database is non-redundant in the sense that it contains no identical
sequences, unless there is conflicting glycosylation data.
Mucins have tandem repeat sequences, which are O-glycosylated.
This result in some redundancy of the O-glycosylation sites.
For prediction purposes we have also included a version of the database
which contains no identical O-glycosylation sites (window=9) called
O-Unique.seq. This data set has been used as the training set of the
netOglyc prediction server (Hansen et al. 1995).
Databases
Format of O-GLYCBASE
Fields: Description
> Entry accession number and entry date
GLYCPROT: Glycoprotein name, and alternative names
SPECIES: Species
DB_REF: Crossreferences to PIR, SWISS-PROT, PDB and PROSITE.
OGLYCAN: Type of carbohydrate linked to serine or threonine
SER: Residue numbers of the O-linked serines
THR: Residue numbers of the O-linked threonines
ASN: Residue numbers of the N-linked asparagines
TRP: Residue numbers of the C-linked tryptophans
REFERENCES: References of O-glycan assignment.
SEQ: Sequence length, including signal peptide.
SEQUENCE in one letter code. ex: STPSTPNASKLPGHSTNGT
Assignment ...ST.N.......stn..
Where where uppercase T,S,N denote experimentally verified glycosylation
sites of threonine, serine and asparagine, respectively and lower case t,s,n
denote predicted sites. Dots (.) denote no glycosylation.
COMMENTS: contain any comments
END End of entry
Format of O-Unique.
This non-redundant database contains 53 entries only including mammalian mucin type
glycoproteins. It contains 265 O-glycosylation sites.
First line contains sequence length - signalpeptide, database name, number of
experimentally and predicted glycosylation sites eg. ( 17, 0) and glycoprotein name
Second line starts the sequence in one letter uppercase code.
Below is given the assignment with the same notation as in O-GLYCBASE
Ex:
50 A29789 (pir) ( 17, 0) mucin - sheep (fragment)
SSVPGESATPQQPGALSESTTQLPGVTGTSAVTGSEPGLPSTGVSGLPGT
SS....S.T.......S.STT.....T.TS..T.S.....ST..S....T
The leukosialins are cut into peptides marked (p1-4) as this is the only regions
where the assignment can be performed. Including the rest of the sequences would
introduce false negative sites. (See comments in O-GLYCBASE).
This data set can be used for benchmark studies. It is identical to the data set
used to train the neural networks used in the
netOglyc prediction server (Hansen et al. 1995).
Data can no longer be retrieved by anonymous ftp. Only http is supported:
NEW DATA, COMMENTS AND SUGGESTIONS
New data, comments and suggestions may be sent to
Karin Julenius
E-mail address:kj@cbs.dtu.dk
PAPER TO REFERENCE WHILE REPORTING RESULTS:
O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins.
Ramneek Gupta, Hanne Birch, Kristoffer Rapacki, Søren Brunak and Jan E. Hansen
Nucleic Acids Research, 27: 370-372, 1999.
Last change: Sep 18, 2001,Ramneek Gupta
GETTING HELP
Technical problems:
|