O-GLYCBASE is a revised database of O- and C-glycosylated proteins.
Version 6.00 has 242 glycoprotein entries. The criteria for inclusion are at least one experimentally verified O- or C-glycosylation site. The terminal sugar linked to serine or threonine is cited when known. The database is non-redundant in the sense that it contains no identical sequences, unless there is conflicting glycosylation data. Mucins have tandem repeat sequences, which are O-glycosylated. This result in some redundancy of the O-glycosylation sites. For prediction purposes we have also included a version of the database which contains no identical O-glycosylation sites (window=9) called O-Unique.seq. This data set has been used as the training set of the netOglyc prediction server (Hansen et al. 1995).


Format of O-GLYCBASE

Fields:		Description
>		Entry accession number and entry date
GLYCPROT:	Glycoprotein name, and alternative names
SPECIES:	Species
DB_REF:         Crossreferences to PIR, SWISS-PROT, PDB and PROSITE.
OGLYCAN:	Type of carbohydrate linked to serine or threonine
SER:		Residue numbers of the O-linked serines 
THR:		Residue numbers of the O-linked threonines
ASN: 		Residue numbers of the N-linked asparagines   
TRP: 		Residue numbers of the C-linked tryptophans 
REFERENCES:     References of O-glycan assignment. 
SEQ:		Sequence length, including signal peptide.
SEQUENCE        in one letter code. ex:	 STPSTPNASKLPGHSTNGT
Assignment                               ...ST.N.......stn..

Where where uppercase T,S,N denote experimentally verified glycosylation sites of threonine, serine and asparagine, respectively and lower case t,s,n denote predicted sites. Dots (.) denote no glycosylation.

COMMENTS: 	contain any comments
END		End of entry

Format of O-Unique.

This non-redundant database contains 53 entries only including mammalian mucin type glycoproteins. It contains 265 O-glycosylation sites.
First line contains sequence length - signalpeptide, database name, number of experimentally and predicted glycosylation sites eg. ( 17, 0) and glycoprotein name
Second line starts the sequence in one letter uppercase code. Below is given the assignment with the same notation as in O-GLYCBASE

   50 A29789  (pir) ( 17,  0)    mucin - sheep (fragment)

The leukosialins are cut into peptides marked (p1-4) as this is the only regions where the assignment can be performed. Including the rest of the sequences would introduce false negative sites. (See comments in O-GLYCBASE). This data set can be used for benchmark studies. It is identical to the data set used to train the neural networks used in the netOglyc prediction server (Hansen et al. 1995).

Data can no longer be retrieved by anonymous ftp. Only http is supported:


New data, comments and suggestions may be sent to Karin Julenius
E-mail address:kj@cbs.dtu.dk


O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins.
Ramneek Gupta, Hanne Birch, Kristoffer Rapacki, Søren Brunak and Jan E. Hansen
Nucleic Acids Research, 27: 370-372, 1999.

Last change: Sep 18, 2001,
Ramneek Gupta