OGLYC-BASE is a revised database of O-glycosylated proteins. Version 4.0 contains 179 glycoprotein entries, 991 verified O-glycosylated sites. The criteria for inclusion is at least one experimentally verified O-glycosylation site. The terminal sugar linked to serine or threonine is cited when known. The database is non-redundant in the sense that it contains no identical sequences. Mucins have tandem repeat sequences, which are O-glycosylated. This results in some redundancy of the O-glycosylation sites. For prediction purposes we have also included a version of the database which contains no identical O-glycosylation sites (window=9) called O-Unique.seq. This data set has been used as the training set of the NetOGlyc prediction server (Hansen et al. 1995). Format of OGLYC-BASE ------------------------------------------------------------------------------- Fields: Description ------------------------------------------------------------------------------- > Entry accession number and entry date GLYCPROT: Glycoprotein name, and alternative names SPECIES: Species DB_REF: Crossreferences to PIR, SWISS-PROT, PDB and PROSITE. OGLYCAN: Type of carbohydrate linked to serine or threonine SER: Residue numbers of the O-linked serines THR: Residue numbers of the O-linked threonines ASN: Residue numbers N-linked Asparagines REFERENCES: References of glycan assignment. SEQ: Sequence length, including signal peptide. SEQUENCE in one letter code. ex: STPSTPNASKLPGHSTNGT Assignment ...ST.N.......stn.. (where uppercase T,S,N denote experimentally verified glycosylation sites of threonine, serine and asparagine respectively, and lower case t,s,n denote predicted sites. Dots (.) indicate 'no glycosylation'. COMMENTS: additional information/cautionary notes ------------------------------------------------------------------------------- Format of O-Unique.seq This non-redundant database contains 53 entries only including mammalian mucin type glycoproteins. It contains 265 O-glycosylation sites. First line contains sequence length - signalpeptide, database name, number of experimentally and predicted glycosylation sites eg. ( 17, 0) and glycoprotein name Second line starts the sequence in one letter uppercase code. Below is given the assignment with the same notation as in OGLYC-BASE Ex: 50 A29789 (pir) ( 17, 0) ( mucin - sheep (fragment) SSVPGESATPQQPGALSESTTQLPGVTGTSAVTGSEPGLPSTGVSGLPGT SS....S.T.......S.STT.....T.TS..T.S.....ST..S....T The leukosialins are cut into peptides marked (p1-4) as this is the only regions where the assignment can be performed. Including the rest of the sequences would introduce false negative sites. (See comments in OGLYC-BASE). This data set can be used for benchmark studies. It is identical to the data set used to train the neural networks used in the netOglyc prediction server (Hansen et al. 1995)