Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

This site contains data sets from the paper

An overabundance of phase 0 introns immediately after the start codon in eukaryotic genes
Henrik Nielsen and Rasmus Wernersson
Submitted, 2006.


Data sets

Please refer to the publication for details about the generation of the data sets.

Genomic data

Homo sapiens Proteins w. intron phase annotation (13890), in TAB format (details: Virtual Ribosome).

Uncompressed: human_build35.cleaned.tab (22mb)
Gzip compressed: human_build35.cleaned.tab.gz (4.8mb)

Same data (without the long comments) in HOW format (HOW example):
Uncompressed: human_build35.cleaned.how (13mb)
Gzip compressed: human_build35.cleaned.how.gz (3.5mb)

Mus musculus Proteins w. intron phase annotation (13928), in TAB format (details: Virtual Ribosome).

Uncompressed: mouse_build35.cleaned.tab (22mb)
Gzip compressed: mouse_build35.cleaned.tab.gz (5.7mb)

Same data (without the long comments) in HOW format (HOW example):
Uncompressed: mouse_build35.cleaned.how (14mb)
Gzip compressed: mouse_build35.cleaned.how.gz (4.3mb)

Drosophila melanogaster Proteins w. intron phase annotation (19250), in TAB format (details: Virtual Ribosome).

Uncompressed: drosophila_april05.cleaned.tab (38mb)
Gzip compressed: drosophila_april05.cleaned.tab.gz (5.5mb)

Same data (without the long comments) in HOW format (HOW example):
Uncompressed: drosophila_april05.cleaned.how (21mb)
Gzip compressed: drosophila_april05.cleaned.how.gz (4.8mb)

GO to name mapping list:

Uncompressed: drosophila.go_categories.txt (4.8mb)
Gzip compressed: drosophila.go_categories.txt.gz (519kb)

GenBank data

Homology reduced data sets

The homology reduced data sets (proteins w. intron phase annotation) are avaiable in both HOW and TAB format.

Gzip compressed tar archives of all data are available here:
homology_reduc_all.how.tgz (10mb)
homology_reduc_all.tab.tgz (9mb).

Browse individual subsets

Vertebrata
Protein count (the same number applies for the corresponding HOW files):

          2552 Vertebrata.reduc.nosignal.tab
           755 Vertebrata.reduc.signal.tab
          3542 Vertebrata.reduc.tab
	

Arthropoda
Protein count (the same number applies for the corresponding HOW files):

          3202 Arthropoda.reduc.nosignal.tab
           769 Arthropoda.reduc.signal.tab
          4179 Arthropoda.reduc.tab
	

Fungi
Protein count (the same number applies for the corresponding HOW files):

          3814 Fungi.reduc.nosignal.tab
           431 Fungi.reduc.signal.tab
          4525 Fungi.reduc.tab
	

Magnoliophyta
Protein count (the same number applies for the corresponding HOW files):

         10370 Magnoliophyta.reduc.nosignal.tab
          1051 Magnoliophyta.reduc.signal.tab
         12751 Magnoliophyta.reduc.tab
	



CORRESPONDENCE

Henrik Nielsen,