Tables illustrating the data format:
File containing 270 intron containing genes from Yeast:
yeast_genome.with_introns.tab [1470 kb]
View the TAB file using a text editor (e.g. UltraEdit on Windows, BBedit on Mac or NEdit on Unix),
or import the file into a spreadsheet like Excel or a database like MySQL or Access.
The output data format uses a scheme with one
entry per line in the following format (tab separated):
name seq ann com
name: The sequence name, as determined by the "Naming preference"
seq: The DNA sequence it self. UPPERCASE is used for the
main sequence, lowercase is used for flanks (if any).
ann: Single letter sequence annotation. Position for position
the annotation descripes the DNA sequence: The first
letter in the annotation, descriped the annotation for
the first position in the DNA sequence and so forth.
The annotation code is defined as follows:
FEATURE BLOCKS (AKA. "EXON BLOCKS")
( First position
T tRNA exonic region
R rRNA / generic RNA exonic region
X Unknown feature type
) Last position
? Ambiguous first or last position
[ First UTR region position
] Last UTR region position
NOTICE: custom feature block can be defined using
the "Custom defined annotation" option.
INTRONS and FRAMESHIFTS
D First intron position (donor site)
I Intron position
A Last intron position (acceptor site)
< Start of frameshift
> End of frameshift
REGIONS WITHOUT FEATURES
. NULL annotation (no annotation).
ONLY IN FLANKING REGIONS:
+ Other feature defined on the SAME STRAND
as the current entry.
- Other feature defined on the OPPOSITE STRAND
relative to the current entry.
# Multiple or overlapping features.
A..Z: Feature on the SAME STRAND as the current entry.
a..z: Feature on the OPPOSITE STRAND as the current entry.
Notice: The type of features annotated in the flanking
regions is determined by the following option:
"Feature types to annotate in flanking regions"
com: Comments (free text). All text, extra information etc
defined in the GenBank files are concatenated into a single
The following extra information is added by this program:
*) Strand ("+" or "-").
*) GenBank entry ID ("LOCUS").
*) Feature type (e.g. "CDS" or "rRNA")
*) Spliced DNA sequence. Simply the DNA sequence defined
by the JOIN statement.
This is provied for two reasons. 1) To overcome negative
frameshifts. 2) As an easy way of extracting the sequence
of the spliced producted.
*) Spliced DNA annotation.
Scientific and technical problems: