NAME
dna2pep - full featured computational translation of DNA to peptide.
(The program behind the "Virtual Ribosome" webserver.
SYNOPSIS
dna2pep [options] [input files] [-f outfile]
DESCRIPTION
TRANSLATION: The translation engine of dna2pep has full support for handling
degenerate nucleotides (IUPAC definition, e.g. W = A or T, S = G or C).
All translation table defined by the NCBI taxonomy group is included,
and a number of options determining the behaviour of STOP and START
codons is avialable.
INTRON and EXONS: dna2pep natively understands TAB files containing
Intron/Exon annotation (gb2tab / FeatureExtract). When translating
files containing Intron/Exon structure, dna2pep will annotate the
underlying gene-structure in the annotation of the translated
sequence.
Input files can be in FASTA (no Intron/Exon annotation) RAW (single
sequence with no header - all non-letters are discarded) or TAB
(incluing annotation) FORMAT. The output format will by default be FASTA
for files without annotation and TAB for files including annotation.
The file format is autodetected by investigating the first line
of the input.
If no input files are specified, dna2pep will read from STDIN.
OPTIONS
-F, --outfile
Optional - specify an output file. If no output file is
specified the output will go to STDOUT.
-O, --outformat
Specify output format (see also the --fasta, --tab,
--report options below):
FASTA: Fasta format (plain DNA, no sequence annotation)
TAB: Tab format. Each line contains the following four
fields, separated by tabs:
name, seq, ann, comment
See gb2tab (FeatureExtract) for details.
REPORT: A nice visualization of the results.
AUTO: [Default] Generate a both a report and sequence output
(use the same format as the one detected from the for
the input files).
--fasta filename
Write output sequences in FASTA format to the specified file.
Use '-' to indicate STDOUT.
--tab filename
Write output sequences in TAB format to the specified file.
Use '-' to indicate STDOUT.
--report filename
Write report to the specified file.
Use '-' to indicate STDOUT.
-m, --matrix tablename/file
Use alternative translation matrix instead of the build-in
Standard Genetic Code for translation.
If "tablename" is 1-6,9-16 or 21-23 one of the alternative
translation tables defined by the NCBI taxonomy group will be
used.
Briefly, the following tables are defined:
-----------------------------------------
1: The Standard Code
2: The Vertebrate Mitochondrial Code
3: The Yeast Mitochondrial Code
4: The Mold, Protozoan, and Coelenterate Mitochondrial Code
and the Mycoplasma/Spiroplasma Code
5: The Invertebrate Mitochondrial Code
6: The Ciliate, Dasycladacean and Hexamita Nuclear Code
9: The Echinoderm and Flatworm Mitochondrial Code
10: The Euplotid Nuclear Code
11: The Bacterial and Plant Plastid Code
12: The Alternative Yeast Nuclear Code
13: The Ascidian Mitochondrial Code
14: The Alternative Flatworm Mitochondrial Code
15: Blepharisma Nuclear Code
16: Chlorophycean Mitochondrial Code
21: Trematode Mitochondrial Code
22: Scenedesmus obliquus mitochondrial Code
23: Thraustochytrium Mitochondrial Code
See http://www.ncbi.nlm.nih.gov/Taxonomy [Genetic Codes]
for a detailed description. Please notice that the table
of start codons is also used (see the --allinternal option
below for details).
If a filename is supplied the translation table is read from
file instead.
The file should contain one line per codon in the format:
codon[whitespace]aa-single letter code
All 64 codons must be included. Stop codons is specified
by "*". T and U is interchangeable. Blank lines and lines
starting with "#" are ignored.
See the "gcMitVertebrate.mtx" file in the dna2pep source
distribution for a well documented example.
-r x, --readingframe=x
Specify the reading frame. For input files in TAB format this
options is ignored, and the reading frame is build from the
annotated Intron/Exon structure.
1: Reading frame 1 (e.g. ATGxxxxxx). DEFAULT.
2: Reading frame 2 (e.g. xATGxxxxx).
3: Reading frame 3 (e.g. xxATGxxxx).
-1: Reading frame 1 on the minus strand.
-2: Reading frame 2 on the minus strand.
-3: Reading frame 3 on the minus strand.
all: Try all reading frames.
This option also implies the -x option.
plus: All positive reading frames.
This option also implies the -x option.
minus: All negative reading frames.
This option also implies the -x option.
-o mode, --orf mode
Report longest ORF in the reading frame(s) specified with the
-r option.
Mode governs which criterias are used to allow the opening of
an ORF. "Strict start codons" => codons _always_ coding for
methione (e.g. ATG in the standard code), "Minor start codons"
=> codon only coding for methionine at the start positon
(e.g. TTG in the standard genetic code).
Mode can be:
------------
strict: Open an ORF at "strict start codons" only.
any: Open an ORF at any start codon.
none: Do not use start codons - look for the longest
fragment before a STOP codon.
The DNA fragment usedfor encoding the ORF will be added to the
comment field (TAB format only).
-a, --allinternal
By default the very first codon in each sequences is assumed
to be the initial codon on the transcript. This means certain
non-methionine codons actually codes for metionine at this
position. For example "TTG" in the standard genetic code (see
above).
Selecting this option treats all codons as internal codons.
-x, --readthroughstop
Allow the translation to continue after a stop codon is reached.
The stop codon will be marked as "*".
-p, --plain, --ignoreannotation
Ignore annotation for TAB files. If this options is selected
TAB files will be treated in same way as FASTA files.
-c, --comment
Preserve the comment field in TAB files. Normally the comment
field is silently dropped, since it makes no sense for FASTA
files.
-C, --processcomment
Works as the -c option described above, except a bit of intelligent
parsing is done on the comment field: If a "/spliced_product"
sub-field is found (from TAB files create by FeatureExtract / gb2tab)
only the part of the comment field before the DNA specific information
is kept in the comment field.
-e, --exonstructure
Default for TAB files. Annotate the underlying exons structure
of the translated sequence the following way. Positions that
are fully or partially encoded within the first exon get the
annotation character "1", positions in the secon exon get the
character "2" etc.
The hex-decimal system is used, which means up to 15 exons can
be uniquely annotated, before the numbering wraps around to "0".
-i, --intronphase
Annotate where an intron interrupted the DNA sequences, and how
the intron did cut the readingframe.
0 : phase-0 intron (inbetween the previous and current position).
1 : phase-1 intron.
2 : phase-2 intron.
AUTHOR
Rasmus Wernersson, raz@cbs.dtu.dk
Feb-Mar 2006
FILES
dna2pep.py, mod_translate.py, ncbi_genetic_codes.py
WEB PAGE
http://www.cbs.dtu.dk/services/VirtualRibosome/
REFERENCE
Rasmus Wernersson
Virtual Ribosome - Comprehensive DNA translation tool.
Submitted to Nucleic Acids Research, 2006
|