Quick guide
Start by pasting in (or uploading) you DNA sequences in FASTA (sequence only),
GenBank
(CDS elements will be extracted) or
TAB
(sequence + annotation) format. Free format: if you have a single sequence you want to translate, you
can simply paste it in. In this case all non-alphabetic characters (such as numbers) are
ignored, making it easy to just copy and paste the sequence from most other data formats.
Hit "Submit query" to run the translation using the Standard Genetic Code and default
parameters.
Options
Translation table
This is the single most important option in the Virtual Ribosome, since it is here it is possible
to change the translation table used. All translation tables defined by the NCBI taxonomy group
can be selected (see details here:
The Genetic Codes [NCBI]). Please notice that the alternative start codons defined
in each translation table is also used. For example, in the Standard Genetic Code (see table
below), the codons TTG and CTG is allowed as methionie coding
start-codons.
Standard Genetic code
AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = ---M---------------M---------------M----------------------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Start codons
This option is closely related to the alternative start-codons mentioned above.
By default the very first codon in the DNA sequence, is considered to be the start-codon
which means the special rules of alternative start-codons applies and (for example) the codon
TTG will code for methionie and not leucine as when the codon
is used internally.
By selecting the "All codons are internal" option all of the sequence is considered internal
and start-codon rules are not applied (useful for working with sequence fragments).
Stop codons
This option determines if the tranlation should be terminated at the first encountered stop-codon
or not. The default is to read through the entire sequence marking stop-codons with "*".
Reading frame
This option governs the reading frame to use for translation. It's possibe to select
either a single reading frame (1, 2, 3 on the plus strand and -1, -2, -3 on the minus strand),
or a set of multiple reading frames ("all" = all 6; "plus" = 1, 2, 3; "minus" = -1, -2, -3).
When a single reading frame is selected, the output is (obviously) shown in regard to
this frame only. For example:
VIRTUAL RIBOSOME
----------------
Translation table: Standard SGC0
>Seq1
Reading frame: 1
M V L S A A D K G N V K A A W G K V G G H A A E Y G A E A L
5' ATGGTGCTGTCTGCCGCCGACAAGGGCAATGTCAAGGCCGCCTGGGGCAAGGTTGGCGGCCACGCTGCAGAGTATGGCGCAGAGGCCCTG 90
>>>...)))..............................................................................)))
E R M F L S F P T T K T Y F P H F D L S H G S A Q V K G H G
5' GAGAGGATGTTCCTGAGCTTCCCCACCACCAAGACCTACTTCCCCCACTTCGACCTGAGCCACGGCTCCGCGCAGGTCAAGGGCCACGGC 180
......>>>...))).......................................))).................................
A K V A A A L T K A V E H L D D L P G A L S E L S D L H A H
5' GCGAAGGTGGCCGCCGCGCTGACCAAAGCGGTGGAACACCTGGACGACCTGCCCGGTGCCCTGTCTGAACTGAGTGACCTGCACGCTCAC 270
..................)))..................)))......))).........)))......)))......))).........
K L R V D P V N F K L L S H S L L V T L A S H L P S D F T P
5' AAGCTGCGTGTGGACCCGGTCAACTTCAAGCTTCTGAGCCACTCCCTGCTGGTGACCCTGGCCTCCCACCTCCCCAGTGATTTCACCCCC 360
...)))...........................))).........))))))......)))..............................
A V H A S L D K F L A N V S T V L T S K Y R *
5' GCGGTCCACGCCTCCCTGGACAAGTTCTTGGCCAACGTGAGCACCGTGCTGACCTCCAAATACCGTTAA 429
...............))).........)))..................)))...............***
Annotation key:
>>> : START codon (strict)
))) : START codon (alternative)
*** : STOP
When multiple reading frames are selected the peptides are "stacked" in the visualization,
as seen in the example below. Notice how the START codon "arrows" are reversed on the minus strand
to indicate the direction of translation.
VIRTUAL RIBOSOME
----------------
Translation table: Standard SGC0
>Seq1 - reading frame(s): all
G A V C R R Q G Q C Q G R L G Q G W R P R C R V W R R G P
W C C L P P T R A M S R P P G A R L A A T L Q S M A Q R P W
M V L S A A D K G N V K A A W G K V G G H A A E Y G A E A L
5' ATGGTGCTGTCTGCCGCCGACAAGGGCAATGTCAAGGCCGCCTGGGGCAAGGTTGGCGGCCACGCTGCAGAGTATGGCGCAGAGGCCCTG 90
>>>...))).)))...............>>>..........)))........))).........)))......>>>...........)))
....................(((...(((..***(............(((.................(((.........(((........
3' TACCACGACAGACGGCGGCTGTTCCCGTTACAGTTCCGGCGGACCCCGTTCCAACCGCCGGTGCGACGTCTCATACCGCGTCTCCGGGAC 90
H H Q R G G V L A I D L G G P A L N A A V S C L I A C L G Q
T S D A A S L P L T L A A Q P L T P P W A A S Y P A S A R
P A T Q R R C P C H * P R R P C P Q R G R Q L T H R L P G P
Annotation key:
PLUS strand
-----------
>>> : START codon (strict)
))) : START codon (alternative)
*** : STOP
|
MINUS strand
------------
<<< : START codon (strict)
((( : START codon (alternative)
*** : STOP
|
ORF finder
The Virtual Ribosome has the option of scanning the input DNA sequence for ORFs
(Open Reading Frames).
For each sequence the longest ORF is reported. The corresponding DNA fragment is
also included for download (embedded in the comment field of the TAB file).
For each sequence the specified reading frames are scanned for ORFs. The citeria
for opening an ORF can be adjusted as follows:
- Start codon: Strict: Only open ORFs at strict START codons (those always coding for methionine, e.g ATG)
- Start codon: Any: Open ORFs at any START codon.
- Start codon: None: Open ORFs at any codon except STOP.
Advanced options
The advanced options relates the behavior of the translation when TAB format sequences containing
annotation of Intron/exon structure has been used as the input (see detail description of TAB files below).
Derived sequence annotation
When the Inton/Exon structure is known, the Virtual Ribosom automatically extract the
exonic parts of the sequence and performs the translation only on these. Following
translation, an analysis of the underlying Intron/Exon structure is performed and used
to add annotation to the protein sequence, in the form of a TAB file.
Two types on analysis are available:
1) Exon numbering: Each amino-acid is annotated the the number
of the exon which encoded it (or a least hosted the first nucleotide in the codon).
VIRTUAL RIBOSOME
----------------
Translation table: Yeast Mitochondrial SGC2
>Q0045 - translation and annotation of the exonic structure
pep: MVQRWLYSTNAKDIAVLYFMLAIFSGMAGTAMSLIIRLELAAPGSQYLHGNSQLFNVLVVGHAVLMIFFLVMPALIGGFGNYLLPLMIGA 90
ann: 111111111111111111111111111111111111111111111111111111111222222222222333333333333444444444 90
pep: TDTAFPRINNIAFWVLPMGLVCLVTSTLVESGAGTGWTVYPPLSSIQAHSGPSVDLAIFALHLTSISSLLGAINFIVTTLNMRTNGMTMH 180
ann: 444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444 180
pep: KLPLFVWSIFITAFLLLLSLPVLSAGITMLLLDRNFNTSFFEVSGGGDPILYEHLFWFFGHPEVYILIIPGFGIISHVVSTYSKKPVFGE 270
ann: 444444444444444444444444444444444444444444444444444444444444555555555555555555555555555555 270
pep: ISMVYAMASIGLLGFLVWSHHMYIVGLDADTRAYFTSATMIIAIPTGIKIFSWLATIHGGSIRLATPMLYAIAFLFLFTMGGLTGVALAN 360
ann: 555555555555555555555555555555555555555555555555555555666666666666666666666666666666666666 360
pep: ASLDVAFHDTYYVVGHFHYVLSMGAIFSLFAGYYYWSPQILGLNYNEKLAQIQFWLIFIGANVIFFPMHFLGINGMPRRIPDYPDAFAGW 450
ann: 666666666777777777888888888888888888888888888888888888888888888888888888888888888888888888 450
pep: NYVASIGSFIATLSLFLFIYILYDQLVNGLNNKVNNKSVIYNKAPDFVESNTIFNLNTVKSSSIEFLLTSPPAVHSFNTPAVQS* 535
ann: 8888888888888888888888888888888888888888888888888888888888888888888888888888888888888 535
TAB files containing exon-number annotation can be used directly in the
FeatureMap3D server, for mapping the underlying exon-structure onto protein 3D structures.
2) Intron pos vs. reading frame: The underlying reading frame is determined, and an annotation
of intron positions and intron phase is generated.
Phase 0 - an intron exists right before the codon encoding the amino-acid.
Phase 1 - an intron exists in between positions 1 and 2 of the codon.
Phase 2 - an intron exists in between positions 2 and 3 of the codon.
VIRTUAL RIBOSOME
----------------
Translation table: Yeast Mitochondrial SGC2
>Q0045 - translation and annotation of the position and phase of the introns
pep: MVQRWLYSTNAKDIAVLYFMLAIFSGMAGTAMSLIIRLELAAPGSQYLHGNSQLFNVLVVGHAVLMIFFLVMPALIGGFGNYLLPLMIGA 90
ann: ........................................................1...........1............0........ 90
pep: TDTAFPRINNIAFWVLPMGLVCLVTSTLVESGAGTGWTVYPPLSSIQAHSGPSVDLAIFALHLTSISSLLGAINFIVTTLNMRTNGMTMH 180
ann: .......................................................................................... 180
pep: KLPLFVWSIFITAFLLLLSLPVLSAGITMLLLDRNFNTSFFEVSGGGDPILYEHLFWFFGHPEVYILIIPGFGIISHVVSTYSKKPVFGE 270
ann: ............................................................0............................. 270
pep: ISMVYAMASIGLLGFLVWSHHMYIVGLDADTRAYFTSATMIIAIPTGIKIFSWLATIHGGSIRLATPMLYAIAFLFLFTMGGLTGVALAN 360
ann: ......................................................0................................... 360
pep: ASLDVAFHDTYYVVGHFHYVLSMGAIFSLFAGYYYWSPQILGLNYNEKLAQIQFWLIFIGANVIFFPMHFLGINGMPRRIPDYPDAFAGW 450
ann: .........0.......1........................................................................ 450
pep: NYVASIGSFIATLSLFLFIYILYDQLVNGLNNKVNNKSVIYNKAPDFVESNTIFNLNTVKSSSIEFLLTSPPAVHSFNTPAVQS* 535
ann: ..................................................................................... 535
Working with sequence annotation
The annotation string concept
The idea is simply to have a string in addtion to the DNA/peptide sequence which
describes the properties of the sequence. This is done using a simple one-letter code.
All this is described in great details for DNA sequences on the
FeatureExtract server and the publication describing the server
[FeatureExtract - extraction of sequence annotation made easy,
Wernersson, 2005]. Here it is best illustrated by an example:
Sequence: ATGTCTACATATGAAGGTATGTAA
Annotation: (EEEEEEEEEEEEEE)DIIIIIII
E: Exon
I: Intron
(: Start of exon
): End of exon
D: Donor site
A: Accepter site
The Virtual Ribosome looks for the regions annotated as (EEEEEE{many E's}EEEEE)
for finding the exoninc part of a sequence.
TAB format file
The TAB for is very simple. Each line hold the information of exactly one sequence in four
field separated by the TAB character:
Name Seq Ann Com
Name : Name of the sequence.
Seq : The DNA sequence itself
Ann : An annotation string of the same length as the sequences.
Com : A comment field. May be empty.
TAB format files containing information about the Intron/Exon structure, can be generated
from GenBank files using the
FeatureExtract server.
GenBank files
The Virtual Ribosome has the option of working directly with GenBank files.
When a GenBank file is supplied, all CDS elements is extracted to a TAB file (see above)
containing Intron/Exon annotation prior to the translation. This is done by processing the
GenBank entry with the
FeatureExtract software using default parameters.
For greated control of the GenBank parsing process, please use the
FeatureExtract server directly and submit the resultant TAB files to the Virtual Ribosome.
Example input files
Alpha-globins in FASTA format
>AB001981_alpha-A_Pigeon
ATGGTGCTGTCTGCCAACGACAAGAGCAACGTGAAGGCCGTCTTCGGCAAAATCGGCGGC
CAGGCCGGTGACTTGGGTGGTGAAGCCCTGGAGAGGTTGTTCATCACCTACCCCCAGACC
AAGACCTACTTCCCCCACTTCGACCTGTCACATGGCTCCGCTCAGATCAAGGGGCACGGC
AAGAAGGTGGCGGAGGCACTGGTTGAGGCTGCCAACCACATCGATGACATCGCTGGTGCC
CTCTCCAAGCTGAGCGACCTCCACGCCCAAAAGCTCCGTGTGGACCCCGTCAACTTCAAA
CTGCTGGGTCACTGCTTCCTGGTGGTCGTGGCCGTCCACTTCCCCTCTCTCCTGACCCCG
GAGGTCCATGCTTCCCTGGACAAGTTCGTGTGTGCCGTGGGCACCGTCCTTACTGCCAAG
TACCGTTAA
>J00043_Alpha-i_Goat
ATGGTGCTGTCTGCCGCCGACAAGTCCAATGTCAAGGCCGCCTGGGGCAAGGTTGGCGGC
AACGCTGGAGCTTATGGCGCAGAGGCTCTGGAGAGGATGTTCCTGAGCTTCCCCACCACC
AAGACCTACTTCCCCCACTTCGACCTGAGCCACGGCTCGGCCCAGGTCAAGGGCCACGGC
GAGAAGGTGGCCGCCGCGCTGACCAAAGCGGTGGGCCACCTGGACGACCTGCCCGGTACT
CTGTCTGATCTGAGTGACCTGCACGCCCACAAGCTGCGTGTGGACCCGGTCAACTTTAAG
CTTCTGAGCCACTCCCTGCTGGTGACCCTGGCCTGCCACCTCCCCAATGATTTCACCCCC
GCGGTCCACGCCTCCCTGGACAAGTTCTTGGCCAACGTGAGCACCGTGCTGACCTCCAAA
TACCGTTAA
>AF098919_Embryonic_Alpha-pi_Chicken
ATGGCACTGACCCAAGCTGAGAAGGCTGCCGTGACCACCATCTGGGCAAAGGTGGCTACC
CAGATTGAGTCCATTGGGCTGGAATCACTGGAGAGGCTTTTTGCCAGCTATCCTCAGACG
AAAACCTACTTCCCTCACTTTGATGTCAGCCAAGGCTCAGTTCAGCTTCGTGGTCACGGC
TCCAAGGTCCTGAATGCCATTGGGGAAGCTGTGAAGAACATCGATGACATTAGAGGTGCT
TTGGCCAAACTCAGCGAGCTGCATGCTTACATCCTCAGGGTGGACCCAGTGAACTTCAAG
CTGCTTTCCCACTGTATCCTGTGCTCTGTGGCTGCCCGCTATCCCAGTGATTTCACCCCA
GAAGTTCATGCTGCGTGGGACAAGTTCCTGTCCAGCATTTCCTCTGTTCTGACTGAGAAA
TACAGATAA
The Example above can be pasted directly into the text-field in the main windows of the Virtual Ribosome.
Here is a file with 11 Alpha-globins in FASTA format:
alpha-globins.fsa.
Alpha-globins in GenBank format
LOCUS GOTHBAII 1691 bp DNA linear MAM 27-APR-1993
DEFINITION Goat adult alpha-ii-globin gene, complete sequence.
ACCESSION J00044
VERSION J00044.1 GI:164125
KEYWORDS alpha-globin; globin.
SOURCE Capra hircus (goat)
ORGANISM Capra hircus
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia;
Pecora; Bovidae; Caprinae; Capra.
REFERENCE 1 (bases 1 to 1691)
AUTHORS Schon,E.A., Wernke,S.M. and Lingrel,J.B.
TITLE Gene conversion of two functional goat alpha-globin genes preserves
only minimal flanking sequences
JOURNAL J. Biol. Chem. 257 (12), 6825-6835 (1982)
PUBMED 6282825
FEATURES Location/Qualifiers
source 1..1691
/organism="Capra hircus"
/mol_type="genomic DNA"
/db_xref="taxon:9925"
CDS join(745..839,941..1145,1250..1378)
/note="alpha-ii globin"
/codon_start=1
/protein_id="AAA30910.1"
/db_xref="GI:164126"
/translation="MVLSAADKSNVKAAWGKVGSNAGAYGAEALERMFLSFPTTKTYF
PHFDLSHGSAQVKGHGEKVAAALTKAVGHLDDLPGTLSDLSDLHAHKLRVDPVNFKLL
SHSLLVTLACHHPSDFTPAVHASLDKFLANVSTVLTSKYR"
exon <745..839
/note="alpha-ii globin"
exon 941..1145
exon 1250..>1378
/note="alpha-ii globin"
ORIGIN
1 ctgcaggaac cagcacctgg gagaagagac ttgaacccgg acttgaactc cttgcaaatt
61 gctgtaaccc gctctcagta tctgttcctt ccaagactgc cactcagttg cacccaaaaa
121 ctctctgcgg aaagaaagga agctcgaagc gccaaggctg aagaggaaca ggagggttgg
181 acgggggtgg ggaggaattc gcgattacat gtgaacggtg agccaagtgt gttgcgtcgg
241 gctgcctctg gcatggacta ggcgcactca gtcgcccgtt ccttcactga tactgcccaa
301 gtttaaaatg cccagagtgt gccaagctta ggtccggggt gggtagacgg gctgacttac
361 tcccttccgt tctcaagaca gctggggaac tcctgcagga tgcaggagcg ggcatctacc
421 cagctccaca atcccgcccc tgccacctgg cgcgaggcta ccacgtccgg ggaaggtgga
481 cgcagcgggc gggaagcaga cggtggaagc aagaaccccc ggtcagagtc caggtctggg
541 tgggtgaggg aagcacccat cgcccggccg ggcgcaggtc ggactccgcg cgccccctgc
601 ggtcctggtc cggccgcgca tgccgcgtgc cagccaatga gcgcagcgcg ggcgggcgtg
661 cacctggagc cgggcgcata aaggctcgcg cactcgcagc cccgcactct tctggttctg
721 acccagactc agagagaatc caccatggtg ctgtctgccg ccgacaagtc caatgtcaag
781 gccgcctggg gcaaggttgg cagcaacgct ggagcttatg gcgcagaggc tctggagagg
841 tgagcaccgc acccgccccg aggggaccgg gccgctcgcc gggcgcgtcc ttgtaccggg
901 cctctcggcc tgagcccggc tttcccgcct cttcacccag gatgttcctg agcttcccca
961 ccaccaagac ctacttcccc cacttcgacc tgagccacgg ctcggcccag gtcaagggcc
1021 acggcgagaa ggtggccgcc gcgctgacca aagcggtggg ccacctggac gacctgcccg
1081 gtactctgtc tgatctgagt gacctgcacg cccacaagct gcgtgtggac ccggtcaact
1141 ttaaggtgag ctcgcgggcc gggccgggac agacctgggc tagcggggca gagaatgccg
1201 cggcggcccc acccagcccc cgccccactg acgtcccctc tctcggcagc ttctgagcca
1261 ctccctgctg gtgaccctgg cctgccacca ccccagtgat ttcacccccg cggtccacgc
1321 ctccctggac aagttcttgg ccaacgtgag caccgtgctg acctccaaat accgttaagc
1381 tggagcctcg gccaccccta ccctggcctg gagcgccctt gcgctctgcg cactctcacc
1441 tcctgatctt tgaataaagt ctgagtgggc tgcagtgtct gtctgtagcc tcgggtctct
1501 gtgtccgcga accggcccag gttctcattg cctcggacca aggagctctc aggcagctag
1561 agagagaagg ggaaaactgg acggaggggt gggggtgcag cctgccccac tgccactacc
1621 tgggattctc tgggcagccc tcaccctcag cctggagtga tttctgagta tcttggccct
1681 tccctgaatt c
//
Here is a file with 11 Alpha-globins (9 GenBank entries) in a multi-GenBank file:
alpha-globins.gbk.
Alpha-globins in TAB format
Here is a file with 11 Alpha-globins in TAB format:
alpha-globins.tab. The file contains both DNA sequence
and annotation of the Intron/Exon structure. It was generated by parsing the GenBank entries listed below
using the
FeatureExtract server. In this file the naming of each entry has been selected to indicate both
the type of alpha-globin and the organism.
AB001981
X01831
J00923
J00043
J00044
X01086
X07053
AF098919
An reformatted human-readable view of the first entry in the TAB file looks like this:
Name: 'AB001981_alpha-D_Pigeon'
ATGCTGACCGACTCTGACAAGAAGCTGGTCCTGCAGGTGTGGGAGAAGGTGATCCGCCAC 59
(EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 59
CCAGACTGTGGAGCCGAGGCCCTGGAGAGGTGCGGGCTGAGCTTGGGGAAACCATGGGCA 119
EEEEEEEEEEEEEEEEEEEEEEEEEEEE)DIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 119
AGGGGGGCGACTGGGTGGGAGCCCTACAGGGCTGCTGGGGGTTGTTCGGCTGGGGGTCAG 179
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 179
CACTGACCATCCCGCTCCCGCAGCTGTTCACCACCTACCCCCAGACCAAGACCTACTTCC 239
IIIIIIIIIIIIIIIIIIIIIA(EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 239
CCCACTTCGACTTGCACCATGGCTCCGACCAGGTCCGCAACCACGGCAAGAAGGTGTTGG 299
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 299
CCGCCTTGGGCAACGCTGTCAAGAGCCTGGGCAACCTCAGCCAAGCCCTGTCTGACCTCA 359
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 359
GCGACCTGCATGCCTACAACCTGCGTGTCGACCCTGTCAACTTCAAGGCAGGCGGGGGAC 419
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE)DIIIIIIIIIIII 419
GGGGGTCAGGGGCCGGGGAGTTGGGGGCCAGGGACCTGGTTGGGGATCCGGGGCCATGCC 479
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 479
GGCGGTACTGAGCCCTGTTTTGCCTTGCAGCTGCTGGCGCAGTGCTTCCACGTGGTGCTG 539
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIA(EEEEEEEEEEEEEEEEEEEEEEEEEEEEE 539
GCCACACACCTGGGCAACGACTACACCCCGGAGGCACATGCTGCCTTCGACAAGTTCCTG 599
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 599
TCGGCTGTGTGCACCGTGCTGGCCGAGAAGTACAGATAA 638
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE) 638
//
Notice: the sequence above CANNOT be paste into the Virtual Ribosome, since it's NOT in TAB format
and only serves the purpose of illustrating the content of a TAB file.
Since the command-line programs behind both the FeatureExtract server and the
Virtual Ribosome are available as Open-Source packages for download, they can be combined
to form a strong tool-chain, as the following example describe.
gb2tab : The program behind the FeatureExtract server.
dna2pep : The program behind the Virtual Ribosome server.
Example: Translating the Yeast genome
All files for the Yeast genome has been download in GenBank format:
genome[raz]:/home/people/raz/projects/genomes/YeastGenomeNov2005> ll *gbf
-rw------- 1 raz user 479K Nov 30 13:59 chr01.gbf
-rw------- 1 raz user 1.8M Nov 30 13:59 chr02.gbf
-rw------- 1 raz user 701K Nov 30 13:59 chr03.gbf
-rw------- 1 raz user 3.3M Nov 30 13:59 chr04.gbf
-rw------- 1 raz user 1.2M Nov 30 13:59 chr05.gbf
-rw------- 1 raz user 592K Nov 30 13:59 chr06.gbf
-rw------- 1 raz user 2.3M Nov 30 13:59 chr07.gbf
-rw------- 1 raz user 1.2M Nov 30 13:59 chr08.gbf
-rw------- 1 raz user 948K Nov 30 13:59 chr09.gbf
-rw------- 1 raz user 1.6M Nov 30 13:59 chr10.gbf
-rw------- 1 raz user 1.4M Nov 30 13:59 chr11.gbf
-rw------- 1 raz user 2.3M Nov 30 13:59 chr12.gbf
-rw------- 1 raz user 2.0M Nov 30 13:59 chr13.gbf
-rw------- 1 raz user 1.7M Nov 30 13:59 chr14.gbf
-rw------- 1 raz user 2.3M Nov 30 13:59 chr15.gbf
-rw------- 1 raz user 2.0M Nov 30 13:59 chr16.gbf
-rw------- 1 raz user 160K Nov 30 13:59 chrmt.gbf
Extract and translate the nuclear genes:
gb2tab chr{0,1}*gbf | dna2pep > yeast_nuc.tab
Extract and translate the mitochondrial genes:
gb2tab chrmt.gbf | dna2pep -m 3 > yeast_mit.tab
Count number of lines = number of genes:
genome[raz]:/home/people/raz/projects/genomes/YeastGenomeNov2005> wc -l yeast_*.tab
19 yeast_mit.tab
5854 yeast_nuc.tab
5873 total
Find the mitochondrial proteins that originates from genes without introns:
genome[raz]:/home/people/raz/projects/genomes/YeastGenomeNov2005> egrep -v "2222+" yeast_mit.tab | cut -f 1
AI1
AAP1
ATP6
OLI1
VAR1
SCEI
COX2
COX3