Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

RevTrans - Background

It is always preferable to align coding DNA in translated form


Why is it problematic to align DNA sequences of protein encoding genes? First, if you align coding DNA at the DNA level, then you are in effect ignoring your prior knowledge of the structure of the genetic code. Second, you are also ignoring the known evolutionary tendency of amino acids to be substituted with other amino acids that have similar physico-chemical properties. An example should make this clear:

               Codon-aligned:                 DNA-aligned:

M L L I G

ATG CTG TTA ATA GGG ATGCT-GTTAATAGGG
ATG CTC GTT AAT GGG ATGCTCGTTAAT-GGG

M L V T G

In the context of the genetic code, it makes perfect sense to align CTG and CTC which both encode the amino acid leucine. However, from a "DNA point of view" it makes more sense to insert a gap so the terminal G in this codon aligns with the first G in the next codon. It is also acceptable to align the codons TTA (encoding leucine) and GTT (encoding valine) since the encoded amino acids have similar properties (they are both hydrophobic).

Note: these observations also hold true for database searches. Always use a translated version of your coding sequence to search for similar genes!




GETTING HELP

Contact: