Biological definitions


Reading frame:
It takes 3 bases to code for an amino acid in a DNA string. Each group of 3 bases is called a triplet or codon. From this follows that there are 3 different reading frames in DNA.
AGGATGCGAGATCAAGACGACTACGACTCACACACGACTTACTAGAAATGCGC
1. AGG ATG CGA GAT CAA GAC GAC TAC GAC TCA CAC ACG ACT TAC TAG AAA TGC GC
2. A GGA TGC GAG ATC AAG ACG ACT ACG ACT CAC ACA CGA CTT ACT AGA AAT GCG C
3. AG GAT GCG AGA TCA AGA CGA CTA CGA CTC ACA CAC GAC TTA CTA GAA ATG CGC

Official documentation of SwissProt format
http://arep.med.harvard.edu/labgc/jong/Fetch/SwissProtAll.html

Description of FASTA file format:
Every sequence starts with a header line, where the very first character is a > followed immediately by a unique sequence id (at the least, unique for the file). Optionally the id can be followed by whitespace and some relevant text, but all the text has to be on the header line only. On the lines following the header line is the sequence, which can be a nucleotide or amino acid sequence. Usually a sequence line contains 60 units (or less if it's the last line), but there are no limitations. Whitespace in the sequence is allowed but ignored. See example below:

>SequenceID One line of text describing the sequence
MFLRRAAVAPQRAPILRPAFVPHVLQRADSALSSAAAGPRPMALRPPHQALVGPPLPGPP
GPPMMLPPMARAPGPPLGSMAALRPPLEEPAAPRELGLGLGLGLKEKEEAVVAAAAGLEE
ASAAVAVGAGGAPAGPAVIGPSLPLALAMPLPEPEPLPLPLEVVRGLLPPLRIPELLSLR
PRPRPPRPEPPPGLMALEVPEPLGEDKKKGKPEKLKRCIRTAAG
>NewSequenceID One line of text describing the sequence
MAELKYISGFGNECSSEDPRCPGSLPEGQNNPQVCPYNLYAEQLSGSAFTCPRSTNKRSW
LYRILPSVSHKPFESIDEGHVTHNWDEVDPDPNQLRWKPFEIPKASQKKVDFVSGLHTLC
GAGDIKSNNGLAIHIFLCNTSMENRCFYNSDGDFLIVPQKGNLLIYTEFGKMLVQPNEIC

This page was last updated         by Peter Wad Sackett, pws@cbs.dtu.dk