Exercise 3, Translation & UniProt --------------------------------- Answers by: Francisco Roque and Nils Weinhold (Some editing by Rasmus Wernersson) VIRTUAL RIBOSOME SECTION. ------------------------ POINT 5. How is a STOP codon displayed? *** How is a START codon displayed? >>> (Strict e.g "ATG") ))) (Alternative e.g "TTG") Does a start-codon alway code for Methionine (M)? Yes - but only if it's atucally used as a start codon. ATG always codes for Methionine. TTG and CTG codes for Met is used as a start codon otherwise they code for Leucine. The strict start codon (ATG) is used in >98% of the human transcripts. POINT 6. How did the translation succeed? Nothing is wrong with the DNA sequence. Can you come up with some good reasons for the result? The translation did not succeed very well as there are many stops in the protein sequence. The reason is that mitochondria use a different code. POINT 9. If you have chosen the right translation table, the DNA sequence can be translated without any problems.Compare the two results and answer the following questions: What is the difference in the use of STOP codons? Mitochondria don't use TGA, but TAA What is the difference in the use of START codons? ATG, TTG, CTG in standard, ATG, ATA in yeast mitochondria Are codons coding for completely new amino acids? A few codons has changed their meaning: CTN (CTA, CTC, CTG, CTT) now encodes Thr instead of Leu. By following the link to the explanation of the genetic codes (linked both in the exercise and on the VirtualRibosome homepage), the following summary of the difference can be found - "Code 3" is the Yeast Mitochondrial code: Code 3 Standard AUA Met M Ile I CUU Thr T Leu L CUC Thr T Leu L CUA Thr T Leu L CUG Thr T Leu L UGA Trp W Ter * CGA absent Arg R CGC absent Arg R POINT 11. We have up to now assumed that the reading frame for the DNA-sequence was known and that it always started at the first nucleotide. In he following, we shall examen how it is often possible to identify the most likely reading frame using computational translation tools.We shall use the the sequence below which is the complete mRNA sequence for a yeast-gene (profilin). Use your biological knowledge to answer the following questions: Yeast has introns in some genes, could this be a major problem in this case? in this case its not a problem, bec we already deal with processed mRNA Can an mRNA molecule contain more sequence than the gene in question? - yes, it can contain untranslated regions at both ends, and signal sequences like the cap, and polyA-tail POINT 12. Six reading frame exist: 1, 2, 3 (on the positive stand, i.e. the sequence as you read it), and -1, -2, -3 (on the negative strand, i.e the complementary DNA string). Since we are working with a mRNA sequence, we do not need to consider the reading frames on the complementary string. Why is this? because mRNA is translated in one direction, its already a copy of one of the strands of DNA POINT 13. Translate the mRNA sequence in the three positive reading frames (1, 2, 3). The easiest way to do this, is to use a window for each translation to be able to compare the different results. What reading frame is most likely the right one? 3 Note also hat the DNA-sequence is show equally in all three reading frames whereas the protein sequence is shifted. Why is this? because residues are produced by different triplets POINT 15. For the sake of illustration, we shall try to translate the sequence on the negative strand. Select reading frame -1, and redo the translation How does the DNA sequence look? In what direction shall it be read? The displayed sequence is complementary to the input. It is read bottom-to-top. In what direction shall the protein-sequence be read? Try to compare to the protein sequence in FASTA format. The protein sequence should be read bottom right to top left POINT 16. Now, lets try to do it all in one go. Select All (6 reading frames) and translate the sequence again. How many DNA string are displayed? Why is this? 2 DNA strings, one is the input DNA sequence in 5' -> 3' direction, the other is the computer reverse-complementary DNA sequence (shown in 3' -> 5' direction). POINT 18. We shall now use a build-in ORF finder with the most stringent criteria. Under ORF finder, select Start codon: strict (this forces the ORF to start at ATG), select "All (6 reading frames)" and translate the sequence again. Does the result fit to what you found earlier? yes, frame 3 Would it make any difference to the result if we had only a partial sequence where the last part of the sequence with the STOP codon is missing? No - by definition an ORF is an OPEN reading frame - simply a reading frame that is not interupted. What would happen is the first 50 nucleotide (with the START codon) were missing? One would find a different ORF, on -2. If we relaxed the criteria for the ORF finder to NOT require a start codon, we can still find the correct ORF. UNIPROT SECTION. --------------- POINT 20. 735. These are all the insulin hits present in both the TrEMBL (unreviewed) database, associated using computer generated annotation and large scale analysis, and the manually curated and non-redundant Swiss-Prot. POINT 21. 478. These are the hits that were reviewed by an expert team of biologists. There is a review process of the experimental data or computer-predicted data for each protein. POINT 22. INS_HUMAN. 2nd hit, cleaved in both insulin chains A and B. Accession P01308. POINT 23. By introducing further restrictions in the search query, we might get a smaller number of results. The syntax is a bit different from the exercise last week, but the same logic applies (AND, NOT, OR). a) 358 b) 52 POINT 24. 26 "organism:human AND name:insulin AND reviewed:yes NOT name:insulin-like NOT name:insulin-receptor" POINT 25. 18 "organism:human AND name:insulin AND reviewed:yes NOT name:insulin-like NOT name:insulin-receptor NOT name:substrate" POINT 27. 31. Inspecting the literature for each entry might give us one idea for the quality of the underlying data. POINT 28. Outside the cell membrane in order to increase the cell permeability to monosaccharides, amino acids and fatty acids. This field will give us keywords assigned to the protein if they are found in a specific cellular or extracellular component. POINT 33. 27,597. By combining the different search filters we can easily locate any protein of our choice. There is an autocomplete feature for the search box that might help with the term selection. POINT 34. 1,976 POINT 35. 5,943 POINT 36. 700 POINT 37. 373 POINT 38. 6