Some information about the exam, Tuesday, December 19
we noticed that some students spend more time on thinkin (worrying) about the exam
than on the actual course. This is clearly not an ideal situation, so lets
try to sort that out.
The exam will be a 4 hours written exam, without aids and scale of marks
(0-13) - not completely unlike exams hold the last years. To get an
impression of what it's like, take a look at the exam from 2004: part
I, part II
For the issue about what you need to learn/read for the exam, the golden
rule is "if you are unsure, ask the lecturer - if you still are unsure, ask
me". - As there is no textbook for
this course, the official "material" for this course are the lectures
and the exercises. All other handouts, like slides and articles are
additional bonuses for making it easier for you (which means you need to pay attention to the hand-outs and the articles
which the lecturer will tell you about) For making it even easier, I
will ask the lecturers about whats expected from you and collect it on this page.
Collected so far
- Alignments, Phylogeny and Database Searching, Anders Gorm Pedersen
Pairwise alignments: ========================== The student must be able to compute the score of a given pairwise alignment (of DNA or protein sequences) if he/she is provided with a substitution matrix and a gap penalty scoring system. This implies understanding the following concepts: (1) alignment (2) substitution matrix (3) affine gap penalties Furthermore, the student should be able to explain: (1) how the BLOSUM family of substitution matrices are constructed. (2) the biological meaning of gaps in an alignment (3) what is meant by "best" alignment Understanding the dynamic programming algorithm in details is not required, but the student should have an intuitive grasp of its role in finding the best alignment (and why it is necessary). Database searching: ========================== Based on the output of a BLAST database search the student should be able to identify which database hits that are likely to be homologous to the query sequence, and to suggest a likely function of the query. This implies understanding: (1) the role of pairwise alignment in database searching. (2) what an "E-value" is (3) how database searching can be used to infer the function of uncharacterized proteins Multiple alignment: ========================== The student should be able to construct a multiple alignment using the ClustalX program. The student should be able to explain how progressive alignment works. Phylogenetic reconstruction, distance-based methods: ==================================================== Based on a multiple alignment, the student should be able to construct the corresponding distance matrix. If the above-mentioned alignment is sufficiently small (say, less than 6 taxa) and there has been no cases of superimposed substitutions, then the student should furthermore be able to construct a phylogenetic tree based on the distance matrix. Furthermore, the student should be able to explain: (1) the meaning of the words "node", "branch", "polytomy", and "monophyletic" (2) the difference between rooted and unrooted trees (3) how an outgroup can be used to root a tree (4) how the "sum of squared differences" (Q) is computed for a given tree and corresponding distance matrix (5) how the "least squares optimality criterion" is used to construct distance-based trees (6) the influence of superimposed (multiple) substitutions on observed vs. real sequence distances Phylogenetic reconstruction, maximum likelihood: ================================================= Given an alignment, a tree topology, a set of nucleotide frequencies, and a set of nucleotide substitution probabilities (for each branch length occurring on the tree), the student should be able to compute the probability of one column in the alignment (the "likelihood" of the given parameter values). Given the log-likelihood and the number of free parameters for two models that have been fitted to the same data set (alignment) the student should be able to use a likelihood ratio test to select the best model. The student should be able to explain: (1) the relationship between a scientific hypothesis and a mathematical model (2) what parameters that typically occur in a phylogenetic model (3) the meaning of the word "likelihood" when used about a given set of parameter values (4) how maximum likelihood is used to estimate the parameter values of a given model
- Gene Expression and DNA array technology, Henrik Bjoern Nielsen
The students is expected to be able to explain the main steps in the microarray analysis pipeline. The questionar given during "Discussion" on Friday, September 29, can be used as guideline for the level of the exam questions.
- Protein Modelling, Drug Discovery and Fold recognition, Ole Lund
The students are expected to know about:- Primary, secondary, tertiary & quaternary protein structure
- Criteria used to define secondary structure
- Prediction of secondary structure
- Usage of secondary structure
- Steps involved in protein homology modeling
- Which methods are used to define secondary structures
- Levels of fold classes
- Validation of homology models
- Limitations & pros/cons in Protein Modelling, Drug Discovery and Fold recognition
- Primary, secondary, tertiary & quaternary protein structure
- Datadriven Predictions and gene finding with Hidden Markov Models and Neural Networks
Eukaryotic gene finding: Nikolaj BlomThe students are expected to know about: -The general structure of eukaryotic genes and approximate sizes of exons and introns -At least two features used by gene prediction programs -Why Hidden Markov Models (HMMs) are well suited for gene prediction -How to use basic functions of the HMMgene and Genscan software and interpret the output
Neural Networks: S. Brunak
Lectures and slides
- Systems Biology and Comparative Genomics
Systems Biology: S. Brunak
Lectures and slides
Comparative Genomics: Dave Ussery
Despite rumors of the "Ten years of bacterial genome sequencing" paper, this article is not on the reading list and will not be a requirement for the exam.Lecture #1 "20 Methods to Compare Microbial Genomes". By the end of the lecture, the student should be able to describe various approaches to comparing microbial genomes, and to discuss the relative strengths and weaknesses of the methods. Also, the student should be able to answer the following questions: Why is it important to use more than one method to compare genomes? What is "pangenomics" and why is this important? What is "metagenomics"? What are some of the best methods to compare two bacterial genomes of closely related species? What methods would be useful for comparing about 20 bacterial genomes? What methods would be best for comparing a hundred sequenced genomes? Which methods are best suited for comparing a thousand bacterial genomes? How many bacterial genomes (roughly) are currently publicly available? How many are there likely to be by 2010? Why is this explosion in genomic sequences? Lecture #2 "Global Regulation of Gene Expression in Microbial Genomes". There are (at least) four different levels of regulation - what are they, and in what is the relationship between the number of genes regulated, their specificity of regulation, and the number of molecules of regulators involved? What roles could non-coding RNA play in regulation, and what mechanisms might be involved? What is it that is being regulated in bacterial genomes, and how is it being regulated?