Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Algorithms in Bioinformatics

Information for participants

GENERAL SCHEDULE
The course will in a two weeks period from Monday 28th of February - Friday 11th of March, each day from 10-17. Lectures will be in the morning from 10.00 - 12.00, and practical exercises in the afternoon from 13.00 - 17.00.

Mornings will consist of lectures and small practical exercises introducing the different algorithms, and afternoons will consist of programming exercises where the algorithms will be implemented.

The main programming language will be C, and all program templates provided in the course will be written in C. Prior knowledge of C programming is NOT required. However, basic programming skills are required to follow the course.

LITERATURE: Compendium of review papers and selected chapters from Immunological Bioinformatics, Lund et al., MIT Press, 2005. The compendium will be handed out on the first day of the course. Other material will be made available online during the course.


Course Programme
Please note that the programme is updated on a regular basis - click the 'refresh' button once in a while to make sure that you have the most updated information


O Monday 28 of February: Course Introduction.

Introduction to course, Immune system overview, UNIX crash course 101
The over all structure of the course is described, and a short overview of the immune system is given. Next an introduction to the UNIX operating system including an exercise covering the most essential UNIX commands is given, as well as a short exercise introducing the c-code template of the course.
Morten Nielsen


  • BACKGROUND TEXTS
    • C notes by Tom Macke. This PDF document is password protected, you get the password in the first course lecture.
    10.00 - 10.20
    Introduction to course
    Introduction to course. [PDF] .
    10.20 - 11.20
    Immune system overview
    Immune system overview. [PDF] .
    11.20 - 12.30
    Unix crash course
    A UNIX/Linux crash course
    12.30 - 13.30
    Lunch break
    13.30 - 17.00
    C-programming crash course 101
    The development of the first c-program working using command line parsing and linked lists.
    C routine, linked listed and commandline parsing
    Details on C routines, linked lists
    Some notes on command line parsing. [PDF].
    Doing it on your local machine
    Introduction to exercise. Sequence alignment refresher
    Some notes on Sequence alignment. [PDF].
    The exercise
    C-programming crash course
    Answers to C-programming exercise


    O Tuesday 1 March: Weight matrix construction.
    Weight matrix (PSSM) construction, Gibbs sampling, and Blosum matrices
    The concept of sequence information and motif are introduced, and the theoretical background including sequence weighting and pseudo counts for weight matrix construction are described. The theoretical background for Blosum matrices is described.
    Morten Nielsen

  • BACKGROUND TEXTS
    • Immunological Bioinformatics. MIT Press. Chapter 4.
    10.00 - 11.15
    Weight matrix construction. [PDF].
    Handout. Estimation of pseudo counts. PLEASE PRINT AND BRING TO THE LECTURE
    NOTE. PLEASE BRING A CALCULATOR OR CELL PHONE WITH SOME SIMPLE MATH FUNCTIONS TO THE LECTURE SINCE YOU WILL BE ASKED TO DO SOME SIMPLE MATH CALCULATIONS.
    Answer
    11.15 - 11.35
    Gibbs sampling. [PDF] .
    11.35 - 12.00
    Blosum scoring matrices [PDF] .
    12.00 - 13.00
    Lunch break
    13.00 - 17.00
    Implementation of PSSM construction from pre-aligned sequences including pseudo count correction for low counts and sequence clustering
    PSSM construction and evaluation
    PSSM answers


    O Wednesday 2 March. Sequence alignment and data redundancy reduction algorithms..
    Sequence alignment and Dynamic programming
    A detailed introduction to dynamic programming and sequence alignment is given. In particular it is demonstrated how and why the use of sequence profiles can improve the alignment accuracy significantly. Next, two commonly use algorithms for sequence redundancy reduction, Hobohm 1 and Hobohm 2, are described.
    Morten Nielsen

    BACKGROUND TEXTS
    • Immunological Bioinformatics. MIT Press. Chapter 3.
    • A General method applicable to search for similarities in the amino acid sequences of two proteins. S. B. Needleman and C. D Wunsch J.Mol. Biol (1970), 48.
    • Identification of common molecular subsequences. T. F. Smith and M. S. Waterman. J. Mol. Biol. (1981), 147.
    • An improved algorithm for matching biological sequences. O. Gotoh. J. Mol. Biol. (1982), 162
    • Selection of representative protein data sets. U. Hobohm et al. Protein Science. 1992.
    • Protein distance constraints predicted by neural networks and probability density functions. O. Lund et al. Protein Engineering. (1997).
    10.00 - 11.00
    Sequence alignment [PDF] .
    Handout 1. PLEASE PRINT AND BRING TO CLASS
    Handout 2. PLEASE PRINT AND BRING TO CLASS
    Handout answers
    A few words on sequence profiles [PDF] .
    11.15- 12.00
    Hobohm algorithms
    Data redundancy reduction algorithms (Hobohm1 and Hobohm2). [ PDF].
    12.00 - 13.00
    Lunch break
    13.00 - 17.00
    Implementation of the Smith-Waterman Dynamic programming and the two hobohm algortihms
    Matrix dumps from alignment programs (to be used for debugging)
    Answers to sequence alignment exercise
    Hobohm redundancy reduction algorithms
    Answers to Hobohm programming exercise


    O Thursday 3 March. Hidden Markov Models.
    Hidden Markov Models
    The concepts of hidden Markov Models are introduced. It is described how to construct a hidden Markov model from data, and the Viterbi and posterior decoding algorithms are described in detail. Next it is demonstrated how HMM can be used to predict features of un-characterized sequences.
    Morten Nielsen

  • BACKGROUND TEXTS
    • Immunological Bioinformatics. MIT Press. Chapter 4.
    • Notes by Anders Krog in Compendium.
    10.00 - 10.45
    Hidden Markov models introduction
    HMM slides [ PDF].
    10.45 - 11.15
    Viterbi decoding
    Viterbi Handout PLEASE PRINT AND BRING TO CLASS
    Answers
    NOTE. PLEASE BRING A CALCULATOR OR CELL PHONE WITH SOME SIMPLE MATH FUNCTIONS TO THE LECTURE SINCE YOU WILL BE ASKED TO DO SOME SIMPLE MATH CALCULATIONS.
    11.15 - 12.00
    Forward/Backward algorithm, Posterior decoding, Baum-Welsh learning
    Forward Handout PLEASE PRINT AND BRING TO CLASS
    Answers
    12.00 - 12.15
    Profile Hidden Markov Models.
    12.15 - 13.15
    Lunch break
    13.15 - 17.00
    Develop program implementing the Viterbi and posterior decoding algorithms for HMM's.
    Hidden Markov exercises
    Answers to hidden Markov model exercise

    O Friday 4 March. Artificial neural networks..
    Artificial neural networks
    Artificial neural networks (ANN) are introduced. It is described how biological data can be encoded to optimally train ANN, and the feed-forward algorithm is introduced demonstrating how ANN can be used to predict properties for biological data. Next, the back-propagation algorithm used for ANN training is introduced, and the problem of method over-fitting is illustrated. It is shown how cross-validation can deal with this problem, and strategies for neural network training are shown.
    Morten Nielsen

  • BACKGROUND TEXTS
    • Immunological Bioinformatics. MIT Press. Chapter 4.
    10.00 - 11.45
    Artificial neural networks. [PDF] .
    • Background
    • Feed forward algorithm
    • Network training, back-propagation, gradient decent
    • Sequence encoding
    • Overfitting and cross-validation
    Handout (forward) PLEASE PRINT AND BRING TO CLASS
    Handout (forward) answers
    Handout (back-propagation) PLEASE PRINT AND BRING TO CLASS
    Handout (back-propagation) answers
    NOTE. PLEASE BRING A CALCULATOR OR CELL PHONE WITH SOME SIMPLE MATH FUNCTIONS TO THE LECTURE SINCE YOU WILL BE ASKED TO DO SOME SIMPLE MATH CALCULATIONS.
    11.45 - 12.00
    Description of project format, project suggestions and formation of groups
    Project description. [PDF] .
    12.00 - 13.00
    Lunch Break
    13.00 - 17.00
    Implementation of sequence encoding and feed forward algorithm
    Network part I answers
    Implementation of Implementation of back-propagation and neural network training
    Network part 2 answers


    O Monday 7 March - Thursday 10 March. Project work
    Project work
    In the project period, the student will apply one or more of the algorithms introduced in the course to gain knowledge about a biological problem of interest. This can be in form of development of prediction methods, or application of all ready developed prediction software to gain novel biological insights. The project should be documented as a report written in form of a research paper.

    The project must be submitted by email to me by Thursday 10 of March at 13.00.


    O Friday 11 March

    Exam. 10.00 - 17.00

    Remember that the exam consists first of a 15 minutes presentation of the project by both members of each group followed by 10-15 minutes individual examination in the course curriculum. In the examination I will not ask specific question to the c-code, but ask questions on the algorithms (how they work, what they do etc). I will strongly recommend that you take a careful look at all the handouts we have used in the course in order to prepare yourself for the exam.
    Program for exam in Algorithms in Bioinformatics
    10.00 - 10.50 Group 1, Baum- Welsh,
    Pablo Gauna
    Paula Beati
    Eugenio Costa
    Ezequiel Bos
    10.55 - 11.25 Group 2, MHC binding using PSSM
    Adrian Tamburri
    Cecilia Sanchez
    11.30 - 12.10 Group 3, Fake versus True CV
    Paula Magarinos
    Santiago Carmona
    Ricci Alejandro
    12.15 - 12.55 Group 4 Baum-Welsh
    Federico Pousa
    Facundo Carrillo
    Felipe Schargorosdky
    12.55 - 13.45 Lunch break
    13.45 - 14.35 Group 5, MHC binding using PSSM
    Esteban Lanzarotti
    Daniel Alvarez
    Omar Arganaras
    14.40 - 15.10 Group 6, Fake versus True CV
    Christina Cossio Mercado
    Miguel Martinez Soler
    15.15 - 16.15 Group 7, PSSM versus ANN
    Mariano Semelman
    Thomas Fischer
    Kevin Allekotte
    16.20 - 17.00 Group 8, PSSM - DNA
    Gonzalo Parra
    Franco Simonetti
    Agustin Izquierdo
    17.00 - 17.20 Group 9, NN
    Michael Jenik



    Go to