Each student will do one of the projects below during this course. The projects are done on individual basis and will count for 25% of the final grade. The project will be handed in through CampusNet (Assignments for the course) on the last day of the course at 15.00 at the latest (lesson 13). The students can consult the teacher or each other on problems in their projects, however, actual cooperation between students with the same project is not allowed.
It is the aim of the teacher, that the students are spread evenly over the projects.
If a student has an idea to a project - perhaps something that the student must do in connection with some other course, then talk to the teacher. Perhaps something can be worked out.
The time allotted for a project is approximately half time for the rest of the course. That translates into 50 hours approx.

Every project consists of the following parts.

  1. The program code itself. The code should be well commented so it is possible to follow your thinking. The major data structures should also be explained (structure and purpose). The code must given in as a plain text file.
  2. A document that describes the algorithm that you implemented, with strengths and weaknesses if any, the expected input data format, and an analysis of the runtime for the algorithm. Runtime has been explained in the course. Click for more on runtime and Big O. The document/report should be in preferably PDF or Word document. It is considered unlikely that that the report is less than 4 pages, but there is no set limitation. The report is evaulated by quality, not by length. Some projects are naturally heavy in theory, others have a more practical approach, and the report is expected to reflect that to some extent.
  3. Any data files if relevant.
  4. Last but not least a signed version of this statement must be included. You can print it, sign it, and scan it at CBS, if you have no other option.

The projects

  1. Random sequence generator
  2. Text mining MEDLINE abstracts
  3. Shortest path in graph
  4. Perl beautifier
  5. K-means clustering
  6. Analysis of sorting
  7. Data mining in NCBI databases
  8. Score sequence data with a PSSM
  9. Searching for signals/motifs in sequences
  10. Data analysis
  11. Sudoku
  12. k-nearest neighbor (k-NN) continuous variable estimation
  13. Read trimmer for Next-Generation-Sequencing data
  14. QT clustering
  15. Artificial Neural Network
  16. Smith-Waterman alignment
  17. Gibbs sampling
Own projects:

Not yet assigned: