Q1. Illumina Q2. Around 84 Q3. N = (M*L)/(L-K+1) = (84*99)/(99-15+1) = 97.84 Genome_size = T/N = (213595090+212449993)/97.84 = 4.35Mb Q4. Mean = 259 ; SD = 11 Q5. It is lower, this means that the actual kmer peak we found (unless you found one higher than 84) is higher (this would give a lower genome size). Q6. 10 of 195 contigs were scaffolded into scaffolds, this is quite few - normally it is much higher. A reason for this could be that our insert size is quite low (~250 bp) and the repeats in the genome are larger than this. Q7. Repeat regions Q8. Contaminations Q9. Properly paired reads: 3814977 Q10. Properly paired flag is 2 Q11. Not very large optimization - probably because we are already at the limit in terms of repeats. To make it better we need longer reads (454, Ion Torrent, Pac Bio) or paired end/mate pair libraries with larger insert sizes. Q12. This is of course just visual, but it seems that most part of the reference genome is covered by our assembly, so yes. Q13. Yes, a couple of the small contigs does not map at all, and the C1097 only maps partially. This could be sequence in our strain, but not in the reference genome. Q14. This is a region with a lot of repeats, this is also why we cant really assemble it. It is used by V. cholerae to integrate new genes into its genome. Q15. The 454 assembly was best.