Ludovic Orlando - Lorlando@snm.ku.dk - +45 353 21 231 AurŽlien Ginolhac - AGinolhac@snm.ku.dk - +45 353 21 499 H‡kon J—nsson - jonsson.hakon@gmail.com Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen - http://geogenetics.ku.dk/research/research_groups/palaeomix_group/ References. - Bos, KI et al. A draft genome of Yersinia pestis from victims of the Black Death. 2011. Nature 478:506-510. - Schuenemann, VJ et al. Targeted enrichment of ancient pathogens yielding the chr plasmid of Yersinia pestis from victims of the Black Death. 2011. PNAS 108:E746-752. - Schubert, M et al. Improving ancient DNA read mapping against modern reference genomes. 2012. BMC genomics 13:178. - J—nsson, H et al. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. 2013. Bioinformatics doi: 10.1093/bioinformatics/btt193. - Li, H, and Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. 2009. Bioinformatics 26:589-595. - Li, H et al. The sequence alignment/map format and SAMtools. 2009. Bioinformatics 25:2078-2079. Outline. 1. Extract data from SRA 2. Check data quality 3. Adapter trimming 4. Check data quality 5. Read mapping/filtering 6. Determine ancient DNA damage patterns 7. Visualizing alignment 8. Recovering sequence variants ###### COMMAND LINES TO PASTE START HERE ############################################### echo "setup shell, MANDATORY STEP!" bash export PATH='/home/local1/27626/bin:'$PATH # Run the SRA conversion tool (see options for SE/PE reads) # extract reads from single-end run fastq-dump --gzip /home/people/ludovic/HandsOnDTUaDNA/SRR341963.sra # Extract both reads from a pair and rename files for practical purpose fastq-dump --gzip --split-3 /home/people/ludovic/HandsOnDTUaDNA/SRR069216.sra rename SRR069216 modern SRR069216* # Modify one of the fastQ file (due to wrong annotation in SRA) seqtk trimfq -e 75 SRR341963.fastq.gz | gzip > ancient_1.fastq.gz seqtk trimfq -b 75 SRR341963.fastq.gz | gzip > ancient_2.fastq.gz # Check the validity of the fastQ file # Display first read (4 lines: name, sequence,+, quality scores) for each file for file in *fastq.gz; do echo $file;gzip -cd $file | head -4; echo ""; done zgrep -c # Count how many reads per file zgrep -c "^@SRR" *.fastq.gz # Run FastQC on the different fastQ files fastqc ancient_1.fastq.gz fastqc ancient_2.fastq.gz fastqc modern_1.fastq.gz fastqc modern_2.fastq.gz # Visualize the output (or open a web-browser, File, Open, Select File) firefox ancient_1_fastqc/fastqc_report.html firefox ancient_2_fastqc/fastqc_report.html firefox modern_1_fastqc/fastqc_report.html firefox modern_2_fastqc/fastqc_report.html # Display options for AdapterRemoval /home/people/ludovic/AdapterRemoval-1.5/AdapterRemoval --help # Trim! /home/people/ludovic/AdapterRemoval-1.5/AdapterRemoval \ --file1 <(gzip -cd ancient_1.fastq.gz) \ --file2 <(gzip -cd ancient_2.fastq.gz) \ --trimns \ --trimqualities \ --minlength 25 \ --qualitybase 33 \ --collapse \ --mm 3 \ --settings ancient.settings \ --discarded >(gzip > ancient.discard.fastq.gz) \ --outputcollapsed >(gzip > ancient.collapsed.fastq.gz) \ --outputcollapsedtruncated >(gzip > ancient.collapsed.trunc.fastq.gz) \ --singleton >(gzip > ancient.singleton.trunc.fastq.gz) \ --output1 >(gzip > ancient.pair1.fastq.gz) \ --output2 >(gzip > ancient.pair2.fastq.gz) # Display one case of trimming /home/people/ludovic/HandsOnDTUaDNA/seaview /home/people/ludovic/HandsOnDTUaDNA/adapter_align.fasta # Display summary for the trimming cat ancient.settings # AdapterRemoval --minquality for modern data # must adapt the minimal quality to streches of Phred score 4 /home/people/ludovic/AdapterRemoval-1.5/AdapterRemoval \ --file1 <(gzip -cd modern_1.fastq.gz) \ --file2 <(gzip -cd modern_2.fastq.gz) \ --trimns \ --trimqualities \ --minlength 25 \ --qualitybase 33 \ --collapse \ --mm 3 \ --settings modern.settings \ --discarded >(gzip > modern.discard.fastq.gz) \ --outputcollapsed >(gzip > modern.collapsed.fastq.gz) \ --outputcollapsedtruncated >(gzip > modern.collapsed.trunc.fastq.gz) \ --singleton >(gzip > modern.singleton.trunc.fastq.gz) \ --output1 >(gzip > modern.pair1.fastq.gz) \ --output2 >(gzip > modern.pair2.fastq.gz) \ --minquality 4 cat modern.settings # Checking trimmed reads quality fastqc ancient.collapsed.fastq.gz fastqc modern.pair1.fastq.gz fastqc modern.pair2.fastq.gz # display results firefox ancient.collapsed_fastqc/fastqc_report.html firefox modern.pair1_fastqc/fastqc_report.html firefox modern.pair2_fastqc/fastqc_report.html # Read mapping against a reference genome # Copy the genome to your current directory cp /home/people/ludovic/HandsOnDTUaDNA/Yersinia_pestisCO92chr.fasta . bwa index Yersinia_pestisCO92chr.fasta -a is # Map, convert to genomic coordinates, remove unmapped reads and sort by reference coordinates bwa aln -l 1024 Yersinia_pestisCO92chr.fasta ancient.collapsed.fastq.gz | \ bwa samse Yersinia_pestisCO92chr.fasta - ancient.collapsed.fastq.gz | \ samtools view -bSh -q 30 -F0x4 - | \ samtools sort - ancient.collapsed.fastq.chr.sort # Display the bam file, reads aligned to the reference, observe PCR duplicates samtools view ancient.collapsed.fastq.chr.sort.bam | less -S # Remove duplicates java -jar /home/local1/27626/bin/MarkDuplicates.jar I=ancient.collapsed.fastq.chr.sort.bam O=ancient.collapsed.fastq.chr.Mkdup.bam AS=TRUE M=/dev/null REMOVE_DUPLICATES=TRUE samtools view ancient.collapsed.fastq.chr.Mkdup.bam | less -S # Rough estimate of coverage: number of bases of aligned reads divided by length of the reference samtools view ancient.collapsed.fastq.chr.Mkdup.bam | \ awk '{sum+=length($10)}END{print sum" bases, "NR" reads"}' seqtk comp Yersinia_pestisCO92chr.fasta # Repeat for the modern data bwa aln Yersinia_pestisCO92chr.fasta modern.pair1.fastq.gz > modern.pair1.sai bwa aln Yersinia_pestisCO92chr.fasta modern.pair2.fastq.gz > modern.pair2.sai bwa sampe Yersinia_pestisCO92chr.fasta \ modern.pair1.sai modern.pair2.sai \ modern.pair1.fastq.gz modern.pair2.fastq.gz |\ samtools view -bSh -q 30 -F0x4 - | \ samtools sort - modern.pairs.fastq.chr.sort # Display the bam file, reads aligned to the reference, observe PCR duplicates samtools view modern.pairs.fastq.chr.sort.bam | less -S # Remove duplicates java -jar /home/local1/27626/bin/MarkDuplicates.jar I=modern.pairs.fastq.chr.sort.bam O=modern.pairs.fastq.chr.Mkdup.bam AS=TRUE M=/dev/null REMOVE_DUPLICATES=TRUE # Rough estimate of coverage: number of bases of aligned reads divided by length of the reference samtools view modern.pairs.fastq.chr.Mkdup.bam | \ awk '{sum+=length($10)}END{print sum" bases, "NR" reads"}' seqtk comp Yersinia_pestisCO92chr.fasta # Run mapDamage # Compare fragmentation and misincorporation patterns for the ancient and modern samples mapDamage -i modern.pairs.fastq.chr.Mkdup.bam \ -r Yersinia_pestisCO92chr.fasta --no-stats mapDamage -i ancient.collapsed.fastq.chr.Mkdup.bam \ -r Yersinia_pestisCO92chr.fasta --no-stats # Estimates of DNA damage patterns using the Neandertal genome mapDamage -i /home/people/ludovic/HandsOnDTUaDNA/SLMez1.hg18_chrM.bam \ -r /home/people/ludovic/HandsOnDTUaDNA/chrM.fa -v # display results okular results_SLMez1.hg18_chrM/Fragmisincorporation_plot.pdf okular results_SLMez1.hg18_chrM/Length_plot.pdf okular results_SLMez1.hg18_chrM/Stats_out_MCMC_hist.pdf less results_SLMez1.hg18_chrM/Stats_out_MCMC_iter_summ_stat.csv okular results_SLMez1.hg18_chrM/Stats_out_MCMC_post_pred.pdf okular results_SLMez1.hg18_chrM/Stats_out_MCMC_trace.pdf # Recovering sequence variants for Yersinia pestis Black plague # Index bam file samtools index ancient.collapsed.fastq.chr.Mkdup.bam # SNP calling: samtools mpileup samtools mpileup -f Yersinia_pestisCO92chr.fasta \ -B ancient.collapsed.fastq.chr.Mkdup.bam \ -u | \ bcftools view -cvg - | \ vcfutils.pl varFilter -d8 -Q30 -D 100 | \ awk '{if($6>=30){print $0}}' | \ egrep ",0:[0-9]+" | \ grep -v INDEL # Display SNPs samtools tview ancient.collapsed.fastq.chr.Mkdup.bam Yersinia_pestisCO92chr.fasta # and then, press g Goto: appears, enter chromosome and position such as: # NC_003143.1chromosome:74539 # NC_003143.1chromosome:286528 # NC_003143.1chromosome:1178178 # to be compared with false-positive calls if not all filters are on # NC_003143.1chromosome:3392897