Setup exercise

  1. Set environment and copy files required for the exercise
    umask 022
    cp -rL ~www/pub/CBS/courses/thaiworkshop08/m18 ~/
    cd ~/m18

BLASTatlas for Burkholderia

The previous BLAST atlase you constructed shows the homology between your own bacterium and those of the other groups - all of which were belonging to different bacterial genera. Now, we will use the same method to compare multiple Burkholderia species.

We have prepared a set of files containing the proteomes of all the currently sequenced Burholderia genomes. Burholderia has three chromosomes and all replicons including plasmids have been merged into a single file for each strain.

  1. Examine the file list
    less ./Burkholderia_blastatlas.conf
  2. Download genome sequence and annotated proteins using web services
    perl BX571965 > BX571965.fsa
    perl BX571965 > BX571965.proteins.fsa
  3. Get genbank record and extract annotations
    getgene BX571965 | saco_convert -I genbank -O annotation  > BX571965.ann
    less BX571965.ann
  4. Running BLASTatlas web service
    perl -ref BX571965.fsa -t "B. pseudomallei K96243, chr. I" \
     -dnap "Intrinsic Curvature,Stacking Energy,Position Preference,Percent AT" \
     -proteins BX571965.proteins.fsa -ann BX571965.ann -blastcfg Burkholderia_blastatlas.conf > Burkholderia_blastatlas.pdf
    Open the PDF file: m18/Burkholderia_blastatlas.pdf

P. marinus BLAST atlas)

  1. Download genome sequence and annotated proteins using web services
    perl CP000111 > CP000111.fsa
    perl CP000111 > CP000111.proteins.fsa
  2. Get genbank record and extract annotations
    getgene CP000111 | saco_convert -I genbank -O annotation  > CP000111.ann
    perl -ref CP000111.fsa -t "P. marinus str. MIT 9312" \
     -dnap "Intrinsic Curvature,Stacking Energy,Position Preference,Percent AT" \
     -proteins CP000111.proteins.fsa -ann CP000111.ann -blastcfg Pmarinus_blastatlas.conf > Pmarinus_blastatlas.pdf
    Open the PDF file: m18/Pmarinus_blastatlas.pdf

Core- and pan genome for Burkholderia

We have prepared a script which performs a number of BLAST searches, provided a list of proteomes. For every proteome that occurs in the input to the program, it performs a BLAST search against all previously occurring proteomes. The result is a set of numbers specific for that time point that represents the proteome in the order of the input list, showing: The script will accept a number of proteomes (pr1, pr2, .. prN) and perform a BLAST search of each proteome against all the previous: After these searches, the program will derive the number of core and pan proteins for each proteome. The output list will the be redirected into an R-script which plots all the core/pan values as a function of the proteome number. Just like the BLAST matrix script you tried yesterday, this script will cache all the BLAST results. In the event you change the order of the input proteins, all BLAST searches must be carried out again. Therefor, we have prepared two runs for you:
  1. less burkholderia.listA
    perl coregenome-1.2 < burkholderia.listA > data.dat
    less data.dat
    R --vanilla < coreplot.R
    gmake coreplotA.pdf
    Open the PDF file: m18/coreplotA.pdf
  2. less burkholderia.listB
    perl coregenome-1.2 < burkholderia.listB > data.dat
    less data.dat
    R --vanilla < coreplot.R
    gmake coreplotB.pdf
    Open the PDF file: m18/coreplotB.pdf
    QUESTION What is the difference between the two input lists - and what is the difference between the out output plots?