Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Comparative Microbial Genomics - #27644

Computer Exercises - BLAST Atlases

Creating a BLAST Atlas (again)
Maintained by Peter F. Hallin. Tim T. Binnewies, and Christoph Champ

BACKGROUND: This week we will repeat the exercise from 2-Nov-2005. We will do things slightly differently and the purpose of this exercise is to help you create your own BLAST Atlas for your Final Report.

Please follow each of the steps listed below carefully and exactly as they are listed.

To make things easier, we are going to use five sample organisms in this exercise. Feel free to replace them with your own favourite organisms.

STEP#1a: Listed below are the five sample organisms we will use (we are listing the GenBanks files here):

(1) Paeruginosa_PAO1_Main.gbk

(2) Pfluorescens_Pf-5_Main.gbk

(3) Pputida_KT2440_Main.gbk

(4) Psyringae_DC3000_Main.gbk

(5) Psyringae_phaseolicola1448A_Main.gbk


STEP#1b:Make sure you have each of the following GenBank (of each organism) in your current working directory (that is, todays exercise directory '23Nov2005'; type 'ls' or 'lt') (NOTE: Do not replace 'mic00' with your number!):

> mkdir 23Nov2005
> cd 23Nov2005
> ln -s ../../mic00/23Nov2005/*.gbk .

STEP#2a: Now, we must decide which organism to use as our "reference genome". When running BLAST Atlases you must use the largest genome in order to obtain meaningful results.


To find out which genome is the largest (i.e. has the most nucleotides), we must check the header of each GenBank file. To do this, issue the following command:


> head -1 *.gbk


(note: The command above tells the computer to display the first line of each GenBank file).


The largest genome will be the one with the most "bp" (basepairs). This is simply the number preceding the 'bp' in the output.


STEP#2b: From step #2a you should have found that P. fluorescens has the largest genome among the five organisms we are working with today (i.e. 7 074 893 bp). This will be our "reference genome". The remaining four organisms will be compared against the P. fluorescens genome.


STEP#3: Now, we must extract the sequenced nucleotides from each of our five GenBank files and convert them to amino acid sequences. We must also extract the gene annotation ('ann') coordinates from our reference genome.


This is accomplished by simply typing each of the following commands (note: You could also run all of them with one command by using 'gmake -j' instead):


> gmake Paeruginosa_PAO1_Main.proteins.fsa

> gmake Pfluorescens_Pf-5_Main.proteins.fsa

> gmake Pputida_KT2440_Main.proteins.fsa

> gmake Psyringae_DC3000_Main.proteins.fsa

> gmake Psyringae_phaseolicola1448A_Main.proteins.fsa


> gmake Pfluorescens_Pf-5_Main.ann


STEP#4: Next, we must establish a 'link' between our organisms (reference with comparisons). This 'link' is used by the make rules to figure out which sequence to BLAST against which. You must give the full path to your reference genome and the reference genome filename without the extension (note: You can find your current full path by typing 'pwd').


Since our reference genome is P. fluorescens, we will echo the full path to it (without the extension) by issuing the following commands (replace 'mic01' with your account):


> echo /home/people/mic01/23Nov2005/Pfluorescens_Pf-5_Main > Paeruginosa_PAO1_Main.proteins.queryblast
> echo /home/people/mic01/23Nov2005/Pfluorescens_Pf-5_Main > Pputida_KT2440_Main.proteins.queryblast
> echo /home/people/mic01/23Nov2005/Pfluorescens_Pf-5_Main > Psyringae_DC3000_Main.proteins.queryblast
> echo /home/people/mic01/23Nov2005/Pfluorescens_Pf-5_Main > Psyringae_phaseolicola1448A_Main.proteins.queryblast


STEP#5a: Now, we must construct a template GeneWiz configuration file (*.cf) of our reference organism. First, login to 'life' using your own password (remember to switch back to today's working directory: 23Nov2005).


> ssh -X life

> cd 23Nov2005

> gmake Pfluorescens_Pf-5_Main.genomeatlas.cf


STEP#5b: We must also modify this configuration file by issuing the following command (this will open a separate window):


> nedit Pfluorescens_Pf-5_Main.genomeatlas.cf


STEP#5c: Insert references to genome/proteome homology in this configuration file.


In the 'dat' lane specification, insert each of the following before all of the other 'dat' specifications (note that you can label each circle by changing the information contained in quotes):

> dat Paeruginosa_PAO1_Main.proteins.blastatlas.genomemap0.gz 1 0.0 0.0 0.0 "Paeruginosa_PAO1" boxfilter 10;

> dat Pputida_KT2440_Main.proteins.blastatlas.genomemap0.gz 1 0.0 0.0 0.0 "Pputida_KT2440" boxfilter 10;

> dat Psyringae_DC3000_Main.proteins.blastatlas.genomemap0.gz 1 0.0 0.0 0.0 "Psyringae_DC3000" boxfilter 10;

> dat Psyringae_phaseolicola1448A_Main.proteins.blastatlas.genomemap0.gz 1 0.0 0.0 0.0 "Psyringae_p.1448A" boxfilter 10;


In the 'circle' lane specification, insert each of the following:

> circle Paeruginosa_PAO1_Main.proteins.blastatlas.genomemap0.gz 1 "101010_000010.cm2" by 0.0 40.0;

> circle Pputida_KT2440_Main.proteins.blastatlas.genomemap0.gz 1 "101010_000010.cm2" by 0.0 40.0;

> circle Psyringae_DC3000_Main.proteins.blastatlas.genomemap0.gz 1 "101010_000010.cm2" by 0.0 40.0;

> circle Psyringae_phaseolicola1448A_Main.proteins.blastatlas.genomemap0.gz 1 "101010_000010.cm2" by 0.0 40.0;


In the 'file' lane specification, insert each of the following:

> file Paeruginosa_PAO1_Main.proteins.blastatlas.ge nomemap0.gz dat;

> file Pputida_KT2440_Main.proteins.blastatlas.genomemap0.gz dat;

> file Psyringae_DC3000_Main.proteins.blastatlas.genomemap0.gz dat;

> file Psyringae_phaseolicola1448A_Main.proteins.blastatlas.genomemap0.gz dat;


STEP#6: Now run the following script:


> ~pfh/scripts/genewiz/cf_data_create.pl Pfluorescens_Pf-5_Main.genomeatlas.cf


This script reads the .cf file and generates all files needed to create a BlastAtlas in GeneWiz.


STEP#7: As a last step, we need to create a .ps (PostScript) file by using the configuration file (.cf):


> ~pfh/scripts/genewiz/cf2pdf.pl Pfluorescens_Pf-5_Main.genomeatlas.cf



STEP#8: Finally, we can view our BlastAltas plot (our results) by issuing the following command:


> ghostview Pfluorescens_Pf-5_Main.genomeatlas.nfp.ps



You can create your own BlastAtlases simply by replacing the above five organisms with your one. Just remember that all of the above steps are necessary for the process to work.







Course Organiser: David W. Ussery  Software questions: Christoph Champ