|
Creating
a BLAST Atlas (again) Maintained by Peter F.
Hallin. Tim T. Binnewies, and Christoph Champ
BACKGROUND: This week we
will repeat the exercise from 2-Nov-2005. We will do things slightly
differently and the purpose of this exercise is to help you create
your own BLAST Atlas for your Final Report.
Please follow each of the steps listed
below carefully and exactly as they are listed.
To make things easier, we are going to
use five sample organisms in this exercise. Feel free to replace them
with your own favourite organisms.
STEP#1a: Listed below are the five sample organisms we will use (we are listing the GenBanks files here):
(1)
Paeruginosa_PAO1_Main.gbk
(2)
Pfluorescens_Pf-5_Main.gbk
(3)
Pputida_KT2440_Main.gbk
(4)
Psyringae_DC3000_Main.gbk
(5)
Psyringae_phaseolicola1448A_Main.gbk
STEP#1b:Make sure you have each of the following GenBank (of each organism) in your current working directory (that is, todays exercise directory '23Nov2005'; type 'ls' or 'lt') (NOTE: Do not replace 'mic00' with your number!):
> mkdir 23Nov2005
> cd 23Nov2005
> ln -s ../../mic00/23Nov2005/*.gbk .
STEP#2a:
Now, we must decide which organism to use as our "reference
genome". When running BLAST Atlases you must
use the largest genome in order to obtain meaningful results.
To find out which genome
is the largest (i.e. has the most nucleotides), we must check the
header of each GenBank file. To do this, issue the following command:
> head -1 *.gbk
(note: The command above
tells the computer to display the first line of each GenBank file).
The largest genome will be the one with the most "bp" (basepairs). This is simply the number preceding the 'bp' in the output.
STEP#2b:
From step #2a you should have found that P. fluorescens
has the largest genome among the five organisms we are working with
today (i.e. 7 074 893 bp). This will be our "reference genome".
The remaining four organisms will be compared against the P.
fluorescens genome.
STEP#3:
Now, we must extract the sequenced nucleotides from each of our five
GenBank files and convert them to amino acid sequences. We must also
extract the gene annotation ('ann') coordinates from our reference
genome.
This is accomplished by
simply typing each of the following commands (note: You could also
run all of them with one command by using 'gmake -j' instead):
>
gmake Paeruginosa_PAO1_Main.proteins.fsa
>
gmake Pfluorescens_Pf-5_Main.proteins.fsa
>
gmake Pputida_KT2440_Main.proteins.fsa
>
gmake Psyringae_DC3000_Main.proteins.fsa
>
gmake Psyringae_phaseolicola1448A_Main.proteins.fsa
>
gmake Pfluorescens_Pf-5_Main.ann
STEP#4:
Next, we must establish a 'link' between our organisms (reference
with comparisons). This 'link' is used by the make rules to figure
out which sequence to BLAST against which. You must
give the full path
to your reference genome and
the reference genome filename without
the extension (note: You can find your current full path by
typing 'pwd').
Since our reference
genome is P. fluorescens, we
will echo the full path to it (without the extension) by issuing the
following commands (replace 'mic01' with your
account):
> echo /home/people/mic01/23Nov2005/Pfluorescens_Pf-5_Main > Paeruginosa_PAO1_Main.proteins.queryblast
> echo /home/people/mic01/23Nov2005/Pfluorescens_Pf-5_Main > Pputida_KT2440_Main.proteins.queryblast
> echo /home/people/mic01/23Nov2005/Pfluorescens_Pf-5_Main > Psyringae_DC3000_Main.proteins.queryblast
> echo /home/people/mic01/23Nov2005/Pfluorescens_Pf-5_Main > Psyringae_phaseolicola1448A_Main.proteins.queryblast
STEP#5a:
Now, we must construct a template GeneWiz configuration file (*.cf)
of our reference organism. First, login to 'life' using your own
password (remember to switch back to today's working directory:
23Nov2005).
> ssh
-X life
>
cd 23Nov2005
> gmake
Pfluorescens_Pf-5_Main.genomeatlas.cf
STEP#5b:
We must also modify this configuration file by issuing the following
command (this will open a separate window):
> nedit
Pfluorescens_Pf-5_Main.genomeatlas.cf
STEP#5c:
Insert references to genome/proteome homology in this configuration
file.
In the 'dat' lane
specification, insert each of the following before
all of the other 'dat' specifications (note that you can label each
circle by changing the information contained in quotes):
> dat
Paeruginosa_PAO1_Main.proteins.blastatlas.genomemap0.gz
1 0.0 0.0 0.0 "Paeruginosa_PAO1"
boxfilter 10;
>
dat Pputida_KT2440_Main.proteins.blastatlas.genomemap0.gz
1 0.0 0.0 0.0 "Pputida_KT2440"
boxfilter 10;
>
dat Psyringae_DC3000_Main.proteins.blastatlas.genomemap0.gz
1 0.0 0.0 0.0 "Psyringae_DC3000"
boxfilter 10;
>
dat
Psyringae_phaseolicola1448A_Main.proteins.blastatlas.genomemap0.gz
1 0.0 0.0 0.0 "Psyringae_p.1448A"
boxfilter 10;
In
the 'circle' lane specification, insert each of the following:
> circle
Paeruginosa_PAO1_Main.proteins.blastatlas.genomemap0.gz
1 "101010_000010.cm2" by 0.0 40.0;
>
circle Pputida_KT2440_Main.proteins.blastatlas.genomemap0.gz
1 "101010_000010.cm2" by 0.0 40.0;
>
circle Psyringae_DC3000_Main.proteins.blastatlas.genomemap0.gz
1 "101010_000010.cm2" by 0.0 40.0;
>
circle
Psyringae_phaseolicola1448A_Main.proteins.blastatlas.genomemap0.gz
1 "101010_000010.cm2" by 0.0 40.0;
In the 'file' lane
specification, insert each of the following:
>
file
Paeruginosa_PAO1_Main.proteins.blastatlas.ge
nomemap0.gz
dat;
>
file Pputida_KT2440_Main.proteins.blastatlas.genomemap0.gz
dat;
>
file Psyringae_DC3000_Main.proteins.blastatlas.genomemap0.gz
dat;
>
file
Psyringae_phaseolicola1448A_Main.proteins.blastatlas.genomemap0.gz
dat;
STEP#6:
Now run the following script:
>
~pfh/scripts/genewiz/cf_data_create.pl
Pfluorescens_Pf-5_Main.genomeatlas.cf
This script reads the
.cf file and generates all files needed to create a BlastAtlas in
GeneWiz.
STEP#7:
As a last step, we need to create a .ps (PostScript) file by using
the configuration file (.cf):
>
~pfh/scripts/genewiz/cf2pdf.pl
Pfluorescens_Pf-5_Main.genomeatlas.cf
STEP#8:
Finally, we can view our BlastAltas plot (our results) by issuing the
following command:
> ghostview
Pfluorescens_Pf-5_Main.genomeatlas.nfp.ps
You
can create your own BlastAtlases simply by replacing the above five
organisms with your one. Just remember that all
of the above steps are necessary for the process to work.
|