Day 5: Blast atlases
Blast atlases are similar to the genome atlases that you looked at
during Day 2, but in addition to showing genomic properties it also
shows blast hits to the target genome.
A blast matrix is always made with a reference organism in the
'middle'. All genomic properties that are shown in the atlas relate to
this one organism. Next, other organisms that you wish to compare to
the reference organism are searched for genes that are similar to those
found in the reference organism. These hits are then shown in the atlas
as lines where regions in the reference organism have been found to
have a match in the searched organism. One lane per searched organism
Note: the genes in the reference organism are naturally enough shown in
the order they are found in the organism. The hits to a gene are shown
where the reference gene is, that is, no inference can be made about
the location about the matching gene in the searched genomes.
There are two ways for you to make atlases. If you have less than 7
organisms, you can use the zoomable web version, if you have more, you
need to use the script version.
Zoomable web version
This version can be found here: Zoomable
In this version, you find and choose your reference organism first, and
then add 'BLAST LANES', one for each of the other organisms you wish to
display. Then you press 'submit' and wait a bit. Note: this will only
work if you have the the latest java version installed.
This is the version you need to use if you want to display more than 7
your reference organism.
First, you need to select which organism which is going to be in the
'middle'. In this case it may be useful to choose the organism that has
the highest number of genes. This is something you found out during
exercise 2, when you predicted prodigal genes.
into the CBS computers
# log in to the computers again, then
ssh -Y sbiology
setenv MAKEFILES /home/people/pfh/bin/Makefile
# Create atlas diretory
genbank file, fasta file and protein file into atlas directory
REMEMBER: if something is shown in red, it needs to be replaced. In
this case, it should be replaced by the name given to the files
belonging to the organism you have chosen as your reference organism.
If the genbank file for that organism is named
Escherichia_coli_MG1655.gbk, you would replace <organism>
# copy files into the atlas directory
cp ../data/genbank/<organism>.gbk .
cp ../data/fsafiles/<organism>.fasta .
cp ../data/prodigal/<organism>.proteins.fsa .
annotation file for reference organism.
An annotation file is a file containing
the start, stop, directions and the
# create annotation file
blast configuration file
should in this case be replaced with for instance
Escherichia_coli_MG1655, if that organism is your reference organism.
# Ensure you are in the right place
~karinl/scripts/blastcfg/makeblastcfg.sh ../data/prodigal <organism> > blast.cfg
When you have done these steps, you should have five different files in
your directory. These are a .fasta, .gbk, .proteins.fsa, .ann and a
blast.cfg file. IMPORTANT: all of the files, except the
blast.cfg file, should have the same beginning, which should be what
you in this exercise replaced <organism>
Check that all of these are present before you continue (use ls to
Create blast atlas
You will now create your blast atlas. The first atlas you will be
making, will show you all of the reference organism and the blast hits
to it. From this you will most likely see some gaps in your organisms.
Next, you will make a map where you can zoom in on these regions and
find out more closely where these gaps are, and in which genomes they
# Create blast atlas
perl ~karinl/scripts/blastcfg/BLASTatlas -modus circle -ref <organism>.fasta -proteins <organism>.proteins.fsa -ann <organism>.ann -blastcfg blast.cfg \
--dnap='Percent AT,Intrinsic Curvature,Stacking Energy,Position Preference' -title "<organism>" > <organism>.blastatlas.ps
To look at the atlas, you need X activated. See Exercise
3 for how.
# Look at atlas
Based on the atlas above, you will be wanting to zoom in a bit on
regions with gaps in them. Use the inner circle (the one with numbers
on it), to decide where you want to start and stop. Replace start and
stop (including the <>) below with these numbers. In
order to separate this plot from the others, tag it with something
which will let you remember which atlas this is.
# Make zoomable atlas
perl ~karinl/scripts/blastcfg/BLASTatlas -modus circle <organism>.fasta -proteins -ref <organism>.proteins.fsa -ann <organism>.ann -blastcfg blast.cfg \
--dnap='Percent AT,Intrinsic Curvature,Stacking Energy,Position Preference' -title "<organism>" -begin <start> -end <stop> > <tag>.blastatlas.ps
out which genes the other genomes are missing
You now have an idea of wich regions that are missing. You can now go
to your proteins.fsa file (in this directory) and find which genes that
the other organisms are missing. Open the proteins.fsa file with less.
Remember the commands from Exercise 1 to work with this program. To
find genes in this region in the program, type in / and then the number
you are looking for. For instance, you are looking for a gene that
begins at 2100, you type in /2100 and press enter. To go to the next
hit for this match, type in n.
Getting your files to your computer
You have now several postscript files in your directories that you
might want to have on your computer.
If you have a mac, you can use the ps files directly. If you have a windows computer,
you need to do a bit of conversion first. Here is what you do:
You use a command called ps2epsi like this:
You then have a <filename>.epsi file in your directory.
This file needs to be renamed <filename>.eps
mv <filename>.epsi <filename>.eps
You can then transfer this file to your computer.
If you have a mac, use something like Fugu.
If you have windows, use something like WinSCP.
Both of these are graphical secure copy programs. Install them, and
connect to login.cbs.dtu.dk with your stud-account.
You can then get the ps or eps files to your computer, and you can then
insert them into your documents.