href="file:///Users/karinlag/Documents/teaching/finished_ex/exercises.css" rel="stylesheet">
Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other
CBS >> CBS Courses >> Scientific Communication of Comparative Genomics >> Course Programme >> Day 5

Day 5: Blast atlases



Blast atlases

Blast atlases are similar to the genome atlases that you looked at during Day 2, but in addition to showing genomic properties it also shows blast hits to the target genome.

A blast matrix is always made with a reference organism in the 'middle'. All genomic properties that are shown in the atlas relate to this one organism. Next, other organisms that you wish to compare to the reference organism are searched for genes that are similar to those found in the reference organism. These hits are then shown in the atlas as lines where regions in the reference organism have been found to have a match in the searched organism. One lane per searched organism is shown.

Note: the genes in the reference organism are naturally enough shown in the order they are found in the organism. The hits to a gene are shown where the reference gene is, that is, no inference can be made about the location about the matching gene in the searched genomes.

There are two ways for you to make atlases. If you have less than 7 organisms, you can use the zoomable web version, if you have more, you need to use the script version.

Zoomable web version

This version can be found here: Zoomable atlases

In this version, you find and choose your reference organism first, and then add 'BLAST LANES', one for each of the other organisms you wish to display. Then you press 'submit' and wait a bit. Note: this will only work if you have the the latest java version installed.


Script version

This is the version you need to use if you want to display more than 7 blast lanes.

Preprocessing

  1. Select your reference organism.

    First, you need to select which organism which is going to be in the 'middle'. In this case it may be useful to choose the organism that has the highest number of genes. This is something you found out during exercise 2, when you predicted prodigal genes.

  2. Log into the CBS computers

    # log in to the computers again, then
    ssh -Y sbiology
    umask 022
    setenv MAKEFILES /home/people/pfh/bin/Makefile


  3. Create atlas directory

    # Create atlas diretory
    mkdir blastatlas
    cd blastatlas


  4. Copy genbank file, fasta file and protein file into atlas directory

    REMEMBER: if something is shown in red, it needs to be replaced. In this case, it should be replaced by the name given to the files belonging to the organism you have chosen as your reference organism. If the genbank file for that organism is named Escherichia_coli_MG1655.gbk, you would replace <organism> with Escherichia_coli_MG1655. 
    # copy files into the atlas directory
    cp ../data/genbank/<organism>.gbk .
    cp ../data/fsafiles/<organism>.fasta .
    cp ../data/prodigal/<organism>.proteins.fsa .


  5. Create annotation file for reference organism.

    An annotation file is a file containing the start, stop, directions and the 
    # create annotation file
    make <organism>.ann


  6. Create blast configuration file

    <organism> should in this case be replaced with for instance Escherichia_coli_MG1655, if that organism is your reference organism.
    # Ensure you are in the right place
    ~karinl/scripts/blastcfg/makeblastcfg.sh ../data/prodigal <organism> > blast.cfg



When you have done these steps, you should have five different files in your directory. These are a .fasta, .gbk, .proteins.fsa, .ann and a blast.cfg file.  IMPORTANT: all of the files, except the blast.cfg file, should have the same beginning, which should be what you in this exercise replaced <organism> with.

Check that all of these are present before you continue (use ls to check).


Create blast atlas

You will now create your blast atlas. The first atlas you will be making, will show you all of the reference organism and the blast hits to it. From this you will most likely see some gaps in your organisms. Next, you will make a map where you can zoom in on these regions and find out more closely where these gaps are, and in which genomes they are.

  1. Create full atlas

    # Create blast atlas
    perl ~karinl/scripts/blastcfg/BLASTatlas -modus circle -ref <organism>.fasta -proteins <organism>.proteins.fsa -ann <organism>.ann -blastcfg blast.cfg \
    --dnap='Percent AT,Intrinsic Curvature,Stacking Energy,Position Preference' -title "<organism>" > <organism>.blastatlas.ps


  2. Look at atlas

    To look at the atlas, you need X activated. See Exercise 3 for how.
    # Look at atlas
    ghostview <organism>.blastatlas.ps


  3. Create zoomed atlas

    Based on the atlas above, you will be wanting to zoom in a bit on regions with gaps in them. Use the inner circle (the one with numbers on it), to decide where you want to start and stop. Replace start and stop (including the <>) below with these numbers. In order to separate this plot from the others, tag it with something which will let you remember which atlas this is.

    # Make zoomable atlas
    perl ~karinl/scripts/blastcfg/BLASTatlas -modus circle <organism>.fasta -proteins -ref <organism>.proteins.fsa -ann <organism>.ann -blastcfg blast.cfg \
    --dnap='Percent AT,Intrinsic Curvature,Stacking Energy,Position Preference' -title "<organism>" -begin <start> -end <stop> > <tag>.blastatlas.ps

  4. Find out which genes the other genomes are missing


    You now have an idea of wich regions that are missing. You can now go to your proteins.fsa file (in this directory) and find which genes that the other organisms are missing. Open the proteins.fsa file with less. Remember the commands from Exercise 1 to work with this program. To find genes in this region in the program, type in / and then the number you are looking for. For instance, you are looking for a gene that begins at 2100, you type in /2100 and press enter. To go to the next hit for this match, type in n.



Getting your files to your computer


You have now several postscript files in your directories that you might want to have on your computer.

If you have a mac, you can use the ps files directly. If you have a windows computer, you need to do a bit of conversion first. Here is what you do:

You use a command called ps2epsi like this:

ps2epsi <filename>ps

You then have a <filename>.epsi file in your directory.

This file needs to be renamed <filename>.eps

mv <filename>.epsi <filename>.eps



You can then transfer this file to your computer.

Transfer


If you have a mac, use something like Fugu.

If you have windows, use something like WinSCP.


Both of these are graphical secure copy programs. Install them, and connect to login.cbs.dtu.dk with your stud-account.
You can then get the ps or eps files to your computer, and you can then insert them into your documents.