 |
|
Exercise 2
OBJECTIVES
The purpose of this exercise is to become familiar with the construction of genome atlases, with their analysis and use as a visualization tool.
We will make three different kinds of atlases for each genome, and see what
we can learn from them.
Key tools used in this exercise are:
We will be using a system of makefiles, which is designed for the compilation
of programs, but which is used for running pipelines of different tools and
format conversions at cbs. For this purpose we use the GNU Make program.
To construct the atlases, the GeneWiz program is used. An online java
version is soon to be released.
We will be using a MySQL database to access results.
|
|
-
LOG IN
Open a session on 'organism.cbs.dtu.dk', and from this login to ibiology and create directories for you to work in:
ssh -X ibiology
cd projects
mkdir Ex2
cd Ex2
-
Link data from last week
Remember that we are working on the following genomes:
+-----------+-----------------------------------------------------------------+
| accession | organism |
+-----------+-----------------------------------------------------------------+
| AE016879 | Bacillus anthracis str. Ames, complete genome. |
| AE017042 | Yersinia pestis biovar Microtus str. 91001, complete genome. |
| AL111168 | Campylobacter jejuni subsp. jejuni NCTC 11168 complete genome. |
| AL645882 | Streptomyces coelicolor A3(2) complete genome. |
| AP008232 | Sodalis glossinidius str. 'morsitans' DNA, complete genome. |
| AP009048 | Escherichia coli W3110 DNA, complete genome. |
| BA000021 | Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis |
| CP000034 | Shigella dysenteriae Sd197, complete genome. |
+-----------+-----------------------------------------------------------------+
Today we will use mainly the genbank files and the data from the database.
Thus we start out by linking the genbank files from the directory we used last
week.
|
foreach genome (AE017042 AE016879 AL111168 AL645882 AP008232 AP009048 BA000021 CP000034)
mkdir $genome
ln -sf ~/Ex1/source/$genome.gbk $genome/
end
Making a base atlas
|
Using the Makefile system at cbs, a lot of common tasks are made easy by using
an ordered pipeline of commands.
We will use this a lot in this exercise. |
|
First we make the configuration files for the base atlases. This is done using
GNU Make.
|
foreach genome (AE017042 AE016879 AL111168 AL645882 AP008232 AP009048 BA000021 CP000034)
cd $genome
gmake $genome.baseatlas.cf &
cd ..
end
wait
|
When this command has finished, inspect the files using your favourite editor.
One thing to notice is that the config file for Streptomyces coelicolor
creates a circular atlas. This is easily remedied by replacing all occurences
of circle with linear. This can be done quickly with the command:
|
sed 's/circle/linear/g' AL645882/AL645882.baseatlas.cf \
> AL645882/AL645882.baseatlas.lin.cf
mv AL645882/AL645882.baseatlas.lin.cf AL645882/AL645882.baseatlas.cf
Now we only need to make the actual atlases.
foreach genome (AE017042 AE016879 AL111168 AL645882 AP008232 AP009048 BA000021 CP000034)
cd $genome
gmake $genome.baseatlas.ps &
cd ..
end
wait
|
This creates a postscript file of the base atlas for each genome. These files
can be viewed with ghostview.
|
Making a structure atlas
|
This step is almost the same as the former, only we make a DNA structure atlas
this time.
Again we start by making the configuration files for the atlases.
|
foreach genome (AE017042 AE016879 AL111168 AL645882 AP008232 AP009048 BA000021 CP000034)
cd $genome
gmake $genome.structureatlas.cf &
cd ..
end
wait
|
Inspecting the files you should start to see a pattern in the format.
Again the config file for S. coelicolor
creates a circular atlas. We fix it with the command:
|
sed 's/circle/linear/g' AL645882/AL645882.structureatlas.cf > AL645882/AL645882.structureatlas.lin.cf
mv AL645882/AL645882.structureatlas.lin.cf AL645882/AL645882.structureatlas.cf
Now we only need to make the actual atlases.
foreach genome (AE017042 AE016879 AL111168 AL645882 AP008232 AP009048 BA000021 CP000034)
cd $genome
gmake $genome.curvature.gz
gmake $genome.structureatlas.ps &
cd ..
end
wait
|
This creates a postscript file of the structure atlas for each genome.
|
Making a genome atlas
|
For the genome atlas, the procedure is the same as the two previous atlases.
Thus we make it in one big step
|
foreach genome (AE017042 AE016879 AL111168 AL645882 AP008232 AP009048 BA000021 CP000034)
cd $genome
gmake $genome.genomeatlas.cf &
cd ..
end
wait
sed 's/circle/linear/g' AL645882/AL645882.genomeatlas.cf > AL645882/AL645882.genomeatlas.lin.cf
mv AL645882/AL645882.genomeatlas.lin.cf AL645882/AL645882.genomeatlas.cf
foreach genome (AE017042 AE016879 AL111168 AL645882 AP008232 AP009048 BA000021 CP000034)
cd $genome
gmake $genome.genomeatlas.ps &
cd ..
end
wait
|
And once this is done we have a postscript file of the genome atlas for each genome.
|
|