Task
- Locate on the WWW and download the proteome of
Mycobacterium tuberculosis.
- Identify the secretome part of the proteome.
- Predict the presence of CTL epitopes in the proteins of the secretome.
Identify the 50 potential MHC ligands with the highest prediction scores.
The task is to be performed using the traditional paste-and-click services
on the WWW. In addition, some simple reformatting and selection of data will
need to be done on the local host. You will need the functionality answering
to the usage of typical UNIX tools
e.g. grep, sort, gawk
etc.
Therefore, we recommend that you should login to your CBS account and perform
the necessary actions on 'sbiology'. It also makes sense to run a WWW browser
on 'sbiology' and not on the local host, to avoid file transfers.
Step 1: downloading the proteome
A bacterial proteome can be located on the WWW in many ways; if you do not
have a favourite download site you may consider the
FTP server at NCBI or the
SRS server at EBI. It is not
important which strain you choose. You should expect around 4,000 proteins,
depending on the strain. Make sure to save the proteome in
FASTA
format as the next step of the exercise will require FASTA format as input.
Step 2: identifying the secretome
The subcellular location has not been verified experimentally for all the
Mycobacterium tuberculosis proteins. Therefore, you will employ
a prediction method to identify the subset of the proteins to be secreted.
The prediction method to be used is
SignalP.
Load the SignalP page and investigate the operation of the service. You might
like to submit a few proteins and observe the server behaviour. Specifically,
consider the following:
- What are the suitable parameter settings (organism type, method etc.)?
- What are the input limits compared to the size of your data?
- What seems to be the output format suitable for further investigation?
You will discover a few problems:
When you are ready submit the proteins to the SignalP server and save the
results in a file.
Step 3: predict MHC ligands
You will employ the prediction server
NetCTL.
As before, load the NetCTL page and investigate the operation of the service.
You might like to submit a few proteins and observe the server behaviour.
Specifically, consider the following:
- What are the suitable parameter settings (HLA supertype, various
thresholds etc.)?
- What are the input limits compared to the size of your data?
- What are the output format options suitable for the final presentation
of the results?
Remember that only the secretome proteins should be submitted to NetCTL.
This means that you need to prepare the input file containing only those.
From the SignalP output extract the names of the entries predicted as secreted
and use the program
getfrag
(on the command line) to generate a secretome FASTA file. getfrag is a
CBS made script used for extraction of entries or fragments of entries
from FASTA files.
Submit the secretome file to NetCTL and save the results in a file.
In the final output you can get the original entry names back using the
script 'restore.sed' that 'goodname' created for you. Sort the output
to identify the 50 top-scoring MHC ligand predictions.