Task
- Locate on the WWW and download the proteome of
Mycobacterium tuberculosis.
- Identify the secretome part of the proteome.
- Predict the presence of CTL epitopes in the proteins of the secretome.
Identify the 50 potential MHC ligands with the highest prediction scores.
This time the task is to be performed using
SOAP based
Web Services
from a PERL script. Before performing the task itself you will need to get
acquainted with the Web Services technology and the way to call
Web Services from your own scripts or programs.
Note: the instructions below should be read in the context of the lecture
"Accessing SOAP based Web Services using Perl" given earlier on the day
(Wed, Jan 25, 11:00am).
Web Services: study two examples in SoapUI
SoapUI is a desktop application for inspecting, invoking, developing
and functional/load/compliance testing of Web Services over HTTP.
We have found it very useful in our work with Web Services;
it allows to investigate a new Web Service quickly,
getting to understand the operations it contains, the input it expects
and the output it produces.
SoapUI can be downloaded free of charge from
http://www.soapui.org/. The installation
is very simple; at this point you might like to download and install it
on your own computer (it needs at least Java 1.5).
Alternatively, login to 'sbiology' and issue the command 'soapui'.
The program takes some time to start. Start a new project, call it
"GenomeAtlas" and load the URL:
http://www.cbs.dtu.dk/ws/GenomeAtlas/GenomeAtlas_3_0.wsdl
This file, written in Web Services Description Language (WSDL) contains
all the infrormation you need to use the Web Service in question. WSDL
files can look rather frightening although they sometimes contain human
readable documentation. Therefore, do not try to read and understand
the file at first - SoapUI will do that for you, generating example
queries for all the operations of the Web Service.
GenomeAtlas returns genomes and proteomes of prokaryotic organisms;
the input is GenBank accession numbers. In this case human readable
documentation has been put at the bottom of the file, for convenience,
but this is a local CBS practice, not a standard.
Activate the operation 'getSeq', insert 'L43967' as input and run
the service. The complete genome of Mycoplasma genitalium G37, 580076 bp
will appear in the output window. Examine the output; locate the actual
sequence.
In the same SoapUI session start another project, call it "RNAmmer"
and load the URL:
http://www.cbs.dtu.dk/ws/RNAmmer/RNAmmer_1_1a.wsdl
Examine the service as before. RNAmmer predicts the location of
5s/8s, 16s/18s, and 23s/28s ribosomal RNA in full genome sequences.
You will notice that this service is slightly more complex than
GenomeAtlas: it is asynchronous and involves polling a queue before
results can be fetched. Test the service using the output genome
sequence from GenomeAtlas as input. Study the output as before.
By now you should understand the two services; please note that
SoapUI is not a suitable tool for real work but ideal for familiarizing
yourself with new Web Services. You will now proceed to calling Web Services
from a PERL script.
Web Services: study an example of calling from PERL
Download the PERL script
simple.pl
to your CBS account and study it. Try to figure out
what it does. Run the script from the command line to confirm your conclusion.
The script
RNAmmer.pl calls GenomeAtlas
and RNAmmer Web Services. It links them by using the output of the former
as input
to the latter. Study the script and try to understand it; download it
to your CBS account and try to run it.
When you understand the RNAmmer.pl script you are ready to perform
the task from Exercise 1 in the new way. The Web Services you will
need will be GenomeAtlas, SignalP and NetCTL.
Investigate GenomeAtlas and SignalP in SoapUI
Return to the GenomeAtlas project in SoapUI. This time activate the
operation 'getProt'; insert 'L43967' as input and and run the service.
The proteome of
Mycoplasma genitalium G37 will appear in the output
window. Examine the output; locate the actual sequences.
Start yet another project, call it "SignalP" and load the URL:
http://www.cbs.dtu.dk/ws/SignalP/SignalP_3_0.wsdl
Examine the service as before. As you know already, SignalP predicts
the presence and location of signal peptide cleavage sites in amino
acid sequences from different organisms. Try to run that service using
a number of sequences from the Mycoplasma genitalium proteome
as input. Remember that the results are obtained in two steps, as in
RNAmmer. Study the output as before.
Run SignalP again, this time use
signalP-request.xml
as input. It contains a number of protein sequences in a ready-made
SOAP envelope for SignalP. Look at the results - what is wrong?
(Hint - notice the sequence identifiers).
Calling GenomeAtlas and SignalP from a Perl script
Now you are ready to write a script of your own. In your home directory
copy the RNAmmer.pl script to SignalP.pl. Modify the new script in three
steps:
1. Use the GenomeAtlas 'getProt' method and parse the results into
the SignalP Web Service. As input use the entry AL123456 (Mycobacterium
tuberculosis).
2. Produce output only for the sequences predicted as secreted
(source: 'signalp-3.0-nn', comment: 'Y').
3. Make sure it takes care of the truncated sequence names.
There is a solution
in signalP-solution.pl.
You have now a script that outputs the predicted secretome for a given
organism.
|
Please note that you can use this script in future as a template when
developing your own software intended for calling Web Services. Thus, you
will be able to convert any Web Service available on the WWW into a local
resource. We feel that this ability is the most important point to take
home from this series of exercises.
|
It is now time for the final integration of services to achieve
the functionality from Exercise 1.
Finally: all-in-one-script
You need the final link in the workflow; in SoapUI create a new project
and load the URL:
http://www.cbs.dtu.dk/ws/NetCTL/NetCTL_1_1.wsdl.
This service has the same functionality as the NetCTL server used in
Exercise 1. Run the service using
netCTL-request.xml
as input. It contains a number of protein sequences in a ready-made
SOAP envelope for NetCTL.
In your home directory copy the script SignalP.pl to NetCTL.pl. Modify
the new script in two steps:
1. Extend it to call NetCTL only on the sequences that were predicted as
secreted by SignalP.
2. Print the correct (original) sequence names in the NetCTL output.
There is a solution in
netCTL-solution.pl.
The final version of your script performs the same task as in Exercise 1.
If time allows, compare the two ways to solve the task. Is there a reason
to bother about SOAP?