Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

NOTE: This exercise was held on Jan 25, 2007 at 13:00 am.

Exercise 2


Task

  1. Locate on the WWW and download the proteome of Mycobacterium tuberculosis.
  2. Identify the secretome part of the proteome.
  3. Predict the presence of CTL epitopes in the proteins of the secretome. Identify the 50 potential MHC ligands with the highest prediction scores.
This time the task is to be performed using SOAP based Web Services from a PERL script. Before performing the task itself you will need to get acquainted with the Web Services technology and the way to call Web Services from your own scripts or programs.

Note: the instructions below should be read in the context of the lecture "Accessing SOAP based Web Services using Perl" given earlier on the day (Wed, Jan 25, 11:00am).

Web Services: study two examples in SoapUI

SoapUI is a desktop application for inspecting, invoking, developing and functional/load/compliance testing of Web Services over HTTP. We have found it very useful in our work with Web Services; it allows to investigate a new Web Service quickly, getting to understand the operations it contains, the input it expects and the output it produces.

SoapUI can be downloaded free of charge from http://www.soapui.org/. The installation is very simple; at this point you might like to download and install it on your own computer (it needs at least Java 1.5).

Alternatively, login to 'sbiology' and issue the command 'soapui'. The program takes some time to start. Start a new project, call it "GenomeAtlas" and load the URL:

http://www.cbs.dtu.dk/ws/GenomeAtlas/GenomeAtlas_3_0.wsdl

This file, written in Web Services Description Language (WSDL) contains all the infrormation you need to use the Web Service in question. WSDL files can look rather frightening although they sometimes contain human readable documentation. Therefore, do not try to read and understand the file at first - SoapUI will do that for you, generating example queries for all the operations of the Web Service. GenomeAtlas returns genomes and proteomes of prokaryotic organisms; the input is GenBank accession numbers. In this case human readable documentation has been put at the bottom of the file, for convenience, but this is a local CBS practice, not a standard.

Activate the operation 'getSeq', insert 'L43967' as input and run the service. The complete genome of Mycoplasma genitalium G37, 580076 bp will appear in the output window. Examine the output; locate the actual sequence.

In the same SoapUI session start another project, call it "RNAmmer" and load the URL:

http://www.cbs.dtu.dk/ws/RNAmmer/RNAmmer_1_1a.wsdl

Examine the service as before. RNAmmer predicts the location of 5s/8s, 16s/18s, and 23s/28s ribosomal RNA in full genome sequences. You will notice that this service is slightly more complex than GenomeAtlas: it is asynchronous and involves polling a queue before results can be fetched. Test the service using the output genome sequence from GenomeAtlas as input. Study the output as before.

By now you should understand the two services; please note that SoapUI is not a suitable tool for real work but ideal for familiarizing yourself with new Web Services. You will now proceed to calling Web Services from a PERL script.

Web Services: study an example of calling from PERL

Download the PERL script simple.pl to your CBS account and study it. Try to figure out what it does. Run the script from the command line to confirm your conclusion.

The script RNAmmer.pl calls GenomeAtlas and RNAmmer Web Services. It links them by using the output of the former as input to the latter. Study the script and try to understand it; download it to your CBS account and try to run it.

When you understand the RNAmmer.pl script you are ready to perform the task from Exercise 1 in the new way. The Web Services you will need will be GenomeAtlas, SignalP and NetCTL.

Investigate GenomeAtlas and SignalP in SoapUI

Return to the GenomeAtlas project in SoapUI. This time activate the operation 'getProt'; insert 'L43967' as input and and run the service. The proteome of Mycoplasma genitalium G37 will appear in the output window. Examine the output; locate the actual sequences.

Start yet another project, call it "SignalP" and load the URL:

http://www.cbs.dtu.dk/ws/SignalP/SignalP_3_0.wsdl

Examine the service as before. As you know already, SignalP predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms. Try to run that service using a number of sequences from the Mycoplasma genitalium proteome as input. Remember that the results are obtained in two steps, as in RNAmmer. Study the output as before.

Run SignalP again, this time use signalP-request.xml as input. It contains a number of protein sequences in a ready-made SOAP envelope for SignalP. Look at the results - what is wrong? (Hint - notice the sequence identifiers).

Calling GenomeAtlas and SignalP from a Perl script

Now you are ready to write a script of your own. In your home directory copy the RNAmmer.pl script to SignalP.pl. Modify the new script in three steps:

1. Use the GenomeAtlas 'getProt' method and parse the results into the SignalP Web Service. As input use the entry AL123456 (Mycobacterium tuberculosis).
2. Produce output only for the sequences predicted as secreted (source: 'signalp-3.0-nn', comment: 'Y').
3. Make sure it takes care of the truncated sequence names.

There is a solution in signalP-solution.pl.

You have now a script that outputs the predicted secretome for a given organism.

Please note that you can use this script in future as a template when developing your own software intended for calling Web Services. Thus, you will be able to convert any Web Service available on the WWW into a local resource. We feel that this ability is the most important point to take home from this series of exercises.

It is now time for the final integration of services to achieve the functionality from Exercise 1.

Finally: all-in-one-script

You need the final link in the workflow; in SoapUI create a new project and load the URL:

http://www.cbs.dtu.dk/ws/NetCTL/NetCTL_1_1.wsdl.

This service has the same functionality as the NetCTL server used in Exercise 1. Run the service using netCTL-request.xml as input. It contains a number of protein sequences in a ready-made SOAP envelope for NetCTL.

In your home directory copy the script SignalP.pl to NetCTL.pl. Modify the new script in two steps:

1. Extend it to call NetCTL only on the sequences that were predicted as secreted by SignalP.
2. Print the correct (original) sequence names in the NetCTL output.

There is a solution in netCTL-solution.pl.

The final version of your script performs the same task as in Exercise 1. If time allows, compare the two ways to solve the task. Is there a reason to bother about SOAP?




CONTACT

Ole Lund,