Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

NOTE: This exercise was held on Feb 6, 2008 at 3:45pm.

Exercise 1: Using a Web Service from a local script


Task

  1. Retrieve the sequence of the ERp44 Human chaperone using the eFetch Web Service.
  2. Find if the protein is predicted as secreted using the SignalP Web Service.
  3. Integrate the two workflows.

The task is to be performed using your language of choice. We recommend using either Perl or Python:

  • Perl 5.8.7 or better, SOAP::Lite 0.69 or better
  • Python 2.4.2 or better, SOAPpy 0.12.0

NOTE: For Python > 2.5 please download SOAPpy from here

SoapUI is also strongly recommended for invoking and inspecting the Web Services.


Getting acquainted with the eFetch Web Service

  1. Create the working directory

  2. Login to your Unix shell account on 'sbiology', change to the working directory for today:

    cd ex1
    ls
    

    Check that you have the following files in the directory:

    RNAmmer.pl	eFetch.pl	eFetch_signalP.pl	signalP.pl
    RNAmmer.py 	eFetch.py	eFetch_signalP.py	signalP.py
    
  3. Inspect the Web Service using Soapui

  4. SoapUI can be downloaded to your computer free of charge from http://www.soapui.org/. It needs Java 1.5 to run, and you can simply press the Webstart button to run it. The application will be stored in your browser's cache.

    In alternative, you can login to 'sbiology' and issue the command 'soapui'. Note that this will be slow and is not recommended

    Create a new WSDL project, name it 'Exercise1' and load the eFetch WSDL:

    http://www.ncbi.nlm.nih.gov/entrez/eutils/soap/efetch.wsdl

    Examine the services it provides. You should see two services under 'eUtilsServiceSoap'. Lets focus on the 'run_eFetch' service.

    Examine the request operation for this service by double clicking it. You should see a default Soap message as input. Replace the '?' in the 'db' field by 'protein'. This will be the database to use when querying the eFetch service. Fill in the 'id' field with 'Q9BS26'. This is the accession number for the ERp44 Human Chaperone. Remove all the other fields containing '?' in them, and then submit the request to the 'run_eFetch' service.

    You will get a Soap response from the server containing the ERp44 Chaperone information. Scroll all the way down to find the protein sequence. Notice which tags it is encapsulated with.

Exploring services with local scripts

Now we are going to implement these steps in a script running locally. Depending of your language of choice, the syntax will be different, but the steps to follow will be the same (unless your language generates stubs for the Web Service):

  • Import the WSDL parsing module.
  • Load the WSDL file.
  • Define the input data.
  • Run the service.
  • Retrieve the results.
  1. Retrieve a protein sequence
  2. We will be using the NCBI eFetch Web Service. You can read more about it at http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html.
    For your convenience the source code is available in your working directory. You can view it here: eFetch.pl | eFetch.py.
    You can try running the script in the CBS servers:
    perl eFetch.pl P53_HUMAN
    
    or:
    python eFetch.py Q9BS26
    
    There should be no output.
    Try to examine the general syntax of the script and identify the steps enumerated above. Notice how the input data for the service is built.
    Now try uncommenting the last lines of the script, and run it again.
    Examine the script again and try to understand how the output data is accessed. You can even change how you want to print it. Perhaps saving the sequence header and amino acid sequence to a file?
    Try modifying the output so it looks something like this (FASTA format):
    >Q9BS26  Thioredoxin domain-containing protein 4 precursor (Endoplasmic reticulum resident protein ERp44)
    MHPAVFLSLPDLRCSLLLLVTWVFTPVTTEITSLDTENIDEILNNADVALVNFYADWCRFSQMLHPIFEEASDVIKEEF
    PNENQVVFARVDCDQHSDIAQRYRISKYPTLKLFRNGMMMKREYRGQRSVKALADYIRQQKSDPIQEIRDLAEITTLDR
    SKRNIIGYFEQKDSDNYRVFERVANILHDDCAFLSAFGDVSKPERYSGDNIIYKPPGHSAPDMVYLGAMTNFDVTYNWI
    QDKCVPLVREITFENGEELTEEGLPFLILFHMKEDTESLEIFQNEVARQLISEKGTINFLHADCDKFRHPLLHIQKTPA
    DCPVIAIDSFRHMYVFGDFKDVLIPGKLKQFVFDLHSGKLHREFHHGPDPTDTAPGEQAQDVASSPPESSFQKLAPSEY
    RYTLLRDRDEL
    
    Congratulations! You have a running "local" copy of eFetch!


  3. Identifying signaling peptides
  4. Now for something a bit more complex.
    Load the SignalP WSDL in SoapUI to inspect it. It is available at http://www.cbs.dtu.dk/ws/ws.php?entry=SignalP. You can read a little about this service in the web page.

    After loading the WSDL you can see that this service has 3 different endpoints: 'runService', 'pollQueue', and 'fetchResult'. This is due to the fact that we are now using an asynchronous Web Service:
    • runService will submit the job parameters to a job queue and return a job id.
    • pollQueue will poll the queue for the job status.
    • fetchResult can be used to return the job results as soon as it is finished.

    Now connect the two services using SoapUI. Query eFetch for the protein id 'Q9BS26'. Get the sequence out of the results and paste into SignalP's sequences:entry:seq field. Fill the header as well. Make sure you input 'euk' as the organism. Delete the other default values with '?'. Inspect the results and get acquainted with the output format.

    Examine the source code: signalP.pl | signalP.py.
    These scripts expect a fasta sequence from standard input, and will predict if it is a signaling peptide or not.
    Try hooking up the previous script (after modifying it to produce fasta output) with this one:
    perl eFetch.pl Q9BS26 | perl signalP.pl
    python eFetch.py Q9BS26 | python signalP.py
    
    The output is pretty simple, and could use some more information. Try fiddling with the script to produce additional print statements, e.g. the exact location of the cleavage site.

    After these simple examples you have already 2 powerful tools at your disposition, and running from your computer. As long as you are connected to the internet you can easily fetch a protein sequence given its accession number and predict if it has a signaling peptide or not. You can even share it with your co-workers! All this without the hassle of building or compiling other tools than the required modules. As simple as that.


  5. Adding the two together
  6. Now that you are familiar with running one simple Web Service call, try to implement these together. Build a script that fetches a sequence using eFetch, and predicts if it has a signaling peptide or not.

    The full solution can be found here:  eFetch_signalP.pl | eFetch_signalP.py


  7. Are you done? Want some more?
  8. Try calling and connecting the following Web Services: Genome Atlas getSeq, and RNAmmer.
    Read their documentation, inspect them using SoapUI and build a script that retrieves the full genomic sequence from one organism and predicts RNA genes.

    The full solution can be found here:  RNAmmer.pl | RNAmmer.py



CONTACT

Francisco Roque,