Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

SIDDbase 1.0a.ws1

Calculation of Stress-induced DNA duplex destabilization


WSDL SIDDbase/SIDDbase_1_0a_ws1.wsdl
Schema definitions ../common/ws_common_1_0b.xsd
ws_siddbase_1_0a_ws1.xsd

We recommend that the first time users should load the WSDL file above to SoapUI and investigate the Web Service operations in that environment. SoapUI is a desktop application for inspecting, invoking, developing and functional/load/compliance testing of Web Services over HTTP. It can be downloaded free of charge from http://www.soapui.org/.

Other versions and implementations

Ver.Last updated
1.0a.ws1  2012-04-19(this version, most recent)
1.0a.ws0  2009-05-04

Examples of client side scripts using the service

FilenameTypeCompatibilityAuthorDescription
getSIDD.pl (1.6 KB) Perl 1.0a ws0 Peter Fischer Hallin
Extract pre-calculated SIDD values of region within a prokaryotic genome sequence
test.pl (11.2 KB) Perl 1.0a Peter Fischer Hallin
Standalone test script - all prerequisites embedded
getseq.pl (973 B) Perl Genome Atlas 3.0.ws2 Peter Fischer Hallin
Download genome sequence of a genbank accession no.
xml-compile.pl (3.2 KB) Perl NA Peter Fischer Hallin
Helper scripts used to initiate XML::Compile's proxys (WSDL+XSD)
fasta.inc.pl (877 B) Perl NA Peter Fischer Hallin
Helper script to parse input fasta file
runSIDD.pl (2.2 KB) Perl 1.0a ws0 Peter Fischer Hallin
Calculate SIDD of input sequence (fasta)

Documentation

     SIDDbase-WS is a SOAP based Web Service created in a collaboration between the
    Comparative Microbial Genomics Group at CBS, The Technical University of Denmark and
    Prof. Craig Benham's research group at the UC Davis Genome Center.  It provides
    interoperable access to the SIDD software, and access to the repository of stored
    results from calculations previously performed on complete bacterial genomes.

   SIDD (Stress-induced DNA Duplex Destabilization) is the propensity for the DNA duplex to
    be destabilized within genomic regions that are experiencing a superhelical stress. This
    is a complex, interactive attribute of genomic DNA, that has been implicated in a wide
    variety of regulatory processes. Different strategies are used to calculate SIDD
    properties of short (i.e. < ~10kb) regions (Bi C, Benham CJ., WebSIDD: A server for
    predicting stress-induced duplex destabilized (SIDD) sites in superhelical DNA,
    Bioinformatics, 2004 Jun 12;20(9):1477-9), and of long genomic sequences, up to complete
    chromosomes (Benham CJ and Bi C, The Analysis of Stress-Induced Duplex Destabilization
    in Long Genomic DNA Sequences, J. Comp. Biol. 11: 519-543, 2004).  The extent of
    destabilization is given by sigma, the superhelix density.  For each base pair and each
    sigma value two results are reported. These are the probability p(x) of base pair x
    being open (i.e. separated) under the given conditions, and the increment G(x) of free
    energy needed to insure that base pair x is always open then.

    This service consist of two parts:

    1. REPOSITORY A repository of pre-calculated SIDD values (for four different sigma
    values -0.025, -0.035,-0.045,-0.055,-0.065). This repository is updated daily by
    synchronizing prokaryotic genome sequences against NCBI Entrez Genome Projects
    (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi). Any new or changed genomes will be
    updated automatically every day. The DNA geometry (i.e., whether the chromosome is
    circular or linear) is by default circular, which will be used except in the cases where
    the GenBank record indicates a linear chromosome. This part of the service runs
    synchronously, which means that the user will receive the output directly as a response
    to a request message

    2. CALCULATION SERVICE An SOAP compliant interface for making SIDD predictions. This
    part of the service runs /a/synchronously, and is therefore split into three different operations
    submit, poll, and fetch operations

  REPOSITORY
  1. getSIDD
    Get the genomic sequence by specifying the GenBank accession number
    Input: 
          * 'accession'    - The Genbank accession number (e.g. AL111168) 
                             of the genome sequence you want to fetch from the
                             repository.
          * 'sigma'        - The sigma value of the prediction (-0.025,-0.035,-0.045,-0.055,-0.065)
          * 'energetics'   - Energetics used in the calculations. Only copolymeric is supported
                             by the repository (c)
          * 'weight'       - The overlap/weighting scheme. Only 10 is supported by the repository (10)
          * 'from'         - (optional) From (and including this) postion 
          * 'to'           - (optional) To (and including this) postion
          * 'Gformat'      - e.g. '%.2f'
          * 'Pformat'      - e.g. '%.2e'
          * 'format'       - Either 'string' or 'element' - determins if output should be 
                             provided as XML entities are comma separated strings. The latter
                             is much faster.
    Output:
          * 'accession'    - The ganbank accession of the record used.
          * 'digest'       - The MD5 checksum of the entire genomic DNA sequence.
          * 'from'         - From (and including this) postion.
          * 'to'           - To (and including this) postion.
          * 'total_length' - Total length of genome sequence.
          * 'version'      - SIDD back-end program version. 
          * 'method'       - The energetics method used.
          * 'weight'       - The overlap/weighting scheme.
          * 'sigma'        - The sigma value of the prediction.

             The following two elements consists of a choice (depending on 
             'format' specified in request)
          * 'element'      
           *  'x'          - Absolute chromosomal position
           *  'nt'         - The nucleotide at position 'x'
           *  'P'          - The helix opening probability.
           *  'G'          - Free energy, G.
          * 'string'      
           *  'P'          - Comma separated list of probabilities
           *  'G'          - Comma separated list of free energies


  CALCULATION
  2a. runService
    Submit a custom DNA sequnce to the SIDD program
    Input:
          * 'sequencedata'
            * 'sequence'
            * 'seq'        - The raw DNA sequence
            * 'id'         - Any sequence identifier
          * 'sigma'        - The sigma value of the prediction (e.g. -0.055)
          * 'energetics'   - Either copolymeric (c) or nearest-neighbor energetics (n)
          * 'weight'       - The overlap/weighting scheme. Only 10 is supported by the repository (10)
          * 'Gformat'      - e.g. '%.2f'
          * 'Pformat'      - e.g. '%.2e'
          * 'format'       - Either 'string' or 'element' - determins if output should be 
                             provided as XML entities are comma separated strings. The latter
                             is much faster.    
    Output:
          * 'jobid'        - The 32 byte identification string of the job
          * 'datetime'     - The last timepoint at which the status of the job has changed
          * 'status'       - Possible values are QUEUED, ACTIVE, FINISHED, WAITING, REJECTED, 
                             UNKNOWN JOBID or QUEUE DOWN

  2b. pollQueue
  Once obtained from 'runService', a job identification can be used to poll the
  status to see if the result is ready for download.

    Input:
          * 'jobid'        - The 32 byte identification string of the job
    Output:
          * 'jobid'        - The 32 byte identification string of the job
          * 'datetime'     - The last timepoint at which the status of the job has changed
          * 'status'       - Possible values are QUEUED, ACTIVE, FINISHED, WAITING, REJECTED, 
                             UNKNOWN JOBID or QUEUE DOWN

  2c. fetchResult
  Once the status is 'FINISHED' the results generated by the Web Service can be retrieved by
  specifying the jobid;
  
    Input:  
          * 'jobid'        - The 32 byte identification string of the job
    Output:
          * 'total_length' - Total length of genome sequence.
          * 'version'      - SIDD back-end program version. 
          * 'method'       - The energetics method used.
          * 'weight'       - The overlap/weighting scheme.
          * 'sigma'        - The sigma value of the prediction.

             The following two elements consists of a choice (depending on 
             'format' specified in request)
          * 'element'      
           *  'x'          - Absolute chromosomal position
           *  'nt'         - The nucleotide at position 'x'
           *  'P'          - The helix opening probability.
           *  'G'          - Free energy, G.
          * 'string'      
           *  'P'          - Comma separated list of probabilities
           *  'G'          - Comma separated list of free energies

  For more information, please contact Peter F. Hallin: pfh@cbs.dtu.dk
      
  CONTACT
  Technical questions concerning the Web Service should go to Karunakar Bayyapu, karun@cbs.dtu.dk or
  Kristoffer Rapacki, rapacki@cbs.dtu.dk.