Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Examples of common data types


The data types described below have been designed to be as general as possible. They are intended to fit the needs of many different Web Services, simplifying both implementation and usage. They are defined formally in the schema definition in http://www.cbs.dtu.dk/ws/common/ws_common_1_0b.xsd, at the top of the file. They are already in use in some of the CBS Web Services e.g. NetPhos, NetGlycate, NetNGlyc and NetOGlyc.


'sequencedata'

The data type 'sequencedata' answers roughly to the FASTA format. It is often suitable as input or a part of input to many services as well as output of database search tools.

Synopsis

sequencedata &rarr [sequence ...]
sequence &rarr id [comment] [seq]

See the formal definition

Comments

  • An object 'sequencedata' may be empty e.g. when returned by a sequence database search that resulted in no hits;
  • In the 'sequence' object the 'comment' tag is optional, as in FASTA.
  • The 'seq' tag in 'sequence' is also optional which makes 'sequencedata' suitable for holding lists of sequence identifiers as well as sequences.

Example

Two protein sequences:
<sequencedata>
    <sequence>
        <id>RNP_BOVIN></id>
        <comment>PDB:3rn3</comment>
        <seq>
        MALKSLVLLSLLVLVLLLVRVQPSLGKETAAAKFERQHMDSSTSAASSSNYCNQMMKSRN
        LTKDRCKPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPN
        CAYKTTQANKHIIVACEGNPYVPVHFDASV
        </seq>
    </sequence>
    <sequence>
        <id>HBA_HUMAN</id>
        <seq>
        VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK
        KVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPA
        VHASLDKFLASVSTVLTSKYR
        </seq>
    </sequence>
</sequencedata>
The same two sequences in FASTA:
>RNP_BOVIN      (PDB:3rn3)
MALKSLVLLSLLVLVLLLVRVQPSLGKETAAAKFERQHMDSSTSAASSSNYCNQMMKSRN
LTKDRCKPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPN
CAYKTTQANKHIIVACEGNPYVPVHFDASV
>HBA_HUMAN
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK
KVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPA
VHASLDKFLASVSTVLTSKYR


'anndata'

The data type 'anndata' (annotation data) answers roughly to the GFF format. It is often suitable as output from Web Services generating sequence annotation.

Synopsis

anndata &rarr annsource ann
annsource &rarr method version
ann &rarr sequence annrecords
annrecords &rarr [annrecord ...]
annrecord &rarr feature [global|pos|range] [score ...] [strand] [frame] [comment]
range &rarr begin end
score &rarr [key] value

See the formal definition

Comments

  • An object 'annrecords' may be empty e.g. when no annotation is included for a given input sequence;
  • 'annrecord' answers to one line in GFF. It has been designed with economy of space in mind: the source and the sequence name are not repeated, the position in sequence stated only as one number if possible; 'strand', frame' and 'comment' are all optional. By contract, multiple scores are allowed.

Example

The output of the SignalP prediction server for the two sequence above:
<anndata>
    <annsource>
        <method>signalp</method>
        <version>3.2</version>
    </annsource>
    <ann>
        <sequence>
            <id>RNP_BOVIN</id>
        </sequence>
        <annrecords>
            <annrecord>
                <feature>signal</feature>
                <range>
                    <begin>1</begin>
                    <end>27</end>
                </range>
                <score>
                    <key>cutoff</key>
                    <value>0.43</value>
                </score>
                <score>
                    <key>D-score</key>
                    <value>0.901</value>
                </score>
                <comment>Y</comment>
           </annrecord>
        </annrecords>
    </ann>
    <ann>
        <sequence>
            <id>HBA_HUMAN></id>
        </sequence>
        <annrecords>
            <annrecord>
                <feature>signal></feature>
                <range>
                    <begin>1></begin>
                    <end>22></end>
                </range>
                <score>
                    <key>cutoff</key>
                    <value>0.43</value>
                </score>
                <score>
                    <key>D-score</key>
                    <value>0.038</value>
                </score>
           </annrecord>
        </annrecords>
    </ann>
</anndata>
The same data in GFF:
##gff-version 2
##source-version signalp-3.2
##date 2007-10-27
##Type Protein 
# seqname            source        feature      start   end   score  N/A   ?
# ---------------------------------------------------------------------------
RNP_BOVIN            signalp-3.2   signal           1    27   0.901  . .   Y
HBA_HUMAN            signalp-3.2   signal           1    22   0.038  . .


'EMBRACEimage'

The data type 'EMBRACEimage' answers to a binary image format whose content is difficult/unnecessary to type in XSD. Typically this format can be PNG/JPEG/GIF images. This type of data typically represents endpoints of workflows, e.g. various visualization outputs, 3D rendering of protein structures, graphs etc.

Synopsis

EMBRACEimage &rarr [comment] [encoding] [MIMEtype] [content]

See the formal definition

Comments

  • comment describes with words the content of the object. Since there is no strict typing of the binary content this comment is valuable! encoding can be either none or base64. The MIMEtype currently supports image/png, image/bmp, image/gif, image/tiff, or image/jpeg. The content element contains the raw representation of the image, likely always to be base64 encoded.

Example



<image>
    <comment>3D structure of RNP_BOVIN</comment>
    <encoding>base64></encoding>
    <MIMEtype>image/png></MIMEtype>
    <!-- This is one continous string, representing the encoded content -->
    <content>RU1CUkFDRSBOZXR3b3JrIG9mIEV4Y2VsbGVuY2UK</content>
</image>


'EMBRACEdocument'

This object is identical to 'EMBRACEimage', but has basic document MIMEtype enumerations (application/pdf, application/postscript, application/msword, application/rtf)

Example



<image>
    <comment>Report of analysis (RNP_BOVIN)</comment>
    <encoding>base64></encoding>
    <MIMEtype>application/pdf></MIMEtype>
    <!-- This is one continous string, representing the encoded content -->
    <content>RU1CUkFDRSBOZXR3b3JrIG9mIEV4Y2VsbGVuY2UK</content>
</image>



CONTACT

Kristoffer Rapacki,