|
Examples of common data types
The data types described below have been designed to be as general
as possible. They are intended to fit the needs of many different
Web Services, simplifying both implementation and usage. They are
defined formally in the schema definition in
http://www.cbs.dtu.dk/ws/common/ws_common_1_0b.xsd,
at the top of the file. They are already in use in some of the CBS
Web Services e.g.
NetPhos,
NetGlycate,
NetNGlyc and
NetOGlyc.
'sequencedata'
The data type 'sequencedata' answers roughly to the
FASTA
format. It is often suitable as input or a part of input to many services
as well as output of database search tools.
Synopsis
sequencedata &rarr [sequence ...]
sequence &rarr id [comment] [seq]
See the formal definition
Comments
- An object 'sequencedata' may be empty e.g. when returned by
a sequence database search that resulted in no hits;
- In the 'sequence' object the 'comment' tag is optional, as in FASTA.
- The 'seq' tag in 'sequence' is also optional which makes 'sequencedata'
suitable for holding lists of sequence identifiers as well as sequences.
Example
Two protein sequences:
<sequencedata>
<sequence>
<id>RNP_BOVIN></id>
<comment>PDB:3rn3</comment>
<seq>
MALKSLVLLSLLVLVLLLVRVQPSLGKETAAAKFERQHMDSSTSAASSSNYCNQMMKSRN
LTKDRCKPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPN
CAYKTTQANKHIIVACEGNPYVPVHFDASV
</seq>
</sequence>
<sequence>
<id>HBA_HUMAN</id>
<seq>
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK
KVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPA
VHASLDKFLASVSTVLTSKYR
</seq>
</sequence>
</sequencedata>
The same two sequences in
FASTA:
>RNP_BOVIN (PDB:3rn3)
MALKSLVLLSLLVLVLLLVRVQPSLGKETAAAKFERQHMDSSTSAASSSNYCNQMMKSRN
LTKDRCKPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPN
CAYKTTQANKHIIVACEGNPYVPVHFDASV
>HBA_HUMAN
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK
KVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPA
VHASLDKFLASVSTVLTSKYR
'anndata'
The data type 'anndata' (annotation data) answers roughly to the
GFF
format. It is often suitable as output from Web Services generating
sequence annotation.
Synopsis
anndata &rarr annsource ann
annsource &rarr method version
ann &rarr sequence annrecords
annrecords &rarr [annrecord ...]
annrecord &rarr feature [global|pos|range] [score ...]
[strand] [frame] [comment]
range &rarr begin end
score &rarr [key] value
See the formal definition
Comments
- An object 'annrecords' may be empty e.g. when no annotation
is included for a given input sequence;
- 'annrecord' answers to one line in GFF. It has been designed with
economy of space in mind: the source and the sequence name are not
repeated, the position in sequence stated only as one number
if possible; 'strand', frame' and 'comment' are all optional.
By contract, multiple scores are allowed.
Example
The output of the SignalP
prediction server for the two sequence above:
<anndata>
<annsource>
<method>signalp</method>
<version>3.2</version>
</annsource>
<ann>
<sequence>
<id>RNP_BOVIN</id>
</sequence>
<annrecords>
<annrecord>
<feature>signal</feature>
<range>
<begin>1</begin>
<end>27</end>
</range>
<score>
<key>cutoff</key>
<value>0.43</value>
</score>
<score>
<key>D-score</key>
<value>0.901</value>
</score>
<comment>Y</comment>
</annrecord>
</annrecords>
</ann>
<ann>
<sequence>
<id>HBA_HUMAN></id>
</sequence>
<annrecords>
<annrecord>
<feature>signal></feature>
<range>
<begin>1></begin>
<end>22></end>
</range>
<score>
<key>cutoff</key>
<value>0.43</value>
</score>
<score>
<key>D-score</key>
<value>0.038</value>
</score>
</annrecord>
</annrecords>
</ann>
</anndata>
The same data in
GFF:
##gff-version 2
##source-version signalp-3.2
##date 2007-10-27
##Type Protein
# seqname source feature start end score N/A ?
# ---------------------------------------------------------------------------
RNP_BOVIN signalp-3.2 signal 1 27 0.901 . . Y
HBA_HUMAN signalp-3.2 signal 1 22 0.038 . .
'EMBRACEimage'
The data type 'EMBRACEimage' answers to a binary image format whose content
is difficult/unnecessary to type in XSD. Typically this format can be PNG/JPEG/GIF images.
This type of data typically represents endpoints of workflows, e.g. various visualization outputs,
3D rendering of protein structures, graphs etc.
Synopsis
EMBRACEimage &rarr [comment] [encoding] [MIMEtype] [content]
See the formal definition
Comments
- comment describes with words the content of the object. Since there
is no strict typing of the binary content this comment is valuable! encoding
can be either none or base64. The MIMEtype currently supports image/png, image/bmp, image/gif,
image/tiff, or image/jpeg. The content element contains the raw representation of the image, likely always to be
base64 encoded.
Example
<image>
<comment>3D structure of RNP_BOVIN</comment>
<encoding>base64></encoding>
<MIMEtype>image/png></MIMEtype>
<!-- This is one continous string, representing the encoded content -->
<content>RU1CUkFDRSBOZXR3b3JrIG9mIEV4Y2VsbGVuY2UK</content>
</image>
'EMBRACEdocument'
This object is identical to 'EMBRACEimage', but has basic document
MIMEtype enumerations (application/pdf, application/postscript, application/msword, application/rtf)
Example
<image>
<comment>Report of analysis (RNP_BOVIN)</comment>
<encoding>base64></encoding>
<MIMEtype>application/pdf></MIMEtype>
<!-- This is one continous string, representing the encoded content -->
<content>RU1CUkFDRSBOZXR3b3JrIG9mIEV4Y2VsbGVuY2UK</content>
</image>
CONTACT
Kristoffer Rapacki,
|