Chemical Biology output (Annotation vs Suggestion): The chemical structures can be fed to
the server in SMILES or sdf format, or sketched in
the JME applet. It is possible to use generic name or
chemical name in order to retrieve the chemical structure associated to the
name. Then, the structures are transformed to 2D sdf-format with the Molconvert
program from ChemAxon. Molecules are cleaned of their
solvent if present and fingerprint are computed for comparison with our
The results are presented in a table format where biological and chemical
information are described through various functionality :
- The table shows first chemical annotation
("annotated") if present in our databases. They are followed by chemical
suggestion ("suggested") sharing similar biological activity based on their chemical similarity. The chemical bioactivity is
categorized in function of the species.
- Pointing the cursor into the compound ID, you
will see the 2D structure of the molecule with the molecule name and some
physicochemicals features like molecular weight, logP....
- For each compound in the table, the protein target name,
biological activity measurement and value related to, and database where the information come
from are shown. The Tanimoto coefficient (Tc) is also added for predicted
- In "Type" column, biological activity
measurement is defined when knwon (i.e. Ki, IC50, %inhib...). For STITCH, no biological
activity values are associated to a protein, but rather a score between "0" and
"1000", where "1000" is a high confidence of a chemical-protein interaction. For PubChem assays, only
confirmed assays were considered and compounds defined as "active" for these assays are gathered.
In the "value column", the activity value are directly associated to the type of assay apart for Wombat,
Wombat-PK and Ki database where activity values has been normalized to -log (g/mol)
- To the target name is linked also the "uniprot
and "ensembl ID" where you can get more information about the protein.
- If the protein is part of a disease complex, by
clicking on "Disease" you will be redirected to our disease complexes predictive
server. The protein can be involved in several complexes and the user will have
to select a disease complex by clicking on one the complex.
EXAMPLE OUTPUT FOR ASPIRIN
output: The complex disease server is dedicated in the analysis of proteins involved
in a particular disease. The protein seed of the complex is defined with its HUGO name and
Ensembl ID. The "size" number define the number of proteins directly involved in
the proteins complex. The "BioAlma Terms" refers to the number of disease terms
associated to the complex. Proteins involved directly in the biological network are
shown in the figure. In our server, we have an updated collection of more of 428
000 human protein-protein interaction (Lage K. et al. Nat. Biotechnol. 2007). A protein protein interaction is defined when a physical interaction
between both protein has been determined experimentally. Then 2227 disease-associated protein
complexes have been generated and analyzed through 5 different sources of information:
OMIM is an open source database focusing on the relationship between genotype and phenotype. In
our server, the pvalues associated with each disease represents the enrichment of proteins from this
disease in the particular complex. Proteins associated with the specific diseases are in yellow in the
In BioAlma, the relevance scores of disease terms related to complexes are based on the
co-occurences of disease terms and the genes in the complex in Medline documents. The more
co-occurences observed relation to the gene disease-term pair, the higher is the "weights" value.
- Go Biological Process and Go Cellular component:
To ensure that the complexes were biologically relevant entities the enrichment of gene ontology
(GO) terms (biological process and cellular component) was compared to randomly generated
The enrichment of proteins co-occuring in the same tissues was determined using high quality
manuallu curated imunnohistochemistry data downloaded from the Human Protein Atlas. The enrichment was
compared to randomly generated complexes.
- mRNA expression:
Complexes were mapped to tissues using the expression data from 73 non diseased tissues from the
Novartis research Foundation gene Expression Database (GNF atlas). The higher is the z-score, the
higher the tissue will be affected by the complex of protein.
EXAMPLE OUTPUT FOR ASPIRIN