Chemical Biology output (Annotation vs Suggestion): The chemical structures can be fed to
the server in SMILES or sdf format, or sketched in
the JME applet. It is possible to use generic name or
chemical name in order to retrieve the chemical structure associated to the
name. Then, the structures are transformed to canonical SMILES and sdf-format using OpenBabel
program. Solvent and ions are removed from the structures, if present, and fingerprint are computed for comparison with our
collection of chemicals
Once the query chemical has been submitted, the user is redirected into a page where proteins, diseases and chemicals
information are described through various functionality (see Figure below):
- 1. On the top, the user can modify the display setting and optimize the query. It is
possible to select the data sources and filter the results from the query. For example, under
"Display Settings", the user can select the visualisation of the output in the heatmap selecting
"Circle", "Fill" or "Rectangles". If several bioactivities were measured with the same activity
type (Ki, IC50,...) for the same target, the user can decide to visualize the lower value (Min), the
higher (Max), the average (Median) or all of them (All). In the last case, the circle will be
filled according to the spectrum of activities. The spectrum of colors in the heatmap is going from Red (strong
binders) to Blue (weak binders). Under "Data Sources", the number of compounds with bioactivities
from each database are shown (). Finally, in the fields "Activity values", the user can
filter the bioactivity information. It is possible to select and to show a specific activity type.
For example, click the button to the left of Ki and you will highlight the protein-compound
cells on which Ki information are present. You can specify a minimum and maximum value for each activity type
and the heatmap will only display the protein-compound interactions with affinity measures between these values.
- 2. On the left, a heatmap representing the chemical-protein interactions are shown.
Compounds on the X axis are annotated to protein on the Y axis. The query compound or protein is
shown in blue whereas the other similar entries found by ChemProt are highlighted in pink. Compounds are defined with a
ChemProt identifier and the number in parentheses corresponds to the Tanimoto coefficient score to the query Chemical
(between 0 = no similarity, to 1 = high similarity). Pointing the cursor to a compound identifier will display the
2D structure, SMILES and some physicochemicals parameters of the compound. In the pie charts, each color correspond to a
database. For example the black pie correspond to data from the STITCH database and the green pie,
data from ChEMBL. By pointing the cursor to the pie chart, the compound-protein interaction will
be highlighted and activity types and values are depicted. Regarding proteins, the uniprot ID,
Ensembl ID and protein name are shown and further information about the function of the protein
can be obtained by clicking on them.
We should notify that for clarity the heatmap is only showing information on
100 proteins per page. If there are more chemical-protein interactions, the user
has to click on the arrow in the end of the protein axis in order to get access to the
next 100 chemical-protein interactions. A similar option is also applied for chemicals.
The user can get the full data related to the query by clicking on "Table view"
or download the results by clicking on the "Download results" by button.
- 3. On the right a heatmap displaying the protein-disease interactions are depicted. The
diseases have been categorized according to the human disease network described by Goh et al. (Goh
K.I. et al., PNAS, 2007, 104:8685-8690). It corresponds to 22 categories and 1400 sub-categories.
When a protein-disease interaction is present in a category, it is depicted with a red dot.
Pointing the cursor on it, the sub-category will be highlighted.
- 4. If a protein is part of a disease complex a link will be visual next to the "Disease categories" heatmap.
By clicking on this link ( "Diseases" ) you will be redirected to our disease complexes predictive server. (Figure below)
The complex disease server is dedicated to the analysis of proteins involved
in a particular disease. The protein seed of the complex is defined with its HUGO name and
Ensembl ID. The "size" number defines the number of proteins directly involved in
the protein complex. In our server, we have an updated collection of more of 507 000 human protein-protein interaction (Lage K. et al. Nat. Biotechnol. 2007).
A protein-protein interaction is defined when a physical interaction
between both proteins has been determined experimentally. Then disease-associated protein
complexes have been generated and analyzed through 6 different sources of information:
  - OMIM:
OMIM is an open source database focusing on the relationship between genotype and phenotype. In
our server, the p-values associated with each disease represent the enrichment of proteins from this
disease in the particular complex. Proteins associated with the specific diseases are in red in the
protein complex.
  - GeneCards:
In GeneCards, the relevance scores of disease terms related to protein complexes are based on the
co-occurrences of disease terms and the genes in the complex in Medline documents. The more
co-occurrences observed between the gene disease-term pair, the lower is the p-value.
  - Go Biological Process and Go Cellular component:
To ensure that the complexes were biologically relevant entities the enrichment of gene ontology
(GO) terms (biological process and cellular component) was compared to randomly generated
complexes.
  - KEGG pathway and Reactome:
The enrichment of proteins involved in the same pathway was determined using KEGG and Reactome.
Pointing the cursor to a biological term will highlight in red the proteins associated to the specific term.
GETTING HELP
Scientific problems:
Technical
problems: