Output format


Combined Model for Prediction of pH-Dependent

Chemical Biology output (Annotation vs Suggestion): The chemical structures can be fed to the server in SMILES or sdf format, or sketched in the JME applet. It is possible to use generic name or chemical name in order to retrieve the chemical structure associated to the name. Then, the structures are transformed to canonical SMILES and sdf-format using OpenBabel program. Solvent and ions are removed from the structures, if present, and fingerprint are computed for comparison with our collection of chemicals

Once the query chemical has been submitted, the user is redirected into a page where proteins, diseases and chemicals information are described through various functionality (see Figure below):

    - 1. On the top, the user can modify the display setting and optimize the query. It is possible to select the data sources and filter the results from the query. For example, under "Display Settings", the user can select the visualisation of the output in the heatmap selecting "Circle", "Fill" or "Rectangles". If several bioactivities were measured with the same activity type (Ki, IC50,...) for the same target, the user can decide to visualize the lower value (Min), the higher (Max), the average (Median) or all of them (All). In the last case, the circle will be filled according to the spectrum of activities. The spectrum of colors in the heatmap is going from Red (strong binders) to Blue (weak binders). Under "Data Sources", the number of compounds with bioactivities from each database are shown (). Finally, in the fields "Activity values", the user can filter the bioactivity information. It is possible to select and to show a specific activity type. For example, click the button to the left of Ki and you will highlight the protein-compound cells on which Ki information are present. You can specify a minimum and maximum value for each activity type and the heatmap will only display the protein-compound interactions with affinity measures between these values.

    - 2. On the left, a heatmap representing the chemical-protein interactions are shown. Compounds on the X axis are annotated to protein on the Y axis. The query compound or protein is shown in blue whereas the other similar entries found by ChemProt are highlighted in pink. Compounds are defined with a ChemProt identifier and the number in parentheses corresponds to the Tanimoto coefficient score to the query Chemical (between 0 = no similarity, to 1 = high similarity). Pointing the cursor to a compound identifier will display the 2D structure, SMILES and some physicochemicals parameters of the compound. In the pie charts, each color correspond to a database. For example the black pie correspond to data from the STITCH database and the green pie, data from ChEMBL. By pointing the cursor to the pie chart, the compound-protein interaction will be highlighted and activity types and values are depicted. Regarding proteins, the uniprot ID, Ensembl ID and protein name are shown and further information about the function of the protein can be obtained by clicking on them.

We should notify that for clarity the heatmap is only showing information on 100 proteins per page. If there are more chemical-protein interactions, the user has to click on the arrow in the end of the protein axis in order to get access to the next 100 chemical-protein interactions. A similar option is also applied for chemicals. The user can get the full data related to the query by clicking on "Table view" or download the results by clicking on the "Download results" by button.

    - 3. On the right a heatmap displaying the protein-disease interactions are depicted. The diseases have been categorized according to the human disease network described by Goh et al. (Goh K.I. et al., PNAS, 2007, 104:8685-8690). It corresponds to 22 categories and 1400 sub-categories. When a protein-disease interaction is present in a category, it is depicted with a red dot. Pointing the cursor on it, the sub-category will be highlighted.


        - 4. If a protein is part of a disease complex a link will be visual next to the "Disease categories" heatmap. By clicking on this link ( "Diseases" ) you will be redirected to our disease complexes predictive server. (Figure below)


The complex disease server is dedicated to the analysis of proteins involved in a particular disease. The protein seed of the complex is defined with its HUGO name and Ensembl ID. The "size" number defines the number of proteins directly involved in the protein complex. In our server, we have an updated collection of more of 507 000 human protein-protein interaction (Lage K. et al. Nat. Biotechnol. 2007). A protein-protein interaction is defined when a physical interaction between both proteins has been determined experimentally. Then disease-associated protein complexes have been generated and analyzed through 6 different sources of information:

        - OMIM:
OMIM is an open source database focusing on the relationship between genotype and phenotype. In our server, the p-values associated with each disease represent the enrichment of proteins from this disease in the particular complex. Proteins associated with the specific diseases are in red in the protein complex.

        - GeneCards:
In GeneCards, the relevance scores of disease terms related to protein complexes are based on the co-occurrences of disease terms and the genes in the complex in Medline documents. The more co-occurrences observed between the gene disease-term pair, the lower is the p-value.

        - Go Biological Process and Go Cellular component:
To ensure that the complexes were biologically relevant entities the enrichment of gene ontology (GO) terms (biological process and cellular component) was compared to randomly generated complexes.

        - KEGG pathway and Reactome:
The enrichment of proteins involved in the same pathway was determined using KEGG and Reactome.

Pointing the cursor to a biological term will highlight in red the proteins associated to the specific term.


Scientific problems:        Technical problems: