Exercise on drug design

Irene Kouskoumvekaki

The purpose of this exercise is to familiarize you with web servers relevant to drug design.

Part I: Similarity and biological information


SMILES: (Simplified Molecular Input Line Entry Specification) Specification for unambiguously describing the structure of chemical molecules using short ASCII strings.

Tanimoto coefficient (TC) is a simple structural similarity measure based on binary bit strings:

TC = C / (A + B - C)

A: bit set in structure A

B: bit set in structure B

C: bit set in common between structure A and structure B

Example of binary bit string:

0: absence of a specific fragment

1: presence of a specific fragment

Here is a list of 5 drug compounds:

  1. Chlorguanide
  2. Sildenafil
  3. Citalopram
  4. Erythromycin
  5. Ketoconazole

Go to the PubChem:For each one of the compounds in the list above, go through the following steps and fill in the information in the table at the bottom.

  • Write the name of the molecule in the "Search PubChem Compound for" field and click on Go.
  • Click on the first 2D structure.
  • Note down the CID, the canonical SMILES and the pharmacological action of the compound.
  • Click on the first icon on the top right (the bi-cycle) and select "Bioactivity, this compound".
  • Note down how many bioassays this molecules has been tested on.
  • Click on "Add Similar Compounds" and note how many molecules are similar to the query.
  • Click on "Structure-Activity". You will see a dendrogram (Bioassay cluster) with the bioassays that this set of molecules has been tested. Note the bioassays for which your query compound is shown to be active.
  • On the left side there is a second dendrogram (Compound cluster) based on Tanimoto similarity. Locate the CID of your query. Choose the compound (or one of the compounds, if you have more than one options) with the highest similarity to your query. (You may need to collapse the dendrogram by clicking on the closest blue circle and select "Compounds in Structure Clustering".)
  • Note its Tanimoto coefficient to the query, its pharmacological action (if available) and the bioassays that it is active on.
Compound Pharm. action CID Tested bioassays Similar compounds (#) Active bioassays Most similar compound (Tanimoto coef., pharm. action, active bioassays
  • Which of the compounds from the list are toxic at high concentrations (active on the 1195 bioassay)?
  • Click on 1195 on any one of the dendrograms to see how many compounds in PubChem have tested activity on this bioassay.

Part II: Prediction of pH-dependent solubility

Now go to pHSol: pHSol is a model based on artificial neural networks (ANN) that predicts the pH-dependent solubility of a given drug-like small molecule.

Copy and paste the SMILES from the previous exercise to the window on the left. Then tick the box "Generate graphics" and click on Submit. Fill in the table below with the model's output.

Compound Intrinsic logS Ionised at pH:7? Ionised at basic or acidic pH? Ionizable groups


If you want to find more drug-related information about these 5 compounds, you can play with the following internet tools below:

Prediction of druglikeness.

Prediction of drug-protein interactions.