Exercise on data integration
This exercise will focus on using the software Cytoscape, and its application to systems biology. Cytoscape is a Java-based, open source application for visualizing molecular interaction networks and for integrating these with other data such as gene expression values, etc. The exercise will start by covering the most basic features in Cytoscape, and will slowly move towards trying to use the tool to form some biological conclusions. Throughout all of it we will use data on the galactose utilization pathway in yeast.
(Note: In this exercise the term "node" will refer to genes and/or proteins, whereas the term "edge" will be used to describe interactions between nodes in a network regardless of their type. This convention is used in most publications on biological interaction networks and with the Cytoscape documentation where the terms are used interchangeably)
Background: The Galactose Utilization PathwayFigure and text from Ideker et al. Science 2001 (1)
As shown in the Figure above, galactose utilization consists of a biochemical pathway that converts galactose into glucose-6-phosphate and a regulatory mechanism that controls whether the pathway is on or off. This process has been reviewed extensively (2, 3) and involves at least three types of proteins. A transporter gene (GAL2) encodes a permease that transports galactose into the cell; several other hexose transporters (HXTs) may also have this ability (4). A group of enzymatic genes encodes the proteins required for conversion of intracellular galactose, including galactokinase (GAL1), uridylyltransferase (GAL7), epimerase (GAL10), and phosphoglucomutase (GAL5/PGM2). The regulatory genes GAL3, GAL4, and GAL80 exert tight transcriptional control over the transporter, the enzymes, and to a certain extent, each other. GAL4p is a DNA-binding factor that can strongly activate transcription, but in the absence of galactose, GAL80p binds GAL4p and inhibits its activity. When galactose is present in the cell, it causes GAL3p to associate with GAL80p. This association causes GAL80p to release its repression of GAL4p, so that the transporter and enzymes are expressed at a high level.
Although these genes and interactions form the core of the GAL pathway, the complete regulatory mechanism is more complex (5-8) and involves genes whose roles in galactose utilization are not entirely clear (9, 10). For instance, the gene GAL6 (LAP3) functions predominantly in a drug-resistance pathway, but can suppress transcription of the GAL transporter and enzymes under certain conditions and may itself be transcriptionally controlled by GAL4 (11).
In this exercise we will integrate gene expression data from gene deletion studies with protein-protein interaction network. In the study by Ideker et al. Science 2001, the yeast transcription factors Gal1p, Gal4p, and Gal80p were analyzed for their importance in galactose utilization pathways. In the gene deletion study, a gene is deleted and the expression value in the mutant is compared to the wild type. This is reported as log10 expression ratio (mutant/wildtype).
Part I. Loading network and expression data
STEP 0: Download the files you will need for this exercise.
STEP 1: Start Cytoscape and import the network "galFiltered.sif" If you had trouble with the file extension, your Cytoscape installation's sampleData directory will already have this file.
Your network will contain a combination of protein-protein (pp) and protein-DNA (pd) interactions.
STEP 2: Import expression data table: File -> Import -> Attribute from Table (Text/MS Excel)… and select the "galExpData.txt" file you downloaded in STEP 0.
In the "Import Attribute from Table" dialog, select the "Show Text File Import Options" and then select "Transfer first line as attribute names".
Part II: Coloring nodes
It is common to use expression data in Cytoscape to set the visual attributes of the nodes in a network. This visualization can be used to portray functional relation and experimental response at the same time. The steps for doing this are as follows:
STEP 4: To set visual properties: select the "VizMapper" in the Control Panel.
STEP 5: On the VizMapper manager window, click the button to create a new visual style (see figure) named something like "GalStyle" to duplicate the default style.
STEP 6: Set the "Node Color" attribute as follows:
Q2: Use this visualization to identify the gene that is the most up-regulated in the "gal80" knock-out experiment.
Part III. Using p-values
Now we set the node size based on the in setting visual properties.
STEP 7: The p-value is a measure for how likely it is that a given expression change has happened by random. Hence, p-values (e.g. "pval_gal80") range from 0 to 1, as they should, and the log10(p-values) (e.g. "logp_gal80") ranges from -infinity to 0. Select some nodes and look at their relative expression and p-values in the Data Panel. You can sort the list (up or down) by clicking on the column headings.
STEP 8: Now, we will explore setting node size according to log10(p-values). Bigger nodes will then reflect more significant changes in expression for the attribute you select.
Note: Do not click on the "Add" button in this step. You will be certain to have troubles if you click on "Add" many many times as tends to happen.
Q3: What color are the smallest nodes in your network?
Part IV. Biological analysis scenario
This section presents one scenario on how expression data can be combined with network data to tell a biological story. But first we need to load more relevant gene names.
STEP 9: Load the "ORF2name.na" node attribute file to get common gene names associated with our systematic ORF names used to define the network.
STEP 10: In the VizMapper, find the Node Label attribute and set it to "GeneName" (this is the attribute you just loaded) and select passthrough mapping.
STEP 11: Now select the neighborhood of GAL4 and create a new sub-network.
Q4: If GAL80 levels are low (or absent) but most of the other genes linked to GAL4 show significant levels of induction, what does this say about the role of Gal80p? Is Gal80 activating or inhibiting the activity of Gal4?