Exercise on data integration
This exercise will focus on using the software Cytoscape, and its application to systems biology. Cytoscape is a Java-based, open source application for visualizing molecular interaction networks and for integrating these with other data such as gene expression values, etc. The exercise will start by covering the most basic features in Cytoscape, and will slowly move towards trying to use the tool to form some biological conclusions. Throughout all of it we will use data on the galactose utilization pathway in yeast.
(Note on vocabulary: In this exercise the term "node" will refer to genes and/or proteins, whereas the term "edge" will be used to describe interactions between nodes in a network regardless of their type. This is completely analogous with many publications on interaction networks and with the Cytoscape documentation where the terms are used interchangeably)
Background: The Galactose Utilization PathwayFigure and text from Ideker et al. (1)
As shown in the Figure above, galactose utilization consists of a biochemical pathway that converts galactose into glucose-6-phosphate and a regulatory mechanism that controls whether the pathway is on or off. This process has been reviewed extensively (2, 3) and involves at least three types of proteins. A transporter gene (GAL2) encodes a permease that transports galactose into the cell; several other hexose transporters (HXTs) may also have this ability (4). A group of enzymatic genes encodes the proteins required for conversion of intracellular galactose, including galactokinase (GAL1), uridylyltransferase (GAL7), epimerase (GAL10), and phosphoglucomutase (GAL5/PGM2). The regulatory genes GAL3, GAL4, and GAL80 exert tight transcriptional control over the transporter, the enzymes, and to a certain extent, each other. GAL4p is a DNA-binding factor that can strongly activate transcription, but in the absence of galactose, GAL80p binds GAL4p and inhibits its activity. When galactose is present in the cell, it causes GAL3p to associate with GAL80p. This association causes GAL80p to release its repression of GAL4p, so that the transporter and enzymes are expressed at a high level.
Although these genes and interactions form the core of the GAL pathway, the complete regulatory mechanism is more complex (5-8) and involves genes whose roles in galactose utilization are not entirely clear (9, 10). For instance, the gene GAL6 (LAP3) functions predominantly in a drug-resistance pathway, but can suppress transcription of the GAL transporter and enzymes under certain conditions and may itself be transcriptionally controlled by GAL4 (11).
In this exercise we will integrate gene expression data from gene deletion studies with protein-protein interaction network. In the study by Ideker et al. Science 2001, the yeast transcription factors Gal1p, Gal4p, and Gal80p were analyzed for their importance in galactose utilization pathways. In the gene deletion study, a gene is deleted and the expression value in the mutant is compared to the wild type. This is reported as log10 expression ratio (mutant/wildtype).
Part I. Loading network and expression data
STEP 1: Start Cytoscape and import the network “galFiltered.sif” from your Cytoscape installation’s sampleData directory (it should be).
Your network will contain a combination of protein-protein (pp) and protein-DNA (pd) interactions.STEP 2: Import expression data table: File -> Import -> Attribute/Expression Matrix… and select the “galExpData.pvals” file from your sampleData directory. This file contains gene expression measurements for three knock-out perturbation experiments. In each experiment, the expression for a different transcription factor knock-out strain was measured.
After a brief load, a status window will appear, indicating how many experimental conditions were found (three) and what type significance values were included.STEP 3: Now we will use ‘Data Panel’ to browse through the expression data (node attributes), as follows.
Part II: Coloring nodes
It is common to use expression data in Cytoscape to set the visual attributes of the nodes in a network. This visualization can be used to portray functional relation and experimental response at the same time. The steps for doing this are as follows:
STEP 4: To set visual properties: select the “VizMapper” in the Control Panel.
STEP 5: On the VizMapper manager window, click the button to create a new visual style (see figure) named something like“Gal80” to duplicate the default style.
STEP 6: Set the “Node Color” attribute as follows:
Use this session file if you are having trouble getting step 6 to work. You may need to repeat Step 2.
Q2: Use this visualization to identify the gene that is the most up-regulated in the ‘gal80’ knock-out experiment.
Part III. Using p-values
Here, we will use expression values and p-values together in setting visual properties.
STEP 7: Expression log-ratios range from about -3 to +3 in this study (log10 so 1/1000X to 1000X fold-change). The p-value is a measure for how likely it is that a given expression change has happened by ‘random’. Hence, they range from 0 to 1, as they should. Select some nodes and look at their expression and p-values in the Data Panel. You can sort the list (up or down) by clicking on the column heading.
STEP 8: Now, we will explore setting node size according to p-values. Bigger nodes will then reflect more significant changes in expression (for the attribute you select).
Note: Do not click on the "Add" button more than once in this step. You will be certain to have troubles if you click on "Add" many many times as tends to happen.
Use this session file if you are having trouble getting step 8 to work. You may need to repeat Step 2.
Q3: What color are the smallest nodes in your network?
Part IV. Biological analysis scenario
This section presents one scenario on how expression data can be combined with network data to tell a biological story. But first we need to load more relevant gene names.
STEP 9: Load the "ORF2name.na" node attribute file to get common gene names associated with our systematic ORF names used to define the network.
STEP 10: In the VizMapper, find the Node Label attribute and set it to “GeneName” (this is the attribute you just loaded).
STEP 11: Now select the neighborhood of GAL4 and create a new sub-network.
Use this session file if you are having trouble getting step 11 to work. You may need to repeat Step 2.
Q4: If GAL80 levels are low (or absent) but most of the other genes linked to GAL4 show significant levels of induction, what does this say about the role of Gal80p? Is Gal80 activating or inhibiting the activity of Gal4?