Exercise 2: Data Integration



In this exercise we will integrate gene expression data from gene deletion studies with a protein-protein and protein-DNA interaction network. In the study by Ideker et al. Science 2001, the yeast transcription factors Gal1p, Gal4p, and Gal80p were analyzed for their importance in galactose utilization pathways.

Part I. Loading network and expression data

STEP 1: Start Cytoscape and load the network “galFiltered.sif” that you have downloaded from here. It can also be found in your Cytoscape installation’s sampleData directory (it should be there).


Your network will contain a combination of protein-protein (pp) and protein-DNA (pd) interactions.


STEP 2: Import expression data table: File -> Import -> Attribute/Expression Matrix…, and select the “galExpData.pvals” file that you have also downloaded from here or found in your sampleData directory. This file contains gene expression measurements for three knock-out perturbation experiments. In each experiment, the expression for a different transcription factor knock-out strain was measured.


The “XXXexp” attributes are the average log-ratios while the “XXXsig” are the corresponding p-values for the replicated expression study.


After a brief load, a status window will appear, indicating how many experimental conditions were found (three) and what type significance values were included.


STEP 3: Now we will use the ‘Data Panel’ to browse through the expression data (node attributes), as follows.

                    i.            Select some node in the Cytoscape canvas.

                  ii.            In Data Panel, click the Select Attributes button (top left table icon of Panel), and select the attributes ‘gal1RGexp’, ‘gal4RGexp’, and ‘gal80Rexp’.

Part II: Coloring nodes

It is common to use expression data in Cytoscape to set the visual attributes of the nodes in a network. The steps for doing this are as follows:


STEP 4: To set visual properties: select the “VizMapper” in the Control Panel.



STEP 5: On the VizMapper manager window, click the button to create a new visual style (see figure) named something like“Gal80” to duplicate the default style.


STEP 6: Set the “Node Color” attribute as follows:


                                i.            In the pull-down next to “Node Color”, select “gal80Rexp” (see figure).


                              ii.            Under the associated “Mapping Type”, select a “Continuous Mapping”.

                            iii.            Click on the “Graph View” field to bring up the mapping editor (see figure).



                             iv.            Add 3 break points and move them to -1, 0 and 1 respectively. You can do this with the help of the “Range Setting”.

                               v.            By double-clicking on the range handles (small triangles), set the colors (you should only need 2 colors),



then close the window.


                             vi.            Note that the default node color of pink may fall within this spectrum. A useful trick is to choose a color outside this spectrum to distinguish nodes with no expression value defined. Under Defaults, click anywhere on the image to open the default editor. Then set the “NODE_FILL_COLOR” default to grey and then “Apply”.


Part III. Using p-values

Here, we will use expression values and p-values together in setting visual properties.

STEP 7: Expression log-ratios range from about -3 to +3 in this study (log10 so 1/1000X to 1000X fold-change), the p-value ranges from 0 to 1, as they should. Select some nodes and look at their expression and p-values in the Data Panel. You can sort the list (up or down) by clicking on the column heading.


STEP 8: Now, we will explore setting node shapes according to p- values.

                       i.            Double-click the Node Size tab in the VizMapper setting window.

                     ii.            In the Map Attribute pull-down menu, select “gal80Rsig.

                    iii.            In the pull-down menu under Mapping, select Continuous Mapping

                   iv.            Click anywhere on the “Graphic view” row to bring up the mapping editor.

The y-axis represents the node size while the x-axis is the range of the attribute being mapped (0-1 in this case).

                      v.            First click “Add” twice to create 2 break points. Double-click on the lower bound handle (solid red square) and set this size to 55. Set the upper bound size to 20. Slide the lower break point to 0.001 using the black triangle or by using the “Range Setting”. Set the upper break point to 0.1. Set the lower break point size to 50 by double-clicking on the open square. Set the upper break point to 20.  You should see something like the following figure.

Close the mapping editor dialog.


Part IV. Biological analysis scenario

This section presents one scenario on how expression data can be combined with network data to tell a biological story. But first we need to load more relevant gene names.


STEP 9: Load the “ORF2name.na” from the data files table here:

File -> Import -> Node Attributes… and then Open (is the file on your Desktop?)


STEP 10: In the VizMapper, find the Node Label attribute and set it to “GeneName” using a "Passthrough Mapper" (this is the attribute you just loaded).


STEP 11: Now select the neighborhood of GAL4 and create a new sub-network.

                                i.            In the Control Panel, select the Filter and create a new filter (“NodeName” for example). Select the Attribute “node.GeneName” and Add the filter. Type GAL4 in the new text box and then click Apply.

                              ii.            To focus the view on the selected node, click the “Zoom Selected Region” in the menu bar.



Then zoom out with the ‘-‘magnifying glass. You should see something like,


                            iii.            While the GAL4 node is selected: Select -> Nodes -> ‘First neighbors of selected nodes’

                            iv.            Create a child sub-network: File -> New -> Network -> ‘From selected nodes, all edges’

                              v.            In the new sub-network, apply a graph layout algorithm using the yFiles Hierarchic layout.

                            vi.            Use the VizMapper to change the Edge Color attribute with a Discrete Mapping on the “interaction” attribute. This will distinguish regulatory interactions, “pd”, from protein-protein interactions, “pp”.



Now set the Edge Target Arrow Shape in the VizMapper with a Discrete Mapping on the “interaction” attribute again.




Notice that all three dark red nodes (highly induced genes) are in the same region of the graph. With a little exploration in the node attribute browser, you should see the following:

                     i.            The two genes that interact with all three highly induced genes are GAL11 (a general transcription cofactor with many interactions) and GAL4.

                   ii.            Both GAL4 and GAL11 show small changes in expression and neither change is statistically significant suggesting that the critical change affecting GAL1, GAL7, and GAL10 might be somewhere else in the network.

                  iii.            GAL4 interacts with GAL80, which shows a significantly lower level (GAL80 was deleted after all).

Q6: If GAL80 levels are low (or absent) but most of the other genes linked to GAL4 show significant levels of induction, what does this say about the role of Gal80p? Is Gal80 activating or inhibiting the activity of Gal4?


Exersice 3: Active Modules

A method for finding “active modules” in interaction networks using gene expression data was published in Ideker T, et al. “Discovering regulatory and signaling circuits in molecular interaction networks.” Bioinformatics. 2002. jActiveModules is the plugin implementation of this module search and scoring method. It requires that node p-value attributes have been imported (as we have done in the previous exercise).


STEP 1: Set the parameters that jActiveModules will use to score modules. Plugins -> jActiveModules to create the tab in the Control Panel. Select all available p-value attributes (3) under Expression Attributes For Analysis (this can be tricky, try ctrl-click starting from the bottom).


STEP 2: Making sure the main galFiltered network has been selected (this can be done in the Network tab of the Control Panel) run the search algorithm by clicking Find Modules in the jActiveModules panel. A results window should appear when the search is finished.




STEP 3: Select a module result by selecting a network row in the jActiveModules results window. This will select the corresponding nodes in the larger graph.


STEP 4: Select the second ranking module (with 14 nodes) and create a new sub-network. File -> New -> Network -> From selected nodes, all edges


STEP 5: Layout this sub-network with the method of your choice.



Q7: What can we guess about the activity of Rap1p in the gal80 deletion data?