Exercise 1: Visualizing interaction networks
This exercise introduces you to Cytoscape, a software program for visualizing networks (www.cytoscape.org). In this exercise, you are to familiarize yourself with Cytoscape and its visualization abilities. Then you will work with a few plugins to do a statistical analysis of an interaction network.
Part 0. Before we get started…
You will need to install the correct Cytoscape version and get data and plugins from http://www.cbs.dtu.dk/phdcourse/cookbooks/Cytoscape/
In this exercise we will use a subset of the human interaction dataset by Rual et al. (Nature.2005 Oct 20;437(7062):1173-8).
Part I. Getting started:
STEP 1: Install and launch Cytoscape. You should see a window that looks like this:
STEP 2: Load the network “RUAL.subset.sif” into Cytoscape by selecting from the menu: File -> Import -> Network (multiple file types), then ‘Select’ the location of the file (maybe on your Desktop) and then ‘Import’.
This network consists of 1089 interactions observed between 419 human proteins, and is a small subset of a larger human interaction dataset. This subset consists of proteins that interact with the transcription factor TP53. Note that Cytoscape will only create an automatic view if there are less than 500 nodes.
Close the Loading Network window when it has finished.
Part II. Network layout & Selecting nodes:
STEP 3: Try some of the different layouts (circular, organic, hierarchical and random) by selecting the appropriate layout in the ‘yFiles’ under Layout.
By default, Cytoscape generates a grid layout which is not very useful. One of the most useful layouts for network biology is the spring layout (similar to the organic layout). Try the spring embedded layout:
Layout -> Cytoscape Layouts -> Spring Embedded.
STEP 4: In the Cytoscape canvas (the blue window with the network view) you can select and move nodes by clicking and dragging with the left mouse button. Select a few nodes, and move them around the screen.
STEP 5: The nodes in this network are labeled by numeric Entrez IDs, which are the IDs employed by NCBI (www.ncbi.nlm.nih.gov). Add gene symbols for this network by importing the node attributes in RUAL.subset.names.tab
Do this in Cytoscape by File -> Import -> Attribute from Table -> Select File. Then select “Show Text File Import Options” and then “Transfer first line as attribute names”, now “Import”.
Select TP53 using Filters. Start by creating a new Filter under “Options” (see figure).
After you provide a name for your filter, like “NodeName”, select which attribute you want to filter on. In this case it will be “node.Symbol” then click “Add” to add the filter.
Type “TP53” in the text box and click “Apply” (TP53 will now appear yellow in the network).
You can unselect any selection by clicking on the canvas. You can also select nodes that interact with a specific node (e.g. TP53) by STEP 6.
STEP 6: With TP53 selected, select the first neighbors of this node: Select -> Nodes -> First neighbors of selected nodes. You should see a network with several yellow nodes in the center.
Q1: How many proteins interact with TP53?
Part III. Network statistics:
In this part of the exercise, you will work with the NetworkAnalyzer plugin.
One of Cytoscape’s strengths is the ability to write plugins that can be run in Cytoscape. There is a large community of developers that contribute with such plugins.
STEP 7: First, deselect all nodes in the network (click anywhere not over a node or edge in the network panel). Then apply the NetworkAnalyzer plugin to the network by selecting from the Plugins menu, Network Analysis -> Analyze Network. Choose the default undirected interpretation of the interactions. This should produce the following window with many results tabs:
As you can see, the NetworkAnalyzer plugin calculates various network parameters. Browse through the various network statistics/parameters and try to answer the following questions:
Q2: What is the average degree (connectivity) of the network?
Q3: What is the most likely degree of a random selected node in the network? And where is TP53 in the node degree distribution?
Q4: Use the node degree distribution and the distribution of average cluster coefficient (C(k)) to determine whether the network structure is random, scale free or hierarchical?
Part IV. Identification of complexes:
As the average cluster coefficient is relatively high it is to be expected that there will exists some clusters (complexes) in the network. Next we will try to identify these using the MOCDE algorithm.
STEP 8: Start MCODE: Plugins -> MCODE -> Start MCODE. At the bottom of the MCODE panel (bottom left), click Analyze. This will identify several complexes.
STEP 9: Try clicking on a complex (a complex in the "Cluster Browser" tab of MCODE "Results Panel", on the right side). This will highlight the complex (yellow nodes) in the large network. Try to browse through all complexes with a score above 1.
· How many of these complexes would you have found by manual inspection of the large network?
Part V. Extra:
STEP 10: Have a look at the shortest path length distribution for the entire network using the NetworkAnalyzer plugin.
Q5: What is the highest number of edges that you need to connect any two nodes in the network?
This phenomenon is known as ‘small-world-network’ and can be found in many real life networks, e.g. the network that connects actors who have appeared in the same movie.
STEP 11: You can connect any two actors on http://oracleofbacon.org/. Try, just for fun, with a few actors and see how many edges (movies) that are required to connect them.