Wednesday, July 15, 2015

Network Viz in Cytoscape

A few disclaimers before getting started: First, there are many tools for network visualization in R, Matlab, Python ... you name it. Cytoscape may just be the best, however. Second, Cytoscape can do far more than simply visualize your network-it can analyze it, too. This ability to analyze the network (identify connected components, node degree, etc.) enhances the types of visualizations you can perform. In this introductory tutorial, our focus is simply on making your network into a beautiful figure, but don't leave with the impression that Cytoscape is limited to making pretty pictures.

Which UVA Professors Have Published Together? A Co-Publication Network

It is interesting to visualize a community of people as a network, often done with data from social networking sites like Facebook. For this tutorial, I chose to visualize the connections between a group of 35 researchers at the University of Virginia.

While I won't go into the details of collecting the data, the code can be found here on the bitbucket repository. The gist is that a python script downloads the list of publications for each researcher in Pubmed. The coauthors on each publication are compiled into a list, and then searched to see if anyone authored papers with  each other. We also went one step further to check if any two researchers at UVA wrote papers with a shared coauthor. Two shortcomings which don't much matter here: Pubmed doesn't have every paper these authors wrote (or in some cases, any papers they wrote), so this doesn't give the full picture; and, because the method depends on name matching, if someone has the same name, or the same initials, there could be false positives. But, we just want to visualize it, errors and all. You start by formatting the data correctly.

Cytoscape accepts a few formats, and one of the simpler ways is to have a text file with a single line for each edge in the network. The format is <Source Node><delimiter><Edge Type><delimiter><Target Node>. In this network, it's <Author1><Tab><Number of Papers Written Together><Tab><Author2>.
The "sca1_2" stands for "shared co-author between authors 1 and 2". The edge type of -1 is to distinguish indirect edges from direct edges (which are positive).

Similarly, a node attributes file can be created which adds additional information about each node. The format is <Node Name><delimiter><Attribute>, like so:

Visualization in Cytoscape

First, you'll need to download and install Cytoscape. It requires up-to-date Java, so be prepared to install a couple of things before you're ready to roll.

Open Cytoscape

First things first, import your network:
Open From Network File "coauthorshipNetwork.tsv".
Show Text File Import Options and deselect "space" as a delimiter while keeping "tab".
Assign Source as column 1, Interaction as column 2, and Target as column  3.

Notice that the network is in a default style and layout. Change those immediately (defaults are not the way to impress):
In the Control Panel select the "Style" tab and change from default to "Solid".
Layout->yFiles Layout->Organic

Cytoscape allows a lot of flexibility in formatting the network appearance based on network attributes. For example, to make the node appearance dependent on whether the author is in our list, versus a common co-author that we're less interested in, we first need to add that information to the network:
File -> Import Table -> File "author_attributes.tsv".

Now, on to customizing your network appearance:
In the control panel, select "Node" style attributes.
For "Fill Color", set column to "Type", "Mapping Type" to "Discrete Mapping", "CommonCoAuthor" to grey and "MainAuthor" to yellow.
For "Size", "CommonCoAuthor" to 10 and "MainAuthor"to 60.
For "Label Font Size", "CommonCoAuthor" to 3 and "MainAuthor"to 30.

In the control panel, select "Edge" style attributes.
Set the "Stroke Color" to correspond to the interaction value, with a discreet mapping such that the indirect edges (value = -1) are light grey, and the direct edges (value > 0) are dark grey.
Set the edge "Width" to correspond to the interaction value, with a discreet mapping such that the indirect edges are width 1, and the direct edges are width 7.

Right click to add nodes. Select nodes and right click again to edit the node names. Using the "table panel", you can edit the attributes of the nodes and edges so that the styles you just set will apply. For example, you can set the new new nodes to be "MainAuthor" so that they are displayed big and yellow.

And you're off to a good start. Remember, Cytoscape is powerful and allows for a lot of analysis and infinite customization. Few things can get your audience interested as quickly as a beautiful network.

No comments:

Post a Comment