Monday, November 9, 2015

GraPhlAn: Sorta Like Circos for Python

The right tool can save you a lot of time and effort. For those who need an easy-to-use Python tool for plotting data in a circular format, GraPhlAn is the way to go. Now, that may sound strangely specific ("Data in a circular format"?). Think of this as Circos for Python. Just as with Circos, GraPhlAn enables easy plotting on a circle, and like Circos, this tool was developed in order to make it easier to display biological information.

The authors went to great lengths to produce some excellent tutorials, and the package is easy to install and apply. The tool's intended use is to plot and annotate phylogenetic trees. While it's great at its job, it's capable of much more. Drawing another parallel (yes, a graphing pun) with Circos, while the original use was for genomics, circular plots are useful in many other contexts (see the Circos archive for examples of non-biological Circos applications). The same is true of GraPhlAn. All that's needed is to get rid of that phylogenetic tree in the middle, and you're ready to go.


Getting Rid of the Phylogenetic Tree


First, let's go back to how a GraPhlAn figure is made.
  1. Start with a Phylogenetic tree (accepts many formats). 
  2. Annotate the tree (using a annotation file with a GraPhlAn-specific format)
You may notice that there's not much room for "getting rid" of the phylogenetic tree. So, the next best thing is to hide it. Suppose you were going to plot some data on the 2016 Presidential Candidates or NBA player projections ... you need to start with a phylogenetic tree. What you'll do is create a tree with a branch for every element of your data (every presidential candidate or every NBA player). Something like the following:
Notice that it's literally just a list of the branches I want. "Branch1" could easily be Tyrone Corbin and "Branch2" could be Ron Harper.

Now, to make those branches invisible. In the beginning of the "annotations.txt", you just set the branches and markers to have thickness/size 0. It's easy as pie (aaaaaaand, a circle pun).

And the final result is a plot with no phylogenetic tree in the middle, but with lots of outer rings for plotting anything you want.

Plotting Anything You Want


The data is embedded in the annotation file. In this example, there are two ways to display numerical data (the GraPhlAn tutorials cover more ways, if you're interested): Heatmaps and Bars.

The "heatmap" rings are setup by setting the entire ring to the same color, and modulating the transparency of each element around the ring (less transparent = darker color):
In the first line, the ring transparency ("ring_alpha") for ring #2 (counting from inside to outside, where ring #1 is hidden), and branch "Branch51" is set to 0.005396837. In the second line, the color for ring #2 is set to #AAAA00 (that greenish-gold color). This is repeated for every branch.

The tall black bars around the outer edge are done in a different way. Here, the height of the ring elements are modulated:
Here, the height for the element of ring #5 that corresponds to "Branch51" is set to 2.3946... The default color is black. There's no reason you couldn't combine the heatmap and barchart methods to double-up the information content of each ring.

Having created an input "phylogenetic tree" file, and an accompanying annotations file, you're ready to plot. As per the GraPhlAn tutorial (and having installed GraPhlAn), you would then run the following code:
$ graphlan_annotate.py --annot annotations.txt custom_tree.txt no_tree.xml
$ graphlan.py no_tree.xml no_tree.png --dpi 300 --size 3.5

Resources

The code for this example is available at the Eats, Graphs and Leaves code repository.

For more GraPhlAn information, see the associated publication, and the Segata lab website.

No comments:

Post a Comment