Monday, March 30, 2015

Simple Maps with ggplot2

As part of a case competition I recently participated in, our team was struggling to put together a convincing argument for replacing NIH review sessions (a peer-review system for dispensing NIH research funds) with randomly-assigned grants. With only hours left before the deadline, we needed a quick way to display geographical data demonstrating uneven (*cough* biased) grant distribution under the current system. Without making a case for that idea (we did not win the case competition :), I did come across a simple method in R that uses the ggplot2 and maps toolboxes. I was able to go from zero to map in under thirty minutes. I was impressed, and you may find this useful next time you try to stick it to the man.

The R script and data can be downloaded from our bitbucket repository.

Step one, read in your data:

I pulled some data on recent NIH grants (all sizes), state populations, and the number of universities per state. I compiled this information into a file "nih_funding.txt", with an added column for the amount of NIH funding per individual in each state, and the amount of NIH funding per university in each state.

 
 library(ggplot2)  
 nih_data = read.table('nih_funding.txt',header=T,sep='\t')  
 nih_data$LOCATION = tolower(nih_data$LOCATION)  


Step two, plot your data:

First, plot NIH funding per university:

 
 # NIH.Funding.per.institution  
 states_map <- map_data("state")  
 m = ggplot(nih_data, aes(map_id = LOCATION)) +   
   geom_map(aes(fill = NIH.Funding.per.institution ), map = states_map) +   
   expand_limits(x = states_map$long, y = states_map$lat) +  
   theme_bw() +   
   theme(axis.title = element_blank(), axis.text=element_blank()) +  
   ggtitle("NIH Funding per Institution by State")  
 print(m)   
 ggsave(m, file="NIH_funding_by_institution.jpg", width=8, height=4)  
Because we're using ggplot2, the image is constructed layer by layer. First, a ggplot object is created,  a "geom_map()" layer is added. In this case, the map is chosen to be a map of the United States (a built-in option). The "theme_bw()" function removes the gray background. The "theme()" function removes the axis labels. "ggtitle()"--this may come as a surprise--adds a title to your image.

Run this, and your map should come out looking like this:



Then plot NIH funding per person:

 
 # NIH.Funding.per.person  
 states_map <- map_data("state")  
 m = ggplot(nih_data, aes(map_id = LOCATION)) +   
  geom_map(aes(fill = NIH.Funding.per.person ), map = states_map) +   
  expand_limits(x = states_map$long, y = states_map$lat) +  
  theme_bw() +   
  theme(axis.title = element_blank(), axis.text=element_blank()) +  
  ggtitle("NIH Funding per Person by State")  
 print(m)   
 ggsave(m, file="NIH_funding_by_population.jpg", width=8, height=4)  

Notice how some states seem to receive greater-than-average federal research money for the population size and the number of universities. This is not an in-depth analysis, so there may be good reasons for this apparent discrepancy. The real takeaway here is that in just two steps you find yourself staring at a beautiful map. Not bad for a day's work. 

No comments:

Post a Comment