Dots and Dynamite

For this post we’ll be looking at the PlantGrowth data from base R which compares the growth of thirty plants, ten in a control group and ten in each of two experimental groups. The final weight of each plant and the group it was in is recorded in the dataframe.

head(PlantGrowth)
  weight group
1   4.17  ctrl
2   5.58  ctrl
3   5.18  ctrl
4   6.11  ctrl
5   4.50  ctrl
6   4.61  ctrl

Here’s a shockingly contentious question: What is the best way to show the mean weights of the various groups?

There are a lot of answers to this. In many fields a bar plot is used to show the mean with whisker-like error bars to show either the standard error or the standard deviation. The standard error is a way to estimate the true position of the mean.

Let’s take a look at this method, showing the standard error.
meanbar

Here’s the problem with this. Think about what the bar in this graphic is supposed to represent. Have you figured it out? It doesn’t represent anything. Frankly, that’s a problem. In a traditional barplot or histogram that bar is there to represent what we think of as mass, the graphic builds upward from zero, they are like towers where the top is the highest point. The center of data is a pure abstraction, it is not at all like the highest point in a tower.

People who aren’t fans of this method like to call this kind of image a dynamite plot because it looks like an old fashioned detonator box.

A commonly proposed alternative is to use dotplots.

Here’s what a dot version of the same information looks like, again with the standard error.

meandot

Certainly this has an advantage of lacking the meaningless, and potentially confusing, bar. The criticism most frequently leveled at this technique is that it magnified the difference between groups. In fact with ggplot2 (which we are using here) the differences will be shown as large as possible. While it is admittedly true that this should be considered as a problem isn’t clear that anchoring the plot at zero is any better. After all, a barplot with shrink apparent differences between means. Besides, the whole reason that standard error exists is in order to help us make appropriate comparisons.

Making the PlantGrowth data work for us take a little bit of effort. First we need the mean and standard error of each group. We’ll also extract the levels() of the group variable.

se <- with(PlantGrowth,tapply(weight,group,sd))/sqrt(10)
     ctrl      trt1      trt2 
0.1843897 0.2509823 0.1399540

m <- with(PlantGrowth,tapply(weight,group,mean))
 ctrl  trt1  trt2 
5.032 4.661 5.526

group <- levels(PlantGrowth$group)
"ctrl" "trt1" "trt2"

Now we put them all together into a dataframe that ggplot2 can read for us. As a new trick let’s also store all of the stylistic stuff in the name of the ggplot object we make. This makes referencing it later very easy.

plants <- data.frame(weight,se,group)

base <- ggplot(plants,aes(x=group,y=weight,ymax=weight+se,ymin=weight-se)) +
	geom_errorbar(width=.3,size=.8) + 
	theme_bw() +
	theme(panel.grid.major = element_line(size = 1))

base + geom_bar(fill='black')

base + geom_point(size=6)

In our next post we’ll take a look at boxplots and violin plots again. How do they stack up with dotplots and barplots?

This entry was tagged . Bookmark the permalink.

Leave a comment