Reporting Mean and SE

Did you know you can put ggplot2 code inside of a function? Even though the grammar is different it is still R code

Before you try it out make a script with the mean.se() function we made last time and save it with the rest of your R code under the name “meanse.R” so we can reference it whenever we want.

Let’s make a function that creates the kind of graphics we’ve been looking at this week. A violin plot and a shaded boxplot are good. I think a dotplot is also worth including, we’ll tell ggplot2 to zoom out to the range of the data so we can see the variability in a reasonable way. As great as graphics are it always responsible to report the exact numbers as well so we want the function to output the summary stats that we’re using as well.

Since this post is all about the function we’ll go through it in pieces then gather it all together at the end.

Our first line defines the function. It will be named report() and show the use where to put the variable column, the grouping column, and to define the type of image. Finally the use can decide what interval size they want.

report <- function(variable, groups, type="violin", int=1) {

To save a lot of typing through the rest of the function I rename variable and group to a and b.

a <- variable
b <- groups

The require() function is like the library() function but it checks if ggplot2 is already loaded. Saves a bit of time when running the function. Meanwhile the source() function will read the script we saved elsewhere.

require(ggplot2)
source("meanse.R")

Now we make two dataframes for ggplot2 to use later on. First we load the variable and the groups into their own dataframe then we use mean.se to extract the mean and standard error for us.

dat <- data.frame(a, b)
datse <- mean.se(a, b, int = int)

This is just a polite line of code that reminds the user they’re showing something other than one SE.

if(int != 1) {
	warning(paste("Showing and reporting",int,"standard errors"), .call = FALSE)
}

Like we’ve done in previous posts we’ll build a ggplot object that contains the base information we want. This way we don’t have to rewrite thd information.

p <- ggplot(data = dat, aes(x=b, y=a)) + 
	theme_bw() + 
	theme(panel.grid.major = element_line(size = 1)) +
	labs(x="Group", y="Variable")

Here we have a very ugly block of code that creates, but doesn’t print, a violin plot. We only want the object created so we can extract the range of the plot. Later on this will be used to help make the dotplot look much neater.

q <- p + geom_violin()
ran <- ggplot_build(q)$panel$ranges[[1]]$y.range

These three if() statements will run depending on what the user sets the ‘type’ argument to. Both the violin plot and boxplot should be familiar. For the dot plot we use the datse dataframe we made to define errorbars and a large dot at the mean.

if(type=="violin") {
	p <- p + geom_violin(size=1) +
		geom_pointrange(data = datse,
			aes(x=group, y=m, ymax=m+se, ymin=m-se),
			size=1)
}
	if(type=="box") {
	p <- p + geom_boxplot(size=1) +
		geom_crossbar(data = datse, 
			aes(x=group, y=m, ymin=m-se, ymax=m+se), 
			color=NA, fill="grey", alpha=.5, width=.75)
}
if(type=="dot") {
	p <- p + geom_errorbar(data = datse,
			aes(x=group, y=m, ymax=m+se, ymin=m-se),
			size=1, width=.3) +
		geom_point(data = datse,
			aes(x=group, y=m),
			size=4) +
		coord_cartesian(ylim=range(ran))
	}

Finally we get to print our graphics. Hooray! We used print() once before and recall that it forces the function to output something without ending the function.

print(p)

The last coding line of he function will be returned as the official output.

datse

Don’t forget to close the function or everything will go wrong.

}

Here’s the whole thing:

report &lt;- function(variable, groups, type=&quot;dot&quot;, int=1) {
	a &lt;- variable
	b &lt;- groups

	require(ggplot2)
	source(&quot;meanse.R&quot;)

	dat &lt;- data.frame(a,b)
	datse &lt;- mean.se(a,b,int=int)

	if(int != 1) {
		warning(paste(&quot;Showing and reporting&quot;,int,&quot;standard errors&quot;), .call=FALSE)
	}
	
	p &lt;- ggplot(data = dat, aes(x=b,y=a)) + 
			theme_bw() + 
			theme(panel.grid.major = element_line(size = 1)) +
			labs(x=&quot;Group&quot;,y=&quot;Variable&quot;)

	q &lt;- p + geom_violin()
	ran &lt;- ggplot_build(q)$panel$ranges[[1]]$y.range

	if(type==&quot;violin&quot;) {
		p &lt;- p + 
			geom_violin(size=1) +
			geom_pointrange(data = datse,
				aes(x=group,y=m,ymax=m+se,ymin=m-se),
				size=1)
	}

	if(type==&quot;box&quot;) {
	p &lt;- p + 
		geom_boxplot(size=1) +
		geom_crossbar(data = datse, 
			aes(x=group,y=m,ymin=m-se,ymax=m+se), 
			color=NA, fill=&quot;grey&quot;, alpha=.5, width=.75)
	}

	if(type==&quot;dot&quot;) {
		p &lt;- p + 
			geom_errorbar(data = datse,
				aes(x=group,y=m,ymax=m+se,ymin=m-se),
				size=1, width=.3) +
			geom_point(data = datse,
				aes(x=group,y=m),
				size=2.5) +
			coord_cartesian(ylim=range(ran))
	}
print(p)
datse
}

You can save this as its own script so you can source() it later.

Let's try it out for ourselves with the <em>iris</em> data. Let's make boxplot of the Sepal Widths and show the 95% confidence interval of the mean (two standard errors).


with(iris,report(Sepal.Width,Species,type='box',int=2))

irisbox

Leave a comment