Introduction to R – Functions

We’re going to make a function of our own today, get some more practice with vectorization, and learn about making detailed plots.

Most of the code used in John Kruschke’s book “Doing Bayesian Data Analysis” is publicly available online and this program adapts his code for Running Average.
http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/Programs/RunningProportion.R

Our goal is to generalize his work and make it into a convenient function that we can call when we like.

Creating a function is only slightly different from other functions. The first line:

coinAvg <- function(N=500, p=.5, se=FALSE) {

The name of our new function will be coinAvg and it is a function with three arguments. Unlike in previous sections we get to make up our own arguments. N will be the number of times we flip the coin, p will be the probability of heads, and se will tell the program whether to show the standard error of the mean.

The curly bracket at the end of the that line of code opens up the new function. All of the things done by coinAvg are described inside these brackets. Fortunately, we get to use exactly the same R code we’ve already learned.

Our first line of code within the function introduces a new function called sample(). We tell it is take a sample of 0s and 1s. The probability of it picking 0 is 1-p and the probability of it picking 1 is p (by tradition Heads is represented by 1). The size of the sample is N, the argument we defined before, and we will sample with replacement.

By default this will result in picking 0 and 1 with equal probability 500 times.

flipsequence <- sample(x=c(0,1), prob=c(1-p, p), size=N, replace=TRUE)

Next we’re going to calculate the proportion of Heads that we’ve flipped. First we defined a vector that goes from 1 to N. Second we define a vector that uses cumsum() to add up the flips, notice that it will increase by 1 whenever we get Heads, and divide by n. This means we calculate the proportion of Heads after each flip of the coin.

n <- 1:N
runprop <- cumsum(flipsequence)/n

This is the end of the math! In just three lines we’ve done all the calculations. What remains is creating an image.

We’re going to use a lot of new arguments to the plot() function here but they’re mostly self explanatory.

plot(n, runprop, type="o", log="x",
	  xlim=c(1,N), ylim=c(0.0,1.0),
	  xlab="Number of Flips", ylab="Proportion of Heads", cex.lab=1.2,
	  main="Running Proportion of Heads", cex.main=1.2)

The x-axis is n and the y-xais is runprop, they’re not named because, as we’ve seen before, those are the default positions for x and y.
The type of plot is “o” which tells R to draw each point and connect it to the next with a line.
The argument log=”x” tells R that the x-axis should be drawn on a logarithmic scale that will let us see the first few values individually while still visualizing the whole thing.
The arguments xlim and ylim place limits on the axes so the plot stays on target.
The arguments xlab, ylab, and main give names to the various parts of the plot.
Finally cex.lab and cex.main define the size of the text.

We still have things to add!

A straight line that shows us where the proportion of heads should be. Setting the lty argument to 3 creates a dotted line.

abline(h=p, lty=3)

Some text that tells us exactly what proportion of heads we ended up with and exactly what the true probability was.

text(x=N, y=.9, paste("Final Proportion =", runprop[N]), adj=c(1,0), cex=1.2)
text(x=N, y=.8, paste("True Probability =", p), adj=c(1,0), cex=1.2)

Now remember how we defined an argument called se? Good, because we did. Now we create an if() function that will activate if we set se to TRUE. It calculates the standard error after each flip. After that it draws the upper and lower standard error lines.

if (se==TRUE) {
	err <- sqrt((p*(1-p))/n)
	herr <- runprop + err
	lerr <- runprop - err
	lines(herr,col='red')
	lines(lerr,col='red')
	}

Notice that we use curly brackets for if() function. The equation for standard error here is specific to things like flipping coins known as Bernoulli trials.

Finally we make a dataframe that holds the information we’ve been using. The invisible() function will prevent it from printing our unless we ask for it.

invisible(data.frame(flip=flipsequence, runprop=runprop))

Oh and finally we need a curly bracket to end the function!

}

So the whole thing looks like this

coinAvg <- function(N=500, p=.5, se=FALSE) {

flipsequence <- sample(x=c(0,1), prob=c(1-p, p), size=N, replace=TRUE)

n <- 1:N
runprop <- cumsum(flipsequence)/n

plot(n, runprop, type="o", log="x",
	  xlim=c(1,N), ylim=c(0.0,1.0),
	  xlab="Number of Flips", ylab="Proportion of Heads", cex.lab=1.2,
	  main="Running Proportion of Heads", cex.main=1.2)

abline(h=p, lty=3)

text(x=N, y=.9, paste("Final Proportion =", runprop[N]), adj=c(1,0), cex=1.2)
text(x=N, y=.8, paste("True Probability =", p), adj=c(1,0), cex=1.2)

if (se==TRUE) {
	err <- sqrt((p*(1-p))/n)
	herr <- runprop + err
	lerr <- runprop - err
	lines(herr,col='red')
	lines(lerr,col='red')
	}
}

Let’s try it out!

coinAvg()

coin

And now after changing the arguments:

coinAvg(N=1000, p=.65, se=TRUE)

coin

Running the function only takes a moment so you can do it a bunch of times to see the law of large numbers in action. The final proportion sometimes misses where it “should” be but after hundreds of flips it tends to be pretty close. At the same time the Standard Error, the precision of our estimate, shrinks. By playing around with the number of flips you can see that enormous numbers of flips add a relatively small amount of precision.

Completing this walkthrough of creating a simple program brings you to the end of the Absolute Introduction to R! Future posts will use R to talk about concepts in statistics and data science.