Tables are probably the world’s oldest statistical technique. There are, however, a lot of different kinds of tables. A dataframe is a sort of table itself, of course. A contingency table (also known as a pivot table) shows relationships between two variables by counting up where they overlap.
Lets get some data to work with by using the read.table() function from base R to scrape it off of a source online. The first six variables are traits of the cars and the seventh is how the car was classified (how acceptable it was). For our own convenience lets order the factors as well rather than leave them in alphabetical order.
url <- "http://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data"
carstuff <- read.table(url, sep=",")
names(carstuff) <- c("buying", "maint", "doors", "persons", "lug.boot", "safety", "class")
carstuff$buying <- factor(carstuff$buying, levels=c("vhigh","high","med","low"), ordered=TRUE)
carstuff$maint <- factor(carstuff$maint, levels=c("vhigh","high","med","low"), ordered=TRUE)
carstuff$lug.boot <- factor(carstuff$lug.boot, levels=c("small","med","big"), ordered=TRUE)
carstuff$safety <- factor(carstuff$safety, levels=c("high","med","low"), ordered=TRUE)
carstuff$class <- factor(carstuff$class, levels=c("unacc","acc","good","vgood"), ordered=TRUE)
(The actual page starts with https but R won’t read that properly.)
Here’s a random sample of the dataframe that R made.
carstuff[sample(nrow(carstuff),5),]
buying maint doors persons lug.boot safety class
500 high vhigh 4 4 med med unacc
1439 low high 3 2 big med unacc
1215 med low 2 more big high vgood
866 med vhigh 2 2 small med unacc
549 high high 2 2 big high unacc
A contingency table showing the relationship between safety and class. This is one of the best ways to report categorical data. You can see immediately that if a car has “low” safety it is always classified as “unacc” (unacceptable). You can also see that the only way to get a “vgood” classification is to have a high safety rating.
t <- with(carstuff,table(safety,class))
t
class
safety unacc acc good vgood
high 277 204 30 65
med 357 180 39 0
low 576 0 0 0
Tables can be transformed in a variety of ways in R. The addmargins() function can put any sort of margin you want onto a table. The sum of each column and row is a common one.
addmargins(t, FUN=sum)
Margins computed over dimensions
in the following order:
1: safety
2: class
class
safety unacc acc good vgood sum
high 277 204 30 65 576
med 357 180 39 0 576
low 576 0 0 0 576
sum 1210 384 69 65 1728
This reveals some other things about the data. For example this data is constructed so that everything except class is evenly split between the various levels.
A proportion table can be made with the prop.table() function to show us more explicitly how the data is distributed than the alternatives. The default setting shows us proportions across both dimensions but it can be set to go across just columns or just rows.
And then you can add margins to get more information about that! All of the pieces fit together.
prop.table(t)
class
safety unacc acc good vgood
high 0.160 0.118 0.017 0.038
med 0.207 0.104 0.023 0.000
low 0.333 0.000 0.000 0.000