Tables

Tables are probably the world’s oldest statistical technique. There are, however, a lot of different kinds of tables. A dataframe is a sort of table itself, of course. A contingency table (also known as a pivot table) shows relationships between two variables by counting up where they overlap.

Lets get some data to work with by using the read.table() function from base R to scrape it off of a source online. The first six variables are traits of the cars and the seventh is how the car was classified (how acceptable it was). For our own convenience lets order the factors as well rather than leave them in alphabetical order.

url <- "http://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data"
carstuff <- read.table(url, sep=",")
names(carstuff) <- c("buying", "maint", "doors", "persons", "lug.boot", "safety", "class")

carstuff$buying <- factor(carstuff$buying, levels=c("vhigh","high","med","low"), ordered=TRUE)
carstuff$maint <- factor(carstuff$maint, levels=c("vhigh","high","med","low"), ordered=TRUE)
carstuff$lug.boot <- factor(carstuff$lug.boot, levels=c("small","med","big"), ordered=TRUE)
carstuff$safety <- factor(carstuff$safety, levels=c("high","med","low"), ordered=TRUE)
carstuff$class <- factor(carstuff$class, levels=c("unacc","acc","good","vgood"), ordered=TRUE)

(The actual page starts with https but R won’t read that properly.)

Here’s a random sample of the dataframe that R made.

carstuff[sample(nrow(carstuff),5),]
     buying maint doors persons lug.boot safety class
500    high vhigh     4       4      med    med unacc
1439    low  high     3       2      big    med unacc
1215    med   low     2    more      big   high vgood
866     med vhigh     2       2    small    med unacc
549    high  high     2       2      big   high unacc

A contingency table showing the relationship between safety and class. This is one of the best ways to report categorical data. You can see immediately that if a car has “low” safety it is always classified as “unacc” (unacceptable). You can also see that the only way to get a “vgood” classification is to have a high safety rating.

t <- with(carstuff,table(safety,class))
t
      class
safety unacc acc good vgood
  high   277 204   30    65
  med    357 180   39     0
  low    576   0    0     0

Tables can be transformed in a variety of ways in R. The addmargins() function can put any sort of margin you want onto a table. The sum of each column and row is a common one.

addmargins(t, FUN=sum)
Margins computed over dimensions
in the following order:
1: safety
2: class
      class
safety unacc  acc good vgood  sum
  high   277  204   30    65  576
  med    357  180   39     0  576
  low    576    0    0     0  576
  sum   1210  384   69    65 1728

This reveals some other things about the data. For example this data is constructed so that everything except class is evenly split between the various levels.

A proportion table can be made with the prop.table() function to show us more explicitly how the data is distributed than the alternatives. The default setting shows us proportions across both dimensions but it can be set to go across just columns or just rows.
And then you can add margins to get more information about that! All of the pieces fit together.

prop.table(t)
      class
safety unacc   acc  good vgood
  high 0.160 0.118 0.017 0.038
  med  0.207 0.104 0.023 0.000
  low  0.333 0.000 0.000 0.000

Leave a comment