More With Tables

Tables can also be used to support graphics in R. A traditional bar plot, showing the total count for each variable, is most easily made by plotting a one dimensional table.

Using the baseball player data from before:

barplot(table(tab$position))

positioncount

If you make a plot of a table that has two or more dimensions you get a mosaic plot, which is very difficult to read.

Tables can also be made multidimensional with the ftable() function, short for “flat table”. Like addmargins() and prop.table() this function only works when applied to an existing table. Here is a three dimensional flat table using the carstuff data.

t <- with(carstuff,(table(safety,class,buying)))
ftable(t)
             buying vhigh high med low
safety class                          
high   unacc           98   75  52  52
       acc             46   69  56  33
       good             0    0  10  20
       vgood            0    0  26  39
med    unacc          118  105  72  62
       acc             26   39  59  56
       good             0    0  13  26
       vgood            0    0   0   0
low    unacc          144  144 144 144
       acc              0    0   0   0
       good             0    0   0   0
       vgood            0    0   0   0

Without using ftable() you still get a similar output from the table, but in a form that’s extremely difficult to read.

Tables

Tables are probably the world’s oldest statistical technique. There are, however, a lot of different kinds of tables. A dataframe is a sort of table itself, of course. A contingency table (also known as a pivot table) shows relationships between two variables by counting up where they overlap.

Lets get some data to work with by using the read.table() function from base R to scrape it off of a source online. The first six variables are traits of the cars and the seventh is how the car was classified (how acceptable it was). For our own convenience lets order the factors as well rather than leave them in alphabetical order.

url <- "http://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data"
carstuff <- read.table(url, sep=",")
names(carstuff) <- c("buying", "maint", "doors", "persons", "lug.boot", "safety", "class")

carstuff$buying <- factor(carstuff$buying, levels=c("vhigh","high","med","low"), ordered=TRUE)
carstuff$maint <- factor(carstuff$maint, levels=c("vhigh","high","med","low"), ordered=TRUE)
carstuff$lug.boot <- factor(carstuff$lug.boot, levels=c("small","med","big"), ordered=TRUE)
carstuff$safety <- factor(carstuff$safety, levels=c("high","med","low"), ordered=TRUE)
carstuff$class <- factor(carstuff$class, levels=c("unacc","acc","good","vgood"), ordered=TRUE)

(The actual page starts with https but R won’t read that properly.)

Here’s a random sample of the dataframe that R made.

carstuff[sample(nrow(carstuff),5),]
     buying maint doors persons lug.boot safety class
500    high vhigh     4       4      med    med unacc
1439    low  high     3       2      big    med unacc
1215    med   low     2    more      big   high vgood
866     med vhigh     2       2    small    med unacc
549    high  high     2       2      big   high unacc

A contingency table showing the relationship between safety and class. This is one of the best ways to report categorical data. You can see immediately that if a car has “low” safety it is always classified as “unacc” (unacceptable). You can also see that the only way to get a “vgood” classification is to have a high safety rating.

t <- with(carstuff,table(safety,class))
t
      class
safety unacc acc good vgood
  high   277 204   30    65
  med    357 180   39     0
  low    576   0    0     0

Tables can be transformed in a variety of ways in R. The addmargins() function can put any sort of margin you want onto a table. The sum of each column and row is a common one.

addmargins(t, FUN=sum)
Margins computed over dimensions
in the following order:
1: safety
2: class
      class
safety unacc  acc good vgood  sum
  high   277  204   30    65  576
  med    357  180   39     0  576
  low    576    0    0     0  576
  sum   1210  384   69    65 1728

This reveals some other things about the data. For example this data is constructed so that everything except class is evenly split between the various levels.

A proportion table can be made with the prop.table() function to show us more explicitly how the data is distributed than the alternatives. The default setting shows us proportions across both dimensions but it can be set to go across just columns or just rows.
And then you can add margins to get more information about that! All of the pieces fit together.

prop.table(t)
      class
safety unacc   acc  good vgood
  high 0.160 0.118 0.017 0.038
  med  0.207 0.104 0.023 0.000
  low  0.333 0.000 0.000 0.000