Wilcoxon Signed-Rank Test

The Wilcoxon Rank Sum Test

Because the t-test is sensitive to violations of normality people have sought improvements on it for a long time. The most popular of these robust options is the Wilcoxon Rank Sum Test and the Wilcoxon Signed-Rank Test, the rank sum test is also known as the Mann-Whitney U Test.

In R we can use the Wilcoxon test with the wilcox.test() function.

The logic of the test is incredibly simple: If there is an difference between groups then one group will tend to have greater values than the other.

For the simplest case, with paired data, the signed-tank test can be used. We have two vectors, x and y, from which we will make a vector of differences called w.

x <- c(3,1,5,8,2)
y <- c(1,2,11,4,10)
w <- x-y

wilcox <- data.frame(x,y,w)
wilcox
  x  y  w
1 3  1  2
2 1  2 -1
3 5 11 -6
4 8  4  4
5 2 10 -8

Now, we will rank each element of w by its absolute value, so the largest difference is 8 on line five.

## Notice that the rank() function returns
## the ranks but doesn't sort the data.
rank(abs(w))
 2 1 4 3 5

wilcox$r <- rank(abs(w))
wilcox
  x  y  w r
1 3  1  2 2
2 1  2 -1 1
3 5 11 -6 4
4 8  4  4 3
5 2 10 -8 5

Next we sum up the ranks of where w is positive and a sum the ranks were w is negative. This results in two numbers W+ and W.

Wp <- 2+3
Wn <- 1+4+5

The question becomes, what is the probability of getting each of those numbers randomly? Answering this question is how we construct the distribution that the test uses.

There were the simplest non-trivial case is to imagine that there were three pairs of data points, which means there are eight possible permutations for the sums. The greatest possible value for W+ is 6 and smallest possible value is 0. The same is true of W. Let us consider a few cases.

How many ways can we get W+ = 6? Only one, if all of the differences are positive. The sum of the positive ranks would be 1+2+3 = 6.

How many ways can we get W+ = 5? Again, just one. The only arrangement that works for the smallest difference to be negative and the 2nd and 3rd positive because 2+3 = 5.

How many ways can we get W+ = 4? Yes, again, only one arrangement works. If the 1st and 3rd rank are positive.

How many ways can we get W+ = 3? There are two! Either the 1st and 2nd might be positive (1+2 = 3) or only the third rank might be positive (3 = 3).

For each the possible sums 2 through 0 is are again only one possible arrangement.

The distribution we end up with for this case is a somewhat silly looking probability mass:

comb <- 0:6
p <- dsignrank(comb,3)

plot(p~comb,
	type='h',ylim=c(0,max(p)),lwd=3,
	ylab='Probability',xlab='Sum',
	main='Signed Rank Distribution for N=3')

signedrank3

This can be generalized for an arbitrarily large number of pairs:

n <- 15
s <- sum(1:n)
comb <- 0:s

main <- paste0('Signed Rank Distribution for N=',n)

p <- dsignrank(comb,n)

plot(p~comb,
	type='h',ylim=c(0,max(p)),lwd=3,
	ylab='Probability',xlab='Sum',
	main=main)

signedrank15

Notice that for N = 15 the distribution is distinctly bell shaped. In fact it is approaching a normal distribution. This is fortunate for two reasons: Firstly it is exhausting to calculate the Wilcoxon distribution (even for a computer) and secondly the normal approximation performs somewhat better in cases where some values have the same rank since the exact value becomes slightly conservative.