Prior Probabilities

What is a Prior Distribution?

Prior probabilities are a cornerstone of Bayesian statistics, a popular and powerful but still somewhat disputed mathematical tool. The analysis of how to think about lie detectors we made in the last post is a variation on the mammogram problem (or the prosecutor’s fallacy) that you can find explained in many places as part of introduction to Bayesian statistics. We divided the population of a dystopian Barcelona between murderers and innocents and declared that 25% of people were murderers. That was how we defined our prior probability distribution.

A prior probability distribution, which we’ll simply refer to as a prior, represents a measure of uncertainty about a parameter before you account for your data. The parameter is what you’re trying to find out about (average weight of a population) and the data is whatever information you just gathered (the average weight of sample of that population).

The simplest example of a prior distribution is a fair coin which has a 50% chance of landing on heads and a 50% chance of landing on tails. It represents a uniform prior where both options are equal. A fair die is also has a uniform prior because its many faces are all equally likely to be rolled. A loaded die or an unfair coin is more likely to give a certain result than another and has a differently shaped prior. Often people will talk about the “shape” of a prior (or of any probability function) because they are visualized as a graph.

Because that our Barcelona example was a thought experiment I could declare any number I wanted and have it be absolutely true. In reality we can rarely be so precise about priors. If our information was already perfect wouldn’t need to do statistics! Because of this there is legitimately a danger when using prior information that our assumptions about how likely things are could be totally wrong and lead us to inaccurate conclusions. For example: If my cyberpunk Barcelona were more like a real city and had only a fraction of a percent of people be murderers then our calculations would have been wrong by many orders of magnitude!

Of course, that is also a point in the favor of prior probabilities. They can have a huge effect on interpretation of the data so we really need to be aware of them. This is so useful that it would be foolish to abandon them simply on the basis that we might pick the wrong ones! After all, no one advocates picking priors at random. Furthermore there are certain pieces of information we can never get without knowing something about the prior distribution of the data. At a minimum being aware of the effects priors can have is important.

Because priors in most realistic systems are quite complicated and only effectively expressed in probability notation this post will look at priors only from a standpoint of the philosophy of statistics.

Types of Priors

There are two broad classes of priors: Informative and uninformative. Informative priors try to represent our current state of knowledge or belief about the world. Uninformative priors try to assume nothing about the world.

It is generally accepted that both informative and uninformative priors have their place. The debates usually center around when each type should be used and how using each type affects our interpretation of the results. In the sciences, for instance, many people advocate using only uninformative priors in order to reduce potential bias. In decision making informative priors are the norm.

What is an informative prior?

An informative prior is based on the idea that it is almost always know something about the question we are asking. Many people initially dislike the idea of informative priors because they notice that you can push your conclusions it whatever direction you want simply by picking the right prior. In the sciences this should be corrected by the publication process; you have to mention what your priors are and you have to justify them or people won’t care much about your results. In personal decision making, a popular use of Bayesian statistics outside of the sciences, we don’t need certainty, just a way to estimate our belief.

The argument in favor of informative priors is that it would be a mistake to disregard all of our previous work each time we do an experiment (or to ignore our previous experiences each time something happens to us). Unfortunately it isn’t obvious or widely agreed upon how to construct a prior based on all the previous information that exists. Nonetheless well know statisticians like Nate Silver and Andrew Gelman have shown that you can apply Bayesian methods to things like political forecasting and produce impressive results.

For example: It is rare for a given US state to undergo a sudden, massive, demographic shifts. This stability is part of what underlies Nate Silver’s startlingly accurate predictions of elections. His predictions tend to wobble while the individual polls he draws from change considerably more from month to month.

For personal priors there is the issue that it can be very difficult to say exactly what we believe. Are you 40% sure that you’re right or is it more like 32%? Simon-Pierre Laplace (a famous French mathematician) found that one could use wagering and betting to estimate personal belief and often couched his results in gambler’s terms. What to do with this knowledge once we have it is much less clear. Are you actually going to act differently now that you know there’s a 45% chance of you being right rather than a 40% chance? Despite the granularity of the results I suspect people will, consciously or not, construct arbitrary thresholds that are “good enough” or “not good enough” and only change their beliefs or behaviors when they are passed.

Proponents of informative priors often emphasize the fact that when you use them extraordinary claims require extraordinary evidence. Critics prefer to emphasize that this means you are encouraged not to change your mind.

What is an uninformative prior?

Uninformative priors only involve specifying things that are “objective” which means that they might have some information (hence they’re sometimes called minimally informative) but they always less than an informative prior. For example: If you were constructing an objective prior for human weights you would have a prior that doesn’t go below zero since weights less than zero are objectively impossible. If you want to make extremely low and extremely high weights less likely then you are edging toward an informative prior.

Now there are criticisms here, too. While an uninformative prior reduces the risk of certain kinds of biases there can be more than one uninformative prior that could be applied to a given situation some of which are completely inappropriate and some of which will give different answers. Knowing how to pick the correct prior is very important is a place where critics from the informative side say much valued “objectivity” is lost. We also would not wish to deceive ourselves into thinking our results represent one thing when they really represent another.

Other critics point out that using an uninformative  prior eliminates or reduces some of the benefits of Bayesian statistics. Perhaps this is worth it to avoid producing results that might unreasonably favor one answer or perhaps it is not.

Uninformative priors have been advocated by many people over the years from Thomas Bayes himself through Edwin Jaynes and others in the 20th and 21st centuries. The subtle differences between uninformative priors and how to construct them for more complex problems has been a major area of research. Some of this seems to have been in hopes of making Bayesian methods more palatable to scientific philosophy.

Final Thoughts

There are many reasons that define whether one should use an informative or uninformative prior. Sometimes, objections of subjective Bayesians aside, you really don’t have any information at all. Other times the previous data is very clear and an uninformative prior is clearly foolish no matter what philosophical standpoint you might want to take. Reasons can also be more politic.

For example: When investigating a contentious claim like the existence of the supernatural it is reasonable for an skeptic to enter with a very low prior belief in the supernatural. However, in order to be politic you still might have to start with an uninformative prior because believers will (and should, in my opinion) object to their side being placed at a disadvantage no matter what the data shows.

In our next post we will look at an example of Bayesian statistics that is slightly more complex than a traditional binary example and note one simple mathematical criticism of this methodology.