The Bechdel Index

The Bechdel Test is a powerful tool for talking about gender bias in fiction. It helps that the test is very simple. A work of fiction is taken and tested on the following question: Is there any point where women talk to each other about something other than men?

As it happens the test is rarely passed and the reverse usually is.

Various changes have been suggested to the original test (named characters only, no man in the room, no girly topics, etc) but they distract from the fundamental elegance of the test. It requires the work to have female characters with interests or motivations that do not center around a man. The test dodges tokenism and drives home the general primacy of male characters in fiction. In any given work it is a reasonable excuse to say that “They always talk about Jack because he’s the main character.” but when every movie has Jack as the main character something strange is going on.

Unfortunately the Bechdel test has a number of serious limitations. It has been noted, for instance, that movies often pass the test because of women talking about their weddings which is indirectly a conversation about a man. Others have criticized it for telling us very little about how progressive the work is in general. Lesbian porn can pass no matter how exploitative it is and Twelve Angry Men never had a chance purely because of when it was set.

However there is one issue that we can address mathematically. A lot of important information is not available about a movie that passes the test. The ease with which an author can pass the Bechdel test is part of what makes it so impactful but also makes it easy to game. It isn’t hard to toss in one token Bechdel conversation. As media becomes more and more aware the Bechdel will becomes less and less useful for talking about issues in media. Going forward what we really want is something that tells us about biased representation of groups in a much broader sense, a binary test in insufficient for that.

Before we begin I’d like to define a bit of language: “Two characters of a given group talk to each other about something other than groups or members of other groups in the same category.” I will call this the generic Bechdel and refer to variations on it in the same style. The original Bechdel test is the ‘female Bechdel’ in this post while a test that looks for Russian characters talking to each other is the ‘Russian Bechdel’.

A ratio is the easiest way to look at overall gender bias.

A simple Bechdel Ratio work like this. Count up all the conversations that pass the generic Bechdel test for men and for women, we’ll call this T for total. Pf is the number of conversations that pass the female Bechdel. Pm is the number that pass the male Bechdel.

Then the Bechdel Ratio is:

(Pm/T)/(Pf/T)

Say over the course of a movie there are 100 conversations.

Of these 30 pass the test generic test. Out of the thirty 3 pass the female Bechdel and 27 pass the male Bechdel.

The score is then (3/30)/(27/30) = 0.100/0.900 = 0.111

This way of doing things has some desirable properties. It is very simple to do and it very easy to interpret. An exactly balanced work scores 1 and a work that fails the Bechdel test scores a 0, while a work that has more conversations that pass the female than male would score greater than 1.

A more complex Bechdel Ratio can be made that compares representation within each group.

It works like this. Take every conversation that involves a woman and determine the proportion that pass the female Bechdel. Take every conversation that involves a man and determine the proportion that pass the male Bechdel.

The alternate Bechdel Ratio is then:

(Pf/Tf)/(Pm/Tm)

Pm and Pf are the same as they were before, the number of conversation that pass the male or female test respectively. Tm and Tf are the total number of conversations including men and the total number including women.

In our movie before let’s say exactly 70 conversations involve men and 50 involve women. This adds up to more than 100 conversations because some involve both men and women.

The score is then (3/50)/(27/70) = 0.060/0.386 = 0.155

Again, an exactly balanced work scores 1 and a work that fails the Bechdel test scores a 0.

This number is higher than the one before because we’re looking at how often the test is passed proportionally. By using this method we can make some useful distinctions. Imagine a different film. There are 100 conversation. 90 involve men. 30 involve women. The male Bechdel is passed 10 times. The female Bechdel is passed 5 times.

This movie scores a (5/30)/(10/90) = 0.166/0.900 = 1.500

Note that men spoke more than passed the male Bechdel more. What has happened is that men spend proportionally more of their conversations talking about women than women do talking about men. This probably means the movie is centered around a woman but has a largely male cast.

Unfortunately there are some issues with ratio methods. These test tests will skyrocket above 1 if the work is female biased. Theoretically we know that the best result in a vacuum is exactly 1 but I’d like to make use of the perception that higher is always better since it isn’t going to be trained out of the public. These tests are also undefined if the work completely fails the male Bechdel (rare but possible).

More broadly this method doesn’t work if we have more groups in order to do a race or sexuality Bechdel test, so we need a more complex summary statistic.

It happens that biologists have already done a lot of our work for us! Ecologists have been using diversity indexes since the 40s. There are a number of calculations we can use but Pielou’s J’ (jay prime) statistic calculates the evenness of groups in a population, which is something we’d like to know.

Although it involves an operation that most people aren’t familiar with it can be determined easily on an online calculator like Google or Wolfram|alpha so applying it isn’t restricted to people with a deep mathematical background. It favors neither group. It can quickly be converted to a racial or sexuality version.

The Pielou-Bechdel score for a work is:

-(((F*lnF)+(M*lnM))/0.693)

A perfectly even work scores a 1 and a perfectly biased work scores a 0.

Count up all the conversations that pass the generic Bechdel for men and for women then determine the proportion that pass the female Bechdel (F) and the proportion that pass the male Bechdel (M). You’ll notice that these terms are the same ones we used in the first type of Bechdel Ratio. However you choose to determine F and M they have to add up to exactly 1 or the output will be nonsense.

Our original movie scores a 0.469

For other versions you only have to add more groups and change the constant at the end so that it is the natural log of the number of groups. For three groups it is:

-(((A*lnA)+(B*lnB)+(C*lnC))/1.099)

Here is a short list of group sizes and the constant that goes with each.

  • 2 — 0.693
  • 3 — 1.099
  • 4 — 1.386
  • 5 — 1.609
  • 6 — 1.792
  • 7 — 1.946
  • 8 — 2.079
  • 9 — 2.197

Let’s apply this to something easy to work with. The Justice League of America was created by DC comics in the 1960s (The Brave and The Bold #28) and its roster has been updated continuously since then. Also, nerds keep track of exactly what those rosters were. We can look to see if there is any progression over time.

The original team was pretty biased, scoring a 0.592, saved from total failure only by Wonder Woman. I’m a big fan of the 1996 team (JLA #1, much of the cast is introduced later in the run) which consistently used Green Lantern, Aztek, Green Arrow, Oracle, Plastic Man, Steel, Zauriel, Wonder Woman, Big Barda, and Hourman. They do better with a score of 0.881. The most recent version of the team was created in 2012 as part of an event in the New 52 universe and the seven founding members score an impressive 0.986, as even as the cast can get (although male biased).

You can see here that the results are quite grainy when working with only a few data points.

Using a larger number of groups we can see other things. For example there was a Justice League International with a fairly large roster. We can evaluate how international it really was. If we divide the world into North America, South America, Europe, Asia, Africa, and Oceana (and ignore the aliens) then we get a Pielou’s J of 0.549, suggesting they did an extremely poor job. It is perhaps more amusing that the Justice League Europe series added a single European character to the team, one who was created specifically for the series. On the other hand a quarter of the team was female for a score of 0.811.

Finally we need a way to make sense of these unitless numbers. They’re fine for comparison but difficult to interpret on their own. Fortunately the numbers are not dependent on the number of groups a threshold of “pretty good” set for one kind of test is applicable to others.

The following are some proportional representations for two groups and their corresponding Pielou’s J’. For example: 1/20 would mean that 95% of male/female Bechdels passed are passed are male (or female) or alternately that 95% of the cast is male (or female) if simply determining evenness of representation in a cast.

  • 1/20 — 0.286
  • 1/10 — 0.469
  • 1/4 — 0.811
  • 1/3 — 0.920