## Conditional probability is hard -- but teaching it *shouldn't* be!

So, consider these two problems:

**A. Which is more difficult?**

**B. Which is it easier to teach someone to do correctly?**

My answers: BAYES is more difficult but *also* easier to each someone to do correctly.

Does that seem plausible to you? I won't be surprised if you say no, particularly if your answer reflects experience in seeing how poorly people do with conditional probability problems.

But if you disagree with me, I do want to challenge your sense of what the problem is

Okay, so here are some data.

For sure, BAYES is harder. In a diverse sample of 1,000 adults (over half of whom had either a four-year college or post-graduate degree), *only 3%* got the correct answer (50%). For COVARY, 55% percent got the correct answer (“patients administered the new treatment were** not** **more likely** to survive”).

This is not surprising. BAYES involves conditional probability, a concept that most people find very counterintuitive. There is a strong tendency to treat the accuracy rate of the witness’s color discernment-- 90% -- as the likelihood that the bus is blue.

That was the modal answer—one supplied by 34% of the respondents—within the sample here. This response ignores information about the base rate of blue versus green buses.

Another 23% picked 90%--the base rate frequency of blue buses. They thus ignored the additional information associated with the witness’s perception of the color of the bus.

How to combine the base rate information with the accuracy of the witness’s perception of color (or their equivalent in other problems that involve the same general type of reasoning task) is reflected in Bayes’s Theorem, a set of logical operations that most people find utterly baffling.

COVARY is a standard “covariance detection” problem. It’s not as hard as BAYES, but it’s still pretty difficult!

Many people (usually most; this fairly well educated sample did better than a representative sample would) use one of two heuristics to analyze a problem that has the formal characteristics of this one (Arkes & Harkness 1983). The first, and most common, simply involves comparing the number of “survivors” to the number of “nonsurvivors” in the treatment condition. The second involves comparing in addition the number of survivors in the treatment and the number of survivors in the control.

Both of these approaches generate the wrong answer—that patients given the new treatment were *more* likely to survive than those who didn’t receive it—for the data generated in this hypothetical experiment.

What’s important is the *ratio* of survivors to nonsurvivors in the two experimental groups. In the group whose members received the treatment, patients were about three times more likely to survive (223:75 = 2.97:1). In the untreated group, however, parents were just over *five* times more likely to survive (107:21 = 5.10:1).

Pretty much anyone who got the wrong answer can see why the correct one is right once the difference in the “likelihood ratios” (which is actually an important common element in conditional probability and covariance problems) is pointed out.

The math is pretty tame (a fifth grader should be able to handle it), and the inferential logic (the essence of the sort of causal inference strategy that informs controlled experimentation) pretty much explains itself.

The reason such a significant number of people get the answer wrong is that they don’t reliably *recognize* that they have to compare the ratios of positive to negative outcomes. They effectively succumb to the temptation to settle for “hypothesis-confirming” evidence without probing for the disconfirming evidence that one can extract only by making use of all the available information in the 2x2 contingency table.

Now, why do *I* feel that it is nevertheless easier to teach people how to solve conditional probability problems of the sort reflected in BAYES than to teach them how to reliably solve covariance-detection ones of the sort reflected in COVARY?

There are a couple of related reasons.

The first is that doing conditional probability problem is actually *easy* once one grasps why the base rate matters—and enabling someone to grasp that turns out to be super easy too with the right pedagogical techniques.

The most important of these is to illustrate how a conditional probability problem can be conceived of as a population-sampling one (Spiegelhalter, Pearson & Short 2011).

In BAYES, we are told that 90% of the buses that could have struck Bill are green, and 10% of them are blue.

Accordingly, if we imagine a simulation in which Bill was hit by 100 city buses drawn at random, we’d expect him to be run down by a green bus 90 times and a blue one 10 times.

If we add Wally to the simulation, we’ll expect him correctly to perceive *81* or 90% of the 90 green buses that struck Bill to be green and incorrectly perceive *9* (10%) of them to be blue.

Likewise, we’ll expect him to correctly perceive *9* of the 10 blue buses (90%) that hit Bill to be blue, but incorrectly perceive *1* of them (10%) to be green.

Overall, then, in 100 trials, Wally will perceive Bill to have been hit 18 times by a blue bus. Nine of those will be cases in which Wally *correctly* perceived a blue bus to be blue. But nine will be cases in which Wally *incorrectly* perceived as blue a bus that was in fact green.

Because in our 100-trial simulation, the number of times Wally was *correct* when he identified the bus that hit Bill as blue is exactly equal to the number of times he was *incorrect*, Bill will have been hit by a blue bus 50% of the time and by a green one 50% of time in all the cases in which Wally perceives he was hit by a blue bus.

This “natural frequency” strategy for analyzing conditional probability problems has been shown to be an effective pedagogical tool in experimental studies (Sedlmeier & Gigerenzer 2001; Kurzenhäuser & Hoffrage 2002; Wheaton & Deshmuk 2009).

After using it to help someone grasp the conceptual logic of conditional probability, one can also connect the steps involved to a *very straightforward* rendering of Bayes’s Theorem: *prior odds x likelihood ratio = revised (posterior) odds*.

In this rendering, the base rate is represented in terms of the *odds* that a particular proposition or hypothesis is true: here, independently of Wally’s observation, we’d compute the odds that the bus that struck Bill was green 10:90 (“10 in 100”) or 1:9.

The new information or evidence is represented as a *likelihood ratio*, which reflects how much more consistent that evidence is with the hypothesis or proposition in question being true than with its negation (or some alternative hypothesis) being true.

Wally’s is able correctly to distinguish blue from green 90% of the time.

So if the bus that struck Bill was in fact blue, we’d expect Wall to perceive it as blue** 9** times out of 10, whereas if the bus that struck Bill was in fact green, we’d expect Wally to perceive it as blue only

*time out of 10.*

**1**Because Wally is nine times (90% vs. 10%) more likely to perceive a bus was “blue” when it was truly blue than when it was green, the likelihood ratio is *9*.

“Multiplying” the prior odds by the likelihood ratio involves multiplying the element of the odds expression that corresponds to the hypothesis by the likelihood ratio value.

Here the prior odds were 1:9 that the bus that struck Bill was blue. Nine (likelihood ratio) times one (from 1:9) equals *9*.

The revised odds that the bus that struck bill was blue is thus 1:9 x 9 = 9:9 or 1:1, which is equivalent to 50%.

I’m not saying that one exposure to this sort of exercise will be sufficient to reliably program someone to do conditional probability problems.

But I *am* saying that students of even middling levels of numeracy can be expected over the course of a reasonable number of repetitions to develop a reliable facility with conditional probability. The “natural frequencies” representation of the elements of the problem *makes sense*, and students can *see* which parts of that conceptualization map onto the “prior odds x. likelihood ratio = revised odds” rendering of Bayes’s theorem and why.

If you want to make it even easier for this sort of lesson to take hold, & related hardwiring to settle in, give your students this cool Bayes's calculator.

Students *can’t *be expected, in contrast, to see why any of the other more complex but logically equivalent rendering of Bayes’s Theorem actually makes sense. They thus can't be expected to retain them, to become adept at heuristically deploying them, or to experience the sort of improvement in discernment and reasoning that occurs as one assimilates statistical concepts.

Teachers who try to get students to learn to apply these formalisms, then, are doing a shitty job!

Now what about covariance?

Actually, there’s really nothing to it from an instructional point of view. It explains itself, as I said.

But that’s exactly the problem: facility with it is *not* a matter of learning how to do any particular thing.

Rather it is a matter of reliably *recognizing* when one is dealing with a problem in which the sort of steps necessary to detect covariance have to be done.

The typical reaction when it's pointed out to someone that he or she got the covariance problem wrong is an instant recognition of the mistake, and the sense that his or her error was a result of an uncharacteristic lapse or even a “trick” on the part of the examiner.

But in fact, in order to make reliable causal inferences based on observation in their everyday life, people will constantly be required to detect covariance. If they are unable to see the need for, or just lack the motivation to perform, the necessary operations even when all the essential information has been pre-packaged *for them* into a 2x2 contingency table, then the likelihood that they will lapse into the defective heuristic alternative when they encounter covariance-detection problems in the wild is very high (Stanovich 2009).

How likely someone is to get the right answer in the covariance problem is associated with their *numeracy*. The standard numeracy scale (e.g., Peters et al. 2006) is a measure not so much of math skill as of the capacity to reliable recognize when a quantitative reasoning problem *requires* one or another type of effortful analysis akin to what's involved in detecting covariance.

Frankly, I’m pessimistic that I can instill that sort of capacity in students. That's *not* because I have a modest sense of my abilities as a teacher. It’s because I have due respect for the difficulty that *many* indisputably great researchers and teachers have encountered in trying to come up with pedagogical techniques that are as successful in imparting critical reasoning dispositions in students as the “natural frequencies” strategy is for imparting a reliable facility in them to do conditional probability problems.

Of course, in order for students to successfully use the “natural frequencies” strategy and—after they become comfortable with it—the *prior odds x likelihood ratio = revised odds* rendering of Bayes theorem, they must reliably *recognize* conditional probability problems when they see them.

But in my experience, at least, that’s not a big deal; when a conditional probability makes its appearance, one is about as likely to overlook it as one is to fail to notice that a mother black bear & cub or a snarling honey badger has appeared along side the trail during a hike in the woods.

Which then leads me to the question, *how can it be that only 3%* *of a sample as well educated and intelligent * as the one I tested can get do a conditional probability problem as simple as the one I put in this battery?

Doesn't that mean that too many math teachers are failing to *use* the empirical knowledge that has been developed by great education researchers & teachers?

Or am I (once again; it happens!) missing something?

**References **

Arkes, H.R. & Harkness, A.R. Estimates of Contingency Between Two Dichotomous Variables. J. Experiminal Psychol. 112, 117-135 (1983).

*Medical Teacher*

**24**, 516-521 (2002).

Peters, E., Västfjäll, D., Slovic, P., Mertz, C.K., Mazzocco, K. & Dickert, S. Numeracy and Decision Making. *Psychol Sci ***17**, 407-413 (2006).

Sedlmeier, P. & Gigerenzer, G. Teaching Bayesian reasoning in less than two hours. *Journal of Experimental Psychology: General ***130**, 380-400 (2001).

Spiegelhalter, D., Pearson, M. & Short, I. Visualizing Uncertainty About the Future. *Science ***333**, 1393-1400 (2011).

Stanovich, K.E. *What intelligence tests miss : the psychology of rational thought* (Yale University Press, New Haven, 2009).

Wheaton, K.J., Lee, J. & Deshmukh, H. Teaching Bayesian Statistics To Intelligence Analysts: Lessons Learned. *J. Strategic Sec. ***2**, 39-58 (2009).

doh! [inevitable!]

I suspect math typos of this sort do indeed constrain student learning of how to do conditional probability! But if that defect in instruction is the only thing that gets fixed, I'm estimating no more than, say, 6% of population will ever get right answer to "Blue Bus" problem.