follow CCP

Recent blog entries

"I was wrong?! Coooooooooool!"

Okay—now here’s a model for everyone who aspires to cultivate the virtues that signify a genuine scholarly disposition.

As discussed previously (here & here), a pair of economists have generated quite a bit of agitation and excitement by exposing an apparent flaw in the methods of the classic “hot hand fallacy” studies.

 These studies purported to show that, contrary to popular understanding not only among sports fans but among professional athletes and coaches, professional basketball players do not experience “hot streaks,” or periods of above-average performance longer in duration than one would expect to see by chance.  The papers in questions have for thirty years enjoyed canonical status in the field of decision science research as illustrations of the inferential perils associated with the propensity of human beings to look for and see patterns in independent events.

Actually, the reality of that form of cognitive misadventure isn’t genuinely in dispute.  People are way too quick to discern signal in noise.

But what is open to doubt now is whether the researchers  used the right analytical strategy in testing whether this mental foible is the source of the widespread impression that professional basketball players experience "hot hands."

I won’t rehearse the details—in part to avoid the amusingly embarrassing spectacle of trying to make intuitively graspable a proof that stubbornly assaults the intuitions of highly numerate persons in particular—but the nub of the  proof supplied by the challenging researchers, Joshua Miller & Adam Sanjurjo, is that the earlier researchers mistakenly treated “hit” and “missed” shots as recorded in a previous, finite sequence of shots as if they were independent. In fact, because the proportion of “hits” and “misses” in a past sequence is fixed, strings of “hits” should reduce the likelihood of subsequent “hits” in the remainder of the sequence. Not taking this feature of sampling without replacement into account caused the original “hot hand fallacy” researchers to miscalculate the “null" in a manner that overstated the chance probability that a player would hit another shot after a specified string of hits....

Bottom line is that the data in the earlier studies didn’t convincingly rule out the possibility that basketball players’ performances did indeed display the sort of “streakiness” that defies chance expectations and supports the “hot hand” conjecture.

But in any case . . . the point of this update is to call attention to the truly admirable and inspiring reaction of the original researchers to the news that their result had been called into question in this way.

As I said, the “hot hand fallacy” studies are true classics. One could understand if those who had authored such studies would react defensively (many others who have been party to celebrating the studies for the last 30 yrs understandably have!) to the suggestion that the studies reflect a methodological flaw, one that itself seems to reflect the mischief of an irresistible but wrong intuition about how to distinguish random from systematic variations in data.

But instead, the reaction of the lead researcher to the M&S result, Tom Gilovich, is: “Coooool!!!!!!!!”

“Unlike a lot of stuff that’s come down the pike since 1985,” Gilovich was quoted as saying in a Wed. Wall Street Journal piece,

this is truly interesting,” Gilovich said. “What they discovered is correct.” Whether the real effect is “so small that the original conclusion stands or needs to be modified,” he said, “is what needs to be determined. Whether the real effect is “so small that the original conclusion stands or needs to be modified,” he said, “is what needs to be determined.”

The article goes on to report that Gilovich, along with others, is now himself contemplating re-analyses and new experiments to try to do exactly that.

In a word, Gilovich, far from have his nose bent out of joint by the M&S finding, is excited that aruly unexpected development is now furnishing him and others with a chance to resume investigation of an interesting and complex question.

I bet, too, that at least part of what intrigues Gilovich is how a mistake like this could have evaded the attention to decision scientists for this long –-and why even now the modal reaction among readers of the M&S paper is “BS!!” It takes about 45.3 (± 7) readings to really believe M&S’s proof, and even then the process has to be repeated at weekly intervals for a period of two months before the point they are making itself starts to seem intuitive enough to have the ring of truth.

But the point is, Gilovich, whose standing as a preeminent researcher is not diminished one iota by this surprising turn in the scholarly discussion his work initiated, has now enriched us even more by furnishing us with a compelling and inspiring example of the mindset of a real scholar!

Whatever embarrassment he might have been expected to experience (none is warranted in my view, nor evident in the WSJ article), is dwarfed by his genuine intellectual excitement over a development that is truly cool & interesting—both for what it teaches us about a particular problem in probability and for the opportunity it furnishes to extent examination into human psychology (here, the distinctive vulnerability to error that likely is itself unique to people with intuitions fine-tuned to avoid making the mistakes that intuitions characteristically give rise to when people try to make sense of randomness).

I’m going to try to reciprocate the benefit of the modeling of scholarly virtue Gilovich is displaying by owning up to, and getting excited about, as many mistakes in my own previous work as I can find! 



Why do we seem to agree less & less as we learn more & more-- and what should we do about that?

from correspondence ...

Dear Prof Kahan,
I’m working on an article describing how our ideologies skew our ability to deal with the facts, no matter how true/scientifically sound they are. While researching this, I (obviously ;) landed upon your research. I’ve been eagerly reading papers and posts on Cultural Cognition –site, but  there are couple of things I’m still unsure of. Namely:
1 How does cultural cognition differ from motivated reasoning? Or is the latter included in the former; thus motivated reasoning is merely cultural cognition ”in action”?
An account here
Also see this 
2 Are smart people more prone to twist given facts so that they fit into their existing beliefs/values? Or are intelligent persons just moreskillful in this process...?  
I think latter.  That is, I don't think the reason various critical reasoning proficiencies magnfiy cultural cognition is that they are correlated with a greater stake or unconscious motivation to form identity-protective beliefs;  individuals who are better than average in critical reasoning aren't more partisan or intensely partisan when one measures those things in them. I think they are just better at doing what people naturally do with information that helps them to form "beliefs" that express who they are.  Our motivated numeracy paper is in line w/ that interpretation.
3 Is motivated reasoning unconsious reaction? Do we know we do it? Does everybody do it, even the ones who try not to?
That's the theory, & I believe the evidence supports it; well-designed experiments have for sure connected motivated-reasoning dynamics to unconscious processes.
Knowing doesn't seem to help, no. One can't "observe" the effect of the dynamic in oneself, much less control it. I'm sure, though, that one can behave in ways that anticipate the effect -- trying to manage the conditions under which one examines information, & also being conscious when an issue is of the sort about which one's beliefs might well have been influenced in this way & taking that into account in acting 
4 If motivated reasoning is unconsious (= automatic), how on earth do we stop it? Can we?
I have to confess this whole phenomena bothers me to the bone, both as a human being and (especially) as a science journalist. How can we, how can anyone promote rational ideas or actions or work towards the kind of society s/he thinks is worthwhile, if s/he doesn’t first know how thing are, thus is able to take in the facts?
The only grounds any of us ever has for confidence in our perception of what is in fact known to science is the reliability of the faculties we use to recognize who knows what about what.  Those faculties are vulnerable to disruption by one or another form of social pathology.  We can attend to those pathologies; we all have an interest in that no matter what our cultural worldviews or our positions on particular issues. 
I would appreciate enormously, if you found a minute answering me.
With kind regards,
By enabling free and reasoning people to understand what science can teach us about how members of a pluralistic liberal democratic society come to know the vast amount of scientific knowledge that their way of life makes possible, you are a critical part of the solution. Thanks, & good luck w/ your story. 




Am I doing the right thing? . . . The “chick-sexing” disanalogy

Okay, here’s a set of reflections that seem topical as another school year begins.

The reflections can be structured with reference to a question:

What’s the difference between a lawyer and a chick sexer?

It’s not easy, at first, to figure out what they have in common.  But once one does, the risk that one won’t see what distinguishes them is much bigger, in actuarial and consequential terms.

I tell people about the link between them all the time—and they chuckle.  But in fact, I spend hours and hours and hours per semester eviscerating comprehension of the critical distinction between them in people who are filled with immense intelligence and ambition, and who are destined to occupy positions of authority in our society.

That fucking scares me.

Anyway, the chick sexer is the honey badger of cognitive psychology: relentlessly fascinating, and adorable. But because cognitive psychology doesn’t have nearly as big a presence on Youtube as do amusing voice-overs of National Geographic wildlife videos, the chick sexer is a lot less famous. 

So likely you haven’t heard of him or her.

But in fact the chick sexer plays a vital role in the poultry industry. It’s his or her responsibility to separate the baby chicks, moments after birth, on the basis of gender.

The females are more valuable, at least from the point of view of the industry. They lay eggs.  They are also plumper and juicier, if one wants to eat them. Moreover, the stringy scrawny males, in addition to being not good for much, are ill-tempered & peck at the females, steal their food, & otherwise torment them.

So the poultry industry basically just gets rid of the males (or the vast majority of them; a few are kept on and lead a privileged existence) at soonest opportunity—minutes after birth.

The little newborn hatchlings come flying (not literally; chickens can’t fly at any age) down a roomful of conveyor belts, 100’s per minute. Each belt is manned (personed) by a chick sexer, who deftly plucks (as in grabs; no feathers at this point) each chick off the belt, quickly turns him/her over, and in a split second determines the creature’s gender, tossing the males over his or her shoulder into a “disposal bin” and gently setting the females back down to proceed on their way.

They do this unerringly—or almost unerringly (99.99% accuracy or whatever).

Which is astonishing. Because there’s no discernable difference, or at least one that anyone can confidently articulate, in the relevant anatomical portions of the minutes-old chicks.

You can ask the chick sexer how he or she can tell the difference.  Many will tell you some story about how a bead of sweat forms involuntarily on the male chick beak, or how he tries to distract you by asking for the time of day or for a cigarette, or how the female will hold one’s gaze for a moment longer or whatever. 

This is all bull/chickenshit. Or technically speaking, “confabulation.”

Indeed, the more self-aware and honest members of the profession just shrug their shoulders when asked what it is that they are looking for when they turn the newborn chicks upside down & splay their little legs.

But while we don’t know what exactly chicksexers are seeing, we do know how they come to possess their proficiency in distinguishing male from female chicks: by being trained by a chick-sexing grandmaster.

For hours a day, for weeks on end, the grandmaster drills the aspiring chick sexers with slides—“male,” “female,” “male,” “male,” “female,” “male,” “female,” “female”—until they finally acquire the same power of discernment as the grandmaster, who likewise is unable to give a genuine account of what that skill consists in.

This is a true story (essentially).

But the perceptive feat that the chick sexer is performing isn’t particularly exotic.  In fact, it is ubiquitous.

What the chick sexer does to discern the gender of chicks is an instance of pattern recognition.

Pattern recognition is a cognitive operation in which we classify a phenomenon by rapidly appraising it in comparison to large stock of prototypes acquired by experience.

The classification isn’t made via conscious deduction from a set of necessary and sufficient conditions but rather tacitly, via a form of perception that is calibrated to detect whether the object possesses a sufficient number of the prototypical attributes—as determined by a gestalt, “critical mass” intuition—to count as an instance of it.

All manner of social competence—from recognizing faces to reading others emotions—depend on pattern recognition.

But so do many do specialized ones. What distinguishes a chess grandmaster from a modestly skilled amature player isn’t her capacity to conjure and evaluate a longer sequence of potential moves but rather her ability to recognize favorable board positions based on their affinity to a large stock of ones she has determined by experience to be advantageous.

Professional judgment, too, depends on pattern recognition.

For sure, being a good physician requires the capacity and willingness to engage in conscious and unbiased weighing of evidence diagnostic of medical conditions. But that’s not sufficient; unless the doctor includes only genuinely plausible illnesses in her set of maladies worthy of such investigation, the likelihood that she will either fail to test for the correct, one fail to identify it soon enough to intervene effective, will be too low.

Expert forensic auditors must master more than the technical details of accounting; they must acquire a properly calibrated capacity to recognize the pattern of financial irregularity that helps them to extract evidence of the same from mountains of business records.

The sort of professional judgment one needs to be a competent lawyer depends on a properly calibrated capacity for pattern recognition, too.

Indeed, this was the key insight of Karl Llewellyn.  The most brilliant member of the Legal Realist school, Llewellyn observed that legal reasoning couldn’t plausibly be reduced to deductive application of legal doctrines. Only rarely were outcomes uniquely determined by the relevant set of formal legal materials (statutes, precedents, legal maxims, and the like).

Nevertheless, judges and lawyers, he noted, rarely disagree on how particular cases should be resolved. How this could be fascinated him!

The solution he proposed was professional “situation sense”: a perceptive faculty, acquired by education and experience, that enabled lawyers to reliably appraise specific cases with reference to a stock of prototypical “situation types,” the proper resolution of which that was governed by shared apprehensions of “correctness” instilled by the same means.

This feature of Llewellyn’s thought—the central feature of it—is weirdly overlooked by many scholars who characterize themselves as “realists” or New Realists,” and who think that Llewellyn’s point was that because there’s no “determinacy” in “law,” judges must be deciding on the basis of “political” sensibilities of the conventional “left-right” sort, generating differences in outcome across judges of varying ideologies. 

It’s really hard to get Llewellyn more wrong than that!

Again, his project was to identify how there could be pervasive agreement among lawyers and judges on what the law is despite its logical indeterminacy. His answer was that members of the legal profession, despite heterogeneity in their “ideologies” politically understood, shared a form of professionalized perception—“situation sense”—that by and large generated convergence on appropriate outcomes the coherence of which would befuddle non-lawyers.

Llewellyn denied, too, that the content of situation sense admitted of full specification or articulation. The arguments that lawyers made and the justifications that judges give for their decisions, he suggested, were post hoc rationalizations.  

Does that mean that for Lewellyn, legal argument is purely confabulatory? There are places where he seems to advance that claim.

But the much more intriguing and I think ultimately true explanation he gives for the practice of reason-giving in lawyerly argument (or just for lawyerly argument) is its power to summon and focus “situation sense”: when effective, argument evokes both apprehension of the governing “situation” and motivation to reach a situation-appropriate conclusion.

Okay. Now what is analogous between lawyering and chick-sexing should be readily apparent.

The capacity of the lawyer (including the one who is a judge) to discern “correct” outcomes as she grasps and manipulates indeterminate legal materials is the professional equivalent of—and involves the exercise of the same cognitive operation as—the chicksexer’s power to apprehend the gender of the day-old chick from inspection of its fuzzy, formless genetalia.

In addition, the lawyer acquires her distinctive pattern-recognition capacity in the same way the chick sexer acquires his: through professional acculturation.

What I do as a trainer of lawyers is analogous to what the chicksexer grandmaster does.  “Proximate causation,” “unlawful restraint of trade,” “character propensity proof/permissible purpose,” “collateral (not penal!) law”—“male,” “male,” “female,” “male”: I bombard my students with a succession of slides that feature the situation types that stock the lawyer’s inventory, and inculcate in students the motivation to conform the results in particular cases to what those who practice law recognize—see, feel—to be the correct outcome.

It works. I see it happen all the time. 

It’s quite amusing. We admit students to law school in large part because of their demonstrated proficiency in solving the sorts of logic puzzles featured on the LSAT. Then we torment them, Alice-in-Wonderland fashion, by presenting to them as “paradigmatic” instances of legal reasoning outcomes that clearly can’t be accounted for by the contorted simulacra of syllogistic reasoning that judges offer to explain them. 

They stare uncomprehendingly at written opinions in which a structural ambiguity is resolved one way in one statute and the opposite way in another--by judges who purport to be following the “plain meaning” rule.

They throw their hands up in frustration when judges insist that their conclusions are logically dictated by patently question-begging standards  (“when the result was a reasonably foreseeable consequence of the defendant’s action. . .  “) that can be applied only on the basis of some unspecified, and apparently not even consciously discerned, extra-doctrinal determination of the appropriate level of generality at which to describe the relevant facts.

But the students do learn—that the life of the law is not “logic” (to paraphrase, Holmes, a proto-realist) but “experience,” or better, perception founded on the “experience” of becoming a lawyer, replete with all the sensibilities that being that sort of professional entails.

The learning is akin to the socialization process that the students all experienced as they negotiated the path from morally and emotionally incompetent child to competent adult. Those already socially competent model the right reactions for them in their own reactions to the materials—and in their reactions to the halting and imperfect attempts of the students to reproduce it on their own. 

“What,” I ask in mocking surprise, “you don’t get why these two cases reached different results in applying the ‘reasonable foreseeability’ standard of proximate causation?” 

Seriously, you don’t see why, for an arsonist to be held liable for causing the death of firefighters, it's enough to show that he could ‘reasonably foresee’ 'death by fire,' whether or not he could foresee  ‘death by being trapped by fires travelling the particular one of 5x10^9 different paths the flames might have spread through a burning building'?! But why ‘death by explosion triggered by a spark emitted from a liquid nitrate stamping machine when knocked off its housing by a worker who passed out from an insulin shock’—and not simply 'death by explosion'—is what must be "foreseeable" to a manufacturer (one warned of explosion risk by a safety inspector) to be convicted for causing the death of employees killed when the manufacturer’s plant blew up? 

"Anybody care to tell Ms. Smith what the difference is,” I ask in exasperation.

Or “Really,” I ask in a calculated (or worse, in a wholly spontaneous, natural) display of astonishment,

you don’t see why somoene's ignorance of what's on the ‘controlled substance’ list doesn’t furnish a "mistake of law" defense (in this case, to a prostitute who hid her amphetamines in tin foil wrap tucked in her underwear--is that where you keep your cold medicine or ibuprofen! Ha ha ha ha ha!!), but why someone's ignorance of the types of  "mortgage portfolio swaps" that count as loss-generating "realization events" under IRS regs (the sort of tax-avoidance contrivance many of you will be paid handsomely by corporate law form clients to do) does furnish one? Or why ignorance of the criminal prohibition on "financial structuring" (the sort of strategem a normal person might resort to to hide assets from his spouse during a divorce proceeding) furnishes a defense as well?!

Here Mr. Jones: take my cellphone & call your mother to tell her there’s serious doubt about your becoming a lawyer. . . .

This is what I see, experience, do.  I see my students not so much “learning to think” like lawyers but just becoming them, and thus naturally seeing what lawyers see.

But of course I know (not as a lawyer, but as a thinking person) that I should trust how things look and feel to me only if corroborated by the sort of disciplined observation, reliable measurement, and valid causal inference distinctive of empirical investigation.

So, working with collaborators, I design a study to show that lawyers and judges are legal realists—not in the comic-book “politicians in robes” sense that some contemporary commentators have in mind but in the subtle, psychological one that Llewellyn actually espoused.

Examining a pair of genuinely ambiguous statutes, members of the public predictably conform their interpretation of them to outcomes that gratify their partisan cultural or political outlooks, polarizing in patterns the nature of which are dutifully obedient to experimental manipulation of factors extraneous to law but very relevant indeed to how people with those outlooks think about virtue and vice.

But not lawyers and judges: they converge on interpretations of these statutes, regardless of their own cultural outlooks and regardless of experimental manipulations that vary which outcome gratifies those outlooks.

They do that not because, they, unlike members of the public, have acquired some hyper-rational information-processing capacity that blocks out the impact of “motivated reasoning”: the lawyers and judges are just as divided as members of the public, on the basis of the same sort of selective crediting and discrediting of evidence, on issues like climate change, and legalization of marijuana and prostitution.

Rather the lawyers and judges converge because they have something else that members of the public don’t: Llewellyn’s situation sense—a professionalized form of perception, acquired through training and experience, that reliably fixes their attention on the features of the “situation” pertinent to its proper legal resolution and blocks out the distracting allure of features of it that might be pertinent to how a non-lawyer—i.e., a normal person, with one or another kind of “sense” reliably tuned to enabling them to be a good member of a cultural group on which their status depends . . . .

So, that’s what lawyers and chick sexers have in common: pattern recognition, situation sense, appropriately calibrated to doing what they do—or in a word professional judgment.

But now, can you see what the chick sexer and the lawyer don’t have in common?

Perhaps you don’t; because even in the course of this account, I feel myself having become an agent of the intoxicating, reason-bypassing process that imparting “situation sense” entails.

But you might well see it—b/c here all I’ve done is give you an account of what I do as opposed to actually doing it to you.

We know something important about the chick sexer’s judgment in addition to knowing that it is an instance of pattern recognition: namely, that it works.

The chick sexer has a mission in relation to a process aimed at achieving a particular end.  That end supplies a normative standard of correctness that we can use not only to test whether chick sexers, individually and collectively, agree in their classifications but also on whether they are classifying correctly.

Obviously, we’ll have to wait a bit, but if we collect rather than throw half of them a way, we can simply observe what gender the baby chicks classified by the sexer as “male” and “female” grow up to be.

If we do that test, we’ll find out that the chick sexers are indeed doing a good job.

We don’t have that with lawyers’ or judges’ situation sense.  We just don’t.

We know they see the same thing; that they are, in the astonishing way that fascinated Llewellyn, converging in their apprehension of appropriate outcomes across cases that “lay persons” lack the power to classify correctly.

But we aren’t in a position to test whether they are seeing the right thing.

What is the goal of the process the lawyers and judges are involved in?  Do we even agree on that?

I think we do: assuring the just and fair application of law.

That’s a much more general standard, though, than “classifying the gender of chicks.”  There are alternative understandings of “just” and “fair” here.

Actually, though, this is still not the point at which I’m troubled.  Although for sure I think there is heterogeneity in our conceptions of the “goals” that the law aims at, I think they are all conceptions of a liberal political concept of “just” and “fair,” one that insists that the state assume a stance of neutrality with respect to the diverse understandings of the good life that freely reasoning individuals (or more accurately groups of individuals) will inevitably form.

But assuming that this concept, despite its plurality of conceptions, has normative purchase with respect to laws and applications of the same (I believe that; you might not, and that’s reasonable), we certainly don’t have a process akin to the one we use for chick sexers to determine whether lawyers and judges’ situation sense is genuinely calibrated to achieving it.

Or if anyone does have such a process, we certainly aren’t using it in the production of legal professionals.

To put it in terms used to appraise scientific methods, we know the professional judgment of the chick sexer is not only reliable—consistently attuned to whatever it is that appropriately trained members of their craft are unconsciously discerning—but also valid: that is, we know that the thing the chick sexers are seeing (or measuring, if we want to think of them as measuring instruments of a special kind) is the thing we want to ascertain (or measure), viz., the gender of the chicks.

In the production of lawyers, we have reliability only, without validity—or at least without validation.  We do successfully (remarkably!) train lawyers to make out the same patterns when they focus their gaze at the “mystifying cloud of words” that Cardozo identified the law as comprising. But we do nothing to assure that what they are discerning is the form of justice that the law is held forth as embodying.

Observers fret—and scholars using empirical methods of questionable reliability and validity purport to demonstrate—that judges are mere “politicians in robes,” whose decisions reflect the happenstance of their partisan predilections.

That anxiety that judges will disagree based on their “ideologies” bothers me not a bit.

What does bother me—more than just a bit—is the prospect that the men and women I’m training to be lawyers and judges will, despite the diversity of their political and moral sensibilities, converge on outcomes that defy the basic liberal principles that we expect to animate our institutions.

The only thing that I can hope will stop that from happening is for me to tell them that this is how it works.  Because if it troubles me, I have every reason to think that they, as reflective decent people committed to respecting the freedom & reason of others, will find some of this troubling too.

Not so troubling that they can’t become good lawyers. 

But maybe troubling enough that they won't stop being reflective moral people in their careers as lawyers; troubling enough so that if they find themselves in a position to do so, they will enrich the stock of virtuous-lawyer prototypes that populate our situation sense by doing something that they, as reflective, moral people—“conservative” or “liberal”—recognize is essential to reconciling being a “good lawyer” with being a member of a profession essential to the good of a liberal democratic regime.

That can happen, too.


How big a difference in mean CRT scores is "big enough" to matter? or NHT: A malignant craft norm, part 2

1.   Now where was I . . . ? 

Right . . . So yesterday I posted part I of this series, which is celebrating the bicentennial , or perhaps it’s the tricentennial—one loses track after a while--of the “NHT Fallacy” critique

The nerve of it is that “rejection of the null [however it is arbitrarily defined] at p < 0.05 [or p < 10^-50 or whatever]” furnishes no inferentially relevant information in hypothesis testing. To know whether an observation counts as evidence in support of a hypothesis, the relevant information is not how likely we were to observe a particular value if the “null” is true but how much more or less likely we were to observe that value if a particular hypothesized true “value” is correct than if another hypothesized “true” value is correct (e.g., Roseboom 1960; Edwards, Lindman & Savage 1963; Cohen 1994; Goodman 1999a;  Gigerenzer 2004). 

Actually, I’m not sure when the first formulation of the critique appeared.  Amusingly, in his 1960 classic The Fallacy of the Null-hypothesis Significance Test, Rosenbloom, apologetically characterized his own incisive attack on the inferential barrenness of NHT as “not a particularly original view”!

The critique has been refined and elaborated many times, in very useful ways, since then, too.  Weirdly, the occasion for so many insightful elaborations has been the persistence of NHT despite the irrefutable proofs of those critiquing it.

More on that in in a bit, but probably the most interesting thing that has happened in the career of the critique in the last 50 yrs. or so has been the project to devise tractable alternatives to NHT that really do quantify the evidentiary weight of any particular set of data. 

I’m certainly not qualified to offer a reliable account of the intellectual history of using Bayesian likelihood ratios as a test statistic in the social sciences (cf. Good.  But the utlity of this strategy was clearly recognized by Rozenboom, who observed that the inferential defects in NHT could readily be repaired by analytical tools forged in the kiln of “the classic theory inverse probabilities.”

The “Bayes Factor” –actually “the” misleadingly implies that there is only one variant of it—is the most muscular, deeply theorized version of the strategy. 

But one can, I believe, still get a lot of mileage out of less technically elaborate analytical strategies using likelihood ratios to assess the weight of the evidence in one’s data (e.g., Goodman, 1999b). 

For many purposes, I think, the value of using Bayesian likelihood ratios is largely heuristic: having to specify the predictions that opposing plausible hypotheses would generate with respect to the data, and to formulate an explicit measure of the relative consistency of the observed outcome with each, forces the researcher to do what the dominance of NHT facilitates the evasion of: the reporting of information that enables a reflective person to draw an inference about the weight of the evidence in relation to competing explanations of the dynamic at issue. 

That’s all that’s usually required for others to genuinely learn from and critically appraise a researcher’s work. For sure there are times when everything turns on how precisely one is able to estimate  some quantity of interest, where key conceptual issues about how to specify one or another parameter of a Bayes Factor will have huge consequence for interpretation of the data.

But in lots of experimental models, particularly in social psychology, it’s enough to be able to say “yup, that evidence is definitely more consistent—way more consistent—with what we’d expect to see if H1 rather than H2 is true”—or instead, “wait a sec, that result is not really any more supportive of that hypothesis than this one!” In which case, a fairly straightforward likelihood ratio analysis can, I think, add a lot, and even more importantly avoid a lot of the inferential errors that accompany permitting authors to report “p < 0.05” and then make sweeping, unqualified statements not supported by their data.

That’s exactly the misadventure, I said “yesterday,” that a smart researcher experienced with NHT.  That researcher found a “statistically significant” correlation (i.e., rejection of the “null at p<0.0xxx”) between a sample of Univ of Ky undergraduate’s CRT scores (Frederick 2005) and their responses to a standard polling question on “belief in” evolution; he then treated that as corroboration of his hypothesis that “individuals who are better able to analytically control their thoughts are more likely” to overcome the intuitive attraction of the idea that “living things, are ... intentionally designed by some external agent” to serve some “function and purpose,” and thus “more likely to eventually endorse evolution’s role in the diversity of life and the origin of our species."

But as I pointed out, the author’s data, contrary to his assertion, unambiguously didn’t support that hypothesis.

Rather than showing that “analytic thinking consistently predicts endorsement of evolution,” his data demonstrated that knowing the study subjects’ CRT scores furnished absolutely no predictive insight into their "evolution beliefs."  The CRT predictor in the author’s regression model was “statistically significant” (p < 0.01), but was way too small in size to outperform a “model” that simply predicted “everyone” in the author’s sample—regardless of their CRT score—rejected science’s account of the natural history of human beings.  

(Actually, there were even more serious—or maybe just more interesting—problems having to do with the author’s failure to test the data's relative support for a genuine alternative about how cognitive reflection relates to "beliefs" in evolution: by magnifying the opposing positions of groups for whom "evolution beliefs" have become (sadly, pointlessly, needlessly) identity defining. But I focused “yesterday” on this one b/c it so nicely illustrates the NHT fallacy.)

Had he asked the question that his p-value necessarily doesn’t address—how much more consistent is the data with one hypothesis than another—he would have actually found out that the results of his study was more consistent with the hypothesis that “cognitive reflection makes no goddam difference” in what people say when they answer a standard “belief in evolution” survey item of the sort administered by Gallup or Pew.

The question I ended on, then, was,

How much more or less probable is it that we’d observe the reported difference in believer-nonbeliever CRT scores if differences in cognitive reflection do “predict” or “explain” evolution beliefs among Univ. Ky undergrads than if they don't?

That’s a very complicated and interesting question, and so now I’ll offer my own answer, one that uses the inference-disciplining heuristic of forming a Bayesian likelihood ratio.

2 provisos:

1. Using a Baysian likelihood ratio is not, in my view, the only device that can be used to extract from data like these the information necessary to form cogent inferences about the support fo the data for study hypotheses.  Anything that helps the analyst and reader guage the relative support of the data for the study hypothesis in relation to a meaningful or set of meaningful alternatives can do that.

Often it will be *obvious* how the data do that, given the sign of the value observed in the data or the size of it in relation to what common understanding tells one the competing hypotheses would predict.

But sometimes those pieces of information might not be so obvious, or  might be open to debate. Or in any case, there could be circumstances in which extracting the necessary information is not so straightforward and in which a device like forming a Bayesian likelihood ratio in relation to the competing hypotheses helps, a lot, to figure out what the inferential import of the data are.

That's the pragmatic position I mean to be staking out here in advocating alternatives to the pernicious convention of permitting researchers to treat "p < 0.05" as evidence in support of a study hypothesis.

2. My "Bayesian likelihood ratio" answer here is almost surely wrong! 

But it is at least trying to answer the right question, and by putting it out there, maybe I can entice someone else who has a better answer to share it.

Indeed, it was exactly by enticing others into scholarly conversation that I came to see what was cool and important about this question.   Without implying that they are at all to blame for any deficiencies in this analysis, it’s one that emerged from my on-line conversations with Gordon Pennycook, who commented on my original post on this article, and my off-line ones with Kevin Smith, who shared a bunch of enlightening thoughts with me in correspondence relating to a post that I did on an interesting paper that he co-authored.

2.   What sorts of differences can the CRT reliably measure? 

Here’s the most important thing to realize: the CRT is friggin hard!

It turns out that the median score on the CRT, a three-question test, is zero when administered to the general population.  I kid you not: studies w/ general population samples (not student or M Turk or ones to sites that recruit from visitors to a website that offers to furnish study subjects with information on the relationship between their moral outlooks and their intellectual styles) show that 60% of the subjects can't get a single answer correct.

Hey, maybe 60% of the population falls short of the threshold capacity in conscious, effortful information processing that critical reasoning requires.  I doubt that but it's possible.

What that means, though, is that if we use the CRT in a study (as it makes a lot of sense to do; it’s a pretty amazing little scale), we necessarily can't get any information from our data on differences  in cognitive reflection among a group of people comprising 60% of the population.   Accordingly, if we had two groups neither of whose mean scores were appreciably above the "population mean," we'd be making fools of ourselves to think we were observing any real difference: the test just doesn't have any measurement precision or discrimination at that "low" a level of the latent disposition.

We can be even more precise about this -- and we ought to be, in order to figure out how "big" a difference in mean CRT scores would warrant saying stuff like "group x is more reflective than group y" or "differences in cognitive reflection 'predict'/'explain' membership in group x as opposed to y...."

Using item response theory, which scores the items on the basis of how likely a person with any particular level of the latent disposition (theta) is to get that particular item correct, we can assess the measurement precision of an assessment instrument at any point along theta.  We can express that measurement precision in terms of a variable "reliability coefficient," which reflects what fraction of the differences in individual test scores in that vicinity of theta is attributable to "true differences" & how much to measurement error.

Here's what we get for CRT (based on a general population sample of about 1800 people):

The highest degree of measurement precision occurs around +1 SD, or approximately "1.7" answers correct.  Reliability there is 0.60, which actually is pretty mediocre; for something like the SAT, it would be pretty essential to have 0.8  along the entire continuum from -2 to +2 SD.  That’s b/c there is so much at stake, both for schools that want to rank students pretty much everywhere along the continuum, and for the students they are ranking. 

But I think 0.60 is "okay" if one is trying to make claims about groups in general & not rank individuals. If one gets below 0.5, though, the correlations between the latent variable & anything else will be so attenuated as to be worthless....

So here are some judgments I'd make based on this understanding of the psychometric properties of CRT:

  • If the "true" mean CRT scores of two groups -- like "conservatives" & "liberals" or "evolution believers" & "disbelievers" -- are both within the red zone, then one has no reasonable grounds for treating the two as different in their levels of reflection: CRT just doesn't have the measurement precision to justify the claim that the higher-scoring group is "more reflective “even if the difference in means is "statistically significant."

  • Obviously, if one group's true mean is in the red zone and another's in the green or yellow, then we can be confident the two really differ in their disposition to use conscious, effortful processing.

  • Groups within the green zone probably can be compared, too.  There's reasonable measurement precision there-- although it's still iffy (alpha is about 0.55 on avg...).

If I want to see if groups differ in the reflectiveness, then, I should not be looking to see if the difference in their CRT scores is "significant  p < 0.05," since that by itself won't support any inferences relating to the hypotheses given my guidelines above.

If one group has a "true" mean CRT score that is in the "red" zone, the hypothesis that it is less reflective than another group can be supported with CRT results only if the latter group's "true" mean score is in the green zone.

3.  Using likelihood ratios to weigh the evidence on “whose is bigger?” 

So how can we can this information to form a decent hypothesis testing strategy here?

Taking the "CRT makes no goddam difference" position, I'm going to guess that those who "don't believe" in evolution are pretty close to the population mean of "0.7."  If so, then those who "do believe" will need to have a “true” mean score of +0.5 SD or about "1.5 answers correct" before there is a "green to red" zone differential.

That's a difference in mean score of approximately "0.8 answers correct."

Thus, the "believers more reflective" hypothesis, then, says we should expect to find that believers will have a mean score 0.8 points higher than the population mean, or 1.5 correct.

The “no goddam difference” hypothesis, we’ll posit, predicts the "null": no difference whatsoever in mean CRT scores of the believers & nonbelievers.

Now turning to the data, it turns out the "believers" in author’s sample had a mean CRT of 0.86, SEM = .07.  The "nonbelievers" had a mean CRT score of 0.64, SEM =0.05.

I calculate the a difference as 0.22, SEM = 0.08.

Again, it doesn’t matter that  this difference is “statistically significant”—at p < 0.01 in fact.  What we want to know is the inferential import of this data for our competing hypotheses. Which one does it support more—and how much more supportive is it?

As indicated at the beginning, a  really good (or Good) way to gauge the weight of the evidence in relation to competing study hypotheses is through the use of Bayesian likelihood ratios.  To calculate them, we look at where the observed difference in mean CRT scores falls in the respective probability density distributions associated with the “no goddam difference” and “believers more reflective” hypotheses.

By comparing how probable it is that we’d observe such a value under each hypothesis, we get the Bayesian likelihood ratio, which is how much more consistent the data are with one hypothesis than the other:

The author’s data are thus roughly 2000 times more consistent with the “no goddam difference” prediction than with the “believers more reflective” prediction.

Roughly! Figuring out the exact size of this likelihood ratio is not important.

All that matters—all I’m using the likelihood ratio, heuristically, to show—is that we can now see that, given what we know CRT is capable of measuring among groups whose scores are so close to the population mean, that the size of the observed difference in mean CRT scores is orders of magnitude more consistent with the  “no goddam difference” hypothesis than with the “believers more reflective” hypothesis, notwithstanding its "stastical significance."

That’s exactly why it’s not a surprise that a predictive model based on CRT scores does no better than a model that just uses the population (or sample) frequency to predict whether any given student (regardless of his or her CRT scores) believes in in evolution.

Constructing a Bayesian likelihood ratio here was so much fun that I’m sure you’ll agree we should do it one more time. 

In this one, I’m going to re-analyze data from another study I recently did a post on: Reflective liberals and intuitive conservatives: A look at the Cognitive Reflection Test and ideology,” Judgment and Decision Making, July 2015, pp. 314–331, by Deppe, Gonzalez, Neiman, Jackson Pahlke, the previously mentioned Kevin Smith & John Hibbing.

Here the authors reported data on the correlation between CRT scores and individuals identified with reference to their political preferences.  They reported that CRT scores were negatively correlated (p < 0.05) with various conservative position “subscales” in various of their convenience samples, and with a “conservative preferences overall” scale in a stratified nationally representative sample.  They held out these results as “offer[ing] clear and consistent support to the idea that liberals are more likely to be reflective compared to conservatives.”

As I pointed out in my earlier post, I thought the authors were mistaken in reporting that their data showed any meaningful correlation—much less a statistically significant one—with “conservative preferences overall” in their nationally representative sample; they got that result, I pointed out, only because they left 2/3 of the sample out of their calculation.

I did point out, too, that the reported correlations seemed way to small, in any case, to support the conclusion that “liberals” are “more reflective” than conservatives.  It was Smith’s responses in correspondence that moved me to try to formulate in a more systematic way an answer to the question that a p-value, no matter how miniscule, begs: namely, just “how big” a difference two groups “true” mean CRT scores has to be before one can declare one to be “more reflective,” “analytical,” “open-minded,” etc. than the another.

Well, let’s use likelihood ratios to measure the strength of the evidence in the data in just the 1/3 of the nationally representative sample that the authors used in their paper.

Once more, I’ll assume that “conservatives” are about average in CRT—0.7. 

So again, the "liberal more reflective" hypothesis predicts we should expect to find that liberals will have a mean score 0.8 points higher than the population mean, or 1.5 correct.   That’s the minimum difference for group means on CRT necessary to register a difference for a group to be deemed more reflective than another whose scores are close to the population mean.

Again, the “no goddam difference” hypothesis predicts the "null": here no difference whatsoever in mean CRT scores of liberal & conservatives.

By my calculation, in the subsample of the data in question “conservatives” in (individuals above mean on the “conservative positions overall” scale) have a mean CRT of 0.55, SE = 0.08; “liberals” a mean score of 0.73, SE = 0.08.

The estimated difference (w/ rounding) in means is 0.19, SE = 0.09.

So here is the likelihood ratio assessment of the relative support of the evidence for the two hypotheses:

Again, the data are orders of magnitude more consistent with “makes no goddam difference.”

Once more, whether the difference is “5x10^3” or 4.6x10^3 or even 9.7x10^2 or 6.3x10^4 is not important. 

What is is that there’s clearly much much much more reason for treating this data as supporting an inference diametrically opposed to the one drawn by the authors.

Or at least there is if I’m right about how to specify the range of possible observations we should expect to see if the “makes no goddam difference” hypothesis is true and the range of possible observations we should expect to see if the “liberals are more reflective than conservatives” hypotheses is true. 

Are those specifications correct?

Maybe not!  They're just the best ones I can come up with for now! 


If someone sees a problem & better still a more satisfying solution, it would be very profitable to discuss that! 


What's not even worth discussing, though, is that "rejecting the null at p<0.05" is the way to figure out if the data supports the strong conclusions these papers purport to draw-- becaues in fact, that information does not support any particular inference on its own.

4.  What to make of this

The point here isn’t to suggest any distinctive defects in these papers, both of which actually report interesting data.

Again, these are just illustrations of the manifest deficiency of NHT, and in particular the convention of treating “rejection of the null at p < 0.05”—by itself! – as license for declaring the observed data as supporting a hypothesis, much less as “proving” or even furnishing “strong,” “convincing” etc. evidence in favor of it.

And again in applying this critique to these particular papers, and in using Bayesian likelihood ratios to liberate the inferential significance locked up in the data, I’m not doing anything the least bit original!

On the contrary, I’m relying on arguments that were advanced over 50 years ago, and that have been strengthened and refined by myriad super smart people in the interim.

For sure, exposure of the “NHT fallacy” reflected admirable sophistication on the part of those who developed the critique. 

But as I hope what I’ve showing the last couple of posts is that the defects in NHT that these scholars identified is really really easy to understand. Once it’s been pointed out; any smart middle schooler can readily grasp it!

So what the hell is going on?

I think the best explanation for the persistence of the NHT fallacy is that it is a malignant craft norm

Treating “rejection of the null at p < 0.05” as license for asserting support of one’s hypothesis is “just the way the game works,” “the way it’s done.” Someone being initiated into the craft can plainly see that in the pages of the leading journals, and in the words and attitudes—the facial expressions, even—of the practitioners whose competence and status is vouched for by all of their NHT-based publications and by the words, and attitudes (and even facial expressions even) of other certified members of the field.

Most of those who enter the craft will therefore understandably suppress whatever critical sensibilities might otherwise have altered them to the fallacious nature of this convention. Indeed, if they can’t do that, they are likely to find the path to establishing themselves barred by jagged obstacles.

The way to progress freely down the path is to produce and get credit and status for work that embodies the NHT fallacy.  Once a new entrant gains acceptance that way, then he or she too acquires a stake in the vitality of the convention, one that not only reinforces his or her aversion to seriously interrogating studies that rest on the fallacy but that also motivates him or her to evince thereafter the sort of unquestioning, taken-for-granted assent that perpetuates the convention despite its indisputably fallacious character.

And in case you were wondering, this diagnosis of the malignancy of NHT as a craft norm in the social sciences is not the least bit original to me either! It’s was Rozenboom’s diagnosis over 50 yrs ago.

So I guess we can see it’s a slow-acting disease.  But make no mistake, it’s killing its host.


Cohen, J. The Earth is Round (p < .05). Am Psychol 49, 997 - 1003 (1994).

Edwards, W., Lindman, H. & Savage, L.J. Bayesian Statistical Inference in Psychological Research.Psych Rev 70, 193 - 242 (1963).

Frederick, S. Cognitive Reflection and Decision Making. Journal of Economic Perspectives 19, 25-42 (2005).

Gigerenzer, G. Mindless statistics. Journal of Socio-Economics 33, 587-606 (2004).

Goodman, S.N. Toward evidence-based medical statistics. 2: The Bayes factor. Annals of internal medicine 130, 1005-1013 (1999a).

Goodman, S.N. Towards Evidence-Based Medical Statistics. 1: The P Value Fallacy. Ann Int Med 130, 995 - 1004 (1999b).

Rozeboom, W.W. The fallacy of the null-hypothesis significance test. Psychological bulletin 57, 416 (1960).

Gigerenzer, G. Mindless statistics. Journal of Socio-Economics 33, 587-606 (2004).




Is the unreal at least *sometimes* rational and the rational at least *sometimes* unreal?

If you can't read the type, click for now & schedule appt w/ optometrist for laterFrom something I'm working on . . .

Identity-protective cognition and accuracy

Identity-protective cognition is a form of motivated reasoning—an unconscious tendency to conform information processing to some goal collateral to accuracy (Kunda, 1990). In the case of identity-protective cognition, that goal is protection of one’s status within an affinity group whose members share defining cultural commitments.

Sometimes (for reasons more likely to originate in misadventure than conscious design) positions on a disputed societal risk become conspicuously identified with membership in competing groups of this sort. In those circumstances, individuals can be expected to attend to information in a manner that promotes beliefs that signal their commitment to the position associated with their group (Sherman & Cohen, 2006; Kahan, 2015b).

We can sharpen understanding of identity-protective reasoning by relating this style of information processing to a nuts-and-bolts Bayesian one. Bayes’s Theorem instructs individuals to revise the strength of their current beliefs (“priors”) by a factor that reflects how much more consistent the new evidence is with that belief being true than with it being false. Conceptually, that factor—the likelihood ratio—is the weight the new information is due. Many cognitive biases (e.g., base rate neglect, which involves ignoring the information in one’s “priors”) can be understood to reflect some recurring failure in people’s capacity to assess information in this way.

That’s not quite what’s going on, though, with identity-protective cognition. The signature of this dynamic isn’t so much the failure of people to “update” their priors based on new information but rather the role that protecting their identities plays in fixing the likelihood ratio they assign to new information. In effect, when they display identity-protective reasoning, individuals unconconsciously adjust the weight they assign to evidence based on its congruency with their group’s position (Kahan, 2015a).

If, e.g., they encounter a highly credentialed scientist, they will deem him an “expert” worthy of deference on a particular issue—but only if he is depicted as endorsing the factual claims on which their group’s position rests (Fig. 1) (Kahan, Jenkins-Smith, & Braman, 2011). Likewise, when shown a video of a political protest, people will report observing violence warranting the demonstrators’ arrest if the demonstrators’ cause was one their group opposes (restricting abortion rights; permitting gays and lesbians to join the military)—but not otherwise (Kahan, Hoffman, Braman, Evans, & Rachlinski, 2012).

In fact, Bayes’s Theorem doesn’t say how to determine the likelihood ratio—only what to do with the resulting factor: multiply one’s prior odds by it. But in order for Bayesian information processing to promote accurate beliefs, the criteria used to determine the weight of new information must themselves be calibrated to truth-seeking. What those criteria are might be open to dispute in some instances. But clearly, whose position the evidence supports—ours or theirs?—is never one of them.

The most persuasive demonstrations of identity-protective cognition show that individuals opportunistically alter the weight they assign one and the same piece of evidence based on experimental manipulation of the congruence of it with their identities. This design is meant to rule out the possibility that disparate priors or pre-treatment exposure to evidence is what’s blocking convergence when opposing groups evaluate the same information (Druckman, 2012).

But if this is how people assess information outside the lab, then opposing groups will never converge, much less converge on the truth, no matter how much or how compelling the evidence they receive. Or at least they won’t so long as the conventional association of positions with loyalty to opposing identify-defining groups remains part of their “objective social reality.”

Bounded rationality?

Frustration of truth-convergent Bayesian information processing is the thread that binds together the diverse collection of cognitive biases of the bounded-rationality paradigm. Identity-protective cognition, we’ve seen, frustrates truth-convergent Bayesian information processing. Thus, assimilation of identity-protective reasoning into the paradigm—as has occurred within both behavioral economics (e.g., Sunstein, 2006, 2007) and political science (e.g., Taber & Lodge, 2013)— seems perfectly understandable.

Understandable, but wrong!

The bounded-rationality paradigm rests on a particular conception of dual-process reasoning. This account distinguishes between an affect-driven, “heuristic” form of information processing, and a conscious, “analytical” one. Both styles—typically referred to as System 1 and System 2, respectively—contribute to successful decisionmaking. But it is the limited capacity of human beings to summon System 2 to override errant System 1 intuitions that generates the grotesque assortment of mental miscues—the “availability effect,” “hindsight bias,” the “conjunction fallacy,” “denominator neglect,” “confirmation bias”—on display in decision science’s benighted picture of human reason (Kahneman & Frederick, 2005).

It stands to reason, then, that if identity-protective cognition is properly viewed as a member of bounded-rationality menagerie of biases, it, too, should be most pronounced among people (the great mass of the population) disposed to rely on System 1 information processing. This assumption is commonplace in the work reflecting the bounded-rationality paradigm (e.g., Lilienfeld, Ammirati, & Lanfield 2009; Westen, Blagov, Karenski, Kilts, & Hamann, 2006).

But actual data are to the contrary. Observational studies consistently find that individuals who score highest on the Cognitive Reflection Test and other reliable measures of System 2 reasoning are not less polarized but more so on facts relating to divisive political issues (e.g., Kahan et al., 2012).

Experimental data support the inference that these individuals use their distinctive analytic proficiencies to form identity-congruent assessments of evidence. When assessing quantitative data that predictably Likelihood ratio is 5x10^8! Seriously! click it!!!trips up those who rely on System 1 processing, individuals disposed to use System 2 are much less likely to miss information that supports their groups’ position. When the evidence contravenes their group’s position, these same individuals are better able to explain it away (Kahan, Peters, Dawson, & Slovic, 2013).

Another study that fits this account addresses the tendency of partisans form negative impressions of their opposing number (Fig. 2). In the study, subjects selectively credited or dismissed evidence of the validity of the CRT as an “open-mindedness” test depending on whether the subjects were told that individuals who held their political group’s position on climate change had scored higher or lower than those who held the opposing view. Already large among individuals of low to modest cognitive reflection, this effect was substantially more pronounced among those who scored the highest on the CRT (Kahan, 2013b).

The tragic conflict of expressive rationality

As indicated, identity-protective reasoning is routinely included in the roster of cognitive mechanisms that evince bounded rationality. But where an information-processing dynamic is consistently shown to be magnified, not constrained, by exactly the types of reasoning proficiencies that counteract the mental pratfalls associated with heuristic information processing, then one should presumably update one’s classification of that dynamic as a “cognitive bias.”

In fact, the antagonism between identity-protective cognition and perceptual accuracy is not a consequence of too little rationality but too much.

Nothing an ordinary member of the public does as consumer, as voter, or participant in public discourse will have any effect on the risk that climate change poses to her or anyone else. Same for gun control, fracking, and nuclear waste disposal: her actions just don’t matter enough to influence collective behavior or policymaking.

But given what positions on these issues signify about the sort of person she is, adopting a mistaken stance on one of these in her everyday interactions with other ordinary people could expose her to devastating consequences, both material and psychic. It is perfectly rational under these circumstances to process information in a manner that promotes formation of the beliefs on these issues that express her group allegiances, and to bring all her cognitive resources to bear in doing so.

Of course, when everyone uses their reason this way at once, collective welfare suffers. In that case, culturally diverse democratic citizens won’t converge, or converge as quickly, on the significance of valid evidence on how to manage societal risks. But that doesn’t change the social incentives that make it rational for any individual—and hence every individual—to engage information in this way.

Only some collective intervention—one that effectively dispels the conflict between the individual’s interest in forming identity-expressive risk perceptions and society’s interest in the formation of accurate ones—could (Kahan et al., 2012; Lessig, 1995).

Rationality ≠ accuracy (necessarily)

. . . . Obviously, it isn’t possible to assess the “rationality” of any pattern of information processing unless one gets what the agent processing the information is trying to accomplish. Because forming accurate “factual perceptions” is not the only thing people use information for, a paradigm that motivates empirical researchers to appraise cognition exclusively in relation to that objective will indeed end up painting a distorted picture of human thinking.

But worse, the picture will simply be wrong. The body of science this paradigm generates will fail, in particular, to supply us with the information a pluralistic democratic society needs to manage the forces that creat the conflict betwen the stake citizens’ have in using their reason to know what’s known and using it to be who they are as members of diverse cultural groups  (Kahan, 2015b).


Akerlof, G. A., & Kranton, R. E. (2000). Economics and Identity. Quarterly Journal of Economics, 115(3), 715-753.

Anderson, E. (1993). Value in ethics and economics. Cambridge, Mass.: Harvard University Press.

Druckman, J. N. (2012). The Politics of Motivation. Critical Review, 24(2), 199-216.

Kahan, D. M. (2015a). Laws of cognition and the cognition of law. Cognition, 135, 56-60.

Kahan, D. M. (2015b). What is the “science of science communication”? J. Sci. Comm., 14(3), 1-12.

Kahan, D. M., Hoffman, D. A., Braman, D., Evans, D., & Rachlinski, J. J. (2012). They Saw a Protest : Cognitive Illiberalism and the Speech-Conduct Distinction. Stan. L. Rev., 64, 851-906.

Kahan, D. M., Jenkins-Smith, H., & Braman, D. (2011). Cultural Cognition of Scientific Consensus. J. Risk Res., 14, 147-174.

Kahan, D. M., Peters, E., Dawson, E., & Slovic, P. (2013). Motivated Numeracy and Enlightened Self Government. Cultural Cognition Project Working Paper No. 116.

Kahan, D. M., Peters, E., Wittlin, M., Slovic, P., Ouellette, L. L., Braman, D., & Mandel, G. (2012). The polarizing impact of science literacy and numeracy on perceived climate change risks. Nature Climate Change, 2, 732-735.

Kahneman, D., & Frederick, S. (2005). A model of heuristic judgment. The Cambridge handbook of thinking and reasoning, 267-293.

Kunda, Z. (1990). The Case for Motivated Reasoning. Psychological Bulletin, 108, 480-498.

Lessig, L. (1995). The Regulation of Social Meaning. U. Chi. L. Rev., 62, 943-1045.

Lilienfeld, S. O., Ammirati, R., & Landfield, K. (2009). Giving Debiasing Away: Can Psychological Research on Correcting Cognitive Errors Promote Human Welfare? Perspectives on Psychological Science, 4(4), 390-398.

Lodge, M., & Taber, C. S. (2013). The rationalizing voter. Cambridge ; New York: Cambridge University Press.

Peirce, C. S. (1877). The Fixation of Belief. Popular Science Monthly, 12, 1-15.

Sherman, D. K., & Cohen, G. L. (2006). The Psychology of Self-defense: Self-Affirmation Theory Advances in Experimental Social Psychology (Vol. 38, pp. 183-242): Academic Press.

Sunstein, C. R. (2006). Misfearing: A reply. Harvard Law Review, 119(4), 1110-1125.

Sunstein, C. R. (2007). On the Divergent American Reactions to Terrorism and Climate Change. Columbia Law Review, 107, 503-557.

Westen, D., Blagov, P. S., Harenski, K., Kilts, C., & Hamann, S. (2006). Neural Bases of Motivated Reasoning: An fMRI Study of Emotional Constraints on Partisan Political Judgment in the 2004 U.S. Presidential Election. Journal of Cognitive Neuroscience, 18(11), 1947-1958.


NHT: A malignant craft norm, part 1

1. The NHT Fallacy

So, as I promised “yesterday,” here are some additional reflections on the deficiencies of “null hypothesis testing” (NHT).

Actually, my objection is to the convention of permitting researchers to treat “rejection of the null, p < 0.05” as evidence for crediting their study hypotheses.

In one fit-statistic variation or another, “p < 0.5” is the modal “reported result” in social science research.

But the idea that a p-value supports any inference from the data is an out-and-out fallacy of the rankest sort!

Because of measurement error, any value will have some finite probability of being observed whatever the “true” value of the quantity being measured happens to be.  Nothing at all follows from learning that the probability of obtaining the precise value observed in a particular sample was less 5%—or even less than 1% or less than 0.000000001%—on the assumption that true value is zero or any other particular quantity.  

What matters is how much more or less likely the observed result is in relation to one hypothesized true value than another.  From that information, we can determine the inferential significance of the data: that is, we can determine whether the data support a particular hypothesis, and if so, how strongly.  But if we don’t have that information at our disposal and a researcher doesn’t supply it, then anything the researcher says about his or her data is literally meaningless.

This is likely to seem obvious to most of the 14 billion readers of this blog. It is--thanks to a succession of super smart people who've helped to spell out this "NHT fallacy" critique (e.g., Roseboom 1960; Edwards, Lindman & Savage 1963; Cohen 1994; Goodman 1999, 1999; Gigerenzer 2004). 

As these critics note, though, the problem with NHT is that it supplies a mechanical testing protocol that elides these basic points.  Researchers who follow the protocol can appear to be furnishing us with meaningful information even if they are not. 

Or worse, they can declare that a result that is “significant at p < 0.05” supports all manner of conclusions that it just doesn’t support—because as improbable as it might have been that the reported result would be observed if the “true” value were zero, the probability of observing such a result if the researcher’s hypothesis were true is even smaller.

2.  This straw man has legs

I know: you think I’m attacking a straw man.

I might be.  But that straw man publishes a lot of studies.  Let me show you an example.

In one recent paper--one reporting the collection of a trove of interesting data that definitely enrich scholarly discussion-- a researcher purported to test the “core hypothesis” that “analytic thinking promotes endorsement of evolution.” 

That researcher, a very good scholar, reasoned that if this was so, “endorsement of evolution” ought to be correlated with “performance on an analytic thinking task.” The task he chose was the Cognitive Reflection Test (Frederick 2005), the leading measure of the capacity and motivation of individuals to use conscious, effortful “System 2” information processing rather than intuitive, affect-driven “System 1” processing.   

After administering a survey to a sample of University of Kentucky undergraduates, the researcher reported finding the predicted correlation between the subjects' CRT scores and their responses to a survey item on beliefs in evolution (p < 0.01).  He therefore concluded: 

  • "analytic thinking consistently predicts endorsement of evolution”;
  • “individuals who are better able to analytically control their thoughts are more likely to eventually endorse evolution’s role in the diversity of life and the origin of our species";
  • “[the] results suggest that it does not take a great deal of analytic thinking to overcome creationist intuitions.”

If you are nodding your head at this point, you really shouldn’t be.  This is not nearly enough information to know whether the author’s data support any of the inferences he draws.

In fact, they demonstrably don’t.

Here is a model in which belief in science's understanding of evolution (i.e., one that doesn't posit "any supreme being guid[ing] ... [it] for the purpose of creating humans') is regressed on the CRT scores of the student-sample respondents:

The outcome variable is the probability that a student will believe in evolution. 

If, as the author concludes, “analytic thinking consistently predicts endorsement of evolution,” then we should be able to use this model to, well, predict whether subjects in the sample believe in evolution, or at least to predict that with a higher degree of accuracy than we would be able to without knowing the subjects’ CRT scores.

But we can’t.  

Yes, just as the author reported, there is a positive & significant correlation coefficient for CRT.

But look at the "Count" & "Adjusted Count" R^2s.  

The first reports the proportion of subjects whose “belief in evolution” was correctly predicted (based on whether the predicted probability for them was > or < 0.50): 62%.  

That's exactly the proportion of the sample that reports not to believe in evolution.

As a result, the "adjusted count R^2" is "0.00."  This statistic reflects the proportion of correct predictions the model makes in excess of the proportion one would have made by just predicting the most frequent outcome in the sample for all the cases.

Imagine a reasonably intelligent person were offered a prize for correctly “predicting” any study respondent’s “beliefs” knowing only that a majority of the sample purported not to accept science's account of the natural history of human beings.  Obviously, she’d “predict” that any given student “disbelieves” in evolution.  This “everyone disbelieves” model would have a predictive accuracy rate of 62% were it applied to the entire sample.

Knowing each respondent's CRT score would not enable that person to predict “beliefs” in evolution with any greater accuracy than that!  The students’ CRT scores, in other words, are useless, predictively speaking.

Here's a classification table that helps us to see exactly what's happening:


The CRT predictor, despite being "positive" & "significant," is so weak that the regression model that included it just threw up its hands and defaulted to the "everyone disbelieves” strategy.

The reason the “significant” difference in the CRT scores of believers & nonbelievers in the sample doesn’t support the author's conclusion-- that “analytic thinking consistently predicts endorsement of evolution”--is that the size of the effect isn’t nearly as big as it would have to be to furnish actual evidence for his hypothesis (something that one can pretty well guess is the case by just looking at the raw data).

Indeed, as the analysis I’ve just done illustrates, the observed effect is actually more consistent with the prediction that “CRT makes no goddam difference” in what people say they believe about the natural history of human beings.

Why the hell (excuse my French) would we expect any other result? As I’ve pointed out 17,333,246 times, answers to this facile survey question do not reflect respondents' science comprehension; they express their cultural identity!

But that's not a very good reply.  Empirical testing is all about looking for surprises, or at least holding oneself open to the possibility of being surprised by evidence that cuts against what one understands to be the truth. 

That didn't happen, however, in this particular case.

Actually, I should point out the author constructs two separate models: one relating CRT to the probability that someone will believe in “young earth creationism” as opposed to “evolution according to a divine plan”—something akin to “intelligent design”; and another relating CRT to the probability that someone will believe in “young earth creationism” as opposed “evolution without any divine agency”—science’s position. It seems odd to me to do that, given that the author's theory was that “analytic thinking tends to reduce belief in supernatural agents.”

So my model just looks at see whether CRT scores predict someone believes in science’s-view of evolution—man evolves without any guidance form or plan by God—vs. belief in any alternative account. That’s why there is a tiny discrepancy between my logit model’s "odds ratio" coefficient for CRT (OR = 1.23, p < 0.01) and the author’s (OR = 1.28, p < 0.01). 

But it doesn’t matter. The CRT scores are just as useless for predicting simply whether someone believes in “young earth” creationism versus either “intelligent design” or the modern synthesis. Thirty-three percent of the author’s Univ. Ky undergrad sample reported believing in “young earth creationism.” A model that regresses that “belief” on CRT classifies everyone in the sample as rejecting that position, and thus gets a predictive accuracy rate of 67%.

3.  What’s the question? 

So there you go: a snapshot of the pernicious vitality of the NHT fallacy in action.  A researcher who has in fact collected some very interesting data announces empirical support for a bunch of conclusions that aren’t supported by them.  What licenses him to do is a “statistically significant” difference between an observed result and a value—zero difference in mean CRT scores—that turns out to be way too small to support his hypothesis.

The relevant question, inferentially speaking, is,

How much more or less probable is it that we’d observe the reported difference in believer-nonbeliever CRT scores if differences in cognitive reflection do “predict” or “explain” evolution beliefs among Univ. Ky undergrads than if they don't?

That’s a super interesting problem, the sort one actually has use reflection to solve. It's one I hadn't thought hard enough about until engaging the author's interesting study results.  I wish the author, a genuinely smart guy, had thought about it in analyzing his data.

I’ll give this problem a shot myself “tomorrow.”

For now, my point is simply that the convention of treating "p < 0.05" as evidence in support of a study hypothesis is what prevents researchers from figuring out what question they should actually be posing to their data. 


Cohen, J. The Earth is Round (p < .05). Am Psychol 49, 997 - 1003 (1994).

Edwards, W., Lindman, H. & Savage, L.J. Bayesian Statistical Inference in Psychological Research. Psych Rev 70, 193 - 242 (1963).

Frederick, S. Cognitive Reflection and Decision Making. Journal of Economic Perspectives 19, 25-42 (2005).

Gigerenzer, G. Mindless statistics. Journal of Socio-Economics 33, 587-606 (2004).

Goodman, S.N. Toward evidence-based medical statistics. 2: The Bayes factor. Annals of internal medicine 130, 1005-1013 (1999).

Goodman, S.N. Towards Evidence-Based Medical Statistics. 1: The P Value Fallacy. Ann Int Med 130, 995 - 1004 (1999).

Rozeboom, W.W. The fallacy of the null-hypothesis significance test. Psychological bulletin 57, 416 (1960).



Resolved: System 2 should be abolished!

Big showdown between System 1 & System 2 on Thursday.  Come on down & root for your favorite team! 

My strategy is to talk super fast so that Shane Frederick doesn't have enough time to reflect on the nature of my arguments & spot the holes in them!  

For cool David Rand discussion of "intuition," "reflection," & cooperation & related forms of beneficence, see this cool piece from NY Times:



Weekend update: What is this "science of science communication" thing?

Get your copy before newstands sell out!



Are people more conservative when “primed for reflection” or when “primed for intuition”? Apparently both . . . . (or CRT & identity-protective reasoning Part 2^8)

1.  The obvious reason people disagree with me is because they just can’t think clearly! Right? Right?? Well, I don’t think so, but I could be wrong

As the 14 billion readers of this blog know, I’m interested in the relationship between cognition and political outlooks. Is there a connection between critical reasoning dispositions and left-right ideology? Does higher cognitive proficiency of one sort or another counteract the tendency of people to construe empirical data in a politically biased way?

The answer to both these questions,  the data I’ve collected persuades me, is, No.

But as I explained just the other day, if one gets how empirical proof works, then one understands that any conclusion one comes to is always provisional. What one “believes” about some matter that admits of empirical inquiry is just the position one judges to be most supported by the best available evidence now at hand.

2.  New evidence that liberals are in fact “more reflective” than conservatives?

So I was excited to see the paper “Reflective liberals and intuitive conservatives: A look at the Cognitive Reflection Test and ideology,” Judgment and Decision Making, July 2015, pp. 314–331, by Deppe, Gonzalez, Neiman, Jackson Pahlke, Smith & Hibbing. 

Deppe et al. report the results from a number of studies on critical reasoning and political ideology.  The one that got my attention was one in which Deppe et al. reported that they had found “moderately sized negative correlations between CRT scores and conservative issue preferences” in a “nationally representative” sample” (pp. 316, 320).

As explained 9,233 times on this blog, the CRT is the standard assessment instrument used to measure the disposition of individuals to engage in effortful, conscious “System 2” information processing as opposed to the intuitive, heuristic “System 1” sort associated with myriad cognitive biases (Frederick 2005).

It was really really important, Deppe et al. recognized, to use a stratified general population sample recruited by valid means to test the relationship between political outlooks and CRT. 

Various other studies, they noted, had relied on samples that don’t support valid inferences the relationship between cognitive style and political outlooks. These included M Turk workers, whose scores on the CRT are unrealistically high (likely b/c they’ve been repeatedly exposed to it); who underrepresent conservatives, and thus necessarily include atypical ones; and who often turn out to be non-Americans disguising their identities (Chandler,  Mueller, & Paolacci 2014; Krukpnikov & Levine 2014; Shapiro,Chandler, & Mueller 2013).

Other scholars, Deppe et al. noted, have constructed samples from “visitors to a web site” on cognition and moral values who were expressly solicited to participate in studies in exchange for finding out about the relationship between the two in themselvesAs a reflective colleague pointed out, this not particularly reflective sampling method is akin to polling visitors to try to figure out what the frequency of “liking football” is among different groups in the general population.

The one study Deppe et al. could find that used a valid general population sample to examine the correlation between CRT scores and right-left political outlooks was one I had done (Kahan 2013).  And mine, they noted, had found no meaningful correlation.

Deppe et al. attributed the likely difference in our results to the way in which they & I measured political orientations.  I used a composite measure that combined responses to standard, multi-point conservative-liberal ideology and party self-identification measures.  But  “self-reported ideology,” they observed, “is well-known to be a highly imperfect indicator of individual issue preferences.”

Nixon reacts w/ shock to Deppe et al. study finding that conservs are unreflectiveSo instead they measured such preferences, soliciting their subjects responses to a variety of specific policies, including gay marriage, torture of terrorist subjects, government health insurance, and government price controls (a goody but oldie; “liberal” Richard Nixon was the last US President to resort to this policy).

On the basis of these responses they formed separate “Economic,” “Moral,” and  “Punishment” “conservative policy-preference” scales.  The latter two, but not the former, had a negative correlation with CRT, as did a respectably reliable scale (α =0.69) that aggregated all of these positions.

Having collected data from a Knowledge Networks sample “to determine if the findings” they obtained with M Turk workers “held up in a more representative sample” (p. 319), they heralded this result as  “offer[ing] clear and consistent support to the idea that liberals are more likely to be reflective compared to conservatives.”

That’s pretty interesting! 

So I decided I should for sure to take the study into account in my own perpetual weighing of the evidence on how critical reasoning relates to political outlooks and comparable indicators of cultural identity.

I downloaded their data from JDM website with the intention of looking it over and then seeing if I could replicate their findings with nationally representative datasets of my own that had liberal and conservative policy positions and CRT scores.

Well, I was in fact able to replicate the results in the Deppe et al. data. 

However, what I ended up replicating were results materially different from what Deppe et al. had  actually reported. . . .

3.  Unreported data from a failed “priming” experiment: System 2 reasoners get more conservative when primed to be “reflective” and when primed to be “intuitive”!

Deppe et al. had collected their CRT and political-position data as part of a “priming” experiment.  The idea was to see if subjects’ political outlooks became more or less conservative when induced or Full results from TESS/Knowledge Networks sample (study 2). Click to inspect--very strange indeed!“primed” to rely either on “reflection,” of the sort associated with System 2 reasoning, or on “intuition,” of the sort associated with System 1.

They thus assigned 2/3 of their subjects randomly to distinct “reflection” and “intuition” conditions. Both were given word-unscrambling puzzles that involved dropping one of five words and using the other four to form a sentence.  The sentences that a person could construct in the “reflection” condition emphasized use of reflective reasoning (e.g., “analyze the numbers carefully”; “I think all day”), while those in the “intuition” condition emphasized the use of intuitive” reasoning (e.g., “Go with your gut”; “she used her instinct”).

The remaining 1/3 of the sample got a “neutral prime”: a puzzle that consisted of dropping and unscrambling words to form statements having nothing to do with either reflection or intuition (e.g., “the sky is blue”; “he rode the train”).

Deppe et al.’s hypothesis was that “subjects receiving an intuitive prime w[ould] report more conservative attitudes” and those  “receiving a reflective prime . . . more liberal attitudes,” relative to “those receiving a “neutral prime.”

Well, the experiment didn’t exactly come out as planned.  Statistical analyses, they reported  (p. 320),

show[ed] no differences in the number of correct CRT answers provided by the subjects between any group, indicating that the priming protocol manipulation . . . failed to induce any higher or lower amounts of reflection. With no differences in thinking style, again unsurprisingly, there were no statistically significant differences between the groups on self-reported ideology  or issue attitudes.

But I discovered that the results were actually way more interesting that!

There may have been “no differences” in the CRT scores and “conservative issue preferences” of subjects assigned to different conditions, but it’s not true there were no differences in the correlation between these two variables in the various conditions: in both the “reflection” and “intuition” conditions, subjects scoring higher on the CRT adopted “significantly” more conservative policy stances than their counterparts in the “neutral priming” condition! By the same token, subjects scoring lower in CRT necessarily became more liberal in their policy stances in the "reflection" & "intuition" conditions.

Wow!  That’s really weird!

If one took the experimental effect seriously, one would have to conclude that priming individuals for “reflection” makes those who are the most capable and motivated to use System 2 reasoning (the conscious, effortful, analytic type) become more conservative--and that priming these same persons for “intuition” makes them more conservative too!

4.  True result in Deppe et. al: “more representative sample” fails to “replicate” negative correlation between conservative policy positions and CRT!

Deppe et al. don’t report this result.  Likely they concluded, quite reasonably, that this whacky, atheoretical outcome was just noise, and that the only thing that mattered was that the priming experiment just didn’t work (same for the ones they attempted on M Turk workers, and same for a whole bunch of “replications” of classic studies in this genre).

But here’s the rub.

The “moderately sized negative correlation[] between CRT scores and conservative issue preferences overall” that Deppe et al. report finding in their "nationally representative" sample (p. 319) was based only on subjects in the “neutral prime” condition.

As I just explained, relative to the “neutral priming” condition, there was a positive relationship "between CRT scores and conservative issue preferences overall" in both the “reflection” and “intuition priming” conditions.

If Deppe et al. had included the subjects from the latter two conditions in their analysis of the results of study 2, they wouldn’t have detected any meaningful correlation –positive or negative—“between CRT scores and conservative issue preferences overall” in their critical “more representative sample.

It doesn’t take a ton of reflection to see why, under these circumstances, it is simply wrong to characterize the results in study 2 as furnishing “correlational evidence to support the hypothesis that higher CRT scores are associated with being liberal.”

For purposes of assessing how CRT and conservatism relate to one another, being assigned to the "neutral priming" condition was no more or less a "treatment" than being assigned to the “intuition" and "reflection" conditions.  The subjects in the "neutral prime" condition did a word puzzle—just as the subjects in the other treatments did.  Insofar as the experimental assignment didn't didn't generate "differences in the number of correct CRT answers" or in "issue attitudes" between the conditions (p. 320), then either no one was treated for practical purposes or everyone was but in the same way: by being assigned to do a word puzzle that had no effect on ideology or CRT scores.

That's more like it, says Tricky Dick!Of course, the correlations between conservative policy positions and CRT did differ between conditions.  As I pointed out, Deppe et al. understandably chose not to report that their “priming” experiment had "caused" individuals high in System 2 reasoning capacity to become more conservative (and those low in System 2 reasoning correspondingly more liberal) both when “primed” for “reflection” and when “primed” for intuition.  The more sensible interpretation of their weird data was that the priming manipulation had no meaningful effect on either conservativism or CRT scores. 

But if one takes that very reasonable view, then it is unreasonable to treat the CRT-conservatism relationship in the “neutral priming” condition as if it alone were the “untreated” or “true” one.

If the effects of experimental assignments are viewed  simply as noise—as I agree they should be!—then the correct way to assess the relationship between CRT & conservatism in study 2 is to consider the responses of subjects from all  three conditions

An alternative that would be weird but at least fully transparent would be to say that “in 2 out of 3 ‘subsamples,’ ” the “more representative sample” failed to “replicate” the negative conservative-CRT correlation observed in their M Turk samples.

But the one thing that it surely isn’t justifiable is to divide the sample into 3 & then report the data from the one subsample that happens to support the authors' hypothesis -- that conservatism & CRT are negatively correlated -- while simply ignoring the contrary results in the other two. 

I’m 100% sure this wasn’t Deppe et al.’s intent, but by only partially reporting the data from their "nationally representative sample" Deppe et al. have unquestionably created a misimpression.  There's just no chance any reader would ever have guessed that the data looked like this given their description of the results—and no way a reader apprised of the real results would ever agree that their "more representative sample" had "replicated" their M Turk sample finding of a “negative correlation[] between CRT scores and conservative issue preferences overall” (p. 320).

5. Replicating Deppe et. al.

As I said, I was intrigued by Deppe et al.’s claim that they had found a negative correlation between conservative policy positions and CRT scores and wanted to see if I could replicate their finding in my own data set.

It turns out their study didn’t find the negative correlation they reported, though, when one includes responses of the 2/3 of the subjects unjustifiably omitted from their analysis of the relationship between CRT scores and conservative policy positions.

Well, I didn’t find any such correlation either when I performed a comparable data analysis on a large (N = 1600) nationally representative CCP (YouGov) study sample from 2012—one in which subjects hadn’t been assigned to do any sort of word-unscrambling puzzle before taking the CRT.

In my sample, subjects responded to this “issues positions” battery:

The responses formed two distinct factors, one suggesting a disposition to support or oppose legalization of prostitution and legalization of marijuana, and the other a disposition to support or oppose liberal policy positions on the remaining issues except for resumption of the draft, which loaded on neither factor.

Reversing the signs of the factor scores, I suppose one could characterize these as “social” and “economic_plus” conservativism respectively .

Both had very very small but “significant” correlations with CRT. 

bivariate correlations between CRT and "conservative overall" and subdomains in nationally representative CCP/YouGov sample. Z_conservrepub is composite scale comprising liberal-conservative ideology and partisan self-id (α = 0.82).But the signs were in opposing directions:  Economic_plus: r =  0.06, p < 0.05; and Social, r = -0.14, p < 0.01.

Not surprisingly, then, these two canceled each other out (r = -0.01, p = 0.80) when one examined “conservative policy positions overall”—i.e., all the policy positions aggregated into a single scale (α = 0.80).

That is exactly what I found, too, when I included the 2/3 of the subjects that Deppe et al. excluded from their report of the correlation between CRT and conservative policy positions in Study 2.  That is, if one takes their conservative subdomain scales as Deppe et al. formed them, there is a small negative correlation between CRT and “Punishment” conservativism ( r = -0.13, p < 0.01) but a small positive one (r = 0.17, p < 0.01) between CRT and “Economic conservativism.”

There is another, even smaller negative correlation between CRT and the “Moral” conservative policy position scale (r = - 0.08, p = 0.08).

Bivariate correlations in Deppe et al. TESS/Knoweldge Networks sample overallOverall, these tiny correlations all wash out (“conservative issue preferences overall”: r = -0.01, p = 0.76).

That—and not any deficiency in conventional left-right ideology measures (ones routinely used by the “neo-authoritarian personality” scholars (Jost et al 2003) that Deppe et al. cite their own study as supporting)— also explains why there is zero correlation between CRT and liberal-conservative ideology and partisan self-identification.

In any event, when one  simply looks at all the data in a fair-minded way, one is left with nothing—and hence nothing that supplies anyone with any reason to revise his or her views on the relationship between political outlooks and critical reasoning capacities.

6. Yucky NHT--again

One last point, again on the vices of “null hypothesis testing.”

Because they were so focused on their priming experiment non-result, I’m sure it just didn’t occur to Deppe et al. that it made no sense for them to exclude 2/3 of their sample when computing the relationship between conservativism and CRT scores in Study 2.

But here’s something I think they really should have thought a bit more about. . . . Even if the results in their study were exactly as they reported, the correlations were so trivially small that they could not, in my view, reasonably support a conclusion so strong (not to mention so clearly demeaning for 50% of the U.S. population!) as

We find a consistent pattern showing that those more likely to engage in reflection are more likely to have liberal political attitudes while those less likely to do so are more likely to have conservative attitudes....

...The results of the studies reported above offer clear and consistent support to the idea that liberals are more likely to be reflective compared to conservatives....

 I’ll say more about that “tomorrow,” when I return to a theme briefly touched on a couple days ago on the common NHT fallacy that statistical “significance” conveys information on the weight of the evidence in relation to a study hypothesis.


Chandler, J., Mueller, P. & Paolacci, G. Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior research methods 46, 112-130 (2014).

Deppe, K.D., Gonzalez, F.J., Neiman, J.L., Jacobs, C., Pahlke, J., Smith, K.B. & Hibbing, J.R. Reflective liberals and intuitive conservatives: A look at the Cognitive Reflection Test and ideology. Judgment and Decision Making 10, 314-331 (2015).

Frederick, S. Cognitive Reflection and Decision Making. Journal of Economic Perspectives 19, 25-42 (2005).

Jost, J.T., Glaser, J., Kruglanski, A.W. & Sulloway, F.J. Political Conservatism as Motivated Social Cognition. Psych. Bull. 129, 339-375 (2003).

Kahan, D.M. Ideology, Motivated Reasoning, and Cognitive Reflection. Judgment and Decision Making 8, 407-424 (2013).

Krupnikov, Y. & Levine, A.S. Cross-Sample Comparisons and External Validity. Journal of Experimental Political Science 1, 59-80 (2014). 

Shapiro, D.N., Chandler, J. & Mueller, P.A. Using Mechanical Turk to Study Clinical Populations. Clinical Psychological Science 1, 213-220 (2013).


"I told you -- the ball cost 5 cents!"



Weekend update: Who really did write the CRT-evolution paper then?

So Will Gervais has a very artful response to my post on his evolution-CRT paper.

The gist of it is that I mischarcterized his views -- that I was addressing some other "Will Gervais," who subscribes to positions wholly unrelated to his.

For sure people should read (a) his paper, (b) my post, & (c) his blog, so they can form their own view.

But I have to say that I find Will's eagerness to distance himself from the position I attributed to him perplexing.

Gervais (I think it was him!) wrote in Cognition

Many supernatural beliefs come easily to people, perhaps because they are supported by a variety of core intuitive processes. As with creationism, reliably developing intuitions support the mental representation of supernatural agents, such as God. However, dual process approaches to cognition suggest that at times people are able to analytically inhibit or override their intuitions.

[P]eople who are more willing or able to engage analytic thinking might be more likely to endorse evolution than people who tend to trust their intuitions. If true, then measures of analytic thinking should predict greater endorsement of evolution. In the present paper, two large studies tested this core hypothesis.

He concludes that his data support this conjecture:

Two studies revealed that—consistent with dual process approaches to cog nition in general, and supernatural cognition in particular—an analytic cognitive style predicts increased endorsement of evolution. Reliably developing intuitions may give creationist views an early cognitive advantage. This early advantage also is likely bolstered by early  enculturation advantages for  creationist, rather than evolutionary, concepts in many cultural contexts. However,  individuals who are better able to analytically control their thoughts are more likely to eventually endorse evolution’s role in the diversity of life and the origin of our species.

Re-analyzing his data, and primarily just showing what the actual raw data look like, I argued that the results of his study didn't support his hypothesis.  That they didn't come anywhere close to supporting it.  

The impact of the disposition to rely on "analytic" as opposed to "intuitive" thinking (measured by the CRT) was "statistically significant" but practically irrelevant. Even the most "analytic" thinkers in Gervais's sample did not endorse a conception of evolution free of divine agency--i.e., did not accept science's own conception of evolution as reflected in the modern synthesis.

The "Will Gervais" who wrote the very interesting Cognition paper states "analytic thinking consistently predicts endorsement of evolution."

But it doesn't. The (very modest incremental) effect of CRT on increased endorsement of evolution was confined to relatively non-religious subjects. Among relatively religious individuals, those who displayed the highest degree of cognitive reflection weren't any more likely to endorse science's account of the natural history of human beings than ones who scored the lowest.  

That's not what we'd expect to see if in fact disbelief in evoultion reflected a deficit in the capacity and motivation to engage in System 2 reasoning.

This result is consistent, however, with an alterative hypothesis.  At least modestly supported by existing researchthis rival position denies that cognitive reflection is something antagonistic to formation of and persistence in culturally identity-defining beliefs that are opposed to scientific evidence.

On the contrary, according to this theory, individuals will use all of the cognitive resources at their disposal to form and persist in beliefs that express their cultural identities on facts that come to symbolize their group allegiances. We should thus expect those most proficient in conscious, effortful, "System 2" analytic reasoning to be even more divided on issues like climate change & evolution than those inclined to rely on "intuitive" System 1 reasoning.

Gervais's data lends more support to that hypothesis than to what he describes as his own "core hypothesis": that "measures of analytic thinking should predict greater endorsement of evolution." 

I'm pretty sure that's all I said in my post, so I'm confused about why Gervais thinks I was mischaracterizing him (maybe he was blogging about another "Dan Kahan"?!).

Gervais complains that the media mischaracterized his study, too. So I took a look at the very impressive volume of press coverage the Cognition study generated.

For sure the media can get things horribly wrong, particularly when a researcher is reporting on how cognitive biases can influence perceptions of disputed issues in science.

But here, I think the media got it right.  Or at least they accurately reported the finding that the "Will Gervais" who authored the article in Cognition unambiguously purported to make: "individuals who are more prone and/or able to engage in analytic thinking to override their intuitions were more likely to endorse evolution."

So I'm really curious now to know who that "Will Gervais" is.  I'd also like to know what the Will Gervais who responded to me in his blog post thinks about that other Will Gervais' Cognition study; I gather he (the blog-post author Gervais) is largely in agreement with me that that the Cognition study drew conclusions not supported by the data that Gervais (not sure at this point which one) uploaded to the Cognition site.

Finally and most important of all, I'd really really like to know what the Gervais who wrote the Cognition article has to say in response to to the substance of points I made.  

The questions the study addressed are really interesting & important. They are also hard; he might point out that there's something I missed--or some additional insight to be gained from the data on the relative strengths of his hypothesis and mine--in which case, I'd like to know that!

I hope that Will Gervais joins the discussion, too.

(Note: I'm closing off comments here; readers should post their responses in the comment thread for my original post-- a more sensible place, I think, for discussion. By all means respond if you have thoughts!)


So we know we can't defeat entropy; but what about overplotting????

I had some correspondence off-line with loyal listener @Steve (aka @sjgenco) about the classic "what does a valid measure of climate-change risk-perceptions look like graph?"  Inspired by loyal listner @FrankL (now that they've finally discovered " missing Malaysia Airlines Flight MH370"--or at least a piece of it--maybe someone will find @FrankL, or at least a piece of him, too), the WDVMCCRLLG graphic has of course achieved iconic status and is pretty much ubiquitous in popular culture.

But it is pretty darn old. Isn't it time for something new? Can't we do better?

Yes, it's  comforting familiarity, its association with memorable moments both personal and worldhistorical, will likely motivate loud howls of protest, at least initially.

But everything, no matter how wonderful, admits of incremental improvement as human knowledge continues to expand as a result of science and improved sports drink formulas.

In response to @Steve's inquiry, I revealed the secret formula for generating the graphic. When Steve said he wasn't enamored of "jitters" as a way to handle overplotting & preferred "bubbles" scaled to reflect observation densities, I directed @Steve to a CCP dataset he could use (one posted with "codebook" the last time the CCP blog was the site for a furious display of graphic genius on the part of @thompn4) to perfect his own improvements.

Here's what he wrote back: 

Hi Dan,
I've been playing around with jitters in R. I like your Gervais jitters. Keeping the clouds more separate helps. That's harder to do when your x-var is continuous, like your libcon variable in your "challenge" dataset.
Your dataset was like catnip so I've squandered a couple of days trying to brush up on my R to see if I could implement my bubble plot idea with your data. For what it's worth, I seem to have succeeded so I thought I'd forward my results. (I use RStudio, btw, I highly recommend it.)
First, I was able to replicate your colored jitter charts in R (seems to require less code than in stata). Here's gwrisk by libcon (making the points 50% transparent also helps highlight the clustering imho):
When I figured out how to put bubbles representing the frequency of responses around each datapoint on the same plot, it looked like this:
It does show the densities nicely, I think. For comparison, here's the bubble plot for scicomp by gwrisk:
You can really see that scicomp clusters in the middle vs. libcon, and how those densities are going to generate a flat regression.
You can also combine the two plots, which is kind of interesting:
Note how the jittering on libcon stretches out the values along the x-axis. There actually aren't any "real" values above 2 or below -2.
I've attached a PPT with all my results, a commented R script for running the plots, and the Rdata image I created for inputting the data.
It was a good excuse for digging into R again. 

So what do people think? Time to retire WDVMCCRLLG? Time to adopt one of @Steve's alternatives as the new symbol of the Un-United States of Risk Perception?

Voice your opinoin --as with everything else relating to this blog, matters will be decided by a democratic vote of the site's 14 billion regular readers -- and by all means try your own hand at devising a graphic that conveys the information in WDVMCCRLLG in an even more compelling, cool way!

And if you want, you can go back to  @thompn4's project to create the perfect 3D graphic presentation that incorporates in addition the impact of science comprehension in magnifying polarization over climate change risk.

I'd offer one of our standard CCP prizes, but obviously the fame of being the originator of the successor of WDVMCCRLLG is incentive enough!

Manny models WDVMCCRLLG high fashionWDVMCCRLLG as backdrop for dramatic & inspired (but ultimately failed) gesture to heal the nation's wounds


Cognitive reflection and "belief in" evolution: critically engaging the evidence

1.   Two hypotheses on "disbelief in evolution"

Why do 45% or so of Americans consistently say they don’t “believe” humans evolved from an earlier species?

How come about only one-third of them say they accept a conception of evolution—science’s conception—that features  mechanisms of natural selection, random mutation, and genetic variance  (the modern synthesis) as opposed to an alternative religious one that asserts a “supreme being guided the evolution of living things for the purpose of creating humans and other life in the form it exists today?”

These questions get asked constantly. Makes sense: they’re complicated,  and also extremely consequential for the status of science in a liberal democratic society.

One popular answer attributes “disbelief in” evolution to a defecit in critical reasoning that interferes with people’s ability to recognize or accept scientific evidence.  I’ve referred to this in other contexts as the “public irrationality thesis” (PIT) (Kahan in press).

Actually, I think PIT, while a plausible enough conjecture, is itself contrary to weight of the scientific evidence on who believes what and why about human evolution.  

It’s well established that there is no meaningful correlation between what a person says he or she “believes” about evolution and having the rudimentary understanding of natural selection, random mutation, and genetic variance necessary to pass a high school biology exam (Bishop & Anderson 1990; Shtulman 2006).

Click on it! Item repsonse profiles rock!There is a correlation between “belief” in evolution and possession of the kinds of substantive knowledge and reasoning skills essential to science comprehension generally.  

But what the correlation is depends on religiosity: a relatively nonreligious person is more likely to say he or she “believes in” evolution, but a relatively religious person less likely to do so, as their science comprehension capacity goes up (Kahan 2015).

That’s what “belief in” evolution of the sort measured in a survey item signifies: who one is, not what one knows. 

Americans don’t disagree about evolution because they have different understandings of or commitments to science.  They disagree because they subscribe to competing cultural worldviews that invest positions on evolution with identity-expressive significance. 

As with the climate change debate, the contours and depth of the divide on evolution are a testament not to defects in human rationality but to the adroit use of it by individuals to conform their “beliefs” to the ones that signal their allegiance to groups engaged in a (demeaning, illiberal, and unnecessary) form of cultural status compeitition.  

Call this the “expressive rationality thesis” (ERT). It's what I believe—on the basis of my understanding of the best currently available evidence (Kahan 2015).

2. New evidence for PIT?

But if one gets how science works, then one knows that all one’s positions—all of one’s “beliefs”—about empirical issues are provisional.  If I encounter evidence contrary to the view I just stated, I’ll revise my beliefs on that accordingly (I’ve done it before; it doesn’t hurt!).

So I happily sat down last weekend to read Gervais., W, “Override the controversy: Analytic thinking predicts endorsement of religion,” Cognition, 142, 312-321 (2015).

Gervais is a super smart psychologist at the University of Kentucky. He's done a number of interesting and important studies that I think are really cool, including one  that shows that people engage in biased information processing to gratify their animus against atheists (Garvais, Shariff & Norenzayan 2011), and another that reports a negative association between critical reasoning and religiosity (Gervaise & Norenzayan 2012). 

In this latest study, Gervais correlated the scores of two samples of Univ. of Kentucky undergrads on the Cognitive Reflection Test (CRT) and their beliefs on evolution.

As discussed in 327 previous posts, the CRT  is regarded as the premiere measure of the capacity and disposition to use conscious, effortful, “System 2” information processing as opposed to unconscious, heuristic “System 1” processing, the sort that tends to be at the root of various cognitive miscues, from confirmation bias to the gambler's fallacy, from base rate neglect to covariance non-detection (Frederick 2005).

Gervais hypothesized that disbelief in evolution is associated with overreliance on “intuitive” or heuristic “System 1” forms of information processing as opposed to conscious or “analytic” “System 2” forms.

 “[M]any scientific concepts are difficult for people to grasp intuitively while supernatural concepts may come more easily,” he explains.

From a young age, children view things in the world as existing for a reason; they view objects as serving functions. This promiscuous teleology persists into adulthood, even among those with advanced scientific training. Further, functionally specialized features of animals (such as a zebra’s stripes or a kangaroo’s tail) are viewed as inherently characteristics of an animal’s ‘‘kind,’’ perhaps implying a deeper and more temporally stable essence of the animal. If objects in the world, including living things, are intuitively imbued with function and purpose, it seems a small step to viewing them as intentionally designed by some external agent. . ..

Given that children and adults alike share the intuition that objects in the world, including living things, serve functions and exist for purposes, they may infer intentional agency behind intuited purpose.

Finding a negative correlation between CRT and belief in evolution, he treats the results of his study as supporting the hypothesis that “analytic thinking consistently predicts endorsement of evolution.” 

Because the influence of CRT persists after the inclusion of religiosity covariates, Gervais concludes that the “cultural” influence of religiosity, while not irrelevant, is “less robust” an explanation for “disbelief in” evolution than overreliance on heuristic reasoning.

In sum, Gervais is offering up what he regards as strong evidence for PIT.

3. Weighing Gervais’s evidence

So what do I think now?

I think Gervais's data are really cool and add to the stock of evidence that it makes sense to assess in connection with competing conjectures on the source of variance in belief in evolution.

But in fact, I don’t think the study results furnish any support for PIT! On the contrary, on close examination I think they more strongly support the alternative expressive rationality thesis (ERT).

a. Just look at the data. To begin, the correlation that Gervais reports between CRT scores and disbelief in evolution  actually belies his conclusion.

Sure, the correlation is “statistically significant.” But that just tells us we wouldn’t expect to find an effect as big as or bigger than that if the true correlation were zero.  The question we are interested in is whether the effect is as big as PIT implies it should be.

The answer is no way!

People familiar with logistic regression would probably have an inkling of this when Gervais reports that the “odds ratio” coefficient for CRT is a mere 1.3. An odds ratio of “1” means that there is no effect—and 1.3 isn’t much different from 1.

But researcher shouldn’t presuppose readers have “inklings,” much less leave them with nothing more to go on.  They should graphically display the data in a way that makes their practical effect amenable to reasoned assessment by any reflective person.

The simplest way to do that is to look at the raw data here.

Admirably, Gervais posted his data to his website.  Here’s a scatterplot that helps convey what the “OR = 1.3” finding means as a practical matter:

These scatter plots relate CRT to endorsement of the modern synthesis position as opposed to either “new earth creationism” or a “divine agency” conception of evolution in which a “supreme being guided the evolution of living things for the purpose of creating humans and other life in the form it exists today."

I think that’s the right comparison if we are trying to assess Gervais's conjecture that overreliance on System 1 reasoning accounts for the stubbornness of “the intuition that objects in the world, including living things, serve functions and exist for purposes" reflecting "intentional agency." But the picture is pretty much the same when we look at how CRT relates to endorsement of the proposition that “God created human beings pretty much in the present form at one no part in guided the present time within the last 10,000 years or so."

Sure, there’s a modest uptick in belief in evolution as CRT increases.

But even those extremely reflective "3's"--a decided majority of whom attribute the natural history of human beings to divine agency-- don't exactly look like a sample of Richard Dawkinses to me!

Gervais states that these “results suggest that it does not take a great deal of analytic thinking to overcome creationist intuitions.”

But in fact they show that, at least for the overwhelming majority of University of Kentucky undergrads, it would take an amount that far exceeds the maximum value on the CRT scale!

This just isn't the picture one would expect to see if resistance to science's account of evolution was a consequence of overreliance on heuristic or System 1 reasoning.

b. Test the alternative hypothesis!  Even more important, the data do look like what you’d expect if the expressive rationality thesis (ERT) explained “belief”/“disbelief” in evolution. 

ERT posits that individuals will use their reason to fit their beliefs to the ones that predominate in their cultural group (Kahan 2013).  As explained, existing evidence is consistent with that: it shows that individuals who have a cultural style that features modest Mmm raw data! Always demand a helping of it when served statistically processed datareligiosity become more likely, but those with one that features strong religiosity less likely, to profess belief in evolution.

The way to test for such an effect is not to put religion into a multivariate model as a “control” as Gervais did,  but to examine whether there is an interaction between religiosity and CRT such that the effect of the latter depends on the level of the former.

Here’s what what that interaction looks like in a regression model of "belief in evolution" for a general population sample, in which religiosity is measured with a composite scale reflecting self-reported church attendance, frequency of prayer, and importance of religion in one’s life (α = 0.80):

If we look, we can find the same interaction in Gervais’s data. 

This figure graphically displays output of a regression model that uses Study 1’s 7-point “belief in God” scale.


The modest impact of CRT in the sample as a whole is driven entirely by its effect on relatively less religious subjects.

yummy! raw data for regression model above!

Study 2 has a “belief in God” measure, too, scaled 1-100.  One-hundred point measures are a very bad idea; they aren’t going to measure variance any better than a 10-point (or probably even 7-point) one, but are going to have tons of noise in them.

The study also had a 7-point church attendance measure, so I combined these two into a scale.

Here’s what the raw data look like when we examine how CRT relates to acceptance of the modern synthesis position on evolution in Study 2:

Once more, it's plain to see that CRT isn't having any effect on subjects above average in religiosity.  The interaction is there in the regression model, too, but because of the wobbly religiosity measure and smaller sample the model is underpowered (b = -.36, p = 0.07, for "theistic evoluition vs "creationism"; b = -.39, p = 0.19, for "naturalistic" vs. "creationism"). (Actually, if one just uses the 100-point "belief in God" measure, there it is, "statistically significant"--for those who view p < 0.05 as having talismanic significance!)

Contrary to what Gervais concludes from his analyses, then, the evidence doesn’t in fact show a “consistent pattern whereby individuals who are more prone and/or able to engage in analytic thinking” use that capacity to “override” the “intuition objects in the world, including living things, serve functions and exist for purposes” reflecting “intentional agency” in their creation.

We see that “pattern” consistently only in non-religious individuals.

That’s what ERT predicts: as individuals become more cognitively proficient, they become even more successful at forming and persisting in beliefs that express their identity.

I think Gervais missed this because he didn’t structure his analyeses to assess the relative support of his data for the most important rival hypothesis to his own.

In fairness, Gervais does advert to some analyses in his footnotes that might have led him to believe he could ruled out this view. E.g., he didn’t find an interaction between the predictors, he reports, when he regressed belief in evolution on CRT and a “religious upbringing” variable in study 1.  But that's hardly surprising: that variable was dichotomous and answered affirmatively by 75% of the subjects; it doesn’t have enough variance, and hence enough statistical power, to detect a meaningful interaction.

In study 2, Gervais administered a nonstandard collection of variables he calls “CREDS,” or “credibility enhancing displays.”  Unfortunately, the item wording wasn't specified in the paper, but Gervaise describes them as measuring variance in “believing” in and “acting” on “supernatural beliefs.”  

Gervais reports that the CREDS had only a modest correlation with disbelief in evolution, and also didn’t interact with religiosity when included as predictors of CRT.  I really don’t know what to say about that, except that the discrepancy in the performance of the CRED items, on the one hand, and the Belief in God and church attendance ones, on the other, make me skeptical about what the former is measuring.

I think Gervais should have displayed a bit more skepticism too before he concluded that his data supported PIT.

5.  Limits of Yucky NHT

One last point, this one on methods.

The problem I have with Gervais’s paper is that it relies on an analytical strategy that doesn't test the weight of the evidence in his data in relation to hypotheses of consequence.  He tells us that he has found a “significant” correlation—but doesn’t show us that the effect observed supports the inference that his hypothesis depends on or rules out a contrary inference supportive of an alternative hypothesis .

These problems are intrinsic to so-called “null hypothesis testing.” Because the “null” is not usually a plausible hypothesis, and because “rejecting" it is often perfectly consistent with multiple competing hypotheses that are plausible, a testing strategy that aims only to “reject the null” will rarely give us any reason to revise our prior assessments of how the world works.

Good studies pit opposing hypotheses against each other in designs where the result, whatever it is, is highly likely to give us more reason than we had before for crediting one over the other.

Gervais is a very good psychologist, whose previous studies definitely reflect this strategy. This one, in my view, wasn’t as well designed—or at least as well analyzed—as his previous ones.

Or maybe I'm missing something, and he or someone else will helpfully tell me what that is!

But no matter what, given the balance of the evidence, I remain as convinced that Gervais is a superb scholar as I am that PIT doesn't explain conflicts over evolution, climate change, and other culturally contested science issues in the U.S.


Bishop, B.A. & Anderson, C.W. Student conceptions of natural selection and its role in evolution. Journal of Research in Science Teaching 27, 415-427 (1990).

Frederick, S. (2005). Cognitive Reflection and Decision Making. Journal of Economic Perspectives, 19(4), 25-42.

Gervais, W. M. (2015). Override the controversy: Analytic thinking predicts endorsement of evolution. Cognition, 142, 312-321. doi:

Gervais, W. M., & Norenzayan, A. (2012). Analytic Thinking Promotes Religious Disbelief. Science, 336(6080), 493-496. doi: 10.1126/science.1215647

Gervais, W. M., Shariff, A. F., & Norenzayan, A. (2011). Do you believe in atheists? Distrust is central to anti-atheist prejudice. Journal of Personality and Social Psychology, 101(6), 1189.

Kahan, D.M. Climate-Science Communication and the Measurement Problem. Advances in Political Psychology 36, 1-43 (2015).

Kahan, D.M. Ideology, Motivated Reasoning, and Cognitive Reflection. Judgment and Decision Making 8, 407-424 (2013).

Kahan, D.M. What is the science of science communication?” J. Sci. Comm. (in press).

Shtulman, A. Qualitative differences between naïve and scientific theories of evolution. Cognitive Psychology 52, 170-194 (2006).



The science of science documentary filmmaking: the missing audience hypothesis

More on this, soon . . .

The scholarly and practical motivation behind the proposed research is to reconcile two facts about science documentary programming in American society. The first is that such programming has outstanding content. Programs like NOVA, Nature, and Frontline, among others, enable curious non-experts to participate in the thrill of discoveries attained through the most advanced forms of scientific inquiry. Second, the audience for these programs is modest and demographically distinctive. These viewers, television industry analyses consistently find, tend to be older, more affluent, and more educated than the general television audience. They are known to be less religious, and they are more likely to identify themselves as politically liberal.

Why is enjoyment of such excellent programming confined so disproportionately to this particular audience? The most straightforward explanation is that these are the only members of the public who are situated to comprehend and enjoy science documentary programming. They are the natural audience for programs like NOVA, whereas non-viewers simply are not interested in the content of science documentaries.

The professionals who produce such programs find this “natural audience” hypothesis unconvincing, and so do we. One reason to doubt the “natural audience” hypothesis is that it’s plainly not the case that appreciation of science is confined to individuals who fit the distinctive profile of typical PBS documentary viewers. Measures of attitudes such as interest in science and trust of scientists are not strongly associated with demographic variables (Gauchat, 2011) and in fact are highly positive across the entire population (National Science Board, 2014, ch. 7).

Another reason to question the “natural audience” explanation is the popularity of what might be called “reality TV” science programming. Mythbusters is a weekly show broadcast by the Discovery Channel that features the use of innovative, jury-rigged experiments to test popular lore (“would a penny dropped from the top of the Empire State building really penetrate the skull of a person on the sidewalk?”). Consistently among the top-rated primetime cable television programs among men 25-54 years of age (Good, 2010) , the show is broadly representative of a niche collection of successful shows  that feature real-life characters interacting in dramatic ways with technology or nature .

It would be impossible to explain the appeal of these programs if those who watch them did not find science and environmental TV shows entertaining. The protagonists of Mythbusters are not scientists, but they are using the mode of discovering truth—controlled experimentation—that is the signature of scientific inquiry. The show would not be such a tremendous success unless there was a broad popular audience that is exhilarated to observe such methods being used to satisfy curiosity about how the world works.

The audience for National Geographic Channel (co-owned by Fox Cable Networks) also serves an audience markedly different from PBS’s. Nat Geo’s series Wild Justice—a popular program that for four seasons chronicled the activities of California Game Wardens patrolling the wilds of the Sierra Nevada Mountains—testifies to its viewers’ fascination with nature and to their identification with the characters’ mission of protecting wildlife.

The reality-based science/nature genre is distinct from science documentary programming, which focuses on conveying the work of, and the insights generated by, professional scientists. But when combined with evidence of the breadth of curiosity about science across diverse segments of the population, including those from which these shows draw their principal viewers, the popularity of Mythbusters and like programs suggests an alternative explanation for the more limited appeal of science documentaries. We will call it the “excluded audience” hypothesis.

At least as striking as the difference in content between the reality-based shows, on the one hand, and science documentary programs, on the other, is the feel of them. Contrasting elements of the two—including the personality of the characters they feature, the dramatic quality of the situations they depict, and the narrative modes of presentation that they use—seem to fit the distinctive cultural styles of their audiences.

“The only difference between science and screwing around,” Mythbusters host Adam Savage once explained, “is when you write it down” (, 2012). This statement might well perplex one class of documentary viewers, who would cringe at the suggestion that, say, work being done to investigate conjectures on quantum gravity at the Hadron Collider is even remotely akin to “screwing around.”

But Savage’s statement no doubt made perfect sense—even thrilled—the person to whom it was made: a sixth grade girl, whose adulatory letter asked Savage and his co-host, “what did you want to be when you grow up, and what inspired you to be scientists?” When that girl grows up, she might well be a scientist. Even if she decides to do something else, there is every likelihood that she’ll have retained the disposition to experience wonder and awe (as Savage plainly has) at how science enlarges our knowledge.

But what is most likely of all is that she will still be the kind of person who was engaged by Mythbusters. Science documentaries that don’t resonate with that person’s outlooks will thus be highly unlikely to engage her.

The “excluded audience” hypothesis holds that the failure to find an idiom that can speak to the diversity of cultural styles that characterize citizens of a pluralistic society creates a barrier between science documentaries and a class of viewers, ones whose curiosity to participate in knowing what is known to science these programs could fully satisfy. The barrier takes the form of cues that viewers unconsciously use to determine if a program is “right” for someone with their distinctive experiences, values, and social ties (Kahan, Jenkins-Smith, Tarantola, Silva & Braman 2015).

If anything approaching a “law” has been established at this point by the nascent science of science communication, it is that hostile or antagonistic cultural meanings stifle cognitive engagement (Kahan, 2010; Nisbet, 2010). A better understanding of how science documentary programming can avoid conveying such meanings would allow them to make their shows more cognitively engaging to a larger segment of the population. The now missing audience would then be enabled to experience the thrill and wonder that such programs consistently allow their current audience to enjoy.


Gauchat, G. (2011). The cultural authority of science: Public trust and acceptance of organized science. Public Understanding of Science, 20(6), 751-770. doi: 10.1177/0963662510365246.

Kahan, D. (2010). Fixing the Communications Failure. Nature, 463, 296-297.

Kahan, D. M., Hank, J.-S., Tarantola, T., Silva, C., & Braman, D. (2015). Geoengineering and Climate Change Polarization: Testing a Two-Channel Model of Science Communication. Annals of the American Academy of Political and Social Science, 658, 192-222.

National Science Board. 2014. Science and Engineering Indicators 2014.Arlington, VA: National Science Foundation.

Nisbet, M. C. (2010). Framing Science: A New Paradigm of Public Engagement Communicating science. In L. Kahlor & P. Stout, (Eds.), New agendas in communication (pp. 40-67). New York: Routledge. (2012). MythBusters Adam Savage and Kari Byron on the Art of Science and Experimentation,


Perplexed--once more--by "emotions in criminal law," Part 3: Motivated reasoning & the evaluative conception

Okay, here's part 3 of the n-part series on my continuing perplexity over  the criminal law's understanding of emotions.

I started off with the fundamental question: what's really going on?

This is what one asks when one has to swim through the current of dissonant idioms on emotions that flow through judicial opinions:  of “highly respected” men of “good moral character,” possessing “ high conceptions of the sanctity of the home and the virtue of women,” in whom the “shock” of spousal infidelity would thus naturally trigger “temporary insanity” and a resulting “loss of control” over their “mental processes”; versus the “rounders and libertines,” whose own lack of virtue would surely inure them to the same “mind-unbalancing” effect of discovering immorality on the part of others.

It's also what one asks when one encounters the sort of selectivity courts display toward impassioned offenders: excusing the "true man" who resorts to lethal violence to protect the "sacredness of his person" rather than beat a cowardly retreat when "wrongfully assailed" in a place he has "every right to be" -- b/c after all, who thinks "rationally in the presence of an uplifted knife"?—while condemning the chronically battered woman who shoots her sleeping husband, because she was motivated not by the "primal impulse" of "self-preservation" but only by her perception that the alternative was a "life of the worst kind of torture and . . . degradation . . . ."


In the last part I offered an explanation, one advanced in a 1996 article I wrote w/ Martha Nussbaum, that I called the "two conceptions thesis" or TCT.

TCT identifies two positions on what emotions are and why they matter: the "mechanistic conception," which treats emotions as unreasoning forces or impulses that acquit an actor of moral responsibility in whole or in part because of their destructive effect on volition; and the "evaluative" conception, which sees actors' emotions as moral evaluations that can in turn can be evaluated in light of social norms that define who is entitled to what.

From voluntary manslaughter to duress, from self-defense to insanity--doctrines of criminal law all appear on casual inspection to reflect the "mechanistic conception."

But on reflection their legal elements create space for and thus demand the exercise of moral judgment, which decisionmakers inevitably exercise in the manner the evaluative conception envisions—by measuring the quality of the impassioned actor’s character, as revealed by his or her anger or fear or disgust.

That’s the account that in the 2011 essay “Two conceptions of two conceptions of emotion” I declared  I no longer found satisfactory.  The source of my doubts about it was the work I had done in the intervening time, mainly in collaboration with others, on cultural cognition, which to me suggested an alternative and likely more compelling answer to the “what is going on question”: not  conscious moral evaluation of the evaluations embodied in impassioned actors’ emotional motivations but rather the unconscious subversion of a genuine commitment to the normative theory (however cogent) that informs the rival mechanistic conception of emotion.

Below I reproduce form the 2011 essay the explanation for this shift in my understanding.

“Tomorrow” I’ll tell you why I now no longer have confidence in that view either. 

Because that’s what this whole series is about: repeatedly changing one’s mind. I don't think there's anything wrong with that; on the contrary, I thnk something is wrong when this doesn't to someone who is doing what one is supposed to as an empiricist: using valid methods of observation, measurement and inference to incrementally enlarge the stock of evidence available to adjudicate between competing plausible explanations for a matter of genuine complexity....


So what’s wrong with TCT? Despite its considerable explanatory power, TCT still leaves one obvious mystery unresolved: why is the mechanistic conception so conspicuous in the law? If it is merely a veneer, why are the decisionmakers covering things up? Why don’t they just say, in unmistakably clear terms, that they are evaluating the moral evaluations that that offenders’ (and sometimes’ victims') emotions embody?

My answer is that they aren’t covering up anything. I see this response as not so much an alternative to TCT, however, as an alternative to the version I have just described. I will call this alternative the cognitive conception of TCT (or C-TCT) and distinguish it from the standard one, which I will call the moral evaluation conception (ME-TCT).

To sharpen the relevant distinctions, consider three models of the role of emotions in criminal law (Figure 1). The first contemplates that decisionmakers’ perceptions of the impact of offenders’ emotions should (and does when decisionmakers aren’t being dishonest) determine outcomes wholly independent of any moral evaluations of the quality of those emotions. This the naïve mechanistic view that TCT seeks to discredit and that it aggressively critiques when articulated by conservative opponents of reforming traditional doctrines. In its place, ME-TCT asserts that outcomes in fact flow from decisionmakers’ evaluations of the moral quality of emotions independently of their perceptions of the impact of emotions on offenders’ volition. This is what I’m calling ME-TCT. C-TCT, in contrast, accepts that decisionmakers are honestly (at least in most cases) reaching outcomes based on their view of the volitional impact of emotions. However, in assessing the intensity of emotions, they are unconsciously conforming what they see—actually, their perception of something that they can’t literally see—to outcomes that reflect culturally congenial social meanings.

One reason that I find C-TCT more compelling than ME-TCT is that I can’t bring myself to take seriously any understanding of TCT that implies decisionmakers are being systematically disingenuous when they appeal to the mechanistic conception of emotion to explain their legal determinations. The idea that they might be secretly invoking it en masse in order to conceal their commitments to politically contestable evaluative norms is preposterous; there’s no way the ever-expanding number of insiders could maintain—or even be expected uniformly to want to maintain—such a conspiracy! The idea that they are being openly disingenuous—that they are winking and grinning as they turn loose the cuckold, the homophobe, or the battered woman—also doesn’t ring true. People just aren’t that cynical; on the contrary, anyone who has taught substantive criminal law to thoughtful people will see that they are as intensely earnest as they are divided about the mental lives of cuckolds, battered women, beleaguered subway car commuters, and all the others, a point that Mark Kelman has brilliantly explored.

Even more important, though, I find myself compelled to accept C-TCT by what I’ve learned about the phenomenon of motivated reasoning during the years since I co-authored Two Conceptions of Emotion in Criminal Law. Motivated reasoning refers to a complex of unconscious cognitive processes that converge to promote formation of factual beliefs that suit some end or need extrinsic to the actual truth of those beliefs. One such end is the stake individuals have in protecting their association with and status within groups united by their commitment to shared understandings of the best life and the ideal society.

In the course of an ongoing research project that I have had the good fortune to be a part of, my collaborators and I have studied on how this dynamic shapes perceptions of risk. People unconsciously search out and selectively credit information that supports beliefs that predominate in their cultural affinity groups; they turn to those who share their values, and whom they therefore trust, to certify what sorts of empirical claims they should believe; they even construe their first-hand experiences, including what they see and hear, to fit expectations that cohere with their defining group commitments. As a result, even when they agree on ends—safe streets, a clean environment, a prosperous economy—they end up culturally divided on the means of how to secure them.

Our research group has recently begun to use these methods to explain disagreement about legally consequential facts. We’ve found, for example, that people of diverse cultural outlooks form systematically different impressions when they view videotape evidence bearing on the degree of risk associated with a high-speed police car chase or on the intent of political demonstrators to intimidate passersby.

Much like the work I did earlier on emotions in criminal law, moreover, this work is part of a multi-faceted and dynamic scholarly conversation. Our work on cultural cognition and law builds on that of social psychologists such as Mark Alicke. More recently, too, other scholars, including Janice Nadler, and John Darley and Avani Sood have completed important studies supporting the likely impact of motivated reasoning on perceptions of legally consequential facts.

C-TCT flows naturally out of this work. The most plausible reason that the mechanistic conception is so conspicuous in the criminal law, on this view, is that ordinary people, including the ones who become judges, juries, and legislators, believe it. They believe (not without reason, including personal experience!) that volition-constraining affect is a signature element of emotion; they also accept that the intensity of such affective responses should have moral consequence akin to what doctrines informed by the mechanistic view seem to say they should. But in assessing one or another form of evidence that bears on offenders’ emotions, culturally diverse individuals unconsciously gravitate toward perceptions that connect them to and otherwise are congenial to persons who share their defining commitments.

There are two studies, in particular, that are supportive of this conclusion. One is a study that Donald Braman and I did, in which we found that mock jurors of opposing cultural outlooks formed opposing pro-defendant or pro-prosecution fact perceptions in a self-defense case involving a battered woman who killed her sleeping husband—and then flipped positions in one involving a beleaguered subway commuter who killed an African-American panhandler. Another study, by Nadler, found that extrinsic facts bearing on the moral quality of parties’ characters, influenced mock jurors’ perceptions of various facts, including intent and causation.

I certainly would not say that the verdict is in on the relative strength of C-TCT and SE-TCT. But I’m convinced the case can and should be decided by empirical proof, and that the weight of the evidence to date supports C-TCT.


Fall seminar: Law & Cognition

Will be offering this course in law school & psychology dept this fall:

Law & Cognition. The goal of this seminar will be to deepen participants' understanding of how legal decisionmakers--particularly judges and juries--think. We will compile an in-depth catalog of empirically grounded frameworks, including ones founded in behavioral economics, social psychology, and political science; relate these to historical and contemporary jurisprudential perspectives, such as "formalism," "legal realism," and the "legal process school"; and develop critical understandings of the logic and presuppositions of pertinent forms of proof--controlled experiments, observational studies, and neuroscience imaging, among others. Students will write short response papers on weekly readings.

I've taught the course before, but for sure I'll be updating the previous reading list, particularly in connection with the study of judicial decisionmaking, where there is now valid experimental alternatives to the observational studies of "judicial behavior" featured in political science.

The course is really pretty cool because it is equally valuable, in my view, for those who want to learn the "laws of cognition" (or at least the best current understandings of the mechanisms of them) & those who want to learn how cognitive dynamics shape the law. 

I advanced a theme similar to this to explain why law furnishes such a useful laboratory for studying cognitive science in Laws of Cognition and Cognition of Law, 135 Cognition 56 (2015),  which is a passable preview for this course.  

click me! pls!!!For sure we'll get to do fun things w/ little diagrams that relate various decisionmaking dynamics--from confirmation bias to motivated cognition, from the "story telling model" to "coherence based reasoning"-- to a straightforward Bayesian model of information processing!

I'm hoping, too, that this course can have a "virtual space," on-line counterpart.  That worked super well for last spring's Science of Science Communication seminar.

If anyone is eager to help facilitate the on-line counterpart, I'm happy to accommodate. Just send me an email!

I'll post various materials as they become available. But for now here is some more "course information":

General Information & Course Outline

A.  Nature of the Seminar

The focus of this seminar will be a set of interrelated frameworks for studying how legal decisionmakers think. These frameworks use concepts and methods from a variety of disciplines, including social psychology, behavioral economics, and political science. What unites—but also divides—them is their ambition to generate empirically grounded accounts of the various cognitive elements of legal decisionmaking: from values and motivations to perceptions and reasoning processes.

For our purposes, “legal decisionmakers” will mean mainly judges and jurors. Our aim will be to assess the contribution that the various frameworks make to explaining, predicting, and identifying means for improving the judgments of these actors. Because we will be interested in how the cognitive tendencies of these two groups of decisionmakers diverge, moreover, we will also afford some consideration to the professional(ized) habits of minds of lawyers more generally.

There are a number of things that we will not be examining in great detail. We will not be trying to identify how the study of cognition can be used to enhance the regulatory efficacy of the law, for example. Nor will we be examining the contribution that the study of cognition might make to improving the law’s use of forensic science. We will, of course, form some insights on these matters, for it is impossible to evaluate the cognitive functioning of legal decisionmakers without reference to its impact on the effectiveness of law and the accuracy of adjudication. But the limited duration of the seminar will prevent us from systematically assessing the relevance of the frameworks to these objectives—in large part because doing so adequately would require consideration of so many phenomena in addition to how legal decisionmakers think.

The seminar will also have a secondary objective: to form a working familiarity with the empirical methods featured in the study of cognition. We will not be designing studies or performing statistical analyses.  But we will be devoting time and attention to acquiring the conceptual knowledge necessary to make independent critical appraisals of the empirical work we will be examining. 


Vaccine hesitancy, acupuncture mania, and the methodological challenge of making senses of "boutique risk-benefit perceptions" (BRBPs)

A thoughtful correspondent drew my attention to evidence of the persistence of enthusiasm for acupuncture despite evidence that it doesn’t have any actual benefit.

He was struck by the contrast with the mirror image resistance to evidence that the benefits of childhood vaccines far outweigh their risks.

What sorts of cultural outlook might there be, he wondered, that predisposes some people to believe that sticking needles into their bodies promotes health and others that doing so will compromise it?! 

Maybe it’s a continuum with vaccine-hesitant people at one end and acupuncture devotees on the other?

Tongue-in-cheek on his part, but there’s an important point here about the role of fine-grained local influences on risk perception.  

My response:

Uh, no. The study finds that a group of exemption-seekers with those characteristics are *atypical* of people seeking exemptions generallyI am willing to bet that belief in the benefits of acupuncture will defy explanation by the sort of correlational, risk-predisposition profiling methods of which cultural cognition is an example.

Indeed, your comment actually highlights a research blind spot in the project to identify risk-perception propensities and to anticipate them through effective science communciation.  

The counterproductive media din to contrary notwithstanding, vaccine hesitancy defies explanation by the sorts of cultural & like profiles that are so helpful in charting conflict over various other risks.

Ditto with GM food risks.

Same w/, oh, concern about pasteurized milk (and belief in the benefits of raw milk); fear of cell phone radiation; anxiety about drones; fluoridation of water  etc.

There's some small segment of the US general population that believes in the effectiveness of acupuncture and its advantages over conventional medical treatments, which presumably those same people view as nonbeneficial or overly risky.  I bet their views are unshared by the vast majority of people who share their cutlural commitments generally.

Let's call these outlier views "boutique risk-benefit perceptions" -- BRBPs.

But let’s agree with "fearless Dave" Ropeik’s consistent point that it is not satisfying to shrug off BRPBs as disconnected from any social context, as lacking any genuine social meaning, or as simply random patterns of risk perception, unamenable to systematic explanation ...

I think the problem in accounting for BRBPs has two related causes:

First, the sorts of characteristics that matter in BRBPs might be ones that are featured in schemes like cultural cognition but they always depend in addition on some local variable, one that makes those characteristics matter only in particular places, & indeed could make different sets of characteristics have different valences across space.

Second, the large-sample correlational studies that are used to examine such relationships in standard risk-profiling studies are unsuited for identifying the relevant indicators of BRPB because the local variable will resist being operationalized in such a  study, and when it's omitted the remaining cultural characteristics will always lack any systematic relationship to the risk perception in question.

For an example of a closely related research problem where this dynamic is present and researchers just don't seem to get its significance, consider studies that purport to corroborate the trope that "rich, white, liberal, suburbanite parents" are anti-vax militants.

The most recent highly publicized study (or most recent highly publicized one I noticed) that purported to support this conclusion used a form of analysis that identifies "clusters" of school districts in which parents requested personal-belief exemptions in Calif.  

The clusters, as hypothesized, were in particular highly affluent, white, suburban school districts in Marin county (bay area) and in certain demographically comparable suburban school districts in the vicinity of LA.

Taking the cue from the authors' own characterization of their results, the media widely reported the studying as confirming that “[t]he parents most likely to opt out of vaccines” are “typically white and well-to-do" etc.

One doctor, who has no training in or familiarity with the empirical study of risk perception and science communication, & who apparently no familiarity with the empirical methods used in this particular study either, excitedly proclaimed that "[w]hile the study looked only at California,  ... similar patterns of demographics on parents would show up in other states as well."

Well, if so, then the conclusion will be that personal-exemption rates are not correlated with being "affluent, white, and suburban."  

In a state-wide regression analysis, this same study showed that suburban schools (which are affluent and mainly white in California) had substantially lower personal-exemption rates.

There's no contradiction or even paradox here.

"Cluster" analysis is a statistical technique designed, in effect, to find outliers: concentrated patterns of results that defy the sort of distribution one would expect in a statistical model in which one variable or set of variables is treated as the "cause" of another generally.  

If one can find such a cluster (i.e., one that can't be explained by a simple linear model that includes appropraite predictors), and can confidently rule out its appearance by chance, then necessarily one can infer that there is some other unobserved influence at work that is causing this unexpected concentration of whatever one is observing.

Generally speaking, cluster analysis isn't designed to identify causes of diseases or other like conditions. It is a form of analysis that tells you that there's some anomaly in need of explanation, almost certainly by other forms of empirical methods.

Strangely, the authors of the study apparently didn't get this.

They noted, with evident surprise, that "[s]suburban location had a negative relationship with PBEs [personal belief exemptions], opposite of what was anticipated given the maps of cluster assignments”  -- & trot out a series of post hoc explanations for this supposed anomaly.

But there was no anomaly to explain.  

If  there are genuinely high-personal-exemption-rate clusters in certain white, affluent, suburban schools, that implies that that there isn't an association between those characteristics and high personal-exemption rates generally--indeed, that there is more likely a negative association between them (if the association weren't negative outside the clusters, the high concentration in the clusters would be more likely to generate a positive linear correlation overall, albeit a weak one).  

Thus, the researchers, if it made sense for them to resort to spatial cluster analysis in the first place, should have anticipated the finding that "affluent, white, and suburban" school districts don’t have high personal-exemption rates generally.

Instead of announcing that their results had corroborated a common but incorrect stereotype, they should have recognized and advised readers that their study shows that in fact the influence that accounts for higher personal exemption rates in these schools is not that they are “affluent, white, and suburban” -- and is necessarily still unaccounted for!

They should also have called attention to the surplus of personal-exemption rate requests in school districts that are non-suburban-- in fact, among students in charter schools, whose attendees are more likely to be poor and minority.

I don't know why there would be higher exemption rates in students attending those schools. I seriously doubt that parents of these children are teeming with anti-vax sentiment. More likely, there’s a hole in the universal-vaccination net that should be identified and repaired here.

But the point is, researchers (at least those looking for the truth and not for the attention they can get for confirming a congenial misconception) aren't going to find out what influences, cultural or otherwise, explain vaccine hesitancy or ambivalence using general-population correlational studies.  The influences are too local, too fine grained, to be picked up by such means.  

Indeed, the "cluster" analysis methodology used in this and other studies is proof that something else-- something still not observed  -- is causing such behavior in these areas.  

It's something that necessarily evades the sorts of profiles one can identify using the sorts of attitudes and characteristics one can measure with a general-population survey.  

That's exactly what sets BRBPs apart from other types of risk perceptions.

BRBPs fall into a blind spot in the study of risk perception and science of science communication.  

We need valid empirical methods to remedy that. 


*Now* what do alternative sanctions mean? And how'd I miss the memo?

There were pretty much three things that I found very mysterious about the disconnect between empirical evidence and public policy when I started as an academic in the late 1950s or whenever it was, and the main one was the excessive reliance on imprisonment in the U.S.

I've reproduced the first few paragraphs of what was one of my first published articles (Kahan 1996) (the other was on how the latest developments in cold fusion were likely to radically alter constitutional interpretation; could still happen!). But basically the idea was that argument for so-called "alternative sanctions" was a loser b/c it ignored the phenomenon of social meaning. 

The case for reducing or eliminating imprisonment for a host of non-violent offenders, ones who didn't need to be incapacitated for public safety, was largely focused on costs and benefits: Tossing people in jail is expensive for society, not to mention degrading and debilitating for offenders, and doesn't deter those forms of criminality any more effectively ("empirical evidence demonstrated") than fines and community service.

The reason this argument, which had proponents across the ideological spectrum, persistently failed to gain traction, I maintained, was that it disregarded the societal expectation that punishment convey an official attitude of disapprobation, and indeed visit, symbolically, on offenders a kind of lowering in status commensurate with the severity of their own disregard for the value of the goods their actions had transgressed.  Decades' worth of experience, I concluded, showed things wouldn't get better until the stock of alternatives was enriched with punishments that not only regulated behavior more efficiently than imprisonment but expressed condemnation as effectively.  I proposed shaming punishments as a candidate.

Well, something seems to have changed. Very dramatically so. 

It's not just that there is "bipartisan support" for reducing incarceration -- at various times there had been that, too, in the past.

But the actual carrying through on these policies seems now to be largely a matter of indifference to the public.  The mood hasn't so much changed as just evaporated. 

Who cares? (Hey, did you hear about that lion in Milwaukee?!)

And what's more, I have no idea how this transformation took place. 

I don't think the explanation is that those making the argument for "alternative sanctions" just stuck to it, refining and improving and amplifying their arguments until finally everyone "got it."

I think the arguments that are being credited now were just as available 10, 20, or 30 years ago (the process that led to the dominance of incarceration as a mode of punishment started in the 1970s and really got locked in by the mid-80s).

What changed was the unacceptable meaning of the alternatives.

Or even more accurately, I think, what changed was the intensity with which the demand for what imprisonment conveys-- the distinctive gesture of condemnation associated with liberty deprivation -- just sort of withered and was forgotten about.... Take away that motivation to resist it, and the case that has always been so compelling actually starts to compel.

But like I said, I have no idea why this happened, and barely any idea when the change in the significance of the meaning of imprisonment changed.

I just averted my eyes, or widened my perspective to try to make sense of other examples of public policy disputes where the question of what laws do seemed subordinate, not just morally but cognitively, to what laws say, particularly about the social status of competing groups—and “poof,” the “alternative sanctions” debate was gone. . . .

Unless of course, it isn’t!

BTW, the second place where this same dynamic loomed large and fascinated me when I started “working” as an academic was the debate over capital punishment.  The primacy of “symbolic” motivations (morally, cognitively) to instrumental, deterrence considerations was widely understood to explain the persistence of capital punishment in the U.S. (Kahan 1999; Ellsworth & Gross 1994; Ellsworth & Ross 1983; Stolz 1983; Tyler & Weber 1982).

It was assumed, too, that that the intensity and durability of those expressive sensibilities meant the death penalty, like the overreliance on imprisonment in the U.S., was not going to go away.

Well, guess what? That’s changed too—and again for reasons that I don’t feel confident I can identify I do feel confident that the “obvious” reasons—cost, conviction of innocent, etc., are not the reasons; those arguments were always available and likely even more compelling at an earlier time! The strength of the arguments didn’t change; the strength of the motivation to resist did—because, as with imprisonment, the demand for the meanings that capital punishment expresses abated.

Likely these developments are related. Capital punishment and “get tough on crime” were big issues—really, really big!—in every presidential election between 1968 and 1988.  And then the whole thing just went away. . . .


The last issue of the three that had this quality when I started: gun control.  Good to see that some things never change.

But even better that many things do--in ways that furnish assurance that there will never be any shortage of mysteries to investigate.

What Do Alternative Sanctions Mean?

Dan M. Kahan


Imprisonment is the punishment of choice in American jurisdictions. In everyday life, the modes of human suffering are numerous and diverse: when we lose our property, we experience need; when we are denounced by those whose opinions we respect, we feel shame; when our bodies are tormented, we suffer physical pain. But for those who commit serious criminal offens­ es, the law strongly prefers one form of suffering-the depriva­ tion of liberty-to the near exclusion of all others. Some alterna­ tives to imprisonment, such as corporal punishment, are barely conceivable. Others, including fines and community service, do exist but are used sparingly and with great reluctance.

 The singularity of American criminal punishments has been widely lamented. Imprisonment is harsh and  degrading for offenders and extraordinarily expensive for society. Nor is there any evidence that imprisonment is more effective than its rivals in deterring various crimes. For these reasons, theorists of widely divergent orientations-from economics-minded conservatives to reform-minded civil libertarians-are united in their support for alternative  sanctions.

The problem is that there is no political constituency for such reform. If anything, the public's commitment to  imprisonment has intensified in step with the theorists' disaffection with it. In the last decade, prison sentences have been both dramatically lengthened for many offenses and extended to others that have traditionally been punished only with fines and probation.

What accounts for the resistance  to  alternative  sanctions? The conventional answer is a failure of democratic politics. Members of the public are ignorant of the availability and feasi­ bility of alternative sanctions; as a result, they are easy prey for self-interested politicians, who exploit their fear of crime by advocating more severe prison sentences.5 The only possible solution, on this analysis, is a relentless effort to  educate  the public on the virtues of the prison's rivals.

I want to advance a different explanation. The political unacceptability of alternative sanctions, I will  argue,  reflects their inadequacy along the expressive dimension of punishment. The public rejects the alternatives not because they perceive that these punishments won't work or aren't severe enough, but because they fail to express condemnation as dramatically and unequivocally as imprisonment.

This claim challenges the central theoretical premise of the case for alternative sanctions: that all forms of punishment are interchangeable along the dimension of severity or "bite." The purpose of imprisonment, on this account, is to make offenders suffer. The threat of such discomfort is intended to deter crimi­ nality, and the imposition of it to afford a criminal his just deserts. But liberty deprivation, the critics point out, is not the only way to make criminals uncomfortable. On this account, it should be possible to translate any particular term of imprison­ ment into an alternative sanction that imposes an equal amount of suffering. The alternatives, moreover, should be preferred whenever they can feasibly be imposed and whenever they cost less than the equivalent term of imprisonment.

This account is defective because it ignores what different forms of affliction mean. Punishment is not just a way to make offenders suffer; it is a special social convention that signifies moral condemnation. Not all modes of imposing suffering express condemnation or express it in the same way. The message of condemnation is very clear when society deprives an offender of his liberty. But when it merely fines him for the same act, the message is likely to be different: you may do what you have done, but you must pay for the privilege. Because community service penalties involve activities that conventionally entitle people to respect and admiration, they also fail to express condemnation in an unambiguous way. This mismatch between the suffering that a sanction imposes and the meaning that  it has for society is what makes alternative sanctions politically unacceptable.

The importance of the expressive dimension of punishment should be evident. It reveals, for one thing, that punishment reformers face certain objective constraints. The  social norms that determine what different forms of suffering mean cannot be simply dismissed as the product of ignorance or bias; rather, they reflect deeply rooted public understandings that mere exhortation is unlikely to change. But there are also more hopeful implica­ tions. If we can understand the expressive dimension of punish­ ment, we should be able to perceive not only what kinds of punishment reforms won't work but also which ones will. Careful attention to social norms might allow us to translate alternative sanctions into a punitive vocabulary that makes them a meaning­ ful substitute for imprisonment.


Ellsworth, P.C. & Ross, L. Public-Opinion and Capital-Punishment - a Close Examination of the Views of Abolitionists and Retentionists. Crime & Delinquency 29, 116-169 (1983).

Ellsworth, P.C. & Gross, S.R. Hardening of the Attitudes: Americans’ Views on the Death Penalty. J. Soc. Issues 50, 19 (1994).

Kahan, D.M. The Secret Ambition of Deterrence. Harv. L. Rev. 113, 413 (1999).

Stolz, B.A. Congress and Capital Punishment: An Exercise in Symbolic Politics. L. & Pol. Q. 5, 157-180 (1983).

Tyler, T.R. & Weber, R. Support for the Death Penalty: Instrumental Response to Crime, or Symbolic Attitude. L. & Soc. Rev. 17, 21-45 (1982).



Cognitive dualism as an adaptive resource in a polluted science communication environment ... a fragment

from something I'm working on. . . .

I. Overview: the “entanglement” problem

By no means the only threat to the science communication environment, the “entanglement problem” nonetheless comprises a recurring and especially damaging one. It occurs when positions on issues that admit of scientific investigation become suffused with antagonistic cultural meanings, transforming them into badges of membership in and loyalty to competing groups. At that point, to protect the standing of their groups and their status within them, individuals can be expected to conform their assessment of all manner of information to the position that predominates among those who share their defining commitments.

It’s almost certainly a mistake to attribute this form of identity-protective cognition (Kahan 2010) to the constraints on rationality responsible for “base rate neglect,” “the availability effect,” “confirmation bias” and like reasoning errors (Kahneman, Slovic & Tversky 1982). For one thing, unlike those biases, identity-protective cognition does not originate in overreliance on heuristic (“System 1”) information processing. On the contrary, the forms of conscious, effortful information (“System 2”) processing most essential to recognizing and giving proper effect to scientific evidence—including cognitive reflection, numeracy, and science comprehension—amplify the tendency of individuals to form and persist in identity-protective beliefs (Kahan 2013b; Kahan, Peters et al. 2013; Kahan, Peters et al. 2012). . . .

This problem—the entanglement problem—is not a consequence of stupid people but of a polluted science communication environment ("stupid!") (Kahan 2012). The antagonistic cultural meanings that transform positions on scientific issues into badges of cultural identity are a toxin that disables the normally reliable reasoning faculties that people use to align themselves with what’s known by science.

Protecting the science communication environment from this sort of contamination is a central mission of the science of science communication (Kahan in press). . . .

II.  Entanglement and science communication environment protection

. . . . Once some scientific issue has become entangled in antagonistic cultural meanings, the process of detoxification is likely to be a slow one. In the interval it takes to quiet the dynamics that excite culturally polarizing forms of identity-protective cognition, society will stand in need of techniques for counteracting the debilitating impact of such a condition on its citizens’ capacity to reason (Hall Jamieson & Hardy 2014). . . .

B.  Cognitive dualism

Observed in both religious students of science and in religious science-trained professionals, cognitive dualism involves the capacity of individuals to maintain apparently contradictory beliefs about some fact—such as the natural history of human beings—that admits of scientific investigation.

Cognitive dualism challenges the premise, however, that such beliefs are genuinely contradictory. According to this position, a “belief” cannot, as a psychological matter, be defined solely by the propositions they embody.

As mental objects, “beliefs” exist only within clusters or ensembles of mental states (including emotions, desires, and moral evaluations) distinctly suited for the performance of some action (Pierce 1877; Braithwaite 1933, 1946; Hetherington 2011). A highly religious doctor, for example, might explain that whether he “believes” in evolution depends on where he is: at “work,” where he uses knowledge of human evolution in his practice as an oncologist or as a medical researcher; or at “home,” where belief that humans were divinely created guides his behavior as a member of a particular religious community (Everhart & Hameed 2013). Because those opposing stances on the natural history of human beings exist only within the mental routines that enable him to do those activities, and because those activities do not contradict one another, the idea that the doctor harbors self-contradictory "beliefs" imposes a psychologically false criterion of identity on the constituents of his mind.

A similar account exists for religious science students who “don’t believe” in evolution. Research shows that it is possible to teach the modern synthesis to students who say they “don’t believe” in evolution just as readily as students who say they “do believe” in it. Afterwards, however, the former still profess not to “believe in” or accept evolution (Lawson & Worsnop 1992), a result that typically is understood by researchers to signify a limitation in the success of instruction for “nonbelieving” students.

Cognitive dualism, however, suggests that it is a mistake to infer that there is in fact any meaningful difference in the impact of the instruction on “believing” and “nonbelieving” students. If, as cognitive dualism supposes, beliefs as mental objects are “dispositions to action,” the science class has in fact generated the same belief in both: the sort that is linked to demonstrating the sort of knowledge of the modern synthesis certified by a high school biology exam (DiSessa 1982).

Such instruction has also left completely unaffected in both a completely distinct state of “belief” that exists for purposes of being a particular sort of person. The “disbelief in” evolution that the religious student has retained obviously performs that function. But so did the “belief in” evolution the nonreligious student held before he learned the modern synthesis. Believing in” evolution at that point enabled him to inhabit a particular cultural style notwithstanding that he almost certainly subscribed to the naive Lamarckian view of how it works that the vast majority of people—believers and nonbelievers—entertain (Bishop & Anderson Shtulman 2006). What is more, he will almost certainly retain that identity-enabling “belief in” evolution even if (as is again highly likely) he thereafter completely forgets the rudiments of the modern synthesis. Should the religious student, in contrast, grow up, say, to be a doctor, she is likely to remember what she learned about the modern synthesis and to use it when doing anything that requires that knowledge—even as she continues to “disbelieve in” evolution in her life as a person who finds meaning in holding a particular faith (Everhart & Hameed 2013; cf. Hermann 2012).

The course, in sum, imparted in both the “believer” and “nonbeliever” the sort of knowledge supportive of doing the things that one can do effectively only by accepting science’s understanding of the natural history of human beings (take exams, carry out responsibilities as a science-trained professional).  But it left unaffected -- in both -- a state of “belief” the enables something completely orthogonal to what science actually knows: being a person who finds meaning in the world through the exercise of free reason in collaboration with others exercising the same.

Cognitive dualism supplies an adaptive resource in a polluted science communication environment.  Where a person experiences as distinct opposing states of belief embedded in discrete and fully compatible clusters of action-enabling intentional states, she is freed from having to choose between being who she is and knowing what’s known by science. Understanding how to accommodate cognitive dualism, and to repel conditions that in fact can be shown to subvert it (Hameed 2015), is thus a form of scientific understanding integral to promoting the effective transmission of scientific knowledge—in classrooms, in businesses, in public meeting halls, and anywhere else—during the periods in which one or another scientific proposition has become enmeshed in antagonistic cultural meanings.


Bishop, B.A. & Anderson, C.W. Student conceptions of natural selection and its role in evolution. Journal of Research in Science Teaching 27, 415-427 (1990).

Braithwaite, R.B. The Inaugural Address: Belief and Action. Proceedings of the Aristotelian Society, Supplementary Volumes 20, 1-19 (1946).

Braithwaite, R.B. The nature of believing. Proceedings of the Aristotelian Society 33, 129-146 (1932).

DiSessa, A.A. Unlearning Aristotelian Physics: A Study of Knowledge‐Based Learning*. Cognitive science 6, 37-75 (1982).

Everhart, D. & Hameed, S. Muslims and evolution: a study of Pakistani physicians in the United States. Evo Edu Outreach 6, 1-8 (2013).

Hall Jamieson, K. & Hardy, B.W. Leveraging scientific credibility about Arctic sea ice trends in a polarized political environment. Proceedings of the National Academy of Sciences 111, 13598-13605 (2014).

Hameed, S. Making sense of Islamic creationism in Europe. Public Understanding of Science 24, 388-399 (2015).

Hermann, R.S. Cognitive apartheid: On the manner in which high school students understand evolution without Believing in evolution. Evo Edu Outreach 5, 619-628 (2012).

Hetherington, S.C. How to know : a practicalist conception of knowledge (J. Wiley, Chichester, West Sussex, U.K. ; Malden, MA, 2011).

Kahan, D. Fixing the Communications Failure. Nature 463, 296-297 (2010).

Kahan, D. Why we are poles apart on climate change. Nature 488, 255 (2012).

Kahan, D.M. Climate-Science Communication and the Measurement Problem. Advances in Political Psychology 36, 1-43 (2015).

Kahan, D.M. Ideology, Motivated Reasoning, and Cognitive Reflection. Judgment and Decision Making 8, 407-424 (2013).

Kahan, D.M. What is the "science of science communication"? J. Sci. Comm, (in press).

Kahan, D.M., Peters, E., Dawson, E. & Slovic, P. Motivated Numeracy and Enlightened Self Government. Cultural Cognition Project Working Paper No. 116 (2013).

Kahan, D.M., Peters, E., Wittlin, M., Slovic, P., Ouellette, L.L., Braman, D. & Mandel, G. The polarizing impact of science literacy and numeracy on perceived climate change risks. Nature Climate Change 2, 732-735 (2012).

Kahneman, D., Slovic, P. & Tversky, A. Judgment under uncertainty : heuristics and biases (Cambridge University Press, Cambridge ; New York, 1982).

Lawson, A.E. & Worsnop, W.A. Learning about evolution and rejecting a belief in special creation: Effects of reflective reasoning skill, prior knowledge, prior belief and religious commitment. Journal of Research in Science Teaching 29, 143-166 (1992).

Pierce, C.S. Philosophical Writings of Peirce, The Fixation of Belief. Popular Science Monthly (1877).

Shtulman, A. Qualitative differences between naïve and scientific theories of evolution. Cognitive Psychology 52, 170-194 (2006).


Weekend update: going to SENCER summer camp to learn about the "self-measurement paradox," the "science communication problem," & the "disentanglement project"

I'll be participating next week in the annual SENCER Summer Institute.

The 14 billion regular readers of this blog already know this, but for the rest of you, SENCER is an organization dedicated to obliterating the “self-measurement paradox” -- the truly weird and ultimately intollerable failure of professions that traffic in scientific knowledge to use science's signature methods to assess and refine their own craft norms.  

Most of the organizations' members are educators who teach math & science.

But SENCER definitely recognizes the link between the self-measurement paradox and the broader science communication problem in the Liberal Republic of Science.  That problem is a consequence of the self-measurement paradox on a grand scale--our systematic failure to use of evidence-based methods of science communication to assure that the vast scientific knowledge at our society's disposal is conveyed under conditions that enable free, reasoning citizens to reliably recognize it and give it the effect it is due when they govern themselves.

(Just to be clear: What effect it is due depends on citizens' values. Anyone who insists the best available scientific evidence uniquely determines policies either is very ill-informed or engaged in deliberative bad faith. Values, of course, naturally vary in a free society, creating the project of deliberative accommodation that is democracy's answer to the puzzle of how to reconcile individual autonomy with law.)

So ... in the session I'll be helping to lead, we'll be focusing on what I regard as the precise point of intersection between the self-measurement paradox and the science communication problem: the disentanglement project.  

In the science classroom, the "disentanglement project" refers to the development (by scientific means, of course) of strategies for unconfounding the question "what does science know" from the question "who are you & whose side are you on" in the study of scientific topics that have become enmeshed in antagonistic cultural meanings.

Critical in itself, learning how to disentangle knowledge and identity in education can, however, be expected to generate benefits that are even more far-reaching.  Disentangling knowledge from identity is in fact central to solving the broader science communication problem. Thus, studies aimed at implementing the disentanglement principle in science classrooms supply researchers with classrooms for acquiring the knowledge necessary for them to discern how to implement the disentanglement principle in institutions of self-government, too. That is the primary objective of the "new political science" essential to perfecting the Liberal Republic of Science as a political regime (Kahan in press). . . .

Boy, I can't wait for my SENCER summer camp session! Not to mention the all between-session volleyball games and evening marshmallow roasts!

My session description:

The science communication  disentanglement  project: What is to be done -- and how to do it with reliable and valid empirical methods

The topics of climate change  and  human evolution both feature the science communication  entanglement  problem. This problem occurs when a fact or set of facts that admit of scientific investigation become enmeshed in antagonistic cultural meanings that transform positions on those facts into badges of membership in opposing cultural groups.   This condition is actually rate, but where it occurs the consequences can be spectacularly damaging to propagation of both the collective knowledge and the norms of constructive deliberation essential to enlightened self-government.  The session will feature existing research on how to  disentangle  knowledge from antagonistic meanings both in and outside the classroom. The primary goal, however, will be to draw on the informed judgment of the participants to form conjectures on how, using the tools of empirical inquiry, educators and other science communicators can enlarge public understanding of how to protect free and reasoning citizens from being put in the position of having to choose between knowing what's known by science and being who they are.


Kahan, D.M. What is the "science of science communication"? J. Sci. Comm  (in press).


On "best practices," "how to" manuals, and *genuinely* evidence-based science communication

From correspondence with a reflective person on whether there is utility in compiling “guide books” of “best practices” for climate-science and like-situated communicators . . . .

I think our descriptions of what we each have in mind are likely farther apart than what each of us actually has in mind.  My fault, I'm sure, b/c I haven't articulated clearly what it is that I think is "good" & what "not good" in the sorts of manuals that synthesizers of social science research compile and distribute.

I think the best thing would be for me to try to show you examples of each.

This is very very very good:

The concept of "best practices as best guesses" that is featured in the intro & at various points throughout is very helpful. It reminds users that advice is a provisional assessment of the best current evidence -- and indeed, can't even be meaningfully understood by a potential user who doesn't have a meaningful comprehension of what observations & inferences therefrom inform the "guess."

Also, as developed, the "best practices as best guesses" concept makes readers conscious that a recommendation is necessarily a hypothesis, to be applied in a manner that enables empirical assessment both in the course of implementation & at the conclusion of the intervention.  They are not mechanical, do-this directives.  The essays are written, too, in a manner that reflects an interpretive synthesis of bodies of literature, including the issues on which there are disagreements or competing understandings.  

This is bad-- very very very very bad.

It is a compilation of general banalities.  No one can get any genuine guidance from information presented in this goldilocks form: e.g., "don't use numbers, engage emotions to get attention ... but be careful to rely too much on emotions b/c that will numb people..."

If they think they are getting that, they are just projecting their own preconceptions onto the cartoons -- literally -- that the manual comprises.  

The manual  ignores complexity and issues of external validity that reflective real-world communicators should be conscious of.  

Worst of all, there is zero engagement with what it means to have an evidence-based orientation and mode of operation.  As a result, this facile type of work reinforces rather than revises & reforms the understandings of real-world communicators who mistakenly expect lab researchers to hand them a set of "how to" directives, as opposed to a set of tools for testing their own best judgments about how to proceed.

I know you have concerns about whether I have unrealistic expectations about the motivation and ability of individuals associated with climate-science communication groups to make effective use of materials of the sort I think are "good."  Maybe you won't have that reaction after you look at the FDA manual.  

But if you do, then I'd say that part of the practice that has to change here involves evaluation of which sorts of groups ought to be funded by NGOs eager to promote better public engagement with climate science.  Those NGOs should adopt standards for awards that will reliably weed out of the pool of support recipients the ones that by disposition & mindset can't conduct themselves in a genuinely evidence-based way & replace them with ones who can and will structure themselves in a manner that enables them to do so.  

There's too much at stake here to rely on people who just won't use the available financial resources in a manner that one could reasonably expect to generate success in the world.

In particular, such resources shouldn't go to any group that thinks the success of a “science communication strategy” should be measured by how much it boosts contributions to the group’s own fund raising efforts.  It doesn’t surprise me to know that this happens but it does shock me to constantly observe members of these groups talking so unself-consciously about it, in a manner that betrays that perpetuation of their own existence is a measure of success in their minds independently of whether they are achieving the results that they presumably exist to bring about.