From something that collaborators and I are working on . . . .
From something that collaborators and I are working on . . . .
Showing why she has come to be kown as the "science communication honeybadger," Tamar Wilner bites into the #scicomm puff adder issue of what's the goal -- "belief" or "comprehension" -- question for teaching evolution.
This was one of the central questions posed by the Session 10 reading materials.
I've only posted an image of the beginning of her essay-- go to her site to read it. I will leave the comment field open here, though, in case anyone wants to share their reaction (if they disagree with Tamar, I'm not responsible for her devouring them with the enthusiasm that a honeybadger displays for sticking her face into a hive of angry bees so she can have a nice larvae desert after her main course of poisonous snake).
But here's another excerpt.
This one shows how we supplemented our use of conventional "statistical significance"/NHT testing of the study results with use of Bayesian likelihood ratios.
We did use former, but I think the latter are more useful generally for conveying practical strength of evidence & also for assessing the relative plausibility of competing hypotheses, an objective central to empirical inquiry for which NHT/"statistical significance" is ill-suited (see Goodman, S.N., Introduction to Bayesian methods I: measuring the strength of evidence, Clin Trials 2, 282 - 290 (2005); Edwards, W., Lindman, H. & Savage, L.J., Bayesian Statistical Inference in Psychological Research., Psych Rev 70, 193 - 242 (1963)). Anyone who disagrees that Likelihood ratios are cool is a Marxist!
Oh, BTW: "IPCI" refers to "identity-protective cognition impacat," which is the average percentage-point difference in the probability that a subject type (judge, lawyer, student, member of the public, or house pet) would be to find a statutory violation when doing so affirmed rather than defied his or her cultural worldview.
* * *
c. Judges vs. members of the public using Bayesian methods. As an alternative to assessing the improbability of the “null hypothesis,” one can use Bayesian methods to assess the strength of the evidence in relation to competing hypothesized IPCIs. Under Bayes’s Theorem the likelihood ratio reflects how much more consistent an observed outcome is with one hypothesis than a rival one. It is the factor in proportion to which one should adjust one’s assessment of the relative probability (expressed in odds) of one hypothesis in relation to another.
Imagine, for example, that we are shown two opaque canvas bags, labeled “B1”and “B2,” each of which is filled with marbles (we use “canvas bags” for this example in anticipation of the reasonable concern that Bayes’s Theorem might apply only to marble-filled urns). We are not told which is which, but one bag, it is stipulated, contains 75% red marbles and 25% blue, and the other 75% blue and 25% red. We are instructed to “sample” the contents of the bags by drawing one marble from each, after which we should make our best estimate of the probability that B1 is the bag containing mostly blue marbles and B2 the one containing mostly red. We extract a blue marble from B1 and a red one from B2.
Bayes’s Theorem furnishes logical instructions on how to use this “new evidence” to revise our estimates of the probability of the hypothesis that B1 is the bag containing mostly blue marbles (and hence B2 mostly red). If we assume that that hypothesis is true, then the probability that we would have drawn a blue marble from B1 is 3/4 or 0.75, as is the probability that we would have drawn a red marble from B2. The joint probability of these independent events—that is, the probability of the two occurring together, as they did—is 3/4 x 3/4 or 9/16. If we assume that hypothesis “B1 is the one that contains mostly blue marbles” is false, then the joint probability of drawing a blue marble from B1 followed by a red marble from B2 would be 1/4 x 1/4, or 1/16. Other possible combinations of colors could have occurred, of course (indeed, there are four possible combinations for such a trial). But if we were to repeat this “experiment” over and over (with the marbles being replaced and the labels on the bags being randomly reassigned after each trial), then we would expect the sequence “blue, red” to occur nine times more often when the bag containing mostly blue marbles is the one labeled “B1” than when it is the bag labeled “B2.” Because “blue, red” is the outcome we observed in our trial, we should revise our estimate of the probability of the hypothesis “B1 contains mostly blue marbles” by a factor of 9—from odds of 1:1 (50%) to 9:1 (90%).
We can use precisely the same logic to assess the relative probability of hypothesized judge and pubic IPCIs. In effect, one can imagine each subject-type as an opaque vessel containing some propensity to engage in identity-protective cognition. The strengths of those propensities—the subject types’ “true” IPCIs—are not amenable to direct inspection, but we can sample observable manifestations of them by performing this study’s statutory interpretation experiment. Calculating the relative likelihood of the observed results under competing hypotheses, we can construct a likelihood ratio that conveys how much more consistent the evidence is with one hypothesized subject-type IPCI than with another.
Figure 8 illustrates the use of this method to test two competing hypotheses about the public’s “true” IPCI: that members of the public would be 25 percentage points more likely to find a violation when doing so is culturally affirming, and alternatively that they would be only 15 percentage points more likely to do so. To make the rival hypothesis commensurable with the study results, we can represent each as a probability distribution with the predicted IPCI as its mean and a standard error equivalent to the one observed in the experimental results. Within any one such distribution, the relative probability of alternative IPCIs (e.g., 15% and 25%) can be determined by assessing their relative “heights” on that particular curve. Likewise, the relative probability of observing any particular IPCI under alternative distributions another can be determined by comparing the ratio of the heights for the probability density distributions in question.
The public IPCI was 22%. The probability of observing such a result (or any in close proximity to it) is eight times more likely under the more extreme “public IPCI = 25%” hypothesis than it is under the more modest “public IPCI = 15%” hypothesis (Figure 8). This the Bayesian likelihood ratio, or the factor in proportion to which one should modify one’s assessment of the relative probability that the “true” public IPCI is 25 as opposed to 15 percentage points.
We will use the same process to assess the weight of four competing hypothesis about the vulnerability of judges to identity protective cognition. The first is that judges will be “unaffected” (IPCI = 0%). This prediction, of course, appears similar to the “null hypothesis.” But whereas “null hypothesis testing” purports to specify only whether the null hypothesis can be “rejected,” Bayesian methods can be used to obtain a genuine assessment of the strength of the evidence in support of there being “no effect” if that is a genuine hypothesis of interest, as it is here. The remaining three hypotheses, the plausibility of which will be tested relative to the “IPCI = 0%” hypothesis are that that judges will be “just as affected as the public” (IPCI = 22%); that judges will be moderately affected (IPCI = 10%); and that judges will be affected to only a comparatively mild degree (IPCI = 5%).
The results are reflected in Figure 9. Not surprisingly, the experimental data are much more supportive of the first hypothesis—that judges would be unaffected by the experimental manipulation—than with the second—that they would be “as affected as much as the public.” Indeed, because the probability that we would have observed the actual experimental result if the latter hypothesis is true are astronomically low, there is little practical value in assigning a likelihood ratio to how much more strongly the evidence supports the hypothesis that judges were “unaffected” by the experimental manipulation.
Of course, members of the public were influenced by their cultural predispositions to a strikingly large extent. To learn that the evidence strongly disfavors the inference that judges are that biased does not in itself give us much insight into whether judges possess the capacity for impartial decisionmaking that their duties demand. It was precisely for that reason that less extreme IPCIs were also hypothesized.
Even those predictions, however, proved to be less supported by the evidence than was the hypothesis that judges would be unaffected by identity-protective reasoning. The evidence was 20 times more consistent with the “judge IPCI = 0” hypothesis than the “judge IPCI = 10%” hypothesis. The weight of the evidence was not as decided but still favored—by a factor of about three—the “judge IPCI = 0” hypothesis over the “judge IPCI = 5%” hypothesis (Figure 9).
This is an excerpt from “ ‘Ideology’ or ‘Situation Sense’? An Experimental Investigation of Motivated Reasoning and Professional Judgment.” That paper reports the results of a CCP study designed to test whether judges are vulnerable to motivated reasoning.
As described in more detail in a previous post, the answer turned out to be "yes and no": yes when they assessed societal risks like climate change and marijuana legalization, on which judges, like members of the public, polarized along cultural lines; but no when those same judges analyzed statutory interpretation problems that were designed to and did trigger ideologically biased reasoning in members of the public who shared those judges' values.
This excerpt discusses the implications of this finding for the question whether scientists should be viewed as vulnerable to ideologically motivated reasoning when they are making in-domain judgments relating to climate change and other societal risks.
* * *
C. Motivated reasoning, professional judgment & political conflict
... Sensibly, citizens tend to treat “scientific consensus” on environmental risk and other highly technical matters as a reliable normative guide for decisionmaking, collective and individual. But what makes it sensible for them to do so is that the method of inquiry scientists themselves use does not afford existing “scientific consensus” any particular weight. On the contrary, the entitlement of any previously supported proposition to continued assent is, for science, conditional on its permanent amenability to re-examination and revision in light of new evidence.
If, then, there were reason to believe that scientists themselves were being unconsciously motivated to discount evidence challenging “consensus” positions on issues like climate change, say, or nuclear power or GM foods, by their cultural outlooks, that would be a reason for treating apparent scientific-consensus positions as a less reliable guide for decisionmaking.
Various commentators, including some scientists, now assert that identity-protective reasoning has pervasively distorted the findings of climate scientists, making their conclusions, as reflected in reports like those issued by the Intergovernmental Panel on Climate Change, the National Academy of Sciences, and the Royal Society, unreliable.
Obviously, the best way to test this claim is by conducting valid empirical studies of the scientists whose findings on risk or other policy-relevant facts are being challenged on this basis. But we believe our study, although confined to judges and lawyers, furnishes at least some evidence for discounting the likelihood of the hypothesis that climate scientists or other comparable experts are being influenced by identity-protective reasoning.
The reason is the connection between our study results and the theory of professional judgment on which the study was founded.
As explained, the theoretical basis for our study design and hypotheses was the account of professional judgment most conspicuously associated with the work of Howard Margolis. Margolis treats professional judgment as consisting in the acquisition of specialized prototypes that enable those possessing the relevant form of expertise to converge on the recognition of phenomena of consequence to their special decisionmaking responsibilities.
Margolis used this account of professional judgment among scientists to help explain lay-expert conflicts over environmental risk. Nonexperts necessarily lack the expert prototypes that figure in expert pattern recognition. Nevertheless, members of the public possess other forms of prototypes—ones consisting of what expert judgments look like—that help them to recognize “who knows what about what.” Their adroit use of these prototypes, through the cognitive process of pattern recognition, enables them to reliably converge on what experts know, and thus to get the benefit of it for their own decisionmaking, despite their inability to corroborate (or even genuinely comprehend) that knowledge for themselves.
Nevertheless, in Margolis’s scheme, the bridging function that these “expertise prototypes” play in connecting lay judgments to expert ones can be disrupted. Such sources of disruption create fissures between expert and lay judgment and resulting forms of public conflict over environmental risk.
Identity-protective cognition can be understood to be a disrupting influence of this character. When a fact subject to expert judgment (Is the earth heating up and are humans causing that? Does permitting citizens to carry handguns in public make crime rates go up or down? Does the HPV vaccine protect adolescent girls from a cancer-causing disease—or lull them into sexual promiscuity that increases their risk of pregnancy or other STDs?) becomes entangled in antagonistic cultural meanings, positions on that fact can become transformed into badges of membership in and loyalty to opposing groups. At that point the stake people have in protecting their status in their group will compete with, and likely overwhelm, the one they have in forming perceptions that align with expert judgments.
As we have noted, there is a striking affinity between the account Margolis gives of pattern recognition in expert judgment among scientists and other professionals and Karl Llewellyn’s account of “situation sense” as a professionalized recognition capacity that enables lawyers and judges to converge on appropriate legal outcomes despite the indeterminacy of formal legal rules. We would surmise, based on this study and previous ones, a parallel account of public conflict over judicial decisions.
Lacking lawyers’ “situation sense,” members of the public will not reliably be able to make sense of the application of legal rules. But members of the public will presumably have acquired lay prototypes that enable them, most of the time anyway, to recognize the validity of legal decisions despite their own inability to verify their correctness or comprehend their relationship to relevant sources of legal authority.
But just like their capacity to recognize the validity of scientific expert judgments, the public’s capacity to recognize the validity of expert legal determinations will be vulnerable to conditions that excite identity-protective reasoning. When that happens, culturally diverse citizens will experience disagreement and conflict over legal determinations that do not generate such disagreement among legal decisionmakers.
This was the basic theoretical account that informed our study. It was the basis for our prediction that judges, as experts possessing professional judgment, would be largely immune to identity-protective cognition when making in-domain decisions. By access to their stock of shared prototypes, judges and lawyers could be expected to reliably attend only to the legally pertinent aspects of controversies and disregard the impertinent ones that predictably generate identity-protective cognition in members of the public—and thus resist cultural polarization themselves in their expert determinations. That is exactly the result we found in the study.
Because this result was derived from and corroborates surmises about a more general account of the relationship between identity-protective reasoning and professional judgment, it seems reasonable to imagine that the same relationship between the two would be observed among other types of experts, including scientists studying climate change and other societal risks. Public conflict over climate change and like issues, on this account, reflects a reasoning distortion peculiar to those who lack access to the prototypes or patterns that enable experts to see how particular problems should be solved. But since the experts do possess access to those prototypes, their reasoning, one would thus predict, is immune to this same form of disruption when they are making in-domain decisions.
This is the basis for our conclusion that the current study furnishes reason for discounting the assertion that scientists and other risk-assessment experts should be distrusted because of their vulnerability to identity-protective cognition. Discount does not mean dismiss, of course. Any judgment anyone forms on the basis of this study would obviously be subject to revision on the basis of evidence of even stronger probative value—the strongest, again, being the results of a study of the relevant class of professionals.
At a minimum, though, this study shows that existing work of the impact of identity-protective cognition on members of the public has no probative value in assessing whether the in-domain judgments of climate scientists or other risk-assessment professionals is being distorted by this form of bias. Generalizing from studies of members of the public to these experts would reflect the same question-begging mistake as generalizing from such studies to judges. The results of this study help to illustrate that commentators who rely on experiments involving general-public samples to infer that judges are influenced by identity-protective cognition are making a mistake. Those who rely on how members of the public reason to draw inferences about the in-domain judgments of scientists are making one, too.
* * *
Now here is one more thing that is worth noting & that is noted (but perhaps not stressed enough) in a portion of the paper not excerpted here: the conclusion that professional judgment insultates experts from identity-protective cognition (the species of motivated reasoning associated with ideologically biaed information processing) either in whole or in part does not mean that those experts are not subject to other cognitive biases that might distort their judgments in distinct or even closely analogous ways! There is a rich literature on this. For a really great example, see Koehler, J.J., The Influence of Prior Beliefs on Scientific Judgments of Evidence Quality. Org. Behavior & Human Decision Processes 56, 28-55 (1993).
Dynamics of cognition need to be considered with appropriate specificity--at least if the goal is to be clear and to figure out what is actually going on.
I’ve posted previously about the quality of “computer models” developed by political scientists for predicting judicial decisions by the U.S. Supreme Court. So this is, in part, an update, in which I report what I’ve learned since then.
As explained previously, the models are integral to the empirical proof that these scholars offer in favor of their hypothesis that judicial decisoinmaking generally is driven by “ideology” rather than “law.”
That proof is “observational” in nature—i.e., it relies not on experiments but on correlational analyses that relate case outcomes to various “independent variables” or predictors. Those predictors, of course, include “ideology” (measured variously by the party of the President who appointed the sitting judges, the composition of the Senate that conferred them, and, in the case of the Supreme Court, the Justices’ own subsequent voting records) the “statistical significance” of which, “controlling for” the other predictors, is thought to corroborate the hypothesis that judges are indeed relying on “ideology” rather than “law” in making decisions.
Commentators have raised lots of pretty serious objections going to the internal validity of these studies. Among the difficulties are sampling biases arising from the decision of litigants to file or not file cases (Kasteller & Lax 2008), and outcome “coding” decisions that (it is said) inappropriately count as illicit “ideological” influences what actually are perfectly legitimate differences of opinion among judges over which legally relevant considerations should be controlling in particular areas of law (Edwards & Livermore 2008; Shapiro 2009, 2010).
But the main issue that concerns me is the external validity of these studies: they don’t, it seems to me, predict case outcomes very well at all.
That was the point of my previous post. In it, I noted the inexplicable failure of scholars and commentators to recognize that a computer model that beat a group of supposed “experts” in a widely heralded (e.g., Ayers 2007) contest to predict Supreme Court decisions (Ruger et al. 2004) itself failed to do better than chance.
It’s kind of astonishing, actually, but the reason that this evaded notice is that the scholars and commentators either didn’t get or didn’t get the significance of the (well known!) fact that the U.S. Supreme Court, which has a discretionary docket, reverses (i.e., overturns the decision of the lower court) in well over 50% of the cases.
Under these circumstances, it is a mistake (plain and simple) to gauge the predictive power of the model by assessing whether it does better than “tossing a coin.”
Because it is already known that the process in question disproportionately favors one outcome, the model, to have any value, has to outperform someone who simply chooses the most likely outcome—here, reverse—in all cases (Long 1997; Pampel 2000).
The greater-than 50% predictive success rate of following that obvious strategy is how well someone could be expected to do by chance. Anyone who randomly varied her decisions between “reverse” and “affirm” would do worse than chance—just like the non-expert law professors who got their asses whupped by the computer, who I have in fact befriended and learned is named “Lexy," in the widely (and embarrassingly!) heralded contest.
The problem, as I pointed out in the post, is that Lexy’s “75%” success rate (as compared to the “expert’s” 59%) was significantly better-- practically or statistically (“p = 0.58”) -- from the 72% reversal rate for the SCOTUS Term in question.
A non-expert who had the sense to recognize that she was no expert would have correctly “predicted” 49 of the 68 decisions that year, just two fewer than Lexy managed to predict.
I was moved to write the post by an recent recounting of Lexy’s triumph in 538.com, but I figured that surely in the intervening years—the contest was 13 yrs ago!—the field would have made some advances.
A couple of scholars in the area happily guided me to a cool working paper by Katz, Bommarito & Blackmun (2014), who indeed demonstrate the considerable progress that this form of research has made.
KBB discuss the performance of a model, whose name I’ve learned (from communication with that computer, whom I met while playing on-line poker against her) is Lexy2.
Lexy2 was fed a diet of several hundred cases decided from 1946 to 1953 (her “training set”), and then turned loose to “predict” the outcomes in 7000 more cases from the years 1953 to 2013 (technically, that’s “retrodiction,” but same thing, since no one “told” Lexy2 how those cases came out before she guessed; they weren’t part of her training set).
Lexy2 got 70% of the case outcomes right over that time.
KBB, to their credit (and my relief; I found it disorienting, frankly, that so many scholars seemed to be overlooking the obvious failure of Lexy1 in the big “showdown” against the “experts”), focus considerable attention on the difference between Lexy2’s predictive-success rate and the Court’s reversal rate, which they report was 60% over the period in question.
Their working paper (which is under review somewhere and so will surely be even more illuminating still when it is published) includes some really cool graphics, two of which I’ve superimposed to illustrate the true predictive value of Lexy2:
As can be seen, variability in Lexy2’s predictive success rate ("KBB" in the graphic) is driven largely by variability in the Court’s reversal rate.
Still 70% vs. 60% is a “statistically significant” difference—but with 7000+ observations, pretty much anything even 1% different from 60% would be.
The real question is whether the 10-percentage-point margin over chance is practically significant.
(Of course, it's also worth pointing out that trends in the reversal rate should be incorporated into evaluation of Lexy2's performance so we can be sure her success in periods when reversal might have been persistently less frequent doesn't subsidize for predictive failure during periods when the reversal rate was persistently higher; impossible to say from eyeballing, but it kind of looks like Lexy2 did better before 1988, when the Court still had a considerable mandatory appellate jurisdiction. than it has done with today's wholly discretionary one. But leave that aside for now.)
How should we assess the practical siginificance of Lexy2's predictive acumen?
If it helps, one way to think about it is that Lexy2 in effect correctly predicted “25%” of the cases (10% of the 40% “affirmed” cases) that “Mr. Non-expert,” who would have wisely predicted "reverse" in all cases, would have missed. Called “adjusted count R2 ,” this is a logistic regression equivalent of R2 for linear regression.
But I think an even more interesting way to gauge Lexy2’s performance this is to compare it to the original Lexy’s.
As I noted, Lexy didn’t genuinely do better than chance.
Lexy2 did, but the comparison is not really fair to the original Lexy.
Lexy2 got to compete against "Mr. Chance" (the guy who predicts reverse in every case) for 60 terms, during which the average number of decisions was 128 cases as compared to 68 in the the single term in which Lexy competed. Lexy2 thus had a much more substantial period to prove her mettle!
So one thing we can do is see how well we'd expect Lexy2 to perform against Mr. Chance in an "average" Supreme Court Term.
Using the 0.60 reverse rate KBB report for their prediction (or retrodiction) sample & the 0.70 prediction-success rate they report for Lexy2, I simulated 5000 "75-decision" Terms--75 being about average for the modern Supreme Court, which is very lazy in historical terms.
Here's a graphic summary of the resuts:
In the 5000 simulated 75-decision Terms, Lexy2 beats Mr. Chance in 88%. In other words, the odds are a bit better than 7:1 that in a given Term Lexy2 will rack up a score of correct predictions that exceeds Mr. Chance's by at least 1.
But what if we want (for bookmaing purposes, say) to determine the spread -- that is the margin by which Lexy2 will defeat Mr. Chance in a given term?
Remember that Lexy "won" against Mr. Chance in their one contest, but by a pretty unimpressive 3 percentage points (which with N = 68 was, of course, not even close to "significant").
If we look at the the distribution of outcomes in 5000 simulated 75-decision terms, Lexy2 beats Mr. Chance by 10% in 50% of the 75-decision terms & fails to beat Mr. Chance by at least 10% in 50%. Not suprising; something would definitely be wrong with the simulation if matters were otherwise! But in any given term, then, Lexy2 is "even money" at +10 pct.
The odds of Lexy2 winning by 5% or more over Mr. Chance (4 more correct predictions in a 75-decison Term) are around 3:1. That is, in about 75% (73.9% to be meaninglessly more exact) of the 75-decision Supreme Court Terms, Lexy2 wins by at least +5 pct.
The odds are about 3:1 against Lexy2 beating Mr. Chance by 15 pct points.
Obviously the odds are higher than 3:1 that Lexy2 will eclipse the 3-pct-point win eked out by the original Lexy in her single contest with Mr. Chance. The odds of that are, according to this simulation, about 5:1.
But what if we want to test the relative strength of the competing hypotheses (a) that “Lexy 2 is no better than the original 9001 series Lexy” and (b) that Lexy2 enjoys, oh, a “5-pct point advantage over Lexy” in a 75-decision term?
To do that, we have to figure out the relative likelihood of the observed data-- that is, the results reported in KBB -- under the competing hypotheses. Can we do that?
Here I've juxtaposed the probability distributions associated with the "Lexy2 is no better than Lexy" hypothesis and the "Lexy2 will outperform Lexy by 5 pct points" hypothesis.
The proponents of those hypotheses are asserting that on average Lexy2 will beat “Mr. Chance” by 3%, Lexy’s advantage in her single term of competition, and 8% (+5% more), respectively, in a 75-decision term.
Those “averages” are means that sit atop of probability distributions characterized by standard errors of 0.09, which is by my calculation (corroborated, happily!, by the simulation) of the difference in success rates for both 0.72 and 0.75, on the one hand, and 0.60, on the other, for a 75-decision Term.
The ratio of the densities at 0.10, the observed data, for the "Lexy2 +5 " hypothesis & the "Lexy2 no better" hypothesis is 1.2. That's the equivalent of the Bayesian likelihood ratio, or the factor by which we should update our prior odds of Hypothesis 2 rather than Hypothesis 1 being correct (Goodman 1999, 2005; Edwards, Lindman & Savage 1963; Good 1985).
That's close enough to a likelihood ratio of 1 to justify the conclusion that the evidence is really just as consistent with both hypotheses –“ Lexy2 is no better,” and “Lexy2 +5 over Lexy.”
Is this “Bayes factor” (Goodman 1999, 2005) approach the right way to assess things?
I’m not 100% sure, of course, but this is how I see things for now, subject to revision, of course, if someone shows me that I made a mistake or that there is a better way to think about this problem.
In any case, the assessment has caused me to revise upward my estimation of the ability of Lexy! I really have no good reason to think Lexy isn’t just as good as Lexy2. Indeed, it’s not 100% clear from the graphics in KBB, but it looks to me that Lexy's 75% "prediction success" rate probably exceeded that of Lexy2 in 2002-03, the one year in which Lexy competed!
At the same time, this analysis makes me think a tad bit less than I initially did of ability of Lexy2 (& only a tad; it's obviously an admirable thinking machine).
Again, Lexy2, despite “outperforming” Mr. Chance by +10 pct over 60 terms, shouldn’t be expected to do any better than the original Lexy in any given Term.
More importantly, being only a 7:1 favorite to beat chance by at least a single decision, & only a 3:1 favorite to beat chance by 4 decisions or more (+5%), in an average 75-decision Term just doesn’t strike me as super impressive.
Or in any case, if that is what the political scientists’ “we’ve proven it: judges are ideological!” claim comes down to, it’s kind of underwhelming.
I mean, shouldn’t we see stronger evidence of an effect stronger than that? Especially for the U.S. Supreme Court, which people understandably suspect of being “more political” than all the other courts that political scientists also purport to find are deciding cases on an ideological basis?
No empirical method is perfect. They are all strategies for conjuring observable proxies of process that in fact we cannot observe directly.
Accordingly, the only “gold standard,” methodologically speaking, is convergent validity: when multiple (valid) methods reinforce one another, then we can more confident in all of them; if they don’t agree, then we should wary about picking just one as better than another.
The quest for convergent validity was one of the central motivations for our study—discussed in my post “yesterday” to probe the “ideology thesis”—the political science conclusion, based on observational studies—via experimental methods.
That our study (Kahan, Hoffman, Evans, Lucci, Devins & Cheng in press) came to a result so decidedly unsupportive of the claim that judges are ideologically biased in their reasoning reinforces my conclusion that the evidence observational researchers have come up with so far doesn’t add much to whatever grounds one otherwise would have had for believing that judges are or are not “neutral umpires.”
But I'm really not sure. What do you think?
Edwards, H.T. & Livermore, M.A. Pitfalls of empirical studies that attempt to understand the factors affecting appellate decisionmaking. Duke LJ 58, 1895 (2008).
Edwards, W., Lindman, H. & Savage, L.J. Bayesian Statistical Inference in Psychological Research. Psych Rev 70, 193 - 242 (1963).
Good, I.J. Weight of evidence: A brief survey. in Bayesian statistics 2: Proceedings of the Second Valencia International Meeting (ed. J.M. Bernardo, M.H. DeGroot, D.V. Lindley & A.F.M. Smith) 249-270 (Elsevier, North-Holland, 1985).
Goodman, S.N. Introduction to Bayesian methods I: measuring the strength of evidence. Clin Trials 2, 282 - 290 (2005).
Goodman, S.N. Toward evidence-based medical statistics. 2: The Bayes factor. Annals of internal medicine 130, 1005-1013 (1999).
Kastellec, J.P. & Lax, J.R. Case selection and the study of judicial politics. Journal of Empirical Legal Studies 5, 407-446 (2008).
Katz, Daniel Martin and Bommarito, Michael James and Blackman, Josh, Predicting the Behavior of the Supreme Court of the United States: A General Approach (July 21, 2014). Available at SSRN:http://ssrn.com/abstract=2463244 or http://dx.doi.org/10.2139/ssrn.2463244
Long, J.S. Regression models for categorical and limited dependent variables (Sage Publications, Thousand Oaks, 1997).
Pampel, F.C. Logistic regression : a primer (Sage Publications, Thousand Oaks, Calif., 2000).
Shapiro, C. Coding Complexity: Bringing Law to the Empirical Analysis of the Supreme Court. Hastings Law Journal 60 (2009).
Shapiro, C. The Context of Ideology: Law, Politics, and Empirical Legal Scholarship. Missouri Law Review 75 (2010).
I have to say, as the day goes by I wonder if I'm giving Lexy2 her due... Given the high reversal rate in the US S Ct, there isn't a lot of room to show predictive proficiency. There's a necessarily going to be a high degree of variance in a 75-decision term-- just as there is in a 75-hand session of poker (or a 750-hand one, for that matter). If over time she is racking up an avg margin of 7.5 correct predictions per term over Mr. Chance, then that's something certainly.
Anyway, KBB are doing this right. I hope the observational scholars purporting to prove "ideology" is driving lower court decisions are putting their models to a test this genuine; it's what has to be done to earn the claim that a decisonmaking model is truly predicting rather than simply being fit to the data one is falsely purporting to "predict."
We were just fooling around with the design of study stimuli (you know, things like this & this), & all of a sudden one of us (AR) comes up with this graphic image the content of which we can't agree on! Half of us on hand in the lab that day saw Hayek & the other half Honey Boo Boo! WTF?!!
Check it out!
So who is it? Hayek? Honey Boo Boo? Weird!
Next we showed it to a nationally representative sample of 482 million Americans on MTurk. It turns out that 24.75% see Hayek, 24.75% Honey Boo, 24.75% the Marlboro Man, and 24.75% Michael Moore.
Even more astonishingly, there was a 0.999*** correlation between which of those characters the subjects reported seeing and their "cultural worldviews" as measured with the CCW scales!
Those seeing either Hayek or Honey Boo Boo turned out to be either Egalitarian Individualists or Hierarch Communitarians, respectively-- of course! The Marlboro Man? Yup: Hierarchical Individualists! And Michael Moore--Egalitarian Communitarians, naturally.
Actually, if you back up or move forward (zoom in, zoom out), you'll likely see the image morph from one of these characters into the other.
Pretty cool, don't you think?!
Now one more thing that will really blow your mind.
It turns out that 1% of the subjects in our study didn't see any of these characters but someone else entirely. They gave consistent descriptions of him/her, but we have no idea who he or she actually is!
We were able to figure out, though, that the 1% of the people who saw this non-conforming, weird image were ... Ludwicks! The super rare types who actually form risk-perception clusters that don't conform to the ones characteristic of their cultural group!
This is, of course, a real breakthrough, since before now we had no reliable predictive indicators for identifying Ludwicks!
For sure we'll be using this graphic, in place of the short-form CWV scales, in all future studies.
Weirdest freak discovery since Michaelson-Morley device proved speed of light is a constant! What a strange world we live in!
Nice work, AR! (Makes up for your vomiting on my laptop this morning, for sure.)
From something I'm working on in connection with the CCP Evidence-based science communication initiative.
1. Overview. This section describes an evidence-based and evidence-generating program of science communication carried out in support of effective local policymaking. Known as “Communicating Normality,” the program aims to stimulate within key opinion-formation communities self-replicating interactions that maximize citizens’ exposure to the confidence of their own peers in the science that informs local climate-policymaking initiatives.
2. Theoretical grounding. In order to live well—or just to live—ordinary individuals must make effective use of far more scientific information than they have either the time or capacity to understand in meaningful detail. For this purpose, they become experts not in particular forms of decision-relevant science but in recognizing the forms of insight generated by valid scientific methods. The primary source of information that guides this expertise is individuals’ observation of others whom they trust and regard as informed, socially competent actors. These actors, whose ranks include not just science-trained professionals but ordinary individuals’ own neighbors, friends, and coworkers, do not, for the most part, “frame” or deliver “messages” about science; rather they vouch for the validity of science through the by relying on in making decisions of consequence (CCP 2014).
The dynamics of “communicating normality,” moreover, not only explain why it is that ordinary citizens normally converge on the best available scientific evidence but also why they sometimes don’t. On issues like climate change and the HPV vaccine, conspicuous forms of cultural conflict obscure, distort, and ultimately stifle the orienting signals that culturally diverse citizens use to identify valid decision-relevant science (Kahan 2012, 2013, 2015).
3. Practical evidence. “Communicating normality” as a science communication strategy has played an important role in the activities local governments involved in promoting public engagement with climate science. Those governments have used a variety of public outreach techniques aimed at vitalizing the spontaneous community interactions that ordinary citizens use to recognize valid science.
In effect, ordinary citizens who already are actively involved in the local-decisionmaking processes have been encouraged to assume the role of “proselytizers of normality” to make their own views about the legitimacy and importance of local decisionmaking initiatives known within relevant opinion-formation communities: from local business groups to home-owner associations, from church congregations and civic organizations (Kahan 2015). This activity, government actors believe, has contributed to their success both by amplifying the signals that individuals use to recognize valid science and by counteracting the disruptive impact of groups committed to entangling the their policymaking agendas in the forms of cultural rivalry that have prevented public recognition of the validity of climate science generally (CCP 2015).
“Communicating normality” is both an evidence-informed and an evidence-generating strategy (Kahan 2014; Han & Stenhouse 2014; Stenhouse 2014). Applying their experience-informed judgment to the best available evidence, local government actors and affiliated communicators, with assistance from researchers affiliated with the CCP Evidence-based Science Communication Initiative, have implemented it, and in the course of carrying it out have assessed its impact and revised its operation, on the basis experimental studies the designs of which they were intimately involved in formulating.
4. Enlarging the program. The outlined program would systematize and enlarge the “Communicating Normality” strategy. As valuable as “Communicating Normality” has been, its overall role local government communications activities, it has been constrained by the limited government staff and staff available to carry it out. Moreover, these government actors justifiably anticipate an intensified need for the contribution that “Communicating normality” is uniquely suited to making: as their activities to use climate-science to protect their communities’ interests assume an increasing larger profile in the everyday lives of ordinary citizens, those citizens will have even greater need both for access to the orienting signals they use to identify valid decision-relevant science and greater insulation from the (often strategically orchestrated) forms of cultural-rivalry that obscure and distort the accessibility of those signals. Finally, this program is founded on the conviction that the information generated by the evidence-based science communication techniques that guide “Communicating normality” should be magnified in extent and made as widely accessible as possible to groups pursuing similar objectives (Kahan 2014).
CCP, Evidence-Based Science Communication Initiative Rept. No. 1: Assessing and Forecasting the Quality of the Local Science Communication Environment (Oct. 13, 2013).
CCP, Evidence-based Science Communication Initiative Rept. No. 2: Proselytizing Normality, an Experimental Assessment (Nov. 14, 2014).
Stenhouse, N. Spreading Success Beyond the Laboratory: Applying the RE-AIM Framework for Effective Environmental Communication Interventions at Scale. Conf. Paper National Communication Association 100th Annual Convention (Mar. 26, 2014).
Despite missing my connecting flight as a result of blogging in the Detroit Metropolitan Airport yesterday, I did manage to make it to Boulder! Will be giving lecture this afternoon on They Saw a Statutory Ambiguity (aka“Ideology” or “Situation Sense”? An Experimental Investigation of Motivated Reasoning and Professional Judgment). If in neighborhood, stop by!
So here is the first of a planned 73 blog posts on the paper, which describes a study we did on whether judges’ decisionmaking is vulnerable to distortion by cultural cognition.
By now, all 14 billion readers of this blog, along with the remaining 18 other persons in the world (the ones whose internet connections were disconnected for failing to pay comcast on a timely basis), know about Hastorf & Cantril’s classic paper “They Saw a Game.”
H&C found that when students from two Ivy League colleges were shown a film of a football game between their two schools, students selectively perceived the referee to be making correct or mistaken penalty calls depending on whether those calls were beneficial or detrimental to their school’s team.
This was the first finding of “motivated reasoning,” which refers to the tendency of individuals to conform their assessments of all manner of information—from brute sense impressions to assessments of logical arguments to evaluations of empirical evidence—to some end or goal independent of factual accuracy.
In H&C, that goal was the experience of solidarity with their school.
“Cultural cognition” is a form of motivated reasoning that manifests itself in individuals conforming their perceptions of risk or other policy relevant facts to propositions that promote the status of, or their standing in, their own cultural group. Cultural cognition, studies suggest, makes a substantial contribution to public conflicts over climate change, gun control, marijuana legalization and other putative societal risks.
One of the things that and I and my colleagues are curious about is whether cultural cognition influences—and in this context, we’d say biases—legal decisionmaking.
One the papers in which we examine this issue is called “They Saw a Protest.”
In that study, we showed the subjects—a group of 200 members of the public drawn from a demographically diverse panel of US adults—a videotape of a political protest. We told them the protestors were suing the police for breaking up their demonstration in violation of the protestors’ free speech rights.
The police, we explained, were justifying their dispersing of the protestors on the ground that the protestors were threatening and intimidating bystanders, and preventing their access to a building. The protestors denied this, asserting that they were peacefully chanting and directing comments at the bystanders in a nonthreatening, lawful fashion.
All the subjects had to do was, in the role of jury, decide whose position—the protestors or the police—the tape supported.
Half the subjects, however, were told that the protesters were anti-abortion activists demonstrating in front of an abortion clinic. The other half were told that the protestors were college students demonstrating at against “Don’t ask, don’t tell” (for those of you too young to remember, the long-time policy that excluded open Lesbians and Gays from the US military) in front of the college recruitment center, where the military was interviewing students interested in signing up for service.
Consistent with the influence of cultural cognition—and reminiscent of the result from the classic “They Saw a Game” study—we found that the study subjects polarized along cultural lines.
Those who had opposing cultural outlooks (“egalitarian communitarian” vs. “hierarchical individualist,” or “egalitarian individualist vs. hierarchical communitarian”) who were assigned to the same experimental condition—who, in other words, thought that they were watching the same kind of protest—disagreed about whether they saw the protestors blocking pedestrians, threatening them with their signs, shouting in their faces, etc. Those who had the same cultural outlooks but who were assigned to different experimental conditions (thought they were watching different kinds of protests) disagreed with each other about what had occurred.
Not a happy outcome for the law. The First Amendment is supposed to protect the right of citizens to express themselves no matter how objectionable anyone finds their message, so long as those citizens don’t cross the line from speech to intimidation. But if decisionmakers are motivated by cultural cognition to attribute intimidating behavior to those whose messages offend the decisionmakers’ values, then this constitutional guarantee won’t be worth much!
But these decisionmakers were ordinary citizens playing the role that ordinary citizens often do—juror—in a legal proceeding.
Would judges be subject to the same sort of bias?
That’s the question our new study was designed to help answer.
The paper is entitled “ ‘Ideology’ or ‘Situation Sense’? An Experimental Investigation of Motivated Reasoning and Professional Judgment,” a more appropriate might have been “They Saw a Statutory Ambiguity.”
In the study, we had the subjects—a group of 800 members of the public and 700 law students, lawyers and judges (represented in roughly equal proportions) play the role of judge in two cases.
Both cases featured statutory ambiguities: in one (“Littering”) whether leaving reusable plastic water bottles in the desert constituted “depositing debris” in a protected wildlife preserve; and in the other (“Disclosure”) whether a police officer who knew he was releasing confidential investigatory information but didn’t realize doing so was illegal “knowingly violating” a law against making such disclosures.
Again, we had experimental manipulations.
In Littering, the alleged violator was either a group of construction workers building a border, who left the bottles in the desert to drink water from as the build a fence to keep undocumented immigrants from illegally crossing the Mexico-Texas border; or an immigrant-aid group who put the bottles in the desert in anticipation that the water would be consumed by the illegal immigrants.
In Disclosure, the party receiving the leaked information was either a family-planning center counseling individuals on where to obtain abortions or a religious family-planning center counseling individuals on alternatives to the same.
Those experimental manipulations were ones irrelevant to proper interpretation of the statutes at issue (or so we believed). But they were ones that could still provoke the sort of bias associated with cultural cognition, which could be expected to bias decisionmakers’ interpretations toward finding violations where such outcomes were consistent with rather than contrary to their cultural values.
We found exactly that pattern of biased decisionmaking in members of the public.
But not in the judges: the judges converged on outcomes (no violation in Littering; violation in Disclosure) regardless of which version of the problem they considered and regardless of their cultural values.
In other words, the judges got a cultural-cognition clean bill of health. They decided the cases free of the sort of influences that, the study showed, could be expected to generate group-favoring biases in ordinary citizens.
In another part of the study, the judges divided along exactly the same cultural lines as members of the public on the risks of climate change, marijuana legalization, and a host of other issues.
The resistance the judges displayed to cultural cognition, then, was very specific to their legal reasoning.
Lawyers, by the way, were pretty much identical to judges in their responses to both the legal reasoning and risk-perception portions of the study.
Law students were in between the members of the public (who again, were biased in both types of responses), on the one hand, and the lawyers and judges, on the other.
So that’s what we found.
In future posts, I’ll say more about the theoretical and practical motivations for the study, the methods we used to analyze the results, and the implications of the study for assessment of the performance not only of judges but of professionals generally, including climate scientists and others who specialize in assessing societal risks.
[Ah shit! Because I was concentrating on writing on this I didn’t notice that they were boarding my connecting flight from Detroit to Boulder where I’m supposed to a lecture on this study. . . . Whoops! I’ll pay more attention when the next flight is getting ready to go.]
Will have more to say about this new study "tomorrow"-- but if anyone wants to get a head start in commenting/questioning/qualifying/annihilating, dive in.
“Ideology” or “Situation Sense”? An Experimental Investigation of Motivated Reasoning and Professional Judgment
Dan M. Kahan, David Hoffman, Danieli Evans, Neal Devins, Eugene Lucci, and Katherine Cheng
This paper reports the results of a study on whether political predispositions influence judicial decisionmaking. The study was designed to overcome the two principal limitations on existing empirical studies that purport to find such an influence: the use of nonexperimental methods to assess the decisions of actual judges; and the failure to use actual judges in ideologically-biased-reasoning experiments. The study involved a sample of sitting judges (n = 253), who, like members of a general public sample (n = 800), were culturally polarized on climate change, marijuana legalization and other contested issues. When the study subjects were assigned to analyze statutory interpretation problems, however, only the responses of the general-public subjects and not those of the judges varied in patterns that reflected the subjects’ cultural values. The responses of a sample of lawyers (n = 217) were also uninfluenced by their cultural values; the responses of a sample of law students (n = 284), in contrast, displayed a level of cultural bias only modestly less pronounced than that observed in the general-public sample. Among the competing hypotheses tested in the study, the results most supported the position that professional judgment imparted by legal training and experience confers resistance to identity-protective cognition—a dynamic associated with politically biased information processing generally—but only for decisions that involve legal reasoning. The scholarly and practical implications of the findings are discussed.
Once again, Tamar Wilner shares illuminating insights with a response paper, this one on Session 9, the readings for which were designed to help us formulate advice on how science communicators should promote constructive public engagement with synthetic biology (in the event the public ever does engage).
I'm reproducing the beginning of her post below (no more than is consistent with the federal Copyright Act's "fair use" provisions; everything that happens on this blog, in contrast to the ones of our so-called "competitors," is in strict compliance with the law), and then linking over to her site for the rest.
There's already great discussion going on in connection with the "virtual class" post, so post your own thoughts there!
Was a super great audience, brimming with knowledge, intelligence & curiosity.
I’ve given talks before on the Measurement Problem & its significance for science communication.
But in this one for the first time I gave a pretty central place to the “Pakistani Dr” paradox—the apparently simultaneous belief & disbelief in one or another scientific proposition (human evolution, human-caused climate change).
Indeed, Everhart & Hameed’s Pakistani Dr only arrived after the “Measurement Problem” study was done, to try to help me answer the question a perceptive audience member asked after I gave a lecture at RMIT University last summer. . . . The Dr’s helped a lot, but for sure I remain perplexed.
The audience members yesterday were aroused and agitated by him, and particularly by his buddy the Kentucky Farmer.
There was the usual impulse to try to explain away the paradox—one’s involving either specifying the propositions believed/disbelieved in more fine-grained ways (“micro- vs. macro-evolution”; “scientists say that, but they are wrong”) or positing unrevealed attitudes (“he doesn’t really disbelieve evolution—he’s just saying that”; “FYATHYRIO” ; “hypocritical selfish bastard acting on basis of self-interest” etc.) that dissolve the apparent contradiction.
That's understandable. It’s everyone’s first instinct, and isn’t necessarily the wrong answer! But as I tried to explain, I think we should resist the impulse to accept those “solutions” too readily, lest they preempt valid empirical inquiry into the range of plausible hypotheses.
Actually, as far as I could tell, everyone readily agreed with me when I raised that point.
I, of course, found myself engaged in a kind of “cheerleading” for my favorite conjecture—the “pragmatic dualism” position, I guess I’d call it (not b/c that is a very good label but b/c it’s as good as anything else I can think of for now).
On this account, the appearance of contradiction reflects a mistaken model of how “beliefs” figure in reasoning.
The mistaken model is that “beliefs” are mental objects akin to factual or empirical propositions that can be identified exclusively with their states-of-affairs referents: “natural history of humans” or “scientific theory of same originating in work of Darwin"; “global temperature trends over last decade” and “impact of burning fossil fuels on the same.”
It makes sense to treat “facts” (essentially) that way for purposes of scientific inquiry, and “beliefs” about them as summaries of our assessment of the best available scientific evidence, I agree.
But “inside of people’s heads” it doesn’t make sense to think of “beliefs” being isolated proposition bits switched to either a “true” (“1”) or “false”(“0”) position.
Rather, “beliefs” in states of affairs are always parts of a bundles of intentional states that include not just assessments of such propositions but also affective reactions to them that reflect their significance and that incline one to particular courses of action (Damasio 2010; Lewandwosky 2000; Elga & Rayo 2014).
“Knowing that” is always part of a “knowing how”—for psychological purposes.
There is no way, on this account, to individuate a belief as a “mental object” abstracted from the action-enabling bundles of intentional states that they are part of.
Because beliefs can’t be individuated independently of the actions they enable, then there’s no necessary “contradiction” in both “believing” and “not believing” propositions about external states of affairs. There would be a contradiction only if the kinds of things that a person is enabled to do by the bundles of intentional states that contain those opposing "beliefs" themselves interefered with one another.
Hameed’s Pakistani Dr is enabled to be a doctor—enabled to practice medicine and experience sense of identity as a part of a science-based profession—by believing in evolution.
He is also enabled to be a part of a certain religious community by disbelieving that particular account of the natural history of human beings.
What’s the problem, he keeps asking us? I am both of those things—and there’s no tension, in the life I lead (in the society in which I live) in doing so.
Similarly for Kentucky Farmer. He is enabled to be a certain kind of person—a “hierarchical individualist,” let’s say—by “disbelieving human-caused climate change.” But he is also enabled to be a successful farmer by “believing in human-caused climate change”—by using, in fact, the best available information on how human activity is affecting the climate so that he can make sensible decisions about his farming practices (no-till farming, crop-rotation, use of genetically modified seeds, etc.) and about conducting his commercial operations (buying crop-failure insurance, etc.).
Big deal, he says. I do both of those things—and they fit together for me just fine.
I don’t know if this is right. I’d like to figure out experiments for testing this & other plausible conjectures about “what’s going on in their heads.”
But one thing that I realized might be making people resist this account at the workshop wasn’t the implausibility of it.
On the contrary, it was the very likelihood that this might be exactly what is happening.
The objection, for some, I think, was less to the apparent “contradiction” in the Kentucky Farmer’s “beliefs” (he came in for the most critical attention at the workshop). Rather it was to what he was being enabled to do by his particular “knowing that”/“knowing how” clusters on climate change.
People were distressed, in particular, by the absence in him of a bundle of action-enabling intentional states containing “belief in climate change” that was geared toward impelling him to demand a particular set of policies relating to the mitigation via putting restrictions on various sorts of commercial and market behavior in the US and other countries.
If Ky Farmer had "belief in human-caused climate change" within a cluster of intentional states that impelled him to demand that, I doubt these critics would have cared much if he, like the vast majority of people who have that bundle of intentional states, actually didn't know even the most rudimentary aspects of climate change science.
In other words, at least some people weren't really objecting to the "irrationality" of the Kentucky Farmer's beliefs. They just didn’t like the person that his “beliefs” were rationally enabling him to be.
They are entitled to feel that way!
But I do think it is useful to recognize that that's what the objection is.
Or in any case, it occurred to me that this might be one way to make sense of how others were making sense of the Kentucky Farmer and the Pakistani Dr.
I could be wrong about that.
As I said, I’m perplexed—and curious what others think (and of course for guidance to treatments of these issues by thoughtful people who have already investigatd them)!
From something I'm working on....
Who is fooling whom?
Identity-protective cognition is a species of motivated reasoning that consists in the tendency of people to conform disputed facts (particularly ones relevant to political controversies) to positions associated with membership in one or another affinity group. I will present evidence—in the form of correlational studies, standardized assessment tests, and critical-reasoning experiments—that show that identity-protective cognition is not a consequence of over-reliance on heuristic information processing. On the contrary, proficiency in one or another aspect of critical reasoning magnifies individuals’ tendency to selectively credit evidence in a manner that conforms to the position associated with their group identity. The question I want to frame is, Which of these two conclusions is more supportable: that individuals who engage in this form of information processing are using their reason to fool themselves; or that we (those who study them) are fooling ourselves about what these individuals are actually using their reason to do?
A thoughtful correspondent writes:
I am a physician . . . I was reading an article on Vox debunking the theory which states that more information makes people smarter. This article referenced your study concluding that those with the most scientific literacy and technical reasoning ability were less likely to be concerned about climate change and the safety of nuclear energy.
I read the paper which shows this quite nicely.
I am confused about the conclusions. I scored a perfect score on the science literacy test and on a technical reasoning test as well. I do not believe climate change is a settled science and I believe nuclear power is the safest form of reliable energy available.
The conclusion that I am biased by my scientific knowledge is strange.
In medical experiments data are scientifically gathered and tabulated. Conclusions are used as a way to explain the data. Could an alternate conclusion be reached that scientific and reasonable people downplay the danger of climate change and nuclear power precisely because we are well informed and able reason logically? It seems just as likely a conclusion as the one you reached yet it was never discussed.
Thanks for these thoughtful reflections. They deserve a reciprocally reflective and earnest response.
1st, I don't think the methods we use are useful for explaining individuals. In the study you described, they identify in large samples patterns that furnish more support than one would otherwise have for the inference that some group-related influence or dynamic is at work that helps to explain variance in genera.
One can then do additional studies, experimental in nature (like this & this), that try to help to furnish even more support for the inference -- or less, since that is what a valid study has to be in the position to do to be valid.
But once one has done that, all one has is an explanation for some portion of the variance in groups of people. One doesn't have an explanation all the variance (the practical & not & merely "statistical" significance of which is what a reflective person must assess). One doesn't have an instrument that "diagnoses" or tells one why any particular individual believes what he or she does.
And most important of all you don't have a basis for saying anyone on any of the issues one is studying is "right" or "wrong": to figure that out, do a valid study on the issue on which people like this disagree; then do another & another & another & another. And compare your results w/ others doing the same thing.
2d, I don't believe the dynamic we are looking at is a "bias" per se. Things are more complicated than, at least for me!
I'm inclined to think that the dynamics that we observe generating polarization in our studies are the very ones that normally enable people to figure out what is known by science.
They are also the very same processes that enable people to effectively use information for another of their aims, which is to form stances and positions on issues that evince commitments that they care about and that connected them to others. That is a matter that is cognitively demanding as well -- & of course one that that most people, even ones who don't get a perfect score on "science comprehension" tests, possess the reasoning proficiency that it takes to perform it.
What to make of the situations, then, in which that same form of reasoning generates states of polarization on facts that admit of empirical inquiry is a challenging issue -- conceptually, morally & psychologically? This is very perplexing to me!
I suspect sometimes it reflects the experience of a kind of interference between or confounding of mental operations that serve one purpose and those that serve another. That in effect, the "science communication environment" has become degraded by conflicts between the stake people have in knowing what's known & being who they are.
At others, times it might simply be that nothing is amiss from the point of view of the people who are polarized; they are simply treating being who they are as the thing that matters most for them in processing information on the issue in question. . . .
3d, notwithstanding all this, I don't think our studies admit of your "alternate conclusion": that "scientific and reasonable people downplay the danger of climate change and nuclear power precisely because we are well informed and able reason logically."
The reason is that that's not what the data show. They show that those highest in one or another measure of science comprehension are the most polarized on a small subset of risk issues including climate change.
That doesn't tell us which side is "right" & which "wrong."
But it tells us that we can't rely on what would otherwise be a sensible heuristic -- that the answer individuals with those proficiencies are converging on is most likely the right answer. Because again, those very people aren't converging; on the contrary, they are the most polarized.
Many people write to me suggesting that an "alternative explanation" for our data is that "their side" is right.
About 50% of the time they are part of the group whose group is "climate skeptical" & the other half of the time the one that is "climate nonskeptical" (I have no idea what terms I'm supposed to be using for these groups at this point; if they hold a convention and vote on a preferred label, I will abide by their decisions!).
I tell them every time that can’t actually be what the data are showing—for all the reasons I’ve just spelled out.
Some fraction (only a small one, sadly), say "ah, yes, I see."
I can't draw any inferences, as I said, about the relationship between their "worldviews" & how they are thinking.
I have no information about their scors on "science comprehension" or "critical reasoning" tests.
But at that point I can draw an inference about their intellectual character: that they possess the virtue of being able and willing to recognize complexity.
This is from something I've been working on. For a long time. The paper of which it is a part will be posted soon.
But for now I am treating it as the final installment of a 3-part series on the relevance of dual-process reasoning theories to science communication. As I'm sure all 14 billion regular readers of this blog recall, the first installment appeared on July 19, 2013, and the second on July 24, 2013.
Even as of that period, I had been working on this project for a long time. . . .
II. Information Processing, Pattern Recognition, and Professional Judgment
Legal training and practice can reasonably be understood to cultivate proficiency in conscious, analytical forms of reasoning. Thus, the work on “motivated System 2 reasoning”—the tendency of conscious, effortful information processing to magnify identity-protective cognition—might in fact be regarded as supplying the strongest support for the conjecture that unconscious cultural partisanship can be expected to subvert judicial neutrality.
Nevertheless, when judges decide cases, they are not merely engaging in conscious, effortful information processing: they are exercising professional judgment. Professional judgment consists, essentially, in habits of mind—conscious and effortful to some degree, but just as much tacit and perceptive—that are distinctively fitted to reasoning tasks the nature of which falls outside ordinary experience. Indeed, it is characterized, in many fields, by resistance to all manner of error, including ones founded on heuristic information processing, that would defeat the special form of decision that professional judgment facilitates.
The dominant scholarly account of professional judgment roots it in the dynamic of pattern recognition. Pattern recognition consists in the rapid, un- or pre-conscious matching of phenomena with mentally inventoried prototypes. A ubiquitous form of information processing, pattern recognition is the type of cognition that enables human beings reliably to recognize faces and read one another’s’ emotions. But it is also the basis for many forms of highly specialized forms of expert decisionmaking. Highly proficient chess players, for example, outperform less proficient ones not by anticipating and consciously simulating a longer sequence of potential moves but by more reliably perceiving the relative value of different board positions based on their prototypical affinity to ones learned from experience to confer an advantage to one player or another. Likewise, the proficiency of aerial photography analysts consists in their tacit ability to discern prototypical clusters of subtle cues that allow them to cull from large masses of scanned images ones that profitably merit more fine-grained analysis. Forensic accountants must use the same form of facility as they combing through mountains of records in search of financial irregularities or fraud.
Expert medical judgment supplies an especially compelling and instructive example of the role of pattern recognition. Without question, competent medical diagnosis depends on the capacity to draw valid inferences from myriad sources of evidence that reflect the correlation between particular symptoms and various pathologies—a form critical reasoning that figures in System 2 information processing. But studies have shown that an appropriately attuned capacity for pattern recognition plays an indispensable role in expert medical diagnosis, for unless a physician is able to form an initial set of plausible conjectures—based on the match between a patient’s symptoms and an appropriately stocked inventory of disease prototypes—the probability that the physician will even know to collect the evidence that enables a proper diagnosis will be unacceptably low.
The proposition that pattern recognition plays this role in professional judgment generally is most famously associated with Howard Margolis. Focusing on expert assessment of risk, Margolis described a form of information processing that differs markedly from the standard “System 1/System 2” conception of dual process reasoning. The latter attributes proficient risk assessment to an individual’s capacity and disposition to “override” his or her unconscious System 1 affective reactions with ones that reflect effortful System 2 assessments of evidence. Margolis, in contrast, suggests an integrated and reciprocal relationship between unconscious, perceptive forms of cognition, on the one hand, and conscious, analytical ones, on the other. Much as in the case of proficient medical diagnosis, expert risk assessment demands reliable, preconscious apprehension of the phenomena that merit valid analytical processing. Even then, the effective use of data generated by such means, Margolis maintains, will depend on the risk expert’s reliable assimilation of such evidence to an inventory of prototypical representations of cases in which the appropriate data were given proper effect. Of course, the quality of an expert’s pattern recognition capacity will depend heavily on his or her proficiency in conscious, analytical reasoning: that form of information processing, employed to assess and re-assess successes and failures over the course of the expert’s training and experience, is what calibrates the experts’ perceptive faculty.
To translate Margolis’s account back into the dominant conception of dual-process reasoning, System 2 gets nowhere—because it is not reliably activated—without a discerning System 1 faculty of perception. The reliability of System 1, however, reflects the contribution System 2 makes to the process of continual self-evaluation that imparts perceive judgments with their reliability.
Karl Llewellyn suggested an account of the reasoning style of lawyers and judges very much akin to Margolis’s view of the professional judgment for risk experts. Although Llewellyn is often identified as emphasizing the indeterminacy of formal legal rules and doctrines, the aim of his most important works was to explain how there could be such a tremendously high degree of consensus among lawyers and judges on what those rules and doctrines entail. His answer was “situation sense”—a perceptive faculty, formed through professional training and experience, that enabled lawyers and judges to reliably assimilate controversies to “situation types” that include within them their own proper resolutions. Llewellyn discounted the emphasis on deductive logic featured in legal argumentation. But he did not dismiss such reasoning as mere confabulation; in his view, lawyers and judges (legislators, too, in drafting rules) employed formal reasoning to prime or activate the “situation sense” of other lawyers and judges—the same function that Margolis sees it as playing in professional discourse among risk experts and indeed in any setting in which human being resort to it.
Margolis also identified the role that pattern recognition plays in professional judgment to explain expert-public conflicts over risk. Lacking the experience and training of experts, and hence the stock of prototypes that reliably guide expert risk assessment, members of the public, he argued, were prone to one or another heuristic bias. By the same token, the experts’ access to those prototypes reliably fixes their attention on the pertinent features of risks and inure them to the features that excite cognitive biases on the part of the lay public.
Based on the role of pattern recognition in professional judgment, one might make an analogous claim about judicial and lay judgments in culturally contested legal disputes. On this account, lawyers’ and judges’ “situation sense” can be expected to reliably fix their attention on pertinent elements of case “situation types,” thereby immunizing them from the distorting influence that identity-protective cognition exerts on the judgments of legally untrained members of the public. It is thus possible the professional judgment of the judge, as an expert neutral decisionmaker, embodies exactly the form of information processing most likely to counteract identity-protective reasoning, including the elements of it magnified by System 2 reasoning.
Tamar Willner has posted another very perceptive and provocative essay in reaction to the readings for Science of Science Communication 2.0, this time in relation to Session 8, on “emerging technologies.” I’ve posted the first portion of it, plus a link to her site for continuation.
She also posed a very interesting question in the comments about an experiment that CCP did on nanotechnology risk perceptions. I’ve posted my answer to her question below the excerpt from her own post.
1. Tamar Wilner on studying perceived risks of emerging technologies ...
2. Q&A on a CCP study of nanotechnology risk perceptions
[I]n your paper (Cultural Cognition of the Risks and Benefits of Nanotechnology) you say, “The ‘cultural cognition’ hypothesis holds that these same patterns [cultural polarization] are likely to emerge as members of the public come to learn more about nanotechnology.” But in your blog you repeatedly make the point that only a minority of public science topics end up getting polarized - that such polarization is “pathological” in its rarity. Why then did you hypothesize that such a pattern would be likely to emerge for nanotech?
I noticed that you start to address this later in the paper when you say, “At the same time, nothing in our study suggests that cultural polarization over nanotechnology is inevitable…” and point out that proper framing can help people to extract factual information. Does this indicate that the passages used in your study employed framing likely to encourage polarization? They seem to use pretty neutral language, to me. What about them makes them polarizing - and is it possible that some polarizing language is unavoidable? For example it seems like just talking about "risks of a new technology" tape into certain egalitarian/communitarian sensibilities, but since that's exactly what the topic of discussion is, I don't see how you would avoid it.
This is a great question. It raises some important general issues & also gives me a chance to say some things that how my own views of the phenomenon of cultural contestation over risk have evolved since performing the study.
The main motivation for the study, actually, was a position that we characterized as the “familiarity hypothesis”: that as people learned more about nanotechnology, their views were likely to be positive.
This was an inference from the a consistent survey finding that although only a small percentage of the public reports having heard of nanotechnology, those who say they have tend to express very favorable views about the ratio of benefits to risks that it is likely to involve.
That inference is specious: there is obviously something unusual about people who know about a technology 80% of the rest of the public is unfamiliar with; it reflects poor reasoning not to anticipate that whatever is causing them to become familiar with a novel technology might also dispose them to form a view that others who lack their interest in technology might fail to form when they eventually learn about a novel form of science.
Our hypotheses, largely corroborated by the study, was that those who were already familiar with nanotechnology (or actually, simply saying they were familiar; the surveys were using self-report measures) were likely people with a protechnology “individualist” cultural outlook, and that when individuals with anti-technology “egalitarian communitarian” ones were exposed to information on nanotechnology they would likely form more negative reactions.
But Tamar’s perceptive question is why did we expect people unfamiliar with a technology to react at all when exposed to such a small amount of info?
As she notes, only a small minority of potentially risky technologies excite polarization. People tend to overlook this fact b/c the they understandably fixate on those and ignore the vast majority of noncontroversial ones.
My answer, basically, is that I don’t think the research team really had a good grasp of that point at the time we did the study. I know I didn’t!
I think, actually, that I really did mistakenly believe that culturally infused and hence opposing reactions to putative risk sources was “the norm,” and that it was therefore likely our subjects would polarize in the way they did.
Looking back, I’d say the reason it was reasonable to expect subjects would polarize is that the study was putting them in the position of consciously evaluating risks and benefits.
On the vast majority of putative risk sources on which there isn’t any meaningful level of polarization—from pasteurization of milk to medical x-rays to cell phone radiation to high-power transmission lines etc.—people don’t consciously think anything; they just model their behavior on what they see other people like them doing; when they do so, it’s rare for them to observe signs that give them reason to think there is anything to worry about.
Perfectly sensible approach, in my view, given how much more information known to science it makes sense to use in our lives than we have time to make sense of on our own.
But as I said, the study subjects were being prompted to do conscious risk assessment. Apparently, in doing that, they reliably extracted from the balanced risk-benefit information culturally affective resonances that enabled them to assimilate this novel putative risk source—nanotechnology—to a class of risks, environmental ones, on which members of their group are in fact culturally polarized.
Being made to expect, in effect, that there would be an issue here, the subjects reliably anticipated too what position “people like them” culturally speaking would likely take.
This interpretation raises a second point on which my thinking has evolved: the external validity of public opinion studies of novel technologies.
This was (as Tamar’s excellent blog post on the readings as a whole discusses) a major theme of the readings. Basically, when pollsters ask people their views on technological risks about which members of the public have never heard and don’t have discussions about in their daily lives, they aren’t genuinely measuring a real-world phenomenon.
They are, in effect, modeling how people react to the strange experience of being asked questions about something they have not thought about. To pretend that one can draw inferences from that to what actual people in the world are truly thinking is flat-out bogus. Serious social science researchers know this is a mistake; news-maker and advocacy pollsters either don’t or don’t care.
One can of course try to anticipate how people—including ones with different cultural outlooks might react to an emerging technology when they do learn about it. Indeed, I think that is a very sensible thing to do; the failure to make the effort can result in disaster, as it did in the case of the HPV vaccine!
But to perform what amounts to a risk-perception forecasting study, one must use an experimental design that it is reasonable to think will induce in subjects the reaction that people in the real world will form when they learn about the technology—or could form depending on how they learn about it. That is what one is trying to model.
A simple survey question—like the one Pew asked respondents about GM foods in its recent public attitudes study—cannot plausibly be viewed as doing that. The real-world conditions in which people learn things about a new technology will be much richer—much more dense with cues relating to the occasions for discussing an issue, the setting in which the discussion is being had, and the identity and perceived motivations of the information sources—than are accounted for in a simple survey question.
I think it is possible to do forecasting studies that reasonable people can reasonably rely on. I think our HPV vaccine risk study, e.g., which tried to model how people would likely react depending on whether they learned about the vaccine in conditions that exposed them to cues of group-conflict or not was like that.
But I think it is super hard to do it.
Frankly, I now don’t think our nanotechnology experiment design was sufficiently rich with the sorts of contextual background to model the likely circumstances in which people would form nanotechnology risk perceptions!
The study helped to show that the “familiarly hypothesis,” as we styled it, was simplistic. It also supported the inference that it was possible people might assimilate nanotechnology to the sorts of technological-risk controversies that now polarize members of different groups.
But the stimulus was too thin to be viewed as modeling the conditions in which that was actually likely to happen..
We should be mindful of hindsight bias, of course, but the fact that nanotechnology has not provoked any sort of cultural divisions in what is now approach two decades of its use in commercial manufacturing helps show the limited strength of inferences on the likelihood of conflict that can be drawn from experiments like the one we did.
As Tamar notes, we were careful in our study to point out that the experimental result didn’t imply that conflict over nanotechnology was “inevitable” or necessarily even “likely.”
But I myself am very willing—eager even—to acknowledge that we viewed the design we used as more informative than it could have been expected to be about the likely career of nanotechnology.
I have acknowledged this before in fact.
In doing so, too, I pointed out that that doesn’t mean studies like the ones we and other researchers did on nanotechnology risk perceptions weren’t or aren’t generally useful. It just means that the value people can get from those studies depends on researchers and readers forming a valid understanding of what designs of that sort are modeling and what they are not.
In order for that to happen, moreover, that researchers must reflect on their own studies over time to see what the fit between them and experience tells them about what is involved in modeling real-world processes in a manner that is most supportive of real-world inferences.
Speaking for myself, at least, I acknowledge that, despite my best efforts, I cannot guarantee anyone I will always make the right assessment of the inferences that can be drawn from my studies. I can promise, though, that when I figure out that I didn’t, I’ll say so—not just to set the record straight but also to help enlarge understanding of the phenomena that it is in fact my goal to make sense of.
Of course, if a cultural conflagration over nanotechnology ignites in the future, I suppose I’ll have to acknowledge the “me” I was then then had a better grasp of things than the “me” I am now; I doubt that will happen—but life, thank goodness, is filled with surprises!