follow CCP

Recent blog entries
popular papers

Science Curiosity and Political Information Processing

What Is the "Science of Science Communication"?

Climate-Science Communication and the Measurement Problem

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

'Ideology' or 'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment

A Risky Science Communication Environment for Vaccines

Motivated Numeracy and Enlightened Self-Government

Making Climate Science Communication Evidence-based—All the Way Down 

Neutral Principles, Motivated Cognition, and Some Problems for Constitutional Law 

Cultural Cognition of Scientific Consensus
 

The Tragedy of the Risk-Perception Commons: Science Literacy and Climate Change

"They Saw a Protest": Cognitive Illiberalism and the Speech-Conduct Distinction 

Geoengineering and the Science Communication Environment: a Cross-Cultural Experiment

Fixing the Communications Failure

Why We Are Poles Apart on Climate Change

The Cognitively Illiberal State 

Who Fears the HPV Vaccine, Who Doesn't, and Why? An Experimental Study

Cultural Cognition of the Risks and Benefits of Nanotechnology

Whose Eyes Are You Going to Believe? An Empirical Examination of Scott v. Harris

Cultural Cognition and Public Policy

Culture, Cognition, and Consent: Who Perceives What, and Why, in "Acquaintance Rape" Cases

Culture and Identity-Protective Cognition: Explaining the White Male Effect

Fear of Democracy: A Cultural Evaluation of Sunstein on Risk

Cultural Cognition as a Conception of the Cultural Theory of Risk

« Bounded rationality, unbounded out-group hate | Main | Another “Scraredy-cat risk disposition”™ scale "booster shot": Childhood vaccine risk perceptions »
Thursday
Apr282016

Hey, everyone! Try your hand at graphic reporting and see if you can win the Gelman Cup!

Score!

Former Freud expert & current stats legend  Andrew Gelman posted a blog (one he likely wrote in the late 1990s; he stockpiles his dispatches, so probably by the time he sees mine he'll have completely forgotten this whole thing, & even if he does respond I’ll be close to 35 yrs. old  by then & will be interested in other things like drinking and playing darts) in which he said he liked one of my graphics!

Actually, he said mine was “not wonderful”—but that it kicked the ass of one that really sucked!

USA USA USA USA!

Alright, alright.

Celebration over.

Time to get back the never-ending project of self-improvement that I’ve dedicated my life too.

The question is, How can I climb to that next rung—“enh,” the one right above “not wonderful”?

I’m going to show you a couple of graphics. They aren’t the same ones Gelman showed but they are using the same strategy to report more interesting data.  Because the data are more interesting (not substantively, but from a graphic-reporting point of view), they’ll supply us with even more motivation to generate a graphic-reporting performance worthy of an “enh”—or possibly even a “meh,” if we can get really inspired here.

I say we because I want some help.  I’ve actually posted the data & am inviting all of you—including former Freud expert & current stats legend Gelman (who also is a bully of WTF study producers , whose only recourse is to puff themselves up to look really big, like a scared cat would)—to show me what you’d do differently with the data.

Geez, we’ll make it into a contest, even!  The “Gelman Graphic Reporting Challenge Cup,” we’ll call it, which means the winner will get—a cup, which I will endeavor get Gelman himself to sign, unless of course he wins, in which case I’ll sign it & award it to him!

Okay, then. The data, collected from a large nationally representative sample, shows the relationship between religiosity, left-right political outlooks, and climate change.  

It turns out that religiosity and left-right outlooks actually interact. That is, the impact of one on the likelihood someone will report “believing in” human-caused climate change depends on the value of the other.

Wanna see?? Look!!

That’s  a scatter plot with left_right, the continuous measure of political outlooks, on the x-axis, and “belief in human-caused climate change” on the right.

Belief in climate change is actually a binary variable—0 for “disbelief” and 1 for “belief.”

But in order to avoid having the observations completely clumped up on one another, I’ve “jittered” them—that is, added a tiny bit of random noise to the 0’s and 1’s (and a bit too for the left_right scores) to space the observations out and make them more visible.

Plus I’ve color-coded them based on religiosity!  I’ve selected orange for people who score above the mean on the religiosity scale and light blue for those who score below the mean. That way you can see how religiosity matters at the same time that you can see that political outlook matters in determining whether someone believes in climate change.

Or at least you can sort of see that. It’s still a bit blurry, right?

So I’ve added the locally weighted regression lines to add a little resolution.  Locally weighted regression is a nonmodel way to model the data. Rather than assuming the data fit some distributional form (linear, sigmoidal, whatever) and then determining the “best fitting” parameters consistent with that form, the locally weighted regression basically slices the x-axis predictor  into zillions of tiny bits, with individual regressions being fit over those tiny little intervals and then stitched together.

It’s the functional equivalent of getting a running tally of the proportion of observations at many many many contiguous points along left_right (and hence my selection of the label “proportion agreeing” on the y-axis, although “probability of agreeing” would be okay too; the lowess regression can be conceptualized as estimating that). 

What the lowess lines help us “see” is that in fact the impact of political outlooks is a bit more intense for subjects who are “low” in religiosity. The slope for their S-shaped curve is a bit steeper, so that those at the “top,” on the far left, are more likely to believe in human-caused climate change. Those at the “bottom,” on the right, seem comparably skeptical.

The difference in those S-shaped curves is what we can model with a logistic regression (one that assumes that the probability of “agreeing” will be S-shaped in relation to the x-axis predictor).  To account for the possible difference in the slopes of the curve, the model should include a cross-product interaction term in it that indicates how differences in religiosity affect the impact of differences in political outlooks in “believing” in human-caused climate change.

Okay, it's important to report this. But if someone gives you *nothing* more than a regression output when reporting their data ... well, make them wish they had competed for & won a Gelman Cup...I’ve fit such a model, the parameters of which are in the table in the inset.

That  regression actually corroborates, as it were, what we “saw” in the raw data: the parameter estimates for both religiosity and political outlooks “matter” (they have values that are practically and statistically significant), and so does the parameter estimate for the cross-product interaction term.

But the output doesn’t in itself doesn’t show us what the estimated relationships  look like. Indeed, precisely because it doesn’t, we might get embarrassingly carried away if we started crowing about the “statistically significant” interaction term and strutting around as if we had really figured out something important. Actually, insisting that modelers show their raw data is the most important way to deter that sort of obnoxious behavior but graphic reporting of modeling definitely helps too.

So let’s graph the regression output:

 

Here I’m using the model to predict how likely a person who is relatively “high” in religiosity—1 SD above the population mean—and a person who is relatively “low”—1 SD below the mean—to agree that human-caused climate change is occurring.  To represent the model’s measurement precision, I’m using solid bars—25 of them evenly placed—along the x-axis.

Well, that’s a model of the raw data.

What good is it? Well, for one thing it allows us to be confident that we weren’t just seeing things.  It looked like there was  a little interaction between religiosity and political outlooks. Now that we see that the model basically agrees with us—the parameter that reflects the expectation of an interaction is actually getting some traction when the model is fit to the data—we can feel more confident that’s what the data really are saying (I think this is the right attitude, too, when one hypothesized the observed effect as well as when one is doing exploratory analysis).  The model disciplines the inference, I’d say, that we drew from just looking at the data.

Also, with a model, we can refine, extend,  and appraise  the inferences we draw from the data. 

You might say to me, e.g., “hey, can you tell me  how much more likely a nonreligious liberal Democrat to accept human-caused climate change than a religious one?”

I’d say, well, about “12%, ± 6, based on my model.”  I’d add, “But realize that even the average religious liberal Democrat is awfully likely to believe in human-caused climate change—73%, ± 5%, according to the model.”

“So there is an interaction between religiosity & political outlooks, but it's nothing to get excited about--the way somone trained only to look at  the 'significance' of regression model coefficients might -- huh?” you’d say.

“Well, that’s my impression as well. But others might disagree with us. They can draw their own conclusions about how important all of this is, if they look at the data and use the model to make sense of it .”

Or whatever!

Now. 

What’s Gelman’s reservation? How come my graphic rates only “not awful” instead of “enh” or “meh”?

He says “I think all those little bars are misleading in that they make it look like it’s data that are being plotted, not merely a fitted model . . . .”

Hm. Well, I did say that the graphic was a fitted model, and that the bars were 0.95 CIs.

The 0.95 CIs *could* mislead people --if they were being generated by a model that didn't fairly convey what the actual data look like. But that's why one starts by looking at, and enabling others to see, what the raw data “look like.”

But hey--I don’t want to quibble; I just want to get better!

So does anyone have a better idea about how to report the data?

If so, speak up. Or really, much much better, show us what you think is better.

I’ve posted the data.  The relevant variables are “left_right,” the continuous political outlook scale; “religiosity,” the continuous religiosity scale; and “AGW,” belief in climate human-caused-climate change =1 and disbelief = 0. I’ve also included “relig_category,” which splits the subjects at the mean on religiosity (0 = below the mean, 1 = above; see note below if you were using "relig" variable).  Oh, and here's my Stata .do file, in case you want to see how I generated the analyses reported here.

So ... either link to your graphics in the comments thread for this post or send them to me by email.  Either way, I’ll post them for all to see & discuss.

And remember, the winner—the person who graphically reports the data in a way that exceeds “not wonderful” by the greatest increment-- will get the Gelman Cup! 

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (7)

What about a contour plot as a solution? (I'll forego an official entry because I would specifically like to AVOID winning that prize). Religiosity is getting reduced to an attribute variable, potentially losing something in the translation, no? Both your charts show high religion = lower acceptance of climate change, so on a contour plot, I'd expect to see contours sloping from low religion/liberal to high religion/conservative.

When I tried it in excel, it doesn't look like that, although perhaps the effect along the religion scale is far too subtle to exhibit itself in a contour plot? That said, I think the idea would be helpful in as much as like we see a slope on the political beliefs axis we ought to see one (say, rather than a step change) on the religiosity one too.

April 29, 2016 | Unregistered CommenterAdam Schwartz

@Adam--

Wow.

Believ it or not, the regulations of the Macao Gaming Commission (which oversees all CCP contests) oblige me to reverse the general relationship between entries and prize elligibiligy in your case as a penalty for disparaging "the dignity and honor of CCP or any other Macao-Gaming related entitly." Less technically, you will be obliged to accept the prize now unless you submit a winning entry!

But for sure, a contour plot would be *great*!

This is in fact a graphic issue -- the s need for 3-dimensional graphing techniques where the interaction of continuous predictors are in play-- that igured in previous graphic compeitions--ones that predate formatio nof the Gelman Graphic REporting Challenge Cup series.

We didn't get to a resolution of the problem. But we didn't have the incentive of the "Cup" then.

Now you have it, even though it was necessary to "reverse code" it in your cae.

Actually, maybe the dataset from the previous event would be a more interesting one for you to work with? That would be fine!

Or you could use the data from Kahan, D.M. ‘Ordinary science intelligence’: a science-comprehension measure for study of risk and science communication, with notes on evolution and climate change. J Risk Res (2016), 10.1080/13669877.2016.1148067. That article has *lots* of e.g.s of the "not wonderful" graphic w/ the color coded 0.95 CI bars involving 2-way interactions between continuous variables, themselves being used to predict a continuous variable outcome -- scores on science comprehension scale. They involve much more dramatic instances of the relationship depicted in the graphic in this post. Shold have picked one of those!

April 29, 2016 | Registered CommenterDan Kahan

This is indeed a predicament I find myself in. If my graphic was a winner prior to my declaration I would've "lost." And now, if I don't win I still "lose"... very well. Here's my submission

As I said, you don't really see the effect on religion on AGW belief. And the extreme corners have very low observation counts, so I'm a bit suspect of any effects there. (I guess you don't run into a lot of liberals who are highly religous or highly conservative folks who are not.)

April 30, 2016 | Unregistered CommenterAdam Schwartz

@Adam Schowartz--

wow. I think you are taking us inside Feynman's mind...

You are now atop the leader board! (and free, momentarily, from jeopardy of being awarded the cup!)

To be clear: the contours convey information about relative density of observations, right? And the colors refer to probability of AGW = 1?

Did you do this w/ *excel*? Can you share .xls sheet if so? Or if w/ other program, the code?

You might well find that the the data for Kahan, D.M. ‘Ordinary science intelligence’: a science-comprehension measure for study of risk and science communication, with notes on evolution and climate change. J Risk Res (2016), 10.1080/13669877.2016.1148067 -- make for much more visually interesting displays of the cool properties of this.

But if you send me the info on how you did your graphic, I might do it myself for those.

thanks!!

& how is one to understand the darkest green corner in bottom left corner? what is it's depth? h

April 30, 2016 | Registered CommenterDan Kahan

It's just an excel contour plot. The slight varying in the color bands is excel's attempt to encode differences within the band, but I think it's goofy. It'd be great if excel had some means (transparency?) to encode the observation counts as well. As it is, each vertex is simply the mean AGW for the intersection of the two independent variables. Btw, despite the thousands of data points, any given x, y value was quite sparsely populated so I binned the data in .5 unit increments just to smooth out the weird spikes caused by single observations at a given x, y . To me, the most interesting feature of the data is the flat patch in the middle of the lighter orange band. Not sure what's up with that. Alas, Gleman's garden of forking paths rears its head and I might be inclined to report something interesting that isn't really there.

Wonder if hex binning the data might be better. Perhaps tomorrow. :)

April 30, 2016 | Unregistered CommenterAdam Schwartz

Hi Dan,
Thanks for posting the shaded CI band ("bounded envelope") version...
As I was saying on Twitter, while I personally find the spikes approach informative, the eyes of most people I've shown your graphs to start to glaze over (especially the less stats-orientated people)... But, when I show them the bounded envelope versions, they almost instantly grasp it & are fascinated!

Meanwhile the more stats-orientated people (which presumably includes many of your ~14 billion readers) seem to find both versions just as informative and fascinating. So, in my opinion, the bounded envelope approach doesn't detract from the interest of the more stats-orientated viewers (they're usually more interested in what the data is saying than in how it's plotted), but dramatically increases the interest of the less stats-orientated viewers (who have almost zero interest in how the data plotted!).

I agree the slight asymmetry of the "point estimates" (i.e., the thick lines between the evelopes) at either extreme can be a bit confusing if you're thinking the lines and bands represent a simple mean +/- CI. But, I don't think it's as big a concern as you were suggesting on Twitter. As I see it, there are at least two ways to address this issue:

1) Just plot the CI bands... without the point estimates! After all, in effect, isn't this what you were doing with the original spike versions? I've found at this stage most people seem to kind of appreciate the concept of an uncertainty envelope. So, in my experience, people don't seem to get as hung up by the absence of a "middle" line in between the bands as you might initially think!

2) Add a simple note explaining that the CI bands are slightly asymmetric about the thick line "point estimates" because the results are derived from a logistical model of the data (or something like that!).

I know you've argued above and elsewhere that you've adopted the spike approach partly because you believe it's "more obvious" that your plots are model-derived (is that an accurate summary?). However, I'm not sure that it's been working. When I show people your spike plots, nobody says to me, "oh cool, I see the spikes aren't joined up - does that mean the results don't represent a straightforward statistical mean +/- CI, but rather are derived from a logistical model of the underlying data?" ;)

Instead, I typically get the reactions I described above, i.e., the more stats-aware people look at the results and are intrigued (but no more than if I showed them the bounded envelope version), while the less stats-aware people get a bored, glazed-over-eyes look, and ask me, "sorry, what am I supposed to be looking at?"

P.S. For what it's worth, I've enjoyed the other "Gelman Cup" entries by Adam Schwartz, Anoneuoid & JoeHilgard along with the accompanying PDDs for the banded evelope. I'm presumably biased in preferring the approach I've advocated, i.e., the simple banded envelope. ;) But, they all are interesting approaches to data visualisation, which have made me think, and are definitely useful for more fully "grokking" the data...

May 11, 2016 | Unregistered CommenterRonan Connolly

Dan,
It seems I was getting a bit reckless with my HTML formatting and by mistake apparently used a "/n" instead of a "/b" closing bracket at one point in my above comment, thereby messing up the formatting! :( Any chance you could fix that mistake?

May 11, 2016 | Unregistered CommenterRonan Connolly

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>