follow CCP

Recent blog entries
« Is cultural cognition a bummer? Part 2 | Main | Is cultural cognition a bummer? Part 1 »
Saturday
Jan212012

R^2 ("r squared") envy

Am at a conference & a (perfectly nice & really smart) guy in the audience warns everyone not to take social psychology data on risk perception too seriously: "some of the studies have R2's of only 0.15...."

Oy.... Where to start? Well how about with this: the R2 for viagra effectiveness versus placebo ... 0.14!

R2 is the "percentage of the variance explained" by a statistical model. I'm sure this guy at the conference knew what he was talking about, but arguments about whether a study's R2 is "big enough" are an annoying, and annoyingly common, distraction. 

Remarkably, the mistake -- the conceptual misundersandings, really -- associated with R2 fixation were articulated very clearly and authoritatively decades ago, by scholars who were then or who have become since giants in the field of empirical methods: 

I'll summarize the nub of the mistake asssociated with R2 fixation but it is worth noting that the durability of it suggests more than a lack of information is at work; there's some sort of congeniality between R2 fixation and a way of seeing the world or doing research or defending turf or dealing with anxiety/inferiority complexs or something... Be interesting for someone to figure out what's going on.

But anyway, two points:

1.  R2 is an effect size measure, not a grade on an exam with a top score of 100%. We see a world that is filled with seeming randomness. Any time you make it less random -- make part of it explainable to some appreciable extent by identifying some systematic process inside it -- good! R2 is one way of characterizing how big a chunk of randomness you have vanquished (or have if your model is otherwise valid, something that the size of R2 has nothing to do with). But the difference between it & 1.0 is neither here nor there-- or in any case, it has nothing to do with whether you in fact know something or how important what you know is.

2. The "how important what you know is" question is related to R2 but the relationship is not revealed by subtracting Rfrom 1.0. Indeed, there is no abstract formula for figuring out "how big" R2 has to be before the effect it mesaures is important. Has extracting that much order from randomness done anything to help you with the goal that motivated you to collect data in the first place? The answer to that question is always contextual. But in many contexts, "a little is a lot," as Abelson says. Hey: if you can remove 14% of the variance in sexual performance/enjoyment of men by giving them viagra, that is a very practical effect! Got a headache? Take some ibuprofen (R2 = 0.02).

What about in a social psychology study? Well, in our experimental examination of how cultural cognition shaped perceptions of the behavior of political protestors, the Rfor the statistical analysis was 0.19. To see the practical importance of an effect size that big in this context, one can compare the percentage of subjects identified by one or another set of cultural values who saw "shoving," "blocking," etc., across the experimental conditions.

If, say, 75% of egalitarian individualists in the abortion-clinic condition but only 33% of them in the military-recruitment center condition thought the protestors were physically intimidating pedestrians; and if only 25% of hierarchical communitarians in the abortion-clinic but 60% of them in the recruitment-center condition saw a protestor "screaming in the face" of a pedestrian--is my 0.19 R2 big enough to matter? I think so; how about you?

There are cases, too, where a "lot" is pretty useless -- indeed, models that have notably high R2's are often filled with predictors the effects of which are completely untheorized and that add nothing to our knowledge of how the world works or of how to make it work better.

Bottom line: It's not how big your R2 is; it's what you (and others) can do with it that counts! 

reference: Meyer, G.J., et al. Psychological testing and psychological assessment: A review of evidence and issues. Am Psychol 56, 128-165 (2001).

 

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (6)

Dan is, of course, right. I’ll add a few other thoughts. First, the obsession with R^2 is understandable in that it is an attractive heuristic and conceptually relatively simple. It is a useful and straightforward summary of a relationship. But that’s all it is. Just as a measure of central tendency (such as the mean) tells you nothing about the shape of the distribution, R^2 tells you only some of the story. Moreover, as Dan points out, you can’t evaluate the importance of a given R2 in a vacuum. Just as it is always important to distinguish between statistical significance and practical significance, an R^2 value must be viewed in context. In some applications, an R^2 of .06 may have no practical significance. Finally, it is worth pointing out that, like model-fit statistics, perhaps the most useful approach to the R^2 statistic is in comparing models or theories. If I have two competing models of a process and one gives me an R^2 that is significantly higher than the other, that is useful information; the absolute sizes of the R^2 may be completely unimportant.

January 25, 2012 | Unregistered CommenterRobert Rocklin

It depends whether you're arguing that your factor *contributes* to the effect, or whether it *explains* the effect. It's perfectly reasonable to say that if an outcome depends 19% on a selected factor and 81% on something else, that you've shown the 19% contributor really *does* contribute. But you can't ignore the 81%, and say your factor is *the* primary explanation of the outcome, which a lot of people try to do - especially when they have a point to make.

You can usually tell that by looking at the title of the paper.

To take another example, let's say you reconstruct a difficult-to-measure quantity by a method that is open to question regarding whether it really does measure the quantity it purports to measure. So you take some out-of-sample observations of the quantity and compare them to your reconstruction of them and find the r-squared is 0.018. Does your reconstruction work, providing you with the quantity you want? No. Is the quantity 'important' in determining the reconstruction's value? Well, it depends on sample sizes and stuff, but quite possibly. You look at another bit, and get an r-squared value of 0.00003. Still think it might be working? Never mind, publish it anyway and keep quiet about the numbers.

The point is that even if your factor shifts the probability from 0.33 to .75, most of the question is about what distinguishes the 33% from the 66% in one category, the 75% from the 25% in the other. You have only answered a small fraction of the question, although you may have answered that fraction rightly. If you are clear that that's what you've done, there's no problem. Your guy in the audience, I suspect, will be thinking mostly of the claims made by all those others.

October 8, 2012 | Unregistered CommenterNiV

@NiV: You say:

The point is that even if your factor shifts the probability from 0.33 to .75, most of the question is about what distinguishes the 33% from the 66% in one category, the 75% from the 25% in the other. You have only answered a small fraction of the question, although you may have answered that fraction rightly....

Actually, I'd say "the question" -- not even most but *all* of it -- was one that existed independently of anything having to do with the data, much less what the R^2 was. It was whether the phenomenon of cultural cognition can generate meaningful differences in perceptions of fact when the basis for those differences is brute sense impression (what people see in a video, e.g.). The answer (subject to questions anyone might raise about the validity of the design; that's independent of R^2, too) is "yup, definitely!"

Or to address your point in your first paragraph, cultural cognition didn't merely "contribute to" but did indeed "explain" the different relationship between subjects' cultural outlooks and their perceptions of fact in the two experimental conditions. The experimental manipulation is the *only* thing that was different in the situation of the subjects in the two conditions; it caused the difference in perception. Being able to assign the cause unambiguously and uniquely in this way is the whole point of doing an experiment.

The R^2 being 0.19 means that there are other things besides cultural outlook that are explaining variance observed in responses of all the subjects -- including variance among individuals with comparable cultural outlooks within each condition. It might be interesting to try to figure out what that is, certainly. But if one does figure that out, it won't detract *at all* from the conclusion that the difference in the relationship between cultural outlooks & perceptions between the two conditions -- the thing that expalins 19% of the variance overall -- is expalined by/attributable to/a result of the experimental manipulation.

And as explained in the post, whether that amount of variance is "important"-- that has to be judged independently of R^2. It's a practical matter. If the sorts of shifts in proportion of subjects, defined by cultural outlook, who saw things one way or the other between conditions seems important to you -- as say, a lawyer trying to pick a jury, or a citizen assessing the power of adjudication to quiet conflict over culturally charged issues even when the evidence is undisputed -- then the result is "important," end of story. Nothing in the R^2 detracts from that.

I think, consistent with what you say, that it would be a mistake for me or anyone else to say that the experiment explained "all" the variance in the responses observed to the outcome variables in the experiment. It explained only 19% of it. But the question "what explained all the variance observed in the responses" *wasn't* the question. The question was -- can cultural cognition affect what people see in a videotape of a political protest?

Indeed, this is the the nub of the mistake that R^2 fetishism makes: it assumes that the only thing one can be trying to do with a regression analysis is account for *all* or *as close to 100% as one can* of the observed variance. That's virtually never either an interesting thing to do or a necessary condition of doing anything interesting.

Disagree?

October 9, 2012 | Unregistered Commenterdmk38

"It was whether the phenomenon of cultural cognition can generate meaningful differences in perceptions of fact when the basis for those differences is brute sense impression"

I'd be inclined to say 'correlation doesn't imply causation' to that, but that's a separate point from the one I was making.

"But if one does figure that out, it won't detract *at all* from the conclusion that the difference [...] is expalined by/attributable to/a result of the experimental manipulation."

I don't disagree. The conclusion is certainly plausible, and the design of the experiment certainly isolates that component and measures it. If you want to know whether culture affects perception, the experiment certainly answers that. But if you want to know *what* affects perception, the answer is at best incomplete.

Suppose hypothetically that people's perceptions depended 20% on their political worldview and 80% on their height. Tall people find protestors shouting less threatening. If asked to explain what caused people to perceive what they did, it would be more accurate to talk about their height than their politics. It's a better explanation. But if you're only interested in politics, then you can arrange the experiment to eliminate the major factor and find the political connection you're looking for, pulling the signal from the 'noise'.

I'm not criticising your particular experiment. It's a worthy thing to do, furthers our understanding, and I don't for one moment doubt your good intentions. I was just using the numbers as an example for discussion. A lot of the public/media presentation of social/epidemiological science findings does tend to trumpet some isolated factor as 'causing' some undesirable outcome, not mentioning that it is only a minor component. I have to say I'm an admirer of yours because you have resisted when others have tried to use your research that way. Nevertheless, I'm pretty sure that's what your conference guy was on about.

I don't disagree with you that r-squared isn't everything. I doubt your conference guy would, either. But it depends on the question, and I think he was thinking about a different sort of question. (Two people can even look at the same experimental result and, because they have different questions in mind, draw completely oppositive conclusions as to its adequacy.) Just as the rest of us shouldn't think r-squared is everything, you shouldn't assume that questioning low r-squared is necessarily or even usually nothing. Both perspectives have value - my intention wasn't to contradict yours, but to add the other viewpoint.

October 9, 2012 | Unregistered CommenterNiV

@NiV: I'm curious what you do have in mind by "correlation doesn't imply causation." As you know, this wasn't a correlational (or "observational") study; it was an experiment, in which we hypothesized and found that the manipulation would generate (i.e., cause) practically & statistically significant differences in the perceptions of subjects int the two experimental treatment groups, conditional on the subjects' cultural worldviews.

October 10, 2012 | Unregistered Commenterdmk38

What I had in mind was that there are potentially several other mechanisms that could explain the observations besides the proposed one. There are many other factors correlated with the conditional variable, and the manipulated variable could affect or conflict with many other correlated cultural and non-cultural factors. It's also not clear from the description here whether it was the perception affected or the reporting. It's possible, for example, that everyone *perceived* the video the same way but partisans reported their perceptions selectively, to maintain consistency with their previously expressed views. They may have sought to manipulate what they perceived to be the experimental outcome for ulterior motives - e.g. if they thought saying protestors against x were violent, it would get reported that way in the paper. It could be that people don't like abortion clinics or military recruitment for the same reason they have a particular political outlook (they have a common cause). They might have moved to the particular political stereotype *because* of their views on abortion or the military (cause and effect reversed). It could be there is a local area where people are particularly opposed to one or other of the protested institutions (e.g. because of recent local events) and coincidentally predominantly of a particular political outlook (e.g. because it's a poor neighbourhood). Another neighbourhood with a different history might see the effect reversed. And so on.

There are an awful lot of psychology experiments along the lines of "if people think a certain way they'll do this in response to that, experiment shows they do this in response to that, therefore they think that certain way". It's *extremely* difficult to design an experiment to eliminate all other possibilities. People are complicated, and they can do things for very unobvious reasons. Even in the physical sciences it is very difficult - people are a nightmare in comparison.

In this case I find the result perfectly plausible and think the explanation offered is very likely, but as a scientist I routinely try to find reasons to doubt. The harder that is to do, the better the experiment. This one's better than many, but it's still not that hard to come up with alternative explanations.

October 11, 2012 | Unregistered CommenterNiV

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>