WSMD? JA! How confident should we be that what one "believes" about global warming, on 1 hand, and political outlooks, on other, measure the same *one* thing?
Wednesday, June 18, 2014 at 7:57PM
Dan Kahan

This is the 983rd--I think; it could also be 613th--episode in the insanely popular CCP series, "Wanna see more data? Just ask!," the game in which commentators compete for world-wide recognition and fame by proposing amazingly clever hypotheses that can be tested by re-analyzing data collected in one or another CCP study. For "WSMD?, JA!" rules and conditions (including the mandatory release from defamation claims), click here.

@DaneGWendell, snickering at a bar graph (I pretty much agree: bar graphs almost always are a yucky way to graphically report interesting data!) A couple days ago I posted something on “what belief in global warming measures.” The answer, I said, was one’s a group-based sense of self-identity.

To support basic point I stated that (1) the Industrial Strength Measure of global warming risk perceptions, (2) a standard “belief in” human-caused global warming item, (3) the standard 5-point “liberal-conservative” ideology measure, and (4) the standard 7-point partisan self-identification display the psychometric properties of being observable indicators for a single latent variable.

A “latent variable” is something that can’t be observed directly. “Indicators” are things one can observe that correlate with the latent variable, typically because they are caused by it (that’s not strictly necessary; one can model a latent variable as being caused by indicators, or both indicators and latent variables as being caused by some other exogenous variable, etc.).

We can thus use the indicators as a substitute for the latent variable in modeling how the latent variable relates to other quantities of interest. When the indicators are aggregated appropriately, their “noise”—the parts of them that vary independently of their causal connection to the latent variable—cancel out, making the resulting scale or index an even more discerning measure of the latent variable (DeVellis 2012).

But before one can do that, one has to be confident the putative indicators really do have the properties one would expect of variables that are measuring the same thing.

I noted that the scale formed by combining the global-warming risk ISM, the “belief” in climate change item, and the two right-left political outlook ones displays a high “Cronbach’s α,” an inter-item correlation statistic that is conventionally understood to measure how reliably the aggregated items (the indicators) can be taken to be measuring any latent variable.

But a curious & reflective guy named @DanegGWendell correctly noted—on twitter—that a high α doesn’t by itself guarantee that the aggregated items are measuring a single latent variable. 

Particularly where one has a large number of items, a scale formed by summing item responses can display a reasonably high α when in fact they are measuring two or maybe even more correlated latent variables.

Linear factor analysis is one of the conventional ways to assess the “dimensionality” of a scale. Conceptually, factor analysis estimates how much variance in the responses to the items can be accounted for by positing a single factor or latent variable, how much of the remaining variance can then be accounted for by positing a second, and so forth.

@DaneGWendell was interested in what a factor analysis of the global warming ISM, global warming belief, and political outlook measures would reveal.

Good question & worthy of a WSMD, JA!

To start, here’s the item “correlation matrix.”  The coefficients express polychoric correlation, which is more appropriate than pearson correlation where, as here, one wants to do a factor analysis of "mixed" data (the ISM is a multi-point rating scale, the political outlook measures multi-point Likert items, and the “belief in” measure a dichotomous item). 

 

Here is the factor analysis of that correlation matrix: 

 

There are a variety of conventional “rules of thumb” used to assess factor structure, all of which suggest that the four items here are appropriately treated as forming a “unidimensional” (i.e., one latent variable) scale.

E.g., the ratio of the “eigenvalues” of the first factor (which explains 90% of the variance in the items) and of the second (which explains almost all the rest) is “greater than 3.”

In addition, the eigenvalue for the second factor is “less than 1.”

Or if we look at a “scree plot,” which plots the eigenvalue of successive factors, there is an “elbow” at 2.

Maybe you can tell, but I find this way of proceeding, which is exactly what you'll see in most articles or textbooks, pretty mechanical and unmotivated. 

Call me silly, but I think it makes more sense to use judgment in assessing the covariance structure to determine whether the items can plausibly be understood to measuring only one latent variable. 

Actually, it's been shown by people who are actually thinking about what they are doinjg and why that treating a two-dimensional scale as one dimensional often has no adverse affect on the accuracy of that scale as a measure of a single latent variable if the two factors are very closely correlated (e.g., Bolt 1999). 

Also, the various statistical techniques and rules of thumb (pragmatic fit indexes etc.) that researchers typically use to investigate scale "dimensionality" have been described as essentially "completely worthless" ((Embretson & Reise 2000, p. 228).

But in fact, that's an unfair appraisal.  They are useful-- but not if used mechanically, as if (to quote Chris Hedges), "the answer to the question" whether a group of items can be treated as observable indicators of a single latent variable were the same as asking, “I mean, what exact buttons do I have to hit?”

"There is utility" (to paraphrase Chris Hedges), in these techniques "in that they may provide supporting evidence that a data set is reasonably dominated by a single common factor" (Embretson & Reise 2000, p. 228).

Or in other words, factor analysis, cronbach's α, and various related statistical measures are tools one can use to equip judgment to do a more reliable job in helping to form valid inferences. 

But treated as substitutes for judgment, they are "completely worthless" (Hedges, of course, 1999, 2000, 2006, 2012, 2014, 2014).

So applying some judgment, what am I trying to say here, and how confident should I be about that given this particular set of observations?

Basically, I’m saying that the 4 items are all measuring the “same thing”—a latent disposition to form coherent stances on matters political. The responses to the “climate change” items are expressions of that disposition—are caused by it—in the same way as responses to the liberal-conservative ideology and party self-identification measures.

The factor analysis is consistent with that. 

But wouldn’t it be more satisfying if I showed this interpretation was more convincing than some alternative plausible hypothesis?

One might think—very reasonably!—that expressions of risk toward environmental hazards reflect a latent disposition, one correlated with but in fact distinct from the sense of identity that one might think political outlooks measure. 

A good alternative hypothesis, then, would be that “climate change” risk perceptions and related factual beliefs are better understood as indicators of some “environmental concern” disposition that is connected to but actually not the "same thing" as the "self-identity" disposition indicated by liberal-conservative ideology and party self-identification.

That alternative hypothesis would have been supported, for sure, if variance in these items had turned out to be more convincingly explained by two discrete factors, one comprising the political outlook items and the other the climate-change items.

But an even more convincing test would be to add some additional “environmental risk concern” items to the “mix,” and then see what happens.

Here is  a covariance matrix that adds to the four items in question ISMs for “artificial food colorings,” “use of artificial sweeteners in diet soft drinks,” and “genetically modified food.”

The signs of the items are consistent with what one might expect if one beleived both that environmental risk perceptions will cohere with each other and that political outlooks will correlate with environmental risk perceptions.

But the correlations between the artificial food coloring, artificial sweetener, and GM food ISMs, on the one hand, and the climate-change items, on the other, are much smaller than the correlations between the climate-change items and the political outlook ones, on the other!

That makes me think it's less likely that global warming items are measuring the "same thing" as those other risk items than it is that the global warming items are indeed measuring the "same thing" as the political outlook items.

Now consider the factor analysis of these 7 items:

The relative proportions of variance explained by the first two factors—0.6 and 0.3—is much closer than was the case for the two factors in the first analysis (0.9 and 0.1).

By the same token, the rule-of-thumb criteria—ratio of eigenvalues (about 2), the absolute size of the second factor’s eigenvalue (> 1), and the scree plot (“elbow” at 3 rather than 2) all support treating the items as measuring two discrete factors.

More importantly in my judgmental opinion, if we look at the “factor loadings”—essentially the correlations between the factor and the indicated items—we can see that the covariance structure looks as you might expect if there 2 latent variables being measured here rather than 1.

The first is one consisting of the global warming ISM, the  “belief in” climate change item, the liberal-conservative ideology item, and the partisan self-identification item.

That's a discrete factor corresponding to the hypothesized latent disposition for which those four variables are all indicators.

The second factor loads much less heavily on those four items and much more so on the food coloring, artificial sweetener, and GM food risk ISMs.

We might, then, want to treat the latter three variables as a scale that measures a concern with environmental risks, or maybe with “food risks” in particular.

The Cronbach’s α for a scale that aggregates those three items would be 0.76.  Usually 0.70 is considered “good.”

The Cronbach’s α for a scale formed by aggregating the climate-change and political outlook items that form the first factor would be 0.85. 

I'm happy about that, though, less b/c I cleared some arbitrary statistical threshold than b/c it just is the case that w/ a "low" Cronbach’s α, one won't be able to connect variance in the scale to variance in other quanitities of itnerest.

There is a very modest positive correlation between the scales of 0.15 (p < 0.01).  In other words, the identity disposition explains some of the variance in this “food risk” disposition, but not much (that's kind of interesting, don't you think? but the 14 billion readers of this blog are among the select few who already know that it's not true that GM foods divide the US general public along political lines).

Well there you go!

I’m even more confident than I would have been had I not done these analyses, or had I just done a recipe-book factor analysis of the four items I hypothesized form a single latent “identity” variable and stopped there.

But that’s all I am: more confident than I’d be otherwise.

Also, not as confident as I could be if I were to do even more things that admit of meaningful assessment than the still too recpie-bookish application of factor analysis I just performed.

And for sure not so confident that I wouldn't change my mind if I were shown meaningful evidence that seemed to support a different conclusion the factor analyses notwithstanding.

The idea that one can perform some set of tests in a mechanical, judgment-free fashion and get “the answer” on questions about how elements of cognition work is commonplace, but wrong.

References

Bolt, D.M. Evaluating the Effects of Multidimensionality on IRT True-Score Equating. Applied Measurement in Education 12, 383-407 (1999).

DeVellis, R.F. Scale development : theory and applications (SAGE, Thousand Oaks, Calif., 2012).

Embretson, S.E. & Reise, S.P. Item response theory for psychologists (L. Erlbaum Associates, Mahwah, N.J., 2000).

 

Article originally appeared on cultural cognition project (http://www.culturalcognition.net/).
See website for complete article licensing information.