follow CCP

Recent blog entries
popular papers

Science Curiosity and Political Information Processing

What Is the "Science of Science Communication"?

Climate-Science Communication and the Measurement Problem

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

'Ideology' or 'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment

A Risky Science Communication Environment for Vaccines

Motivated Numeracy and Enlightened Self-Government

Making Climate Science Communication Evidence-based—All the Way Down 

Neutral Principles, Motivated Cognition, and Some Problems for Constitutional Law 

Cultural Cognition of Scientific Consensus
 

The Tragedy of the Risk-Perception Commons: Science Literacy and Climate Change

"They Saw a Protest": Cognitive Illiberalism and the Speech-Conduct Distinction 

Geoengineering and the Science Communication Environment: a Cross-Cultural Experiment

Fixing the Communications Failure

Why We Are Poles Apart on Climate Change

The Cognitively Illiberal State 

Who Fears the HPV Vaccine, Who Doesn't, and Why? An Experimental Study

Cultural Cognition of the Risks and Benefits of Nanotechnology

Whose Eyes Are You Going to Believe? An Empirical Examination of Scott v. Harris

Cultural Cognition and Public Policy

Culture, Cognition, and Consent: Who Perceives What, and Why, in "Acquaintance Rape" Cases

Culture and Identity-Protective Cognition: Explaining the White Male Effect

Fear of Democracy: A Cultural Evaluation of Sunstein on Risk

Cultural Cognition as a Conception of the Cultural Theory of Risk

« New paper: Expressive rationality & misperception of facts | Main | Why do we seem to agree less & less as we learn more & more-- and what should we do about that? »
Friday
Oct022015

"I was wrong?! Coooooooooool!"

Okay—now here’s a model for everyone who aspires to cultivate the virtues that signify a genuine scholarly disposition.

As discussed previously (here & here), a pair of economists have generated quite a bit of agitation and excitement by exposing an apparent flaw in the methods of the classic “hot hand fallacy” studies.

 These studies purported to show that, contrary to popular understanding not only among sports fans but among professional athletes and coaches, professional basketball players do not experience “hot streaks,” or periods of above-average performance longer in duration than one would expect to see by chance.  The papers in questions have for thirty years enjoyed canonical status in the field of decision science research as illustrations of the inferential perils associated with the propensity of human beings to look for and see patterns in independent events.

Actually, the reality of that form of cognitive misadventure isn’t genuinely in dispute.  People are way too quick to discern signal in noise.

But what is open to doubt now is whether the researchers  used the right analytical strategy in testing whether this mental foible is the source of the widespread impression that professional basketball players experience "hot hands."

I won’t rehearse the details—in part to avoid the amusingly embarrassing spectacle of trying to make intuitively graspable a proof that stubbornly assaults the intuitions of highly numerate persons in particular—but the nub of the  proof supplied by the challenging researchers, Joshua Miller & Adam Sanjurjo, is that the earlier researchers mistakenly treated “hit” and “missed” shots as recorded in a previous, finite sequence of shots as if they were independent. In fact, because the proportion of “hits” and “misses” in a past sequence is fixed, strings of “hits” should reduce the likelihood of subsequent “hits” in the remainder of the sequence. Not taking this feature of sampling without replacement into account caused the original “hot hand fallacy” researchers to miscalculate the “null" in a manner that overstated the chance probability that a player would hit another shot after a specified string of hits....

Bottom line is that the data in the earlier studies didn’t convincingly rule out the possibility that basketball players’ performances did indeed display the sort of “streakiness” that defies chance expectations and supports the “hot hand” conjecture.

But in any case . . . the point of this update is to call attention to the truly admirable and inspiring reaction of the original researchers to the news that their result had been called into question in this way.

As I said, the “hot hand fallacy” studies are true classics. One could understand if those who had authored such studies would react defensively (many others who have been party to celebrating the studies for the last 30 yrs understandably have!) to the suggestion that the studies reflect a methodological flaw, one that itself seems to reflect the mischief of an irresistible but wrong intuition about how to distinguish random from systematic variations in data.

But instead, the reaction of the lead researcher to the M&S result, Tom Gilovich, is: “Coooool!!!!!!!!”

“Unlike a lot of stuff that’s come down the pike since 1985,” Gilovich was quoted as saying in a Wed. Wall Street Journal piece,

this is truly interesting,” Gilovich said. “What they discovered is correct.” Whether the real effect is “so small that the original conclusion stands or needs to be modified,” he said, “is what needs to be determined. Whether the real effect is “so small that the original conclusion stands or needs to be modified,” he said, “is what needs to be determined.”

The article goes on to report that Gilovich, along with others, is now himself contemplating re-analyses and new experiments to try to do exactly that.

In a word, Gilovich, far from have his nose bent out of joint by the M&S finding, is excited that aruly unexpected development is now furnishing him and others with a chance to resume investigation of an interesting and complex question.

I bet, too, that at least part of what intrigues Gilovich is how a mistake like this could have evaded the attention to decision scientists for this long –-and why even now the modal reaction among readers of the M&S paper is “BS!!” It takes about 45.3 (± 7) readings to really believe M&S’s proof, and even then the process has to be repeated at weekly intervals for a period of two months before the point they are making itself starts to seem intuitive enough to have the ring of truth.

But the point is, Gilovich, whose standing as a preeminent researcher is not diminished one iota by this surprising turn in the scholarly discussion his work initiated, has now enriched us even more by furnishing us with a compelling and inspiring example of the mindset of a real scholar!

Whatever embarrassment he might have been expected to experience (none is warranted in my view, nor evident in the WSJ article), is dwarfed by his genuine intellectual excitement over a development that is truly cool & interesting—both for what it teaches us about a particular problem in probability and for the opportunity it furnishes to extent examination into human psychology (here, the distinctive vulnerability to error that likely is itself unique to people with intuitions fine-tuned to avoid making the mistakes that intuitions characteristically give rise to when people try to make sense of randomness).

I’m going to try to reciprocate the benefit of the modeling of scholarly virtue Gilovich is displaying by owning up to, and getting excited about, as many mistakes in my own previous work as I can find! 

 

PrintView Printer Friendly Version

EmailEmail Article to Friend

References (1)

References allow you to track sources for this article, as well as articles that were written in response to this article.

Reader Comments (6)

Hmm. And they still managed to get the explanation wrong.

Their breakthrough is the surprising math of coin flips. Take the 14 equally likely sequences of heads and tails with at least one heads in the first three flips—HHHH, HTHH, HTTH, etc. Look at a sequence at random. Select any flip immediately after a heads, and you’ll see the bias: There is a 60% chance it will be tails in the average sequence.

OK. Let's do that.

HHHH 3 heads following 3 heads
HHHT 2 heads following 3 heads
HHTH 1 head following 2 heads
HHTT 1 head following 2 heads
HTHH 1 head following 2 heads
HTHT 0 heads following 2 heads
HTTH 0 heads following 1 head
HTTT 0 heads following 1 head
THHH 2 heads following 2 heads
THHT 1 head following 2 heads
THTH 0 heads following 1 head
THTT 0 heads following 1 head
TTHH 1 head following 1 head
TTHT 0 heads following 1 head

(3+3+2+2+2+2+1+1+2+2+1+1+1+1) = 24 heads in the first three flips
(3+2+1+1+1+0+0+0+2+1+0+0+1+0) = 12 of them followed by another head
12 / 24 = 50%

Et voila! Magic!

The other method goes wrong because the it's combining results with different sample sizes by simply averaging them.

If I group 100 experimental subjects into two sets, one with 98 members, and the other with 2 members, and I observe that 100% of the first set say 'We believe in cognitive cognition!' and 0% of the second set say the same, then I'm sure you would agree that the average percentage is (100 + 0) / 2 = 50% of the samples agreeing with the proposition. But do you therefore think the probability of a randomly-selected subject agreeing with the proposition is 50%?

October 3, 2015 | Unregistered CommenterNiV

@NiV:

It's very hard to explain what the problem was in the method Gilovich et al used (one that seemed perfectly reasonable--to them & others for 30 yrs, as you know).

But I'm convinced that the 4-toss sequence illustration itself misleads many more people than it enlightens. Flip the coins yourself & you can confirm that P(H|HHH) in a finite sample of 100 coin tosses is considerably less than 50%.

Figuring out exactly what the defect in the method is is very very very very complicated, but the way to explain it, in my view, is to focus on how the Gilovich method was equivalent to sampling w/o replacement. If one sees that, one can see that the chance probability of a recrurrence of "hit" after string of hits should in fact be declining as one tallies observations in a given sample, and hence that a "null effect" would be more consistent with there being a genuine "hot hand."

I hope M&S ditch the 4-coin sequence illustration.

October 3, 2015 | Registered CommenterDan Kahan

" Flip the coins yourself ..."

As I'm sure you'll recall, I wrote a whole load of R code to do exactly that - and it showed that the probability is still 50%.

"P(H|HHH) in a finite sample of 100 coin tosses is considerably less than 50%."

No it isn't!

If you take blocks of a hundred, work out the percentage of HHHH vs HHHT in each, and then average the percentages you get, you'll get considerably less than 50%. But this isn't the correct way to calculate P(H|HHH).

For a start, the mathematical expression P(H|HHH) contains nowhere within it the number "100", or anything equal to it, and yet given that this apparently affects the probability, there has to be a parameter by which it can get input into the result. What if you took the *same* sequence and split it into blocks of different lengths? Would the probability vary, depending on how you chose to slice the results up? What if you did the experiments with one block size, and then re-partitioned them into differently sized blocks later after the experiment, with all the coin tosses done and fixed history? Would that change the probability of runs of Heads occurring retrospectively? It makes no sense for P(H|HHH) to depend on block size.

"Figuring out exactly what the defect in the method is is very very very very complicated, but the way to explain it, in my view, is to focus on how the Gilovich method was equivalent to sampling w/o replacement."

That's not the problem. What do you think it is we're failing to replace? Are there only a fixed number of heads that a coin can output? Do you suppose that we might one day run out of Heads and only have Tails left? That we'll know with certainty that we can only get Tails because we've seen - how many? - Heads already and that's all there are?

The problem, as I said, is that they're combining results from trials with different sample sizes without taking that into account. They might be correct that the original authors did indeed make that same mistake in one of their tests - as I've said previously, their description of their methods lacks the necessary detail to tell - but the problem they describe does what it does because the number of leading Heads varies between the fixed-size blocks of coin tosses. The probability of an individual H following a run of Hs is 50%. It has to be, by construction. The coin tosses are independent, by assertion of the problem definition, and so the probability must be by definition completely unaffected by what went before. Where you get a difference is when you pick a block with a variable number of leading Hs in it and look at the distribution of block percentages. Although it might look similar, it's a different question!

"I hope M&S ditch the 4-coin sequence illustration."

The 4-coin illustration works fine. If the maths is true, then it must be true of any case, and analysing the simplest case, one accessible to intuition and inspection, and direct calculation, is a whole lot better for understanding what's going on than picking a number too big to figure out what's going on.

As I show above, you can use the 4-coin example to prove that the probability remains 50%. You've not refuted this. So if the alternative calculation is coming up with a different answer, it must be the answer to a different question.

--

I didn't really mean to resurrect the technical argument. I just thought it interesting that a lot of people are getting excited at a new discovery, but don't seem to understand what the new discovery actually is, or how it works. They are apparently getting excited by the fact that other people are excited. Following the herd.

My immediate reaction, on being told of the problem, was to spend a good few hours figuring out what was going on. I didn't want to "take anybody's word for it", I wanted to understand it for myself! Because if I didn't understand exactly where the flaw lay, it would mean my own reasoning was flawed and unreliable, which might extend to other situations, and I personally find it philosophically intolerable to know my methods of reasoning are wrong and not do anything about it.


I found it fascinating that other people don't have the same reaction to that, and I still do.

October 3, 2015 | Unregistered CommenterNiV

@NiV:

Yes, this is a rehash.

As, I explained before your simulation doesn't do what the hot hand researchers did. You aren't addressing the question, which is whether their method of calculating the null was correct.

On why this is sampling w/o replacement, go back look at what I said then.

It does take time, I agree, to figure out why M&S are in fact correct.

Probably it takes longer if one gets stuck on the 4-coin-toss illustration & mistakes it for a purported proof that the probability of flipping "heads" after immediately flipping "heads" is < 0.5. It is is supposed to be a simple illustration of why it is wrong to treat "P(H|H) - P(H|T) = 0" as the null when one is sampling previous finite sequences of coin tosses to see if there has been a "string" of heads that exceeds what one would expect to see by chance. The number of smart people I've observed who point out that there is an equal number of heads & tails in the sample space for all four-toss sequences -- something that is true but irrelevant to the point the authors are actually making -- is what convinces me that their illustration is much more likely to entice readers to waste time than to help them get what is admittedly a hard & surprising thing to see.

October 3, 2015 | Registered CommenterDan Kahan

"For sure they were applying null tests that assume binomial distributions. But they didn't understand either (a) that events aren't genuinely independent when you sample w/o replacement from fixed number of random binary outcomes (& so don't follow binomial distribution) or more likely (b) that they were in fact sampling w/o replacement."

What does (a) mean? Which "fixed number of binary outcomes" are you talking about? The 16 blocks of 4 coin tosses? If so, that's a uniform distribution with replacement. Or the individual coin tosses? Or what?

And again, what is it you think they're not replacing?

"Another coin flip isn't the right model to use to answer that question; b/c P(H) in a past sequence depends on proportion of Hs & Ts remaining."

Remaining where? And if you redivide the blocks into different lengths after the experiment has been done, how can P(H) change?

"As, I explained before your simulation doesn't do what the hot hand researchers did."

The problem, as I noted in the previous discussion, is that the hot hand researchers didn't say what they did. As I recall they did indeed compare percentages from observations with different sample sizes, but all they said was that the differences were not significant, without saying how they tested significance, so we can't tell if they compensated for the sample size issue or not. I'm inclined to suspect that they didn't and just used a simplistic Z test which would be wrong, but we don't know.

" You aren't addressing the question, which is whether there method of calculating the null was correct."

But did they say what their method actually was? If not, how can I address it?

October 3, 2015 | Unregistered CommenterNiV

@NiV, let's investigate with strings of 3 flips. So, here's the procedure: flip a coin 3 times, and write down the sequence of flips. Then, look at only the flips that come immediately after a Heads, and determine what percentage of THOSE flips are Heads.

Well, first, if the sequence turns out to be either TTT or TTH, you won't get any answer at all. In these cases, there are 0 flips immediately after a Heads, and division by 0 is UNDEFINED, so there isn't a result here.

Let's throw away sequences like this! (If you like, let's say we start over from scratch and redo the entire 3-flip procedure in a case like this, and keep starting over until we get a usable result.)

So, throwing out results of TTT and TTH, the remaining possibilities are:

THT: (1 flip coming after a Heads, and 0 Heads on those flips) = 0 / 1 = 0.0

THH: 1 / 1 = 1.0

HTT: 0 / 1 = 0.0

HTH: 0 / 1 = 0.0

HHT: 1 / 2 = 0.5

HHH: 2 / 2 = 1.0

So, six possible sequences, all of which are equally likely to result. Of these, three sequences produce the empirical result of 0%, while two produce the empirical result of 100%, and the last produces the empirical result of 50%.

So overall, the Expected Value (average) of this procedure is:

(3*0 + 2*1 + 1*0.5) / 6 = 0.417.

In other words, if you repeat this 3-coin experiment, say, a million times (each time getting a result of either 0.0, 1.0 or 0.5), then the average of your results will be around 0.417.

This EVEN THOUGH we can actually see a total of 8 flips in these sequences which are immediately preceded by an H, of which 4 (or exactly 50%) are themselves Heads. Essentially, under this procedure the "HHH" result is being undercounted as only a single sequence, even though that sequence has two spots in it that could count toward the result. (And I guess the HHT sequence is also undercounted for the same reason.)

Of course you can do the same analysis with 4-flip sequences, or sequences of any length N, and if N > 2 then the expected value will be less than 0.5. That's all the authors are saying, I think.

May 5, 2016 | Unregistered Commentermathmandan

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>