## Check out wild & crazy "coherence based reasoning"! Are rules of evidence "impossible"?, part 2 (another report from Law & Cognition seminar)m

This is part 2 in a 3-part series, the basic upshot of which is that “rules of evidence” are ** impossible**.

**A recap. **Last time I outlined a conception of “the rules of evidence” I called the “Bayesian Cognitive Correction Model” or BCCM. BCCM envisions judges using the rules to “cognitively fine-tune” trial proofs in the interest of simulating/stimulating jury fact-finding more consistent with a proper Bayesian assessment of all the evidence in a case.

Cognitive dynamics like hindsight bias and identity-protective cognition can be conceptualized as inducing the factfinder to over- or undervalue evidence relative to its “true” weight—or likelihood ratio (LR). Under Rule 403, Judges should thus exclude an admittedly “relevant” item of proof (Rule 401: LR ≠ 1) *when the tendency of that item of proof to induce jurors to over- or undervalue of other items of proof (i.e., to assign them LRs that differ from 1 more than they actually do) impedes verdict accuracy more than constraining the factfinder to assign the item of proof in question no weight at all (LR = 1). *

“Coherence based reasoning”—CBR—is one of the kinds of cognitive biases a judge would have to use the BCCM strategy to contain.. This part in the series describes CBR and the distinctive threat it poses to rational factfinding in adjudication.

**Today's episode. **CBR can be viewed as an information-processing dynamic rooted in aversion to residual uncertainty.

A factfinder, we can imagine, might initiate her assessment of the evidence in a reasonably unbiased fashion, assigning modestly probative pieces of evidence more or less the likelihood ratios they are due.

But should she encounter a piece of evidence that is much more consistent with one party’s position, the resulting confidence in that party’s case (a state that ought to be only provisional, in a Bayesian sense) will dispose her to assign the *next* piece of evidence a likelihood ratio supportive of the same inference—viz., that that party’s position is “true.” As a result, she’ll be all the more confident in the merit of that party’s case—and thus all the more motivated to adjust the weight assigned the next piece of evidence to fit her “provisional” assessment, and so forth and so on (Carlson & Russo 2001).

Once she has completed her evaluation of trial proof, moreover, she will be motivated to revisit earlier-considered pieces of evidence, *re*adjusting the weight she assigned *them* so that they now fit with what has emerged as the more strongly supported position ( (Simon, Pham, Quang & Holyoak 2001; Holyoak & Simon; Pennington & Hastie 1991). When she concludes, she will necessarily have formed an inflated assessment of the probability of the facts that support the party whose “strong” piece of evidence initiated this “likelihood ratio cascade.”

What does this matter?

Well, to start, in the law, the party who bears the “burden of proof” will often be entitled to win only if she establishes the facts essential to her position to a heightened degree of certainty like “beyond a reasonable doubt.” One practical consequence of the overconfidence associated with CBR, then, will be to induce the factfinder to decide in favor of a party whose evidence, if evaluated in an unbiased fashion, would not have satisfied the relevant proof standard (Simon 2004). Indeed, one really cool set of experiments (Scurich 2012) suggests that "coherence based reasoning" effects might actually reflect a dissonance-avoidance mechanism that manifests itself in factfinders reducing the standard of proof after exposure to highly probative items of proof!

But even more disconcertingly, CBR makes the outcome sensitive to the *order* in which critical pieces of evidence are considered (Carlson, Meloy & Russo 2006).

A piece of evidence that merits considerable weight might be assigned a likelihood ratio of 1 or < 1 if the factfinder considers it after having already assigned a low probability to the position it supports. In that event, the evidence will do nothing to shake the factfinder’s confidence in the opposition position.

But had the factfinder considered that same piece of evidence “earlier”—before she had formed a confident estimation of the cumulative strength of the previously considered proof—she might well have given that piece of evidence the greater weight it was due.

If *that* had happened, she would then have been motivated to assign *subsequent* pieces of proof likelihood ratios *higher* than *they* in fact merited. Likewise, to achieve a “coherent” view of the evidence as a whole, she would have been motivated to revisit and revise *upward* the weight assigned to earlier considered, equivocal items of proof. The final result would thus have been a highly confident determination *in exactly the opposite direction* from the one she in fact reached.

This not the way things should work if one is engaged in Bayesian information processing—or at least any normatively defensible understanding of Bayesian information processing geared to reaching an accurate result!

Indeed, this is the sort of spectacle that BCCM directs the judge to preempt by the judicious use of Rule 403 to exclude evidence the “prejudicial” effect of which “outweighs” its “probative value.”

**But it turns out that using the rules of evidence to neutralize CBR in that way is IMPOSSIBLE!**

**Why? I’ll explain that in Part 3!**

**# # #**

**But right now I’d like to have some more, “extra-credit”/“optional” fun w/ CBR! **It turns out it is possible & very enlightening to create a simulation to model the accuracy-annihilating effects I described above.

Actually, I’m just going to model a “tame” version of CBR—what Carlson & Russo call “biased predecisional processing.” Basically, it’s the “rolling confirmation bias” of CBR without the “looping back” that occurs when the factfinder decides for good measure to reassess the more-or-less unbiased LRs she awarded to items of proof before she became confident enough to start distorting all the proof to fit one position.

Imagine that a factfinder begins with the view that the “truth” is equally likely to reside in either party’s case—i.e., prior odds of 1:1. The case consists of eight “pieces” of evidence, four pro-prosecutor (likelihood ratio > 1) and four pro-defendant (likelihood ratio <1).

The factfinder makes an unbiased assessment of the “first” piece of evidence she considers, and forms a revised assessment of the odds that reflects its “true” likelihood ratio. As a result of CBR, however, her assessment of the likelihood ratio of the *next* piece of evidence—and *every piece* thereafter—will be biased by her resulting perception that one side’s case is in fact “stronger” than the other’s.

To operationalize this, we need to specify a “CBR factor” of some sort that reflects the disposition of the factfinder to adjust the likelihood ratios of successive pieces of proof up or down to match her evolving (and self-reinforcing!) perception of the strength disparity in the parties’ the party’s case.

Imagine the factfinder misestimates the likelihood ratio of all pieces evidence by a continuous amount that results in her over-valuing or under-valuing an item of proof by a factor of *2* at the point she becomes convinced that the odds in favor of one party’s position rather than the other’s position being “true” has reached *10:1*.

What justifies selecting this particular “CBR factor”? Well, I suppose nothing, really, besides that it supplies a fairly tractable starting point for thinking critically about the practical upshot of CBR.

But also, it’s cool to use this function b/c it reflects a “weight of the evidence” metric developed by Turing and Good to help them break the Enigma code!

For Turing and Good, a piece of evidence with a likelihood ratio of 10 was judged to have a weight of “1 *ban.*” They referred to a piece of proof that had a likelihood ratio 1/10 that big as a “deci-ban”—and were motivated to use *that* as the fundamental unit of evidentiary currency in their code-breaking system based on their seat-of-the-pants conjecture that a “deciban” was the smallest *shift* in the relative likelihoods of two hypotheses that human beings could plausibly perceive (Good 1985).

So with this “CBR factor,” I am effectively imputing to the factfinder a disposition to “add to” (or subtract from) an item of proof one “deciban”—the smallest humanly discernable “evidentiary weight,” in Turing and Good’s opinion—for every 1-unit increase (1:1 to 2:1; 2:1 to 3:1, etc.) or (decrease--1:1 to 1:2; 1:3 to 1:4) in the “odds” of that party’s position being true.

And this figure illustrates how this distorting potential can be affected by CBR generally:

In the “unbiased” table, “prior” reflects the factfinder’s current estimate of the probability of the “prosecutor’s” position being true, and “post odds” the revised estimate based on the weight of the current “item” of proof, which is assigned the likelihood ratio indicated in the “LR” column. The “post %” column transforms the revised estimate of the probability of “guilt” into a percentage.

I’ve selected an equal number of pro-prosecution (LR >1) and pro-defense (LR<1) items of proof, and arranged them so they are perfectly offsetting—resulting in a final estimate of guilt of 1:1 or 50%.

In the “coherence based reasoning” table, “tLR” is the “true likelihood ratio” and “pLR” the perceived likelihood ratio assigned the current item of proof. The latter is derived by applying the CBR factor to the former. When the odds are 1:1, CBR is 1, resulting in no adjustment of the weight of the evidence. But as soon as the odds shift in one party’s favor, the CBR factor biases the assessment of the next item of proof accordingly.

As can be seen, the impact of CBR in this case is to push the factfinder to an inflated estimate of the strength of the prosecution’s position being true, which the factfinder puts at 29:1 or 97% by the “end” of the case.

But things could have been otherwise. Consider:

I’ve now swapped the “order” of proof items “4” and “8,” respectively. That doesn't make any difference, of course, if one is "processing" the evidence they way a Bayesian would; but it does if one is CBRing.

The reason is that the factfinder now “encounters” the defendant’s strongest item of proof -- LR = 0.1—earlier than the prosecution’s strongest—LR = 10.0.

Indeed, it was precisely because the factfinder encountered the prosecutor’s best item of proof “early” in the previous case that she was launched into a self-reinforcing spiral of overvaluation that made her convinced that a dead-heat case was a runaway winner for the prosecutor.

The effect when the proof is reordered this way is exactly the opposite: a devaluation cascade that convinces the factfinder that the odds in favor of the prosecutor’s case are infinitesimally small!

These illustrations are static, and based on “pieces” of evidence with stipulated LRs “considered” in a specified order (one that could reflect the happenstance of when particular pieces register in the mind of the factfinder, or are featured in post-trial deliberations, as well as when they are “introduced” into evidence at trial—who the hell knows!).

But we can construct a simulation that randomizes those values in order to get a better feel for the potentially chaotic effect that CBR injects into evidence assessments.

The simulation constructs trial proofs for 100 criminal cases, each consisting of eight pieces of evidence. Half of the 800 pieces of evidence reflect LRs drawn randomly from a uniform distribution between 0.05 and 0.95; these are “pro-defense” pieces of evidence. Half reflect LRs drawn randomly from a uniform distribution between 1.05 and 20. They are “pro-prosecution” pieces.

We can then compare the “true” strength of the evidence in the 100 cases —the probability of guilt determined by Bayesian weighting of each one’s eight pieces of evidence—to the “biased” assessment generated when the likelihood ratios for each piece of evidence are adjusted in a manner consistent with CBR.

This figure compares the relative distribution of outcomes in the 100 cases:

As one would expect, a factfinder whose evaluation is influenced by CBR will encounter many fewer “close” cases than will one that engages in unbiased Bayesian updating.

This tendency to form overconfident judgments will, in turn, affect the accuracy of *case outcomes*. Let’s assume, consistent with the “beyond a reasonable doubt” standard, that the prosecution is entitled to prevail only when the probability of its case being “true” is ≥ 0.95. In that case, we are likely to see this sort of divergence between outcomes informed by rational information processing and outcomes informed by CBR:

The overall “error rate” is “only” about 0.16. But there are 7x as many incorrect convictions as incorrect acquittals. The "false conviction" rate is 0.21, wheras the "false acquittal" rate is 0.04....

The reason for the asymmetry between false convictions and false acquittals is pretty straightforward. In the CBR-influenced cases, there are a substantial number of “close” cases that factfinder concluded “strongly” supported one side or the other. Which side—prosecution or defendant—got the benefit of this overconfidence is roughly equally divided. However, a defendant is no less entitled to win when the factfinder assesses the strength of the evidence to be 0.5 or 0.6 than when the factfinder assesses the strength of the evidence as 0.05 or 0.06. Accordingly, in all the genuinely “close” cases in which CBR induced the factfinder to form an overstated sense of confidence in the weakness of the prosecution’s case, the resulting judgment of “acquittal” was still the *correct* one. But by the same token, the result was *incorrect* in every close case in which CBR induced the factfinder to form an exaggerated sense of confidence in the *strength* of the prosecution’s case. The proportion of cases, in sum, in which CBR can generate a “wrong” answer is much higher in ones that defendants deserve to win than in ones in which the prosecution does.

This feature of the model is an artifact of the strong “Type 1” error bias of the “beyond a reasonable doubt” standard. The “preponderance of the evidence” standard, in contrast, is theoretically neutral between “Type 1” and “Type 2” errors. Accordingly, were we treat the simulated cases as “civil” rather than “criminal” ones, the false “liability” outcomes and false “no liability” ones would be closer to the overall error rate of 16%.

Okay, I did this simulation *once* for 100 cases. But let’s do it *1,000* times for 100 cases—so that we have a full-blown Monte Carlo simulation of the resplendent CBR at work!

These are the kernel distributions for the “accurate outcome” “false acquittal,” and “false conviction” rates over 1000 trials of 100 cases each:

*Refs*

Carlson, K.A., Meloy, M.G. & Russo, J.E. Leader‐driven primacy: using attribute order to affect consumer choice. Journal of Consumer Research 32, 513-518 (2006).

*Journal of Experimental Psychology: Applied*

**7**, 91-103 (2001)

I.J. Good, Weight of Evidence: A Brief Survey, in Bayesian Statistics 2: Proceedings of the Second Valencia International Meeting (J.M. Bernardo, et al. eds., 1985).

Keith J. Holyoak & Dan Simon, Bidirectional Reasoning in Decision Making by Constraint Satisfaction, J. Experimental Psych. 128, 3-31 (1999).

Kahan, D.M. Laws of cognition and the cognition of law. *Cognition ***135**, 56-60 (2015).

Simon, D. A Third View of the Black Box: Cognitive Coherence in Legal Decision Making. Univ. Chi. L.Rev. 71, 511-586 (2004).

*J. Experimental Psych.*

**27**, 1250-1260 (2001).

## Reader Comments (15)

Dan -

Completely off-topic, but I thought you might find it interesting if you haven't seen it (I've only made my way through part three) - in particular the discussion of "naive realism."

http://fivethirtyeight.com/features/science-isnt-broken/?ex_cid=endlink#part4

And this too:

http://fivethirtyeight.com/features/not-even-scientists-can-easily-explain-p-values/

@Josh--thx!

That interactive graphic is great!

"Imagine the factfinder misestimates the likelihood ratio of all pieces evidence by a continuous amount that results in her over-valuing or under-valuing an item of proof by a factor of 2 at the point she becomes convinced that the odds in favor of one party’s position rather than the other’s position being “true” has reached 10:1."I've not thought about it very deeply, but if this was how it worked, I'd say it would be relatively easy to get round it. First, you consider the marginal pieces of evidence first, to get an accurate idea of what contribution they make. Second, you break the major pieces of evidence up into multiple components, each with a lower likelihood ratio, and third, you pre-classify each bit of evidence as either for or against the defendant, and then

alternatetheir presentation to the judge/jury. Or better, keep track of where the LR is at any given time and select the next bit of evidence to push it back towards the middle, until you run out of one or the other.Doing that should keep the cumulative LR inside the bounds where accurate assessments are made for as long as possible, until going outside the bounds becomes unavoidable. At which point, the bias that kicks in will inevitably be supporting the right decision.

However, I believe the biggest problem with this is that it assumes there's a single, constant, objectively determined, universally agreed assignment of LRs to evidence. (If there was, we wouldn't need human juries.) The assessment of the probability of an event under each hypothesis requires a statistical model. Different people start from different models, and sometimes even operate several models of varying fidelity simultaneously. And they learn as they go, modifying their models accordingly.

Back in the early days of AI research they had hopes for 'expert systems' which could capture the decision-making expertise and knowledge of an expert in a computer program. Bayesian methods were an obvious model to build on. But they quickly found that the beliefs experts used to make their judgements were not consistent with the rules of Bayesian probability - but were nevertheless more accurate than any computer program they were able to produce. It appeared that the experts were using a wider range of inputs - not just the raw inputs but the correlations, patterns, and context, and were selecting between different models, tailoring them to the specific situation as they went. This sort of higher-level 'holistic' meta-understanding was very difficult to capture explicitly - even the experts couldn't articulate exactly what they were doing. Like most AI problems, it turned out to be a lot harder than it looked at first.

So I'm not particularly surprised that you would find difficulties trying to capture the legal reasoning process in a Bayesian framework. While my inclination is to suspect that legal judgements are a lot less accurate and reliable than many in the legal profession would like to claim, I also suspect that some of the 'biases' are actually compensating for the difficulty of the problem. We assume it's a bias because it doesn't fit with what our simple model says we

oughtto do, but it might just be that our simple model is wrong, and we're doing something more sophisticated.An adaptive model-adjusting process may be vulnerable to changes in the order of presentation, but may be

lessvulnerable to the problem of starting with a completely wrong model in the first place. Given how often we start with wrong models, and how rarely (in daily life) the order in which evidence randomly appears is dramatically lop-sided, it might actually be an improvement.@NiV:

Well, see the next installment to see if there is "way around" the problem. You'll see why I'm skeptical that there's anything that can be done to "solve" the problem that doesn't in the end see that the problem is that we should rely on anything that remotely resembles our current form of adjudication to determine facts.

But as for "single, constant, objectively determined, universally agreed assignment of LRs to evidence," I don't

thinkthat the argument assumes that.First of all, the Bayesian framework is used heuristically here -- just to create expositional clarity and to model the rudiments of a reasoning process that has qualities essential to reasoned decisionmaking.

One rudiment --not a particularly challenging or controversial one --is that people should "weight" the probative force of independent pieces of evidence and aggregate them.

I think people do this all the time; they don't assign formal LRs to information in doing so, but that's a convenient way to model what they do, or model what they could do in order to see if what they are doing instead is suited for their ends.

Another "rudiment": don't conflate the weight of pieces of evidence that are in fact independent of each other.

And still another, related one: don't assign weight to evidence based on what you currently believe to be true.

As long as *you* feel that you have a serviceable metric for "weighing," it doesn't matter whether it takes the form of an LR or whether everyone else would "agree" with you about the weight you have assigned.

You will, for your purposes, want to make sure you engage in the sort of process that doesn't involve the problems I just mentioned -- or will if you are in fact trying to figure out what the truth is (as opposed to reach a result that makes you happy, affirms your identity, relieves you from the anxiety of uncertainty, etc.).

That's all that's assumed!

Or I suppose a bit more than that: that if a *group* of people decide to do this sort of reasoning task together, they will be able to work out a shared metric of weighting that will enable them to reason collectively. They won't have to use LRs or even necessarily formalize or quanitify weight; they will just have to be able to use a metric that is commensurable in some rough-and-ready sense & then start deliberating. They might not agree w/ one another, sure. But I think in fact we see people doing this all the time & don't find it particularly remarkable.

But if the group dynamic reproduced the sort of endogeneity between LRs that I'm describing -- then those actors would, by their own lights, be frustrated in achieving their ends.

Of course, if you believe that assigning "single, constant, objectively determined, universally agreed assignment of LRs to evidence" is a barrier to any person or any group *ever* doing any sort of aggregation of items of proof in making a juddgment about the probability of some fact, then you have a reason to be skeptical of adjudication that is much broader than the one I'm offering here, which is limited, I think, to a particular *way* of assessing evidence.

If, however, you believe that individual and collective reasoning about the weight of evidence is possible despite there being no "single, constant, objectively determined, universally agreed assignment of LRs to evidence," then I'm pretty the CBR problem remains one -- a problem, that is.

Hmm. Perhaps my criticism was addressed at your illustrative example rather than the argument it was illustrating.

Nevertheless, I think the criticism likewise is more generally applicable. The point is that different people use different models to assign 'weights' to evidence, whether you interpret those weights as probabilities or not, and that the weight-assigning model could potentially change in light of new evidence.

So the assumption at issue is the following: "And still another, related one: don't assign weight to evidence based on what you currently believe to be true."

Consider for a moment the "Falsus in uno, falsus in omnibus" dictum. You listen to evidence from a particular expert witness, which initially seems credible. You start constructing theories, weighing how his statements and observations support or contradict them, operating with a simplified model that assumes he's telling the truth. Most expert witnesses do, after all, so it's a reasonable approximation to be making.

And then, all of a sudden, he makes a statement that you

knowis untrue, and that he ought to know is untrue as well if he's really an expert. Maybe because it contradicts something they said earlier, maybe because it contradicts something you already knew independently. Your simplistic model of a truth-telling expert witness was wrong!So now our juror will "be motivated to revisit earlier-considered pieces of evidence, readjusting the weight she assigned them so that they now fit with what has emerged as the more strongly supported position". However, that's suddenly a computationally expensive operation. Instead of listening to each item of evidence, weighing its implications with the possibility of a deceptive witness accounted for, and building up the conclusion gradually, the juror is being asked to remember and re-evaluate the entire chain of evidence in a moment! Obviously, the heuristic answer to the challenge will be to dismiss the expert witness's evidence in its entirety, and if a motivation for the deception is suspected, maybe even to assume the opposite of whatever conclusions they were trying to lead you to. Most people are well aware that an expert can fool them, making what seems like a plausible argument lead to ridiculous conclusions (ask any mathematician for their favourite proof that 2 + 2 = 5 and you'll see what I mean!) so it's not totally unreasonable to reject even those parts of the expert's presentation that

seemedplausible. It's a case of not trusting even one'sownjudgement.At the same time, it's pretty hard to throw away all those patiently constructed hypotheses and start again from scratch, so the presentation is likely to influence subsequent thinking.

Whether the expert is recognised as untrustworthy at the beginning or towards the end of their presentation will have a major influence on how the decision makers will take their evidence. It is, in a sense, a coherence-based criterion, in that the evidence and the winning hypothesis is expected to form a coherent and consistent story. Being caught out in an untruth creates such an inconsistency in the picture the expert is trying to paint.

This is one example of a more general phenomenon - that the hypothesis being considered itself can affect the weight one assigns to evidence, when people may be motivated to deceive, have been deceived, or mistaken, or "economical with the truth" in what evidence they present. There is therefore a genuine feedback loop between the hypotheses under consideration and the likelihood ratios we rationally assign to the evidence.

From a mathematical point of view, the problem is illusory. The possibility that the expert witness was a liar was there from the start, and a full rendering of the Bayesian decision making process would have run the implications of both alternatives in parallel, up until the deceptiveness was revealed. In practical terms, we soon run into a combinatorial explosion, as every combination of truth and lie across all the statements made has to be considered a separate hypothesis. It's impossible for even a human brain to do it! Hence we use heuristics, to build, knock down, reconstruct, and tinker with a complicated and ever-shifting structure of highly approximated hypotheses and weights. I don't think it's reasonable to assume these heuristics don't, or shouldn't, assign weight to evidence based on what one currently believes to be true.

I don't know. As I said, I haven't thought about it very deeply. My guess would be that any computationally tractable method is always going to be vulnerable to some sort of manipulation, but that people's intuitive approach to evidence is probably more powerful at resisting being fooled than your concerns would suggest. The rules of evidence wouldn't necessarily reject strongly probative evidence on the basis of CBR, because while increasing the risk of one sort of bias, it possibly reduces the risk of another.

That said,

everybodybelieveshundredsof things that on closer inspection cannot be true, and they seem to survive their errors just fine, so maybe the situation is really that humans are fallible and we just have to do the best we can with it. Maybe that's close enough.In practice, when we build giant cases for or against something, we try to make little mini-conclusions that are individually rather robust. That's what the theory of the prosecution is for in criminal cases.

Imagine you were playing a game of Clue. Someone suggests, "Col. Mustard with the Revolver in the Hall." If you have the Revolver card, you know for sure that theory's incorrect.

Now imagine you were in a slightly more complicated version of the game where instead of having cards that definitely excluded an element, there were instead random associations between weapons, locations, and suspects. If you adopt a strictly Bayesian approach to the whole case without breaking it down, it would be easy to get lost in irrelevancies; e.g., "Exactly how important is it that Scarlet had the Revolver when it was in the Observatory? Doesn't that argue against the proposition that Mustard had it when it was in the Hall?"

The cognitive swindle that the theory of the prosecution pulls off is to say, "Some things associations don't matter. Focus on the following propositions as if they were independent: how likely was it the Colonel? how likely was it the Revolver? how likely was it the Hall?" That's how we deal with n<p, where p, the interaction terms between potentially relevant circumstances, grows exponentially with investigation time, and n grows arithmetically with time if you're lucky.

On related note, if you're right about all this, doesn't your line of reasoning here strengthen Friedman (2003)'s argument against the overvaluation justification for 403?

@dypoon--

I think Friedman's proposal for how to weigh the prejudicial effect of overvalued evidence relative to its probative value is wrong.

But whether he is right or I am on that issue, it just doesn't matter; the positive correlation between probative value & prejudicial effect that is created by CBR makes it impossible to do the sort of cognitive fine tuning that BCCM envisions the judge doing w/ Rule 403. That is the upshot of the next installment. Friedman & I are arguing about a detail that is collateral to the sort of contradiction that CBR creates in the whole operation.

If I'm right about *that*

@Niv--

I'm not following. But I want to.

I want to start by figuring out if you subscribe to some much more radical theory than I do about the "impossibility" of rational factfinding in legal proceedings.

I in fact don't believe rational factifidng is impossible; only that the sort of "cognitive fine tuning" envisioned by BCCM is.

So...

Imagine that we have a "factfinder" (FF) who is trying to test the hypothesis that someone has a diseaese.

FF is furnished w/ 8 or 800 or whatever items of proof, each of which has some independent quantum of probative value w/r/t the hypothesis. In some cases, figuring out that quantum -- the LR -- is super easy. In others, it is more difficult; FF has to expend effort & time assessnig the information before figuring out its import. But she can.

For each item, FF determines the LR in this fashion & records that value in a ledger.

When she is done, she multplies the 8 or 800 or however many LRs. She then uses that as the factor in proportion to which she revises her prior probability (expressed in odds) that the person had the "disease."

No problem, so far right? FF's head doesn't explode or anything from the cogntive load or from any sort of mistaken sense that she has to adjust the LR for any item of proof based on what the LR is for any other (in fact, that would be a mistake, given what I've stipulated--that each item has independent probative significance w/r/t the hypothesis).

Do you think it is not possible or useful to expect a legal factfinder to do something equivalent to what FF has just done? That is, is not possible for the legal factfinder to parcel the evidence in a case in a manner that faithfully reflects the probative weight each item of proof has & that avoids treating items the weight of which is genuinely indepenent as if they were interdependent?

You agree, don't you, that we can expect the legal factfinder to reason in a manner that is valid even though the weight of the evidence doesn't admit of the sort of precise weight envisioned by a Bayesian rational reconstruction of what valid reasoning amounts to?

Do you see heads exploding here?

If so, then this is where I am confused & you should elaborate.

If so far, though, everything is okay, the question then becomes what to make of the prospect of one or another predictable, recurring cognitive bias to which human factfinders are vulnerable. It's not *obvious* that we won't be able to handle that in our system.

Indeed, it seems reasonable to me to believe that there are plenty of satisfactory ways to address that -- the same way that there are ways to address there would be for FF as FF assesses all the information at his disposal in testing the hypothesis that a person has a disease. The same way there are plenty of ways to address this problem in the myriad contexts in which in fact people do generally validly reason about all sorts of matters, many of which are much more complex than assessing the evidence in a legal proceeding.

"FF is furnished w/ 8 or 800 or whatever items of proof, each of which has some independent quantum of probative value w/r/t the hypothesis. In some cases, figuring out that quantum -- the LR -- is super easy. In others, it is more difficult; FF has to expend effort & time assessnig the information before figuring out its import. But she can."The issue is the method used to determine the LR, and the evidence on which

it itselfis based. Suppose we calculate the first 700 out of 800 LRs in our ledger by making some reasonable heuristic approximation. Then we realise on the the seven hundred and first bit of evidence that the approximation must be wrong! We're not just multiplying in a low LR that affects the conclusion, we're undermining the basis for all the previously calculated LRs! And we're challenged to instantly go back and re-evaluate those 700 previous bits of evidence in light of our new finding, as well as the next 100.The standard Bayesian framework has the priors and observations as free variables, but takes the LR function as a given. But in reality, it's not. The process by which we calculate LRs is itself something we have to make an argument for, subject to evidence and differences of opinion.

The disease has various symptoms, risk factors, and medical tests by which it can be identified. An LR can be calculated on the basis of prevalence statistics and calibration tests (sensitivity, false positive rates, etc.). It takes some maths skills, but is nevertheless fairly straightforward to do. We listen to the patient and check them off as we go. Then we hear that the patient has previously been diagnosed with Munchausen syndrome, and is currently engaged in a fierce legal battle with their medical insurance company over a payout for major medical bills. Are you so sure about you're ledger of carefully calculated LRs now? Or suppose evidence is presented late on that the condition is often confused with a separate disease, that has the same symptoms but a different cause and treatment, and a lot of the previously published statistics are wrong. Or rather, some experts say they're wrong, while other experts disagree and say the original statistics were right. And the guy giving evidence turns out to be a prominent proponent of one side in this heated and acrimonious debate. Is calculating LRs so easy now?

We always need a model that tells us how likely an observation is under each hypothesis to calculate the LR. In the typical textbooks teaching Bayesian methods, this is always provided in the question. But the real world isn't a textbook exercise - where are we supposed to get these models from? Where there are several plausible possibilities, how do we pick between them? And what are we supposed to do if the evidence being presented doesn't just bear on the likelihood of the hypothesis we're considering, but also on the validity of the models we're using to calculate the LRs themselves?

I think people *can* deal with such complexities; in legal judgements as well as in real life. But I suspect they would have to do it by methods that violate the assumption that we "don't assign weight to evidence based on what you currently believe to be true".

@Niv:

Does it help if I reformulate "don't assign weight to evidence based on what you currently believe to be true" as

Determine the LR independently of your priors.Failing to do that is exactly what confirmation bias amounts to.

Bayesianism doesn't tell you you can't do that, of course. Indeed, "conformation bias" might be the "rational" decision strategy where the cost of evaluating information w/o reference to one's priors is higher than the expected benefit of updating any mistaken view (go ahead & ignore me when I tell you I have proof that gravity can be turned off when I pull my cat's tail etc).

But if after the first strong piece of evidence a juror encounters--- say a paternity test that is 0.95 accurate (LR = 19) she decides "screw it; it's just not worth the bother to think about the probative value of the next piece of independent evidence"-- say, the authenticity of a medical record indicating that the dfdt in a paternity suit got a vasectomy 12 mos. before the child was born-- then I think normatively the law can say, "sorry, try harder, butthead."

Same if juror says "screw it" or "must be bull shiting-- odds are so high givenn the DNA test!" -- in considering the credibility of witness who testifies as to ironclad alibi after prosecution introduces a positive DNA test that has RMP of 1,000,000 (only 1 in million randomly tested individuals have the relevant DNA profile).

Be lazy (or biased, if you don't get that it is in fact biased in favor of your priors to determine the likelihood ratio based on conformity of evidence to priors)

on your own time, as it affects your own life.Hereyou are under a civic obligation to put forth the effort to evaluate the genuinely independent pieces of evidence independently.Much the same as would, say, a scientist asked to review the validity of the methods in a study that reaches a surprising result. If you are too lazy to figure out the validity of the methods independently of consulting the consistency of the result w/ your priors, then please leave the community of scholars. If you just *can't* do it, that's a big problem too.

"Determine the LR independently of your priors."I'm not sure what that means. Do you mean at each step of the process, so the prior is the posterior resulting from previous observations, or the prior to the whole process? And do you mean priors regarding the truth of the hypothesis, independent of your prior beliefs about the validity of the LR calculation model? What, as I say, do you do if the evidence bears on both?

" say a paternity test that is 0.95 accurate (LR = 19)"LR = P(Obs|H)/P(Obs|¬H). Are you saying that if P(Obs|H) = 0.95 that P(Obs|¬H) must be 0.05? What does "0.95 accurate" really mean? Sorry, just being picky.

" she decides "screw it; it's just not worth the bother to think about the probative value of the next piece of independent evidence""That's an interesting question. If a scientist publishes a result demonstrated with 95% confidence, and say the matter is settled and stop looking for further evidence, doesn't this have the same effect? Would any scientist do that?

I agree, it's not quite the same thing to stop looking as it is to be told that there's contrary evidence and ignore it, but there is nevertheless a fundamental question here: when - if ever - can you stop looking?

But to answer your point - what we have here is two pieces of evidence that according to our current method of calculating them, *both* have very strong LRs, but in opposite directions. The observation of both is incredibly unlikely, according to our model.

If the LR model is *absolutely certain*, e.g. it's given in the textbook exercise, then we simply have to accept that an incredibly unlikely event has happened, and the two bits of evidence cancel out. However, if the LR model is *not* certain, then we can consider our alternatives to be: "the LR model is correct and an incredibly unlikely event has happened" or "Our LR model is not correct", and we see that this is strong evidence that the LR model is not correct.

Supposing we tentatively accept this second hypothesis as true, we now will ask whether it's the accuracy of the paternity test, or the effectiveness of the vasectomy that we doubt the most? The former is backed up by a lot of tests and experiments measuring it's accuracy. The latter is a messy medical question - did the surgeon do the operation correctly? What evidence do we have that this particular surgeon, operating on this particular patient, at this particular time and place, is as reliable a process as the carefully calibrated DNA test performed under controlled and audited laboratory conditions? Do you know?

Is the juror ignoring the strength of the second piece of evidence simply because they saw it second, or are they assessing the meta-evidence about the reliability of the evidence (i.e. the LR calculation model) and finding it less convincing? And even if they

aresimply picking whichever one came first - might this be a heuristic that actuallyworksfor some reason? Isn't it the case that older hypotheses tend to have survived more validity checks? If you don'tknowwhich has the stronger meta-evidence, isn't age a plausible and easily-determined heuristic to use?I would agree that sticking doggedly to your first hypothesis whatever it is and bending any subsequent evidence to fit is a bad idea. But I'd say the same of doggedly applying your initial LR assumptions in the face of evidence that there's something wrong with them.

As I say, I don't know. I'm just raising the possibility that there might be more sense to cognitive biases than is apparent at first glance. Life is complicated.

@NiV:

Don't determine the LR based on our priors: don't set value of P(H|E)/P(H|~E) to current P(H).

A paternity test w/ LR = 19 is one in which the probability of a match conditional on the testee being the father is 19x greater than the probability of a match conditional on the testee *not* being the father. This is conventionally referred to as "0.95 accuracy" -- the same way a drug test that has LR =19 w/r/t hypothesis that the testee used drugs is called "0.95 accurate." Amazingly, it has been shown that a large percentage of physicians don't understand that the when someone tests positive, say, for HIV, the test's "0.99 accuracy rate" (99 true to 1 false positive; 99 true negative to 1 false negative--they are about same for the test) furnishes only a likelihood ratio --& that if the test was administered to someone in random screening the probability the individual has HIV is actually low (b/c the population frequency for HIV is so low; if person isn't in high risk group, the likelihood he's a true positive under those circumstances is very very small).

There's nothing particularly surprising about an infertile man being identified as father by a paternity test that has "0.95 accuracy," i.e., LR = 19. Of every 20 infertile men, 1 will be falsely identified as the father by such a test (one that until not so many yrs ago was standard -- & routinely admissible in evidence despite the inevitable confusion of jurors of 0.95 w/ "probability of paternity."

If in fact a man is infertile, then there is zero probability that he is the father, of course -- a probability that doesn't change when we know the result of the LR = 19 paternity test.

Likewise if he was on the opposite side of the planet when the child is conceived.

If jury thinks that it can discount an alibi or ignore evidence of infertility b/c it thinks the paternity test is "so strong," then it is really making a huge huge huge huge huge error.

It

isexactly the very error that I am talking about -- as it would appear as a single instance of a pattern of reasoning I'm describing in generalAlso, I'm sure you recognize (but could easily have been misunderstood to be saying), P < 0.05 is definitely not equivalent to a likelihood ratio of 19; the two are not even commensurable. LRs are a metric for assessing the weight of the evidence in relation to a hypothesis-- a p-value definitely isn't. That most people who use statistics don't get this is a very sad thing indeed.

In any case, the time to "stop looking" for more evidence is when the marginal expected value of additional evidence =0. LRs & p-values can't tell you that. You need to figure that out w/ extneral normative framework that gives you a metric for cost of error & cost of more evidence. The metric implicit in a trial says that the jury certainly can't "stop" until all the evidence has been produced.

"Don't determine the LR based on our priors: don't set value of P(H|E)/P(H|~E) to current P(H)."Did you mean P(E|H)/P(E|~H)?

"A paternity test w/ LR = 19 is one in which the probability of a match conditional on the testee being the father is 19x greater than the probability of a match conditional on the testee *not* being the father. This is conventionally referred to as "0.95 accuracy""Interesting. That would imply a test that said "yes" 0.95% of the time for a father, and 0.05% of the time for a non-father, would count as "95% accurate". I have to say, I think that would be very confusing to the layman ("Less than 1% chance of correctly identifying the father?!"), and I have to say it's a usage I've not seen before. Nevertheless, we all live and learn.

"Of every 20 infertile men, 1 will be falsely identified as the father by such a test"In the above situation of every 2000 infertile men, one will be incorrectly identified as the father.

"If in fact a man is infertile, then there is zero probability that he is the father, of course"Not true since the day they invented artificial insemination with frozen sperm. He may be infertile

now...There was a case in this country not so long ago where a woman who had recently had an acrimonious divorce from her husband nevertheless subsequently used previously frozen sperm of his to have a child, forging his signature on the permission slip. He was understandably annoyed, claiming he had in effect been 'forced' to conceive a child against his will. It's an interesting question whether he owes her patrimony payments...

"If jury thinks that it can discount an alibi or ignore evidence of infertility b/c it thinks the paternity test is "so strong," then it is really making a huge huge huge huge huge error."It depends whether they think the evidence of infertility is equally strong. There are numerous reported cases of men conceiving after having had a vasectomy. It's a procedure with a known failure rate. (1 in 2000.)

"In any case, the time to "stop looking" for more evidence is when the marginal expected value of additional evidence =0. LRs & p-values can't tell you that. You need to figure that out w/ extneral normative framework that gives you a metric for cost of error & cost of more evidence."Good point. You're quite right.

@NiV:

Yes, P(E|H)/P(E|~H) is LR.

I don't know what I'd tell someone about the "accuracy rate" of a test that correctly id's true father 0.95% of time and falsely ids nonfather as father 0.05%.

But I do know that if someone tested "positive" under such a test, that would give me exactly as much additonal reason to view him as the father as would his testing "positive" under a test that correctly ids true father 95% of time and falsely ids nonfather as father 5%.

It's confusing to call an LR an "accuracy rate" no matter what P(E|H) and P(E|~H) are....