Let's take it from the top. When we examine any study, there are a number of questions that we ask before we even engage in incorporating its results in our meta-knowledge of the subject. Once we have determined that the study indeed asks a valid and important question, we focus on the methods. It is here that we talk about threats to validity, of which the traditional quartet consists of bias, confounding, misclassification and generalizability. Bias generally refers to a systematic error in the design, conduct or analysis of a study that gives us an erroneous estimate of the association between the exposure and the outcome. We had a great example of it right here on Healthcare, etc., over the last few days. If you did not know better, from reading the comments left by people in the last week, you would think that the general population consensus was that I was an idiot not worthy of a cogent debate. But in reality, once we dig a tad deeper, this was certainly not a random sample of comment, but was highly skewed to the particular rhetoric encouraged on the blog of origin, so not at all representative of the general population. This was a classic example of a selection bias, which, if not recognized, would have led to an erroneous interpretation. Confounding refers to factors that are related to both, the exposure and the outcome. A classic example of a confounder is cigarette smoking as it impacts the relationship between drinking and the development of head and neck cancers. Misclassification is just that, classifying the primary exposure, a confounder or an outcome as being present, when in fact they are not, or vice versa. I discussed the issue of misclassification in my recent post on astroturfing diseases. Finally, generalizability refers to whether or not the results apply to a broad swath of a population or only to some narrowly defined groups. In some ways, bias and generalizability can be related. So, my example of the comments I have received is also a good example of generalizability, where, since the commenters represented a narrowly selected group, their feelings are not broadly generalizable to the population at large.
So, these are the beloved tangible pitfalls that we get bogged down in when evaluating studies. But there are other more insidious sources of bias that are only now beginning to surface in methodologists' discussions. So nefarious are they that we work hard to convince ourselves that we are not subject to them (but of course our colleagues are). Ted Kaptchuk at Harvard has done a lot of work in this area. A few words about Ted: if you look him up, his credentials come up as O.D.M., which stands for Oriental Doctor of Medicine. Ted did his ODM studies in a 5-year Oriental Medicine program in China. If this very fact discredits all of his work for you, dear reader, you are then a perfect example of some of the cognitive biases that he researches. So, if you can bring yourself to keep reading, you might get a greater insight into your aversion to some ideas, a reaction perhaps not quite as rational as you think.
Here is a paper that Dr. Kaptchuk published in BMJ in 2003, which summarizes well what we know about a type of cognitive bias called interpretation bias, and thus helps us guard against it. The first paragraph of the paper sets the stage:
Facts do not accumulate on the blank slates of researchers' minds and data simply do not speak for themselves. Good science inevitably embodies a tension between the empiricism of concrete data and the rationalism of deeply held convictions. Unbiased interpretation of data is as important as performing rigorous experiments. This evaluative process is never totally objective or completely independent of scientists' convictions or theoretical apparatus. This article elaborates on an insight of Vandenbroucke, who noted that “facts and theories remain inextricably linked… At the cutting edge of scientific progress, where new ideas develop, we will never escape subjectivity.” Interpretation can produce sound judgments or systematic error. Only hindsight will enable us to tell which has occurred. Nevertheless, awareness of the systematic errors that can occur in evaluative processes may facilitate the self regulating forces of science and help produce reliable knowledge sooner rather than later.Does this sound familiar? I blogged about this a few months back, where I maintained that evidence accumulation tends to be unidirectional. Here is what I wrote:
The scientific community, based on some statistical and other methodological considerations has come to a consensus around what constitutes valid study designs. This consensus is based on a profound understanding of the tools available to us to answer the questions at hand. The key concept here is that of "available tools". As new tools become available, we introduce them into our research armamentarium to go deeper and further. What we need to appreciate, however, is that "deeper" and "further" are directional words: they imply the same direction as before, only beyond the current stopping point. This is a natural way for us to think, since even our tools are built on the foundation of what has been used previously.But enough self-quoting; back to Kaptchuk. Having identified the premise, he goes on
The interaction between data and judgment is often ignored because there is no objective measure for the subjective components of interpretation. Taxonomies of bias usually emphasise technical problems that can be fixed. The biases discussed below, however, may be present in the most rigorous science and are obvious only in retrospect.This is critical. So, the impact of our preconceived notions on how we interpret and assimilate new data is essentially ignored because we do not have a good way to measure it. Now, mind you, this tells us nothing of the magnitude of this potential impact, since we have not yet measured it, but merely that it is there.
In the next section of the paper he discusses the relationship between quality assessment and confirmation bias. What underlies confirmation bias is exactly our own preconceived notions of what is correct. In other words, we are likely to scrutinize more thoroughly the results that disagree with our understanding of the subject than those that agree with it:
This scrutiny, however, may cause a confirmation bias: researchers may evaluate evidence that supports their prior belief differently from that apparently challenging these convictions. Despite the best intentions, everyday experience and social science research indicate that higher standards may be expected of evidence contradicting initial expectations.Does this sound familiar? Remember the USPSTF mammography recommendation debate? How about hormone replacement therapy? In colloquial terms, we are just much more likely to poke holes and reject anything that disagrees with what we think we know. This is one of the difficulties in advancing scientific knowledge, since all of us (myself included) are much more skeptical of that which contradicts than of what confirms our beliefs. How many of us have found ourselves say "I do not believe these data" because they have contradicted our previously held notions? And how often do we nod vigorously and agree with the data that confirm them? To deny this is simply silly and dishonest. And if this is too anecdotal for you, Kaptchuk cites some experimental evidence that confirms this:
The next two related cognitive biases examined is in the context of expectation of a result, iare called rescue and auxiliary hypothesis biases.Two examples might be helpful. Koehler asked 297 advanced university science graduate students to evaluate two supposedly genuine experiments after being induced with different “doses” of positive and negative beliefs through false background papers. Questionnaires showed that their beliefs were successfully manipulated. The students gave significantly higher rating to reports that agreed with their manipulated beliefs, and the effect was greater among those induced to hold stronger beliefs. In another experiment, 398 researchers who had previously reviewed experiments for a respected journal were unknowingly randomly assigned to assess fictitious reports of treatment for obesity. The reports were identical except for the description of the intervention being tested. One intervention was an unproved but credible treatment (hydroxycitrate); the other was an implausible treatment (homoeopathic sulphur). Quality assessments were significantly higher for the more plausible version.
Experimental findings are inevitably judged by expectations, and it is reasonable to be suspicious of evidence that is inconsistent with apparently well confirmed principles. Thus an unexpected result is initially apt to be considered an indication that the experiment was poorly designed or executed. This process of interpretation, so necessary in science, can give rise to rescue bias, which discounts data by selectively finding faults in the experiment. Although confirmation bias is usually unintended, rescue bias is a deliberate attempt to evade evidence that contradicts expectation.One example of rescue bias cited by Kaptchuk was the letters to the editor generated by a vintage study in the 1970s that showed that a coronary artery bypass was no better than medical treatment among veterans in a randomized controlled trial. And the endpoint was the hardest one we have: death. And here is what's particularly telling about the rescue bias in these debates:
Instead of settling the clinical question, the trial spurred fierce debate in which supporters and detractors of the surgery perceived flaws that, they claimed, would skew the evidence away from their preconceived position. Each stakeholder found selective faults to justify preexisting positions that reflected their disciplinary affiliations (cardiology v cardiac surgeon), traditions of research (clinical v physiological), and personal experience.And again, echoes of the ongoing mammography debate should come through loud and clear. And from these fiercely held views springs an additional and related cognitive bias, the auxiliary hypothesis bias. It is characterized by mental contortions that we go through to come up with a set of different experimental conditions that would have resulted in a different outcome, one held dear by us. And this, folks, is exactly what is still happening in the HRT world, where the mammoth WHI randomized controlled trial discredited its use. Instead, the committed proponents are still lamenting the fact that the population was too old, the dose and composition of the therapy was wrong, etc., etc. And mind you, all of these hypotheses may be worth exploring, but my bet is that, had the trial results conformed to the prior expectations, the scrutiny of the design would be orders of magnitude weaker. Tell me, be honest now: did you engage in an auxiliary hypothesis bias when reading about the experiments conducted among students? Bet you did. This is how susceptible we are to this stuff!
Since this post is already getting a bit lengthy, I think that I will stop here for now, and keep you in suspense about other cognitive biases for a day or two. In the meantime, see if you can remember a time when you were guilty of interpretation, confirmation, rescue or auxiliary hypothesis bias yourself. Or perhaps you have never engaged in such self-deception, but your colleagues have for sure.
In the meantime, all of the stuff we talk about on this blog should really give the reader a picture of the complexity of ideas we are faced with every day. Neither the press nor academic reductionism can possibly capture this complexity fairly. So, again, we are faced with rampant uncertainties, which for me make the science ever more exciting. So, go forth and keep an open mind.