Healthcare, etc.: 10/1/10

Sunday, October 31, 2010

Matthew Taylor: 21st century enlightenment

For the next few days I am going to be in a bunch of very busy meetings, so my posts will either be shorter (if I can manage that :)) or just different. Here is the first installment, very much apropos the events of the past week. Enjoy!

Saturday, October 30, 2010

Cognitive biases in medicine, part deux

In yesterday's post we summarized some of the conventional biases we guard against in clinical research, but then went on to open the Pandora's box of cognitive biases, which we do not look for routinely, as we have no good way of measuring them. The quartet we discussed yesterday are:
1. Interpretation bias: Our interpretation of the validity of data and of the data themselves relies upon our own preconceived judgments and notions of the field.
2. Confirmation bias: We tend to be less skeptical and therefore less critical of a study that supports than one that goes against what we think we know and understand.
3. Rescue bias: When a study does not conform to our preconceived ideas, we tend to find selective faults withe study in order to rescue our notions.
4. Auxiliary hypothesis bias: To summarize this briefly, it the "wrong patient population and wrong dose bias".

And although none of these is particularly surprising, especially given our predictably irrational human nature, they are certainly worth examining critically, particularly in the context of how we and our colleagues interpret data and how data are communicated to public at large. It is necessary to acknowledge that these exist in order to understand the complexities of clinical science and its uses and mis-uses. So, let's move on to the three remaining types of bias that Kaptchuk talks about in his paper.

As if it was not enough that we will defend our preconceived notions tooth and nail, that we will make up stories for why they may have been refuted, and maintain that a different design would fix the universe, we also fall prey to the so-called plausibility or mechanism bias. What the heck is this? Simply put, "evidence is more easily accepted when supported by accepted scientific mechanisms". Well, no duh, as my kids would say. I fear that I cannot improve on the examples Ted cites in the paper, so will just quite them:

For example, the early negative evidence for hormone replacement therapy would have undoubtedly been judged less cautiously if a biological rationale had not already created a strong expectation that oestrogens would benefit the cardiovascular system. Similarly, the rationale for antiarrhythmic drugs for myocardial infarction was so imbedded that each of three antiarrhythmic drugs had to be proved harmful individually before each trial could be terminated. And the link between Helicobacter pylori and peptic ulcer was rejected initially because the stomach was considered to be too acidic to support bacterial growth.

Of course, you say, this is how science continues on its evolutionary path. And of course, you say, we as real scientists are first to acknowledge this fact. And while this is true, and we should certainly be saluted for this forthright admission and willingness to be corrected, the trouble is that this tendency does not become clear until it can be visualized through the 20/20 lens of the retrospectoscope. When we are in the midst of our debates, we frequently cite biologic plausibility or our current understanding of underlying mechanisms as a barrier to accepting not just a divergent result, but even a new hypothesis. I touched upon this when mentioning equipoise here, as well as in my exploration of Bayesian vs. frequentist views here. In fact, John Ioannidis' entire argument as to why most of clinical research is invalid, which I blogged about a few weeks ago, relies on this Bayesian kind of thinking of the established prior probability of an event. So entrenched are we in this type of thinking that we feel scientifically justified to poo-poo anything without an apparent plausible mechanistic explanation. And experience cited above provides evidence that at least some of the time we will have to get used to how we look with egg on our faces. Of course, the trouble is that we cannot prospectively know what will turn out to be explainable and what will remain complete hoo-ha. My preferred way of dealing with this, one of many uncertainties, is just to acknowledge the possibility. To do otherwise seems somehow, I don't know, arrogant?

All right, point made, let's move on to the colorfully named "time will tell" bias. This refers to the different thresholds for the amount of evidence we have for accepting something as valid. This in and of itself does not seem all that objectionable, if it were not for a couple of features. The first is the feature of extremes (extremism?). An "evangelist" jumps on the bandwagon of the data as soon as it is out. The only issue here is that an evangelist is likely to have a conflict of interest, either financial or professional or intellectual, where there is some vested interest in the data. One of these vested interests, the intellectual, is very difficult to detect and measure, to the point that peer-review journals do not even ask to disclose whether a potential for it exists. Yet, how can it not, when we build our careers on moving in a particular direction of research (yes, another illustration of its unidirectionality), and our entire professional (and occasionally personal) selves may be invested in proving it right? Anyhow, at the other extreme is the nay sayer who needs large heaps of supporting data before accepting the evidence. And here as well, same conflicts of interest abound, but in the direction away from what is being shown. To illustrate this, Kaptchuk gives one of my favorite quotes from Max Planck:

Max Planck described the “time will tell” bias cynically: “a new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.”

The final bias we will explore is hypothesis and orientation bias. This is where the researcher's bias in the hypothesis affects how data are gathered in the study. The way I understand this type of bias makes me think that the infamous placebo effect fits nicely into this category, where, if not blinded to experimental conditions, we tend to see effects that we want to see. You would think that blinding would alleviate this type of a bias, yet, as Kaptchuk sites, even if blinded, for example RCTs sponsored by pharmaceutical companies are more likely to be positive than those that are not. To be fair (and he is), the anatomy of this discrepancy is not well understood. As he mentions, this may have something to do with a publication bias, where negative trials do not come to publication either because of methodological flaws or due to suppression of (or sometimes, with less malfeasance even due to apathy about) negative data. I am not convinced that this finding is not due to the fact that folks who design trials for pharma are just not better at it than non-pharma folks, but this is a different discussion altogether.

So here we are, the end of this particular lesson. And lest we walk away thinking that on the one hand we know nothing or on the other, this is just a nutty guy and we should not pay attention, here is the concluding paragraph from the paper, which should be eerily familiar; I will let it be the final word.

I have argued that research data must necessarily undergo a tacit quality control system of scientific scepticism and judgment that is prone to bias. Nonetheless, I do not mean to reduce science to a naive relativism or argue that all claims to knowledge are to be judged equally valid because of potential subjectivity in science. Recognition of an interpretative process does not contradict the fact that the pressure of additional unambiguous evidence acts as a self regulatingmechanism that eventually corrects systematic error. Ultimately, brute data are coercive. However, a view that science is totally objective is mythical, and ignores the human element of medical inquiry. Awareness of subjectivity will make assessment of evidence more honest, rational, and reasonable.

Friday, October 29, 2010

Cognitive biases in medicine

Since I have been so focused on the foibles of the human interpretive abilities lately, I though it might be a good time to review thoroughly some of the evidence for how we interpret scientific data. The interactions of the last few days have really put some of the misconceptions about clinical science front and center for me. And although people passionately claim to be objective and skeptical about scientific claims, the limitations of human neuro-wiring deceive us every time.

Let's take it from the top. When we examine any study, there are a number of questions that we ask before we even engage in incorporating its results in our meta-knowledge of the subject. Once we have determined that the study indeed asks a valid and important question, we focus on the methods. It is here that we talk about threats to validity, of which the traditional quartet consists of bias, confounding, misclassification and generalizability. Bias generally refers to a systematic error in the design, conduct or analysis of a study that gives us an erroneous estimate of the association between the exposure and the outcome. We had a great example of it right here on Healthcare, etc., over the last few days. If you did not know better, from reading the comments left by people in the last week, you would think that the general population consensus was that I was an idiot not worthy of a cogent debate. But in reality, once we dig a tad deeper, this was certainly not a random sample of comment, but was highly skewed to the particular rhetoric encouraged on the blog of origin, so not at all representative of the general population. This was a classic example of a selection bias, which, if not recognized, would have led to an erroneous interpretation. Confounding refers to factors that are related to both, the exposure and the outcome. A classic example of a confounder is cigarette smoking as it impacts the relationship between drinking and the development of head and neck cancers. Misclassification is just that, classifying the primary exposure, a confounder or an outcome as being present, when in fact they are not, or vice versa. I discussed the issue of misclassification in my recent post on astroturfing diseases. Finally, generalizability refers to whether or not the results apply to a broad swath of a population or only to some narrowly defined groups. In some ways, bias and generalizability can be related. So, my example of the comments I have received is also a good example of generalizability, where, since the commenters represented a narrowly selected group, their feelings are not broadly generalizable to the population at large.

So, these are the beloved tangible pitfalls that we get bogged down in when evaluating studies. But there are other more insidious sources of bias that are only now beginning to surface in methodologists' discussions. So nefarious are they that we work hard to convince ourselves that we are not subject to them (but of course our colleagues are). Ted Kaptchuk at Harvard has done a lot of work in this area. A few words about Ted: if you look him up, his credentials come up as O.D.M., which stands for Oriental Doctor of Medicine. Ted did his ODM studies in a 5-year Oriental Medicine program in China. If this very fact discredits all of his work for you, dear reader, you are then a perfect example of some of the cognitive biases that he researches. So, if you can bring yourself to keep reading, you might get a greater insight into your aversion to some ideas, a reaction perhaps not quite as rational as you think.

Here is a paper that Dr. Kaptchuk published in BMJ in 2003, which summarizes well what we know about a type of cognitive bias called interpretation bias, and thus helps us guard against it. The first paragraph of the paper sets the stage:

Facts do not accumulate on the blank slates of researchers' minds and data simply do not speak for themselves. Good science inevitably embodies a tension between the empiricism of concrete data and the rationalism of deeply held convictions. Unbiased interpretation of data is as important as performing rigorous experiments. This evaluative process is never totally objective or completely independent of scientists' convictions or theoretical apparatus. This article elaborates on an insight of Vandenbroucke, who noted that “facts and theories remain inextricably linked… At the cutting edge of scientific progress, where new ideas develop, we will never escape subjectivity.” Interpretation can produce sound judgments or systematic error. Only hindsight will enable us to tell which has occurred. Nevertheless, awareness of the systematic errors that can occur in evaluative processes may facilitate the self regulating forces of science and help produce reliable knowledge sooner rather than later.

Does this sound familiar? I blogged about this a few months back, where I maintained that evidence accumulation tends to be unidirectional. Here is what I wrote:

The scientific community, based on some statistical and other methodological considerations has come to a consensus around what constitutes valid study designs. This consensus is based on a profound understanding of the tools available to us to answer the questions at hand. The key concept here is that of "available tools". As new tools become available, we introduce them into our research armamentarium to go deeper and further. What we need to appreciate, however, is that "deeper" and "further" are directional words: they imply the same direction as before, only beyond the current stopping point. This is a natural way for us to think, since even our tools are built on the foundation of what has been used previously.

But enough self-quoting; back to Kaptchuk. Having identified the premise, he goes on

The interaction between data and judgment is often ignored because there is no objective measure for the subjective components of interpretation. Taxonomies of bias usually emphasise technical problems that can be fixed. The biases discussed below, however, may be present in the most rigorous science and are obvious only in retrospect.

This is critical. So, the impact of our preconceived notions on how we interpret and assimilate new data is essentially ignored because we do not have a good way to measure it. Now, mind you, this tells us nothing of the magnitude of this potential impact, since we have not yet measured it, but merely that it is there.

In the next section of the paper he discusses the relationship between quality assessment and confirmation bias. What underlies confirmation bias is exactly our own preconceived notions of what is correct. In other words, we are likely to scrutinize more thoroughly the results that disagree with our understanding of the subject than those that agree with it:

This scrutiny, however, may cause a confirmation bias: researchers may evaluate evidence that supports their prior belief differently from that apparently challenging these convictions. Despite the best intentions, everyday experience and social science research indicate that higher standards may be expected of evidence contradicting initial expectations.

Does this sound familiar? Remember the USPSTF mammography recommendation debate? How about hormone replacement therapy? In colloquial terms, we are just much more likely to poke holes and reject anything that disagrees with what we think we know. This is one of the difficulties in advancing scientific knowledge, since all of us (myself included) are much more skeptical of that which contradicts than of what confirms our beliefs. How many of us have found ourselves say "I do not believe these data" because they have contradicted our previously held notions? And how often do we nod vigorously and agree with the data that confirm them? To deny this is simply silly and dishonest. And if this is too anecdotal for you, Kaptchuk cites some experimental evidence that confirms this:

Two examples might be helpful. Koehler asked 297 advanced university science graduate students to evaluate two supposedly genuine experiments after being induced with different “doses” of positive and negative beliefs through false background papers. Questionnaires showed that their beliefs were successfully manipulated. The students gave significantly higher rating to reports that agreed with their manipulated beliefs, and the effect was greater among those induced to hold stronger beliefs. In another experiment, 398 researchers who had previously reviewed experiments for a respected journal were unknowingly randomly assigned to assess fictitious reports of treatment for obesity. The reports were identical except for the description of the intervention being tested. One intervention was an unproved but credible treatment (hydroxycitrate); the other was an implausible treatment (homoeopathic sulphur). Quality assessments were significantly higher for the more plausible version.

The next two related cognitive biases examined is in the context of expectation of a result, iare called rescue and auxiliary hypothesis biases.

Experimental findings are inevitably judged by expectations, and it is reasonable to be suspicious of evidence that is inconsistent with apparently well confirmed principles. Thus an unexpected result is initially apt to be considered an indication that the experiment was poorly designed or executed. This process of interpretation, so necessary in science, can give rise to rescue bias, which discounts data by selectively finding faults in the experiment. Although confirmation bias is usually unintended, rescue bias is a deliberate attempt to evade evidence that contradicts expectation.

One example of rescue bias cited by Kaptchuk was the letters to the editor generated by a vintage study in the 1970s that showed that a coronary artery bypass was no better than medical treatment among veterans in a randomized controlled trial. And the endpoint was the hardest one we have: death. And here is what's particularly telling about the rescue bias in these debates:

Instead of settling the clinical question, the trial spurred fierce debate in which supporters and detractors of the surgery perceived flaws that, they claimed, would skew the evidence away from their preconceived position. Each stakeholder found selective faults to justify preexisting positions that reflected their disciplinary affiliations (cardiology v cardiac surgeon), traditions of research (clinical v physiological), and personal experience.

And again, echoes of the ongoing mammography debate should come through loud and clear. And from these fiercely held views springs an additional and related cognitive bias, the auxiliary hypothesis bias. It is characterized by mental contortions that we go through to come up with a set of different experimental conditions that would have resulted in a different outcome, one held dear by us. And this, folks, is exactly what is still happening in the HRT world, where the mammoth WHI randomized controlled trial discredited its use. Instead, the committed proponents are still lamenting the fact that the population was too old, the dose and composition of the therapy was wrong, etc., etc. And mind you, all of these hypotheses may be worth exploring, but my bet is that, had the trial results conformed to the prior expectations, the scrutiny of the design would be orders of magnitude weaker. Tell me, be honest now: did you engage in an auxiliary hypothesis bias when reading about the experiments conducted among students? Bet you did. This is how susceptible we are to this stuff!

Since this post is already getting a bit lengthy, I think that I will stop here for now, and keep you in suspense about other cognitive biases for a day or two. In the meantime, see if you can remember a time when you were guilty of interpretation, confirmation, rescue or auxiliary hypothesis bias yourself. Or perhaps you have never engaged in such self-deception, but your colleagues have for sure.

In the meantime, all of the stuff we talk about on this blog should really give the reader a picture of the complexity of ideas we are faced with every day. Neither the press nor academic reductionism can possibly capture this complexity fairly. So, again, we are faced with rampant uncertainties, which for me make the science ever more exciting. So, go forth and keep an open mind.

Thursday, October 28, 2010

Clarifying my views on chicken pox vaccination

First of all, let me say that the discussion with Dr. Novella and Orac has been partly enjoyable, as in sharpening my debate skill, and partly like a school yard brawl, which I will invariably lose because I am not interested in that kind of a zero-sum game. It is all well and good to say that your tone is your tone, but it is an altogether different matter when personal attacks are involved. I know, I started it. But did I really? Well, no matter, that is quite unimportant.

Here are a couple of interesting bits. First of all, go to this blog, where Eddy Jenner, a clinician in Australia, blogs about his encounters with EBM at the bedside -- he provides a fascinating, well-reasoned and well-read perspective. His most recent post is illuminating particularly in the context of the current discussion. It reminded me that I am much more interested in general in improving everyone's understanding of how we do clinical research than debunking alternative approaches.

Next, I want to articulate why it is I think that chicken pox and HPV vaccines strike me as being less straight-forward than, say smallpox, polio or pertussis. There are 3 things to remember about medicine as a science:

1. With every intervention's benefit there is also a risk of an adverse event.

2. What we think we know today will be different in a decade.

3. It is the obligation of medicine first to do no harm.

None of the 3 statements is particularly controversial or new, and I think everyone can agree on them. So, what?

Here is the so what. Since what we know today is necessarily incomplete, it is quite probable that there are many risks to our treatments, which today we just do not have the data to understand, but will be known in the future. Alas, we do not have a crystal ball to see what is coming down the pike, so, being circumspect about what we are so sure about today should be the norm. For example, when the risk of serious complications from a disease is extremely high (think smallpox, polio, diarrheal diseases in Africa, etc.), then, if the treatment (vaccine) diminishes this risk substantially (how about no more smallpox?), and there is no immediate reason to think that the risk of that treatment is overwhelming, then the risk-benefit equation is hard to tilt away from the benefit. However, when the risks of the complications or death from a particular disease are not that high (relatively speaking, of course), then one really has to examine the risks of the intervention with a much finer lens. And indeed, here is a study on chicken pox vaccination and deaths that showed that from pre-vaccination period to post-vaccination period deaths declined on average from 145 to 66 per year. So, that's pretty good, if there were absolutely no deaths associated with the vaccine itself. But invariably, there are, and this is less a function of the vaccine safety than it is of the human substrate that is being injected. In fact if you check CDC Wonder's VAERS, you will see that on average there are about 14 annual deaths that may be related to varicella vaccination. Now you may say that the balance of this intervention is in the direction of the benefit, and you will be correct. But the magnitude of this benefit makes me a bit cautious. If we were talking in purely scientific terms, I would worry about the possibility of type I error, where the difference is really there by random chance. But we also know that we are not all rationality and science, and there is a lot of emotion involved in this debate. So, it is easy to see how these numbers lead to a variety of interpretations and opinions. And add to this the idea that it is not inconceivable that some other safety signal may come along in the future, and it becomes patently obvious why the issue is difficult to reduce to either ethics or idiocy when one is really committed to science and its principal nostrum of "first do no harm".

Wednesday, October 27, 2010

A teachable moment

There are two things I want to discuss in this post. Though at first glance seemingly unrelated, they will fit together in the end.

As I already mentioned here, I utterly delighted in Dan Ariely's erudite account of our predictable irrationality. In this book, he reminded me of a thought experiment initially conceived by the recently deceased philosopher Philippa Foot. The problem is usually referred to as "the trolley problem", and can be presented in several ways. Here is how Joshua Greene at Harvard formulates it:

   First, we have the switch dilemma: A runaway trolley is hurtling down the tracks toward five people who will be killed if it proceeds on its present course. You can save these five people by diverting the trolley onto a different set of tracks, one that has only one person on it, but if you do this that person will be killed. Is it morally permissible to turn the trolley and thus prevent five deaths at the cost of one?   Most people say "Yes."

   Then we have the footbirdge dilemma: Once again, the trolley is headed for five people. You are standing next to a large man on a footbridge spanning the tracks. The only way to save the five people is to push this man off the footbridge and into the path of the trolley. Is that morally permissible? Most people say "No."

    These two cases create a puzzle for moral philosophers: What makes it okay to sacrifice one person to save five others in the switchcase but not in the footbridge case? There is also a psychological puzzle here: How does everyone know (or "know") that it's okay to turn the trolley but not okay to push the man off the footbridge?

Indeed, if the sum total of deaths is the same, how is it that one seems OK, while the other does not? Is this just a bunch of irrational emotional baloney, or is there science that can shed some light on it? Well, as it turns out, Greene has postulated the "dual process theory" of moral judgment. He has performed brain imaging to understand better what is happening physiologically, and here is what he is finding:

According to my dual-process theory of moral judgment, our differing responses to these two dilemmas reflect the operations of at least two distinct psychological/neural systems. On the one hand, there is a system that tends to think about both of these problems in utilitarian terms: Better to save as many lives as possible. The operations of this system are more controlled, perhaps more reasoned, and tend to be relatively unemotional. This system appears to depend on the dorsolateral prefrontal cortex, a part of the brain associated with "cognitive control" and reasoning.

On the other hand, there is a different neural system that responds very differently to these two dilemmas. This system typically responds with a relatively strong, negative emotional response to the action in the footbridge dilemma, but not to the action in the switch dilemma. When this more emotional system is engaged, its responses tend to dominate people's judgments, explaining why people tend to make utilitarian judgments in response to the switch dilemma, but not in response to the footbridge dilemma.

So, here is the science to explain what we already know intuitively. It underscores just how difficult risk-benefit decisions are, and that if we only take into account our rational selves, we will always walk away puzzled as to why it is that there is controversy about stuff, such as, say, interventions in healthcare.

So, this is point one. Point two illustrates how counterproductive acrimony can be in trying to achieve some common ground. There is a response from Orac on Dr. Novella's site here, which calls me on the use of the adjective "rabid" to describe vaccination "defenders". Without explicitly stating it, he also continues to equate me with "anti-vaxers". But also does something else: he provides a link to a previous post that I did in the middle of the H1N1 pandemic. Please, indulge me for a moment while I quote myself:

The H1N1 pandemic is bringing into focus not only the world's vulnerabilities vis a vis the spread of an infectious disease, but also our complete lack of a framework in which to make rational choices about prevention. The cornerstone of preventive efforts for any significant infectious disease is vaccination of a large swath of the population. The rapid development and approval of the H1N1 vaccines is both a blessing and a curse, given the sophomoric level of discussion about its risks and benefits.

Does this look like I am anti-vaccination? Indeed, it looks to me, and this reflects my intellectual stand, that I am in fact advocating for vaccination, as well as for bringing our vaccine discussion to a more scientifically literate level. Feel free to read the entire post, if you do not believe me. But what is obvious is that my name-calling resulted in people, who consider themselves otherwise rational, paying attention only to the insult. Thus, I negated the substance of my post just with one poorly chosen word. Is this a rational reaction to what is otherwise a perfectly pro-vaccination stand? I do not think so, but what it is is a human reaction, and in that it is perfectly understandable and, dare I say it, predictable.

In fact, the post in question ties the two seemingly unrelated ideas, as promised at the beginning. Although admittedly not very skillfully, in my vaccination post I illustrated the trolley problem, where the accounting of the balance of risk-benefit is not at all straight-forward. I will probably address it more extensively in a future blog post. On the other hand, I also showed how one single insult in a 500-word essay can become the only take-away, thus detracting from the overall message. This is exactly the reason for us to stop the name-calling by both sides, and engage in a discussion to arrive at some mutual understanding of what is really a worthwhile exchange of ideas.

Addressing the comments

A very lively discussion indeed! Thanks to everyone who contributed, and thanks for keeping it civil, mostly.

A couple of thoughts on some of the comments:

1. Craigmont: "Between the testable and the untestable; between science and woo; there can be no middle ground. You're going to have to pick a side."

Really? There is no middle ground? I have to choose a side? Really? I am pretty sure that this is how we advance a discussion or knowledge. Seems to me that this is a good way to get elected to office these days, as well as sell news. But does it really get us to a better, more advanced place? I think not.

2. Timm: "You may not have meant the term "allopathic medicine as derogatory, but that is indeed, what it is. It is a slight, a slur, a marginalization.Many people these days recognize what it refers to and do not (as you) intend it as a slur, but it remains a belittling term."

OK, Timm, the point is clear, even though there is nothing that seems to suggest its derogatory nature other than who coined it. This notwithstanding, I am perfectly happy to respect your experience of the word and not use it in any way. But here is what I need from you: 1). Please, tell me what term you would like me to apply to Western medicine that differentiates it from other? and 2). Do you think that "woo" is a respectful way to refer to the other side? Or perhaps you think that "those people" do not deserve your respect, so it is OK to use derogatory shorthand. Either way, if we are interested in advancing the issue in some direction, there has to be a civil conversation between the opposing sides. So, I would suggest that we aim for that, and the way to start is to stop calling each other names, knowingly or not, as you pointed out.

3. Liz Ditz: "To me, high uptake rates for all vaccine-preventable illnesses, including those you characterize as minor (varicella or chicken pox) are a social justice issue. The social and economic costs burden of vaccine-preventable diseases falls disproportionately on those least able to pay for them: the poor and the working poor."

Liz, can you please say more about what you mean? I think that I understand, but want to be sure that we are talking about the same thing. Thanks.

4. Ian Monroe: "They are constantly talking about what a complicated process science is. And it is a process, not an answer. Of course the media is pretty much a four-letter word on their blog."

Yes, Ian, and I am constantly talking about it on my blog as well. And to me, even though I share similar understanding of evidence as the SBM group, my conclusions are different. And they are not only my conclusions. And this is why it is important to have this conversation: in science, as in anything else, you can look at the same data, and walk away with very different lessons. This is why I advocate an open-minded conversation between the opposing sides, rather than just throwing grenades at each other across an imaginary separating line.

5. Calli Arcale: "Homeopathy, in the traditional sense, should be harmless except insofar as it causes people to delay effective treatment. However, what is sold today is not strictly homeopathy in the traditional sense."

Dear Calli, can you show me a study or a body of evidence that indicates the presence of a delay, the magnitude of that delay and the real consequences of it (i.e., morbidity and mortality)? Something that can be specifically attributed to it, rather than the patient's own reluctance to present to a physician for a work-up? In other words, I am interested in the attributable outcomes of what you are referring to. As for your second point (and this was brought up by several commenters), can someone tell me how the mechanism of recalling the harmful stuff from the market is different from these preparations than it is for conventional pharmaceuticals? Thanks.

6. Opit: "Sorry. You've lost."

Really? Somehow I do not feel like I've lost. Somehow I feel like I've won. The discussion is enriched, the tone is more constructive, and we are actually getting to the issues. Contrary to our political discourse, this is not a zero sum game. My aim is not to walk away with the same opinions that I started out with or making the chasm between us more prominent. My desire is to understand the issues better and to have a respectful conversation about stuff that we feel is important. So, while I thank you for your strategic advice, I will not be following it.

7. moderation: "As to the varicella vaccine, I think you have fallen victim to 'I have not seen it, so it must not exist' syndrome."

Dear moderation, while it seems to be the habit to draw inferences about people in the conversations that I have seen in other blogs, I would prefer it if you did not do so here. Most of the time, as I try to teach to my students, these inferences will be incorrect. I think we should stick to the arguments that have been made, and if you want to extend it to inquiring as to whether or not I have fallen victim to denying the invisible (in this case a bad thing), please, inquire away first. Now you have to go back and come up with a different hypothesis, alas. And if you sense a little bit of snideness in my remark, please, forgive me this transgression, as I am so wary of people assigning labels to others based on what they want to see only and not in what is really there.

8. Orac: "Ah, yes, the 'just asking questions' gambit."

Really? Come on! You don't mean that asking questions is anti-science, do you?

I am grateful to see that Dr. Novella has posted a response to my response. His response seems measured and inviting to a respectful discussion, and I particularly appreciate that. I will take the time later to address some of his points further in a different post. In the meantime, let the discussion continue. I would love the answers to my questions above, so that I can get educated further on these issues.

As ever, thanks!

Monday, October 25, 2010

Furthering the discussion

My original intent was to go through Dr. Novella's and Orac's criticisms individually and take it from there. On second thought I decided not to take that approach. Instead, here is my response.

Firstly, I am grateful that there has been so much discussion about our views. Amid many valid points in their posts made with a skillful turn of the phrase, I saw quite a lot of sarcasm as well. I am sure that the tone of my original post is what incited it, and for that I am sorry: I really do want to have a civil discussion about these ideas, as I realize that we are all learning all the time, and the only way to gain a better understanding of a topic is through discourse. So, again, I apologize for setting the confrontational tone, and will try to avoid it in the future.

I do believe our views are more same than different. We both (SBM group and I) understand that science evolves, that evidence is not stagnant and the sense of certainty frequently conveyed to the lay public by the media is oftentimes misplaced. We simply disagree on the extent to which there is uncertainty in evidence. While it is true that the oft-cited 5-20% number representing the proportion of medical treatments having solid evidence behind them is very likely outdated, the kind of evidence we are talking about is a different matter.

In the hierarchy of evidence, depending on where you look, it is either meta-analyses or the randomized controlled trial that is the gold standard. The latter is a great proof of concept tool, but it is necessarily limited in its external validity, or generalizability. The reason for this is that these trials, frequently done for regulatory purposes, very limited types of patients, exert extremely stringent controls on the total care of the patient (or else are criticized for not doing so if they fail to do so), focus on short-term and surrogate outcomes (hence, the use of cholesterol lowering as a marker for cardiac mortality, for example), and do a fairly abysmal job as a rule considering the sources of heterogeneity of response. The interventions come to market and are typically used in a much broader population based on the evidence of the RCT. This paper, one of many in the same vein, is a nice illustration of a perennial problem with trial evidence, where real-world use of a therapy goes far beyond the available evidence. And although this paper addresses issues with evidence used for reimbursement, these are the same studies that feed guideline recommendations. Certainly meta-analyses, which are a way to combine the data from multiple RCTs in a systematic way, when done well can give us greater confidence of the direction and the magnitude of the treatment effect, but they in no way overcome the generalizability issues of their component RCTs.

The next rung of the evidence ladder is observational data, specifically cohort studies first prospective, then retrospective. I am actually a great fan of observational data, as I have mentioned in the past. Cohort studies give us the opportunity to examine what happens in the real world without imposing artificial conditions necessary in a clinical trial. Observational data can be great when describing epidemiology of a particular disease, the frequency of a given exposure, how different characteristics can modify the relationship between the exposure and the outcome. One of the most attractive features of cohort studies is that the population can be observed over long period of time -- just look at the Nurses' Study, the Framingham Cohort, and others. But these types of studies also have important limitations, and these are readily acknowledged as a heightened susceptibility to bias (especially in the retrospective studies), the possibility of misclassifying important events, and, despite our best efforts to adjust for it, residual confounding. I will come clean and admit my affection for observational data, even despite the fact that it is lower on the totem pole of evidence than an RCT. I really love this paper by Rothman and Greenland that takes a bird's eye view of our research debates. The whole paper really tickles the brain, but I will quote from a section of it briefly here:

Impossibility of Proof

Vigorous debate is a characteristic of modern scientific philosophy,no less in epidemiology than in other areas. Perhaps the mostimportant common thread that emerges from the debated philosophiesstems from 18th-century empiricist David Hume’s observationthat proof is impossible in empirical science. This simple factis especially important to epidemiologists, who often face thecriticism that proof is impossible in epidemiology, with theimplication that it is possible in other scientific disciplines.Such criticism may stem from a view that experiments are thedefinitive source of scientific knowledge. Such a view is mistakenon at least two counts. First, the nonexperimental nature ofa science does not preclude impressive scientific discoveries;the myriad examples include plate tectonics, the evolution ofspecies, planets orbiting other stars, and the effects of cigarettesmoking on human health. Even when they are possible, experiments(including randomized trials) do not provide anything approachingproof, and in fact may be controversial, contradictory, or irreproducible.The cold-fusion debacle demonstrates well that neither physicalnor experimental science is immune to such problems.

Some experimental scientists hold that epidemiologic relationsare only suggestive, and believe that detailed laboratory studyof mechanisms within single individuals can reveal cause–effectrelations with certainty. This view overlooks the fact thatall relations are suggestive in exactly the manner discussedby Hume: even the most careful and detailed mechanistic dissectionof individual events cannot provide more than associations,albeit at a finer level. Laboratory studies often involve adegree of observer control that cannot be approached in epidemiology;it is only this control, not the level of observation, thatcan strengthen the inferences from laboratory studies. Furthermore,such control is no guarantee against error. All of the fruitsof scientific work, in epidemiology or other disciplines, areat best only tentative formulations of a description of nature,even when the work itself is carried out without mistakes.

What follows in the hierarchy of evidence are case-control studies, done for some very specific reasons, then case reports and finally expert opinion. When evidence-based guidelines are developed, a comprehensive systematic literature review is undertaken, and all the evidence is examined and ranked. Based on these papers, a recommendation is made and a strength of this recommendation is reported based on the quality of the underlying evidence. This is an arduous and costly process, and it is commendable that it is undertaken. At the same time, given the limitations of the components of the guideline, the final product can be quite inconclusive or even misleading (I hate to bring it up, but look at the screening mammography debate, as well as the recent HRT recommendation reversal). I think it is obvious that I believe in the scientific method, I am simply not convinced that we have done such a great job generating trustworthy evidence in many instances. At the same time, I am not totally nihilistic about what we know, but am somewhere between thinking we have good evidence for a lot of stuff vs. not having any for anything at all.

Allow me one more piece of evidence, if you will, though this is merely anecdotal coming from my dual experience as an author and peer reviewer. I am occasionally floored by the quality of peer review. I have had reviews say on the one hand that of course the paper should be accepted, since it comes from such a reputable group, and on the other reject out of hand papers based on the reviewers' profound lack of understanding of the methods employed. And lest I sound like a crybaby, let me say that I welcome a well-reasoned rejection. What I am talking about is not that. And this is not a surprise, since pretty much anyone can sign up to be a peer reviewer, since, to the best of my knowledge, there is no set of qualifications that journals ask for in their reviewers. And this, so far as I know, applies even to such high caliber publications as JAMA.

So, these are my thoughts on evidence-based medicine. I welcome responses to this, as my understanding of this science is constantly evolving, and differing well-reasoned opinions really help me get a better handle on this stuff.

I will try to tackle my CAM argument next. If I in any way implied in my remarks that I encourage allopathic physicians (by the way, I am not using it in a derogatory way, but merely as it is defined here; in fact, until today I was blissfully unaware of its negative connotation) to be purveyors of CAM, I sincerely apologize. I am pretty sure, however, that I have never made such a statement, as this is not what I believe. My belief is that all modalities that may impact what happens to public's health need to be evaluated for safety, not question. I think we both agree, since there is really no reason to think that something like homeopathy has anything that can help, by the same token we do not believe that it have anything that can hurt. Same with healing crystals, reiki and prayer. So, if a person wants to engage in these activities, and they are perfectly safe physically, be my guest. Other modalities, such as chiropractic, acupuncture, herbalism and the like, definitely need to be evaluated more stringently, as there is reason to think that they may cause harm. And the decisions about their use must be made based on the probability of harm vs. the perceived probability of benefit. Why do I think that these should not be regulated the same way as allopathic treatments? Well, herbs grow naturally and I dare say we have little to say about what our patients grow in their back yards, unless of course the thought is to regulate them the way we do marijuana. As for chiropractic, it is already regulated, though to what extent I am not sure, and would love to hear from someone who knows. Since its techniques most resemble surgical interventions, the level of evidence for them should perhaps be the same as that which we demand in the surgical literature. This is just a thought, and I am not sure that I am correct in this, so other views are, as always, welcomed. What is coming through for me is that perhaps my call to equipoise was a little over the top, as I do not seem to be approaching the above CAM issues in a frequentist, but more in a Bayesian way (though I remain committed to equanimity). Yet, there is something to be said about the frequentist approach, even though it is not my way generally. The frequentist approach, which is what underlies the bulk of our traditional clinical research, does not rely on differential prior probabilities for different possible associations, but treats them all equally. Despite many disadvantages, one obvious advantage is that we do not discount potential associations that do not have biologic plausibility, given our current understanding of biology, and sometimes help us stumble on brand new hypotheses. So, clearly, there is a tension here, and I am still working on what is the better way, if any.

My final words will be about vaccination. It is disheartening to be lumped with "anti-vaxers", as has been done in the comments to Dr. Novella's and Orac's posts. While my bruised ego will survive this insult, I would like to question this assertion. Nowhere have I said that vaccinations are a bad idea or present a real danger to our children. The hype surrounding the vaccination-autism "debate" is abhorrent to me. What I have stated, however, is that I am of the opinion that we have gone a bit overboard with some of them, one being the chicken pox vaccine. Now, this does not make me an "anti-vaxer"; this just makes me a bit skeptical. The way I view the data is that the advantages for this vaccine are mostly economic, in that they prevent parents from missing days at work. Now, I am certainly not opposed to making such a vaccine available to parents who desire it, but I am not convinced that it should be a prerequisite for my kid to go to school. Given that there is always a possibility of an adverse reaction, no matter how small that possibility is, if the risk of it may outweigh the benefit (and here I do not mean the benefit of having mom show up at work), it has to be weighed very carefully. And even though the question asked by one of the comments raises the issue of an immunocompromised child worrying about potentially being exposed to chicken pox, given the known issues with breakthrough disease, I am not sure that immunizing all of his/her contacts would produce anything other than a false sense of security. My sentiment about the HPV vaccine is similar, to respond to another comment. For reasons laid out here and elsewhere on my blog, I am pretty convinced that it would be a compete subversion of the intent of vaccinations to make it into another mandated shot. To date HPV is merely a recommendation, whose validity I am free to question, though last I heard there was movement afoot to make it mandatory. If in your informed opinion your daughter should get vaccinated against HPV, well then I have very little to say about it. But if I were to counsel an individual patient in the context of my understanding of the data, I would be very upfront with my view.

To me the fact that there are such heated debates about this stuff is a testament exactly to how NOT straightforward our science is. I do understand that as a researcher I can afford a certain amount of analysis paralysis that is unacceptable at the bedside. However, I think we (and the press) do a disservice to the patients, to ourselves and the science if we are not upfront about just how uncertain much of what we think we know is. I could not have said it better than this story about Dr. Devereaux's presentation at the recent ASA meeting did here:

It would be nice if we could all agree that science is not static, but rather progresses and regresses. We learn, and then find out that some of what we thought we had learned was wrong, and set about using that information to seek the next level of truth. Repeat, ad infinitum. Personally, I’d love it if my doctors couched every bit of advice with, “Here’s what we think we know today.”
But I suspect that wouldn’t sit well with many patients, who want certainty (as if there is such a thing). And it especially seems like a difficult proposition in our contentious society, where anti-science nay-sayers like to jump on contradictory findings to challenge the basic value of science overall.

That is exactly NOT what I am trying to do. I am merely reflecting on many of the issues that threaten the validity of what we think we know. I am confident that disclosure and transparency not only lead to better science, but they also lead to science that can withstand the test of nay-sayers.

I have come to the end. I am not sure that I have addressed each and every one of the criticisms, though I hope that I have addressed the majority; I am sure you will point out what I missed. A couple of things about comments: Tomorrow I will be in the air most of the day and may not have the opportunity to sign on to approve comments. So, please, if your comment does not go up until Wednesday, do not think that it has been rejected. Also, I really would like to keep it civil and, though I did not apply this rule today, I will not accept any overt insults or name calling from either side of the debate.

I am sincerely looking forward to continuing this discussion!

Dr. Steve Novella responds

Excellent response by Dr. Steve Novella to my paternalism in science-based medicine posts. He does such a great job cherry-picking my arguments that he finally concludes "Zilberberg’s position is anti-science, although perhaps not deliberately so".

But this is my favorite quote, the final paragraph of the post (emphasis mine):

One final note – I would much prefer to have a conversation with the critics of science-based medicine that does not constantly involve defending SBM and myself from false accusations of arrogance and paternalism. I think it says a lot about their intellectual position when that is constantly the best they have.

If the shoe fits?

But seriously, he does make some interesting points. He oversimplifies a lot of the science (HTE is really not "medical student stuff"), and, while admittedly not straightforward, absence of evidence difference from evidence of absence is important to understand.

Anyhow, I am eager for my readers to go and read this response and continue this discussion. Let us keep it civil and informative. In the end, it is possible that we may come to some mutual understanding, no?

Sunday, October 24, 2010

Restoring medicine's social contract: Lessons from Dan Ariely

One of the treats I look forward to when traveling is buying a pop science book at the airport and devouring it during my flights. It provides a welcome change from my daily immersion in the highly technical lingo of my profession. Usually I am drawn to single verb titles, such as “Blink”, “Nudge”, Bonk”. But today was different, as I succumbed to the expository wiles of my favorite behavioral economist Dan Ariely and his “Predictably Irrational”. It is a great read for anyone who still holds on to the Cartesian illusion that man is the most rational being on our planet. So, there will be a few posts inspired by what I have learned, or rather how my thinking about certain aspects of our existence has been enriched by Ariely’s work.

A few words about behavioral economics. It is a field born not so long ago from the work of Nobel laureates Daniel Kahneman and Amos Tversky, who applied cognitive psychology to how we make decisions under the conditions of risk. Traditional neoclassical economic theory maintained (and still does to a large extent) our rationality with vigor, yet something was amiss. If we were so rational, why did some of our most basic choices seem not so?

The current generation of behavioral economists, like Ariely, sits firmly at the nexus of economics and behavioral sciences, and through creative and telling social experiments are extending our knowledge of human nature not only in the economic sense, but in our social and professional interactions as well. “Predictably irrational” is geek brain candy, in which the author describes his personal and academic quest with poignancy and self-deprecating humor. His journey started abruptly with a devastating accident that left him with burns on 70% of his body. Through the ensuing three years of recovery he suffered many painful moments, leading him to question the assumptions made by his experienced clinicians about pain trade-offs. This tragedy turned into a boon for the rest of us by thrusting his creative mind into this exciting and burgeoning field.

I will not try to talk about each and every concept in his book; for this I urge you to purchase it yourselves – it is a terrific read. I do, however, want to apply some of his ideas to what I know and care about, and that is our healthcare system.

One aspect of Ariely’ research has focused on understanding the nuances of what he refers to as social vs. market interactions. That is, social interactions being ones where we interact without any overt financial incentives, while market interactions explicitly involve an exchange of money. One fascinating experiment Ariely and coworkers performed illustrated several aspects of these interactions. This experiment took place at a daycare center in Israel, where at baseline the parents occasionally would be late picking up their children without any penalty other that the guilt associated with keeping the teacher late. To discourage this behavior, a scheme was imposed, whereby the parents were allowed to be late, but at a monetary cost commensurate with their lateness. The change in their behavior was profound, wherein, instead of the desired effect of reduced lateness, they now felt entitled to be late as they were now engaged in a commercial relationship justifying this behavior. Even more striking was the effect of reversing the decision and going back to the guilt as the driver of the desired behavior: the parents’ lateness behavior of the market interaction persisted even into the realm of what had now reverted to a social contract. Ariely sees this as an illustration of the difficulty with restoring a social context to what had been a market interaction, however briefly or temporarily.

Other examples of this distinction are described, where companies that try to promote social interaction with their employees are better off using gifts rather than checks or cash to incentivize performance and loyalty. Although the difference between such vehicles seems subtle, as it turns out cash invariably turns things in the direction of market interaction, fostering a sense of entitlement that overwhelms good will.

Let us now apply this to medicine in the US. Over the last century medicine went from a community-based cottage industry to a multi-trillion dollar behemoth. Back then it was not unusual for the doctor in the community to service his patients’ needs not for a formal payment, but for a chicken or a sac of flour. So, in a sense, although not exactly gifts, these ways of payment avoided the formal exposure to dollars and cents. And this perhaps maintained the relationship as a hybrid of market and social interaction.

Today, the idea of bartering for medical services is laughable. The bureaucracy supported by healthcare financing, viewed by many pundits as a reliable engine of economic growth, would not do well with such a model. And as there is more and more scrutiny of individual clinician performance, there is also more and more talk about rewards and punishments, both of which take monetary shape. Extending the social vs. market dichotomy to this phenomenon, it becomes obvious why there is such rampant discontent in the ranks of the physicians in particular: when money is the only feedback, the amount becomes a proxy for the worth of the service, and even one’s sense of self-worth. Once this interaction is set up, the amount of the payment will determine not only what kind of a service is provided, but how appreciated the clinician feels. No wonder many are so cynically eschewing Medicaid patients, and threatening to do the same with Medicare as their reimbursements dwindle.

But imagine for a moment a different system of healthcare. In this system the doc does not need to worry about maximizing her income. In this system there is a well-established floor and a well-established ceiling of salaries, with not too much distance in between. Having taken this financial wild card out of the interaction, the clinician’s reward can now become the practice itself. If reimbursement no longer serves as the proxy for one’s worth, perhaps the therapeutic relationship can now enter the realm of a social, rather than market, contract. In this construct, the clinician can focus on achieving good relationships with her patients, and on promoting good health outcomes.

Am I oversimplifying? After all, how does such a system ensure quality and efficiency? Well, again, it is possible that rewards can come in the form of restored social status rather than zeros on a check. You will concede that there is a misperception that physicians are paid too much for what they provide. The same field of behavioral economics tells us that these judgments are not made in a vacuum. Perhaps we are setting up an erroneous comparison group. Do Fortune 500 CEOs get paid too much? I think so, yet, we are in no way ready to limit their looting practices with regulation. Yet those executives’ incomes would make any self-respecting doctor blush. As a corollary, are these executives providing a vital service to the society? We can certainly argue about this, but there is no argument whether docs do. So, how much is too much? You tell me. My only point is that if we set a reasonable salary for our physicians (and reasonable would be defined not by government and industry bureaucrats in isolation from the physician community, but arrived through vigorous and transparent debate) then these highly skilled individuals can go back to prioritizing their therapeutic relationships above all other incentives, thereby regaining their own sense of social self-worth and their status within the community.

You may say that I am naïve in thinking this. I do not think so, especially having understood better our predictably irrational ways. What can I say, even the name of the phenomenon, predictable irrationality, suggests that this scenario, so implausible to some, may be the very remedy that we all need to restore medicine's social contract.

Saturday, October 23, 2010

Top 5 this week

#5. What-based medicine?
#4. Disruptive innovation in healthcare: Overcoming HTE
#3. "The illusion of certainty"
#2. Lies and more lies: Are all lies created equal?

Drum roll, please...

#1 post this week is: Paternalism of "science-based" medicine

Friday, October 22, 2010

Comparative effectiveness 101

I have to say I do not understand the opposition to including costs into the comparative effectiveness research (CER) equation. Perhaps I simplify too much, but here is how I think about it.

Comparing two therapies is like using a microscope to focus at different levels of depth. Starting from the lowest magnification, we can ask "do they both work?" Of course in order to make this question answerable with data, we need to define what we mean by "work". Once we have defined that, the question becomes whether or not both comparators produce an effect in the same (desired) direction. If they do not, then the comparison can stop here, as the one with the positive direction of effect wins out. If they do both produce a desired effect, then we focus on the magnitude of that effect. There are several ways to do this, including comparison of 1). the effect size and variability of each to one another, 2). the proportion of patients who achieve a certain threshold of response (aka response rates), and 3). the adverse events frequency and severity. If these parameters are identical between the two, then the decision clearly hinges on the cost. Why is this so hard to accept? We would not want to pay more money for an identical car, so why would we pay more for a drug?

Of course, life is never quite this simple. Most of the time we will find small differences that are amplified by marketing messages as the reason to prefer a particular therapy. And this is all fine and good, and it is OK if a patient prefers one to the other because of some of these subtle differences. The question here becomes "how much are we willing to pay for a unit of this difference?" And unless the patient herself is willing to write the check, this is the pivotal issue facing our society today in the realm of the healthcare debate. So far our politicians have refused to do the real math, and are pandering to the opinion that we can pay for whatever we want in healthcare. This stance resonates with the public, terrified of the fictional death panels, and, more importantly, with the business of medicine, as this approach provides ample fuel for this engine of economic growth. But does it really improve our health, the primary goal of healthcare? I think not, just look at the state of our health and compare it to the rest of the world. And secondly, what is it doing to our national budget?

I will concede that I have simplified a complex issue. But not that much, believe it or not. With a modicum of math literacy people can easily wrap their brains around these concepts and make their own decisions, rather than being manipulated by disingenuous obfuscations of politicians concerned more with stuffing their coffers with corporate money than looking out for the well-being of the nation.

Thursday, October 21, 2010

Lies and more lies: Are all lies created equal?

The Atlantic article about John Ioannidis' research has sparked a live debate about the trustworthiness of much of the evidence generated in science. Much of what is referenced is his 2005 PLoS paper, where, through a modeling exercise, he concludes that a shotgun approach to hypothesis generation is the formula for garbage-in garbage-out data. The Atlantic article compelled me to search out the primary paper, and, despite its dense language, get though it and see what it really says. Below is my attempt at synthesis and interpretation of the salient points. My conclusion, as you will see, is that not all lies are in fact created equal.

As if the title were not incendiary enough, “Why Most Published Research Findings Are False”, Ioannidis, in this 2005 oft-cited PLoS paper, goes on to provoke further with

“There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims [6–8]. However, this should not be surprising. It can be proven that most claimed research findings are false.”

He develops a model simulation to prove to us mathematically that this is the case. But first he rightfully criticizes the fact that we do not put value in duplicating prior findings:

“Several methodologists have pointed out [9–11] that the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05. Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles should be interpreted based only on p-values.”

He then briefly touches upon the fact that negative findings are also of importance (this is the huge issue of publication bias, which gets amplified in meta-analyses), but forgoes an extensive discussion of this in favor of explaining why we cannot trust the bulk of modern research findings.

“As has been shown previously, the probability that a research finding is indeed true depends on the prior probability of it being true (before doing the study), the statistical power of the study, and the level of statistical significance [10,11].”

He then uses the tedious, yet effective and familiar lingo of Epidemiology and Biostatistics methods to develop his idea:

“Consider a 2 x 2 table in which research findings are compared against the gold standard of true relationships in a scientific field. In a research field both true and false hypotheses can be made about the presence of relationships. Let R be the ratio of the number of “true relationships” to “no relationships” among those tested in the field.”

So, let’s look at something that I know about – healthcare-associated pneumonia. We, and others, have shown that administering empiric antibiotics that do not cover the likely pathogens within the first 24 hours of hospitalization in this population is associated with a 2-3 increase in the risk of hospital death. So, the association is antibiotic choice and hospital survival. Any clinician will tell you that this idea has a lot of biologic plausibility: get the bug with the right drug and you improve the outcome. It is also easy to justify based on the germ theory. Finally, it does not get any more “gold standard” than death. We also look at the bugs themselves to see if some are worse than others, some of the process measures, as well as how sick the patient is, both acutely and chronically. Again, it is not unreasonable to hypothesize that all of these factors influence the biology of host-pathogen interaction. So, again, if you are Bayesian, you are comfortable with the prior probability.

The next idea he puts forth is in my opinion the critical piece of the puzzle:

“R is characteristic of the field and can vary a lot depending on whether the field targets highly likely relationships or searches for only one or a few true relationships among thousands and millions of hypotheses that may be postulated.”

To me what this says is that the more carefully we in any given field define what is probable prior to torturing the data, the more chance we have of being correct. Following through on his computation he derives this:

“Since usually the vast majority of investigators depend on α = 0.05, this means that a research finding is more likely true than false if (1 − β)R > 0.05.”

So, given the conventional β of 0.8, the R has to be 0.25 or greater to meet this threshold. That is 1 of 4 every 4 hypothesized associations in a given field must be correct. This does not seem like and unreasonable proportion if we are invoking a priori probabilities instead of plunging into analyses head first. In the field of genomics, as I understand it, the shotgun approach to finding associations certainly would alter this relationship in favor of a very high denominator, thus making the probability of a real association much lower.

Do I disagree with his assertions about multiple comparisons, biases, fabrication, etc.? Of course not – these are well known and difficult to quantify or remedy. Should we work to get rid of them with more transparency? Absolutely! But does his paper really mean that all of what we think we know is garbage, based on his mathematical model? I do not think so.

As everyone who reads this blog by now knows, I do not believe that we can ever arrive at the absolute truth. We can only inch as close as our methods and interpretations will allow. That is why I am so enamored of Dave deBronkart’s term “illusion of certainty”. I do, however, think that we need to examine these criticisms reasonably and not throw they baby out with the bath water.

I would be grateful to hear others’ interpretations of the Ioannidis’ paper, as it is quite possible that I have missed or misinterpreted something very important. After all, we are all learning.