Wednesday, September 29, 2010

Disruptive innovation in healthcare: Overcoming HTE

I have been working on a talk for the American College of Chest Physicians (Chest) annual meeting. The session is on alternative research study designs, and I chose to talk about the N of 1 trials. I think that you can get an idea of why I chose this topic from reading my previous posts here, here and here. Doing my research, I came upon a term "heterogeneous treatment effect", or HTE, that is well worth exploring. I think that every clinician who has ever seen patients is familiar with this effect, but let us trace its explanation.

This excellent article in the UK Independent summarizes the premiss with some scathing comments from none other than GSK's chief geneticist, Dr. Allen Roses:
"The vast majority of drugs - more than 90 per cent - only work in 30 or 50 per cent of the people," Dr Roses said. "I wouldn't say that most drugs don't work. I would say that most drugs work in 30 to 50 per cent of people. Drugs out there on the market work, but they don't work in everybody."
There is even a table presented with response rates by therapeutic area, though the reference(s) is(are) not cited, so, please, take with a grain of salt:
Response rates
Therapeutic area: drug efficacy rate in per cent
  • Alzheimer's: 30
  • Analgesics (Cox-2): 80
  • Asthma: 60
  • Cardiac Arrythmias: 60
  • Depression (SSRI): 62
  • Diabetes: 57
  • Hepatits C (HCV): 47
  • Incontinence: 40
  • Migraine (acute): 52
  • Migraine (prophylaxis): 50
  • Oncology: 25
  • Rheumatoid arthritis: 50
  • Schizophrenia: 60
In essence, what Dr. Roses was referring to is the phenomenon of HTE, described aptly by Kent et al. as the fact that "the effectiveness and safety of a treatment varies across the patient population". The authors preface it by saying that 
Although “evidence-based medicine” has become the dominant paradigm for shaping clinical recommendations and guidelines, recent work demonstrates that many clinicians’ initial concerns about “evidence-based medicine” come from the very real incongruence between the overall effects of a treatment in a study population (the summary result of a clinical trial) and deciding what treatment is best for an individual patient given their specific condition, needs and desires (the task of the good clinician). The answer, however, is not to accept clinician or expert opinion as a replacement for scientific evidence for estimating a treatment’s efficacy and safety, but to better understand how the effectiveness and safety of a treatment varies across the patient population (referred to as heterogeneity of treatment effect [HTE]) so as to make optimal decisions for each patient.
Ah, so it is not your imagination: when someone brings an evidence-based guideline to you, and insists that unless you comply 95% of the time, you are providing less than great quality of care, and you say "this does not represent my patients", you are actually not crazy. To be sure, a good EBPG will apply to most patients encountered with the particular condition. But the devil, of course is in the details. As I have already pointed out, we impose statistical principles onto data to whip it into submission. When we do a good job, we acknowledge the limitations of what measures of central tendency provide us with. But so much of the time I see physicians relying on the p value alone to compare the effects, that I am convinced that the variation around the center is mostly lost on us. And further, how does this variance help a clinician faced with an individual patient who has at best a probability of response on some continuum of a population of probabilities? And more importantly, what will this individual patient's risk-benefit balance be for a particular therapy?

I think what I am walking away with after thinking about this issue is that it is of utmost importance to understand what kind of data have gone into a recommendation. What is the degree of HTE in the known research, and specifically, what is known about the population that your patient represents. The less HTE and the more knowledge about the specific subgroups, the more confident you can be that the therapy will work. Ultimately, however, each patient is a universe onto herself, since no two people will share same genetics, environmental exposures, chronic condition profile or other treatments, to name just a few potential characteristics that may impact response to therapy.

This is the reason that we need better trials, where people are represented more broadly, leading to an increase in external validity. To make this information useful at the bedside, we need a priori plans to analyze many different subgroups, as that will give clinicians at least some granularity so desperately needed in the office. And while pharmacogenomics may be helpful, I am sure that it will not be the panacea for reducing all of this complexity to zero.

Until technology gives us a better way (assuming that it will), where possible, a systematic approach to treatment trials should be undertaken. Later I will blog about N of 1 trials, which, though not appropriate in every situation, may be quite helpful in optimizing treatment in some chronic conditions. With the advent of health IT, these trials may become less daunting and, in aggregate, provide some very useful generalizable information on what happens in the real world. Each clinician will need to take some ownership in advancing our collective understanding of the diseases s/he treats. This may truly be the disruptive innovation we are all looking for to improve the quality of care not just to please the bureaucrats, but to promote better health and quality of life.      

 



Monday, September 27, 2010

Does disproving a statistical null automatically render the clinical null disproved?

A good friend of mine lost her mother to pancreatic cancer recently. The whole process from diagnosis to her death took 6 weeks. And despite wonderful care from a palliative medicine team, the process proved grueling to her family. And no wonder: how do you assimilate a loved one's going from healthy to dead in six weeks? Of course, my friend's family made all the right choices, forgoing aggressive treatment in favor of maximizing their mother's comfort and quality of life. Their experience made me think of the new generation of cancer treatments, experienced by my father in his dying days, and how it all fits in the healthcare debate. 

Much of the debate around the healthcare reform has centered around the unsustainable cost trajectory, as well as the value of evidence in improving the quality and effectiveness of care. The Congress has appropriated a substantial amount of dollars for comparative effectiveness research (CER) intended to provide data comparing one treatment to another, rather than just a treatment to a placebo. This in turn has sparked the debate as to whether to include cost-effectiveness in this comparison. I am definitely in the camp that believes it should be included for the reasons so eloquently outlined by Weinstein and Skinner. At the same time this MSNBC article made me re-evaluate its need, at least in some special cases. Let's peel back the cabbage, specifically looking at Tarceva in advanced pancreatic cancer, exactly what my friend's mother was diagnosed with.

Tarceva, or erlotinib, is a kinase inhibitor manufactured by Genentech, indicated for the treatment of some cases of non-small cell lung carcinoma, and most recently approved by the FDA for advanced pancreatic cancer. Reading the package insert, it becomes clear that the FDA-approved 100 mg dose of this drug, if given in combination with gemcitabine, prolongs median survival by <2 weeks, from 6 months in gemcitabine+placebo arm to 6.4 months in the gemcitabine+Tarceva arm, for a p=0.028. This difference imparts statistical significance at the conventionally set p<0.05, and therefore renders Tarceva better than placebo. Period. 

Delving a tad more deeply into the peer-reviewed publication of the phase 3 trial, one gleans a few other facts. I quote:
Survival and Response

The final analysis was conducted after 486 deaths (239 on erlotinib and gemcitabine and 247 on placebo and gemcitabine). Overall survival was significantly longer in the erlotinib and gemcitabine arm with an estimated HR of 0.82 (95% CI, 0.69 to 0.99; P = .038; log-rank test stratified for performance status, extent of disease, and pain score at baseline; Fig 1A). Median survival times were 6.24 months versus 5.91 months for the erlotinib and gemcitabine versus placebo and gemcitabine groups with 1-year survival rates of 23% (95% CI, 18% to 28%) and 17% (95% CI, 12% to 21%), respectively (P = .023). A multivariate Cox regression analysis showed that erlotinib treatment (HR, 0.82; 95% CI, 0.69 to 0.99; P = .04) and female sex (P = .03) were significantly associated with longer overall survival. While there was an imbalance in male:female ratio between the arms, the treatment effect remains significant when adjusted for sex.
F1.large.jpg
 Results of subgroup analyses of survival by baseline stratification factors and other factors such as sex, race, pain intensity score, and age are displayed in Figure 2.
F2.large.jpg

Progression-free survival was significantly longer in the erlotinib and gemcitabine arm than the placebo and gemcitabine arm with an estimated HR of 0.77 (95% CI, 0.64 to 0.92; P = .004; log-rank test stratified for performance status, extent of disease, and pain score at baseline; median, 3.75 months v 3.55 months; Fig 1B).
So, what do we have overall? We have a hazard ratio of dying that very nearly crosses 1.0, thus coming perilously close to not disproving the null; we have a prolongation of median survival by 1/3 of a month, and a progression-free median survival prolongation by 1/5 of a month.

But given my fondness for Gould's "The Median is not the Message" essay, let's practice full disclosure and look at the tail of the Kaplan-Meyer curves above. As the text points out,
...1-year survival rates of 23% (95% CI, 18% to 28%) and 17% (95% CI, 12% to 21%), respectively (P = .023).
Indeed, these are significant differences, both statistically and clinically. Within the trial this represents the difference in favor of survival for 18 additional patients, thus rendering the cost of roughly $1.5 million for 1 year of life saved by my back-of-the envelope calculation. Of course, this difference is not adjusted for confounders, so it is difficult to say of the number is real of under- or over-estimated. Because the adjusted analysis is given as a hazard ratio of death, I cannot calculate the corresponding adjusted cost.

So, for me this begs the following question (and I would love to hear the thoughts from my colleagues who proudly proclaim being science-based and thus eschewing placebo effect as a valid way to get a therapeutic response): Is what we are seeing here real or is this in fact equivalent to a placebo effect? Is the median survival prolongation of <2 weeks indeed a real effect that means something to the patient and the clinician, or is it just clinical noise, if you will? In other words, should "disproving" the statistical null by default disprove the clinical null? Or does the bar for disproving the clinical null need to be set just a tad higher than a statistically significant increase in life expectancy of 2 weeks?

Obviously, no one knows a priori who will do better and who will not. So, without a crystal ball, it is one's values and preferences that have to drive these decisions. But does the society have a say in any of this, as we struggle with equitable distribution of a limited resource? Is $1.5 million for 1 year of life a good societal investment? And what is the quality of this life? And given that at least 1/2 of all treated patients get far less than extra 2 weeks of life, how do we strike the balance between a reasonable expectation of a response and a false hope?

My friend's family based their choices on their mother's wishes and their values and utilities for her comfort. The choice they made was very different from that made by my parents with regard to my father's palliation. Both were right for the respective families. But one may have been far too costly, both financially and emotionally.    

Wednesday, September 22, 2010

Is VAP prevention woo science?

My students and I are continuing with the VAP theme, this week exploring the development and implementation of evidence-based practice guidelines (EBPG) through the ATS/IDSA HAP/VAP/HCAP guideline. To continue what we started last week, we are specifically talking about VAP prevention. In this EBPG, there are over 20 suggested maneuvers to prevent VAP, most of them based on level I or II evidence. One lively thread in the discussion deals with the logistics of implementing so many interventions. The complexity of codifying and introducing durably these many processes was duly acknowledged. And although I used the word "bundle" once so far, I have not alluded yet to the IHI's effort to simplify the process. As some of you know, I have been somewhat critical of our current CMS Administrator's approach to quality and safety improvements, and I have even engendered the wrath of some colleagues by publishing this evidence-driven criticism in a review paper (yes, quoting myself again):
    Given that the MV bundle simply represents a conglomeration of some of the recommended practices, it is still important to evaluate how it performs as a VAP preventive strategy for several reasons. First, for example, it is possible that one or two of the interventions chosen to be included into the bundle drive most of the VAP preventive benefit. If this is the case, then it may be inefficient to include the remaining elements, as they may divert the necessary implementation resources from the elements that truly matter. One example of seemingly simple, cheap, yet rarely attainable goal is compliance with head of the bed elevation. Some studies have indicated that a variety of reasons preclude this goal from being achieved 85% of the time, and even call its effectiveness into question (31). Understanding which elements of the bundle drive improvements in which populations may deprioritize head of the bed elevation as a goal to be achieved across the board. Alternatively, it may help to make a more forceful argument to improve compliance with this recommendation. Second, other evidence-based recommendations included in the EBPG, but not the bundle, may impart a greater magnitude of VAP prevention, thus once again making the current approach inefficient. For example, some educational strategies incorporating more broadly the EBPG recommendations have been demonstrated to effect a substantial reduction in the rates of VAP (27, 32). Third, neither the expenditures associated with building the infrastructure for bundle implementation nor the potential return on such investment has been explicitly quantified. In general, the recent disappointing results of two meta-analyses of studies evaluating the impact of a rapid response team on hospital outcomes should serve as a cautionary note for adoption of any new process, even one with a great deal of face validity, that has not undergone rigorous testing as a whole (33, 34). More importantly, in the absence of such rigorous validation scoring requiring nearly complete compliance with these processes (e.g., 95% compliance advocated in the case of the MV bundle) as a quality measure would be misguided.
The 95% compliance refers to the IHI's stipulation that an institution reach this level of implementation of all the components of the bundle in order to be considered compliant.

Although several studies out there have demonstrated that by applying a group of evidence-based preventive strategies at least some cases of VAP can be prevented, none has addressed the potential hierarchy of or interactions between the components. What is clear from the literature, however, is that a concomitant educational effort is necessary to make the guideline stick.

And how much of the effect is actually due to Hawthorne effect rather than the result of the specific intervention? I could of course argue that Hawthorne effect is not something to avoid if it helps reduce VAP rates, but then I might get accused of advocating "woo". Perish that thought!

     

Friday, September 17, 2010

VAP: A case of mistaken identity?

This week in my class we are talking about systematic reviews and meta-analyses. As in the past, I assigned this excellent example published a couple of years ago by Canadian friends:
BMJ. 2007 Apr 28;334(7599):889. Epub 2007 Mar 26.

Oral decontamination for prevention of pneumonia in mechanically ventilated adults: systematic review and meta-analysis.

Department of Nursing Services, Tan Tock Seng Hospital, Singapore. ee_yuee_chan@ttsh.com.sg
Comment in:

Abstract

OBJECTIVE: To evaluate the effect of oral decontamination on the incidence of ventilator associated pneumonia and mortality in mechanically ventilated adults.
DESIGN: Systematic review and meta-analysis.
DATA SOURCES: Medline, Embase, CINAHL, the Cochrane Library, trials registers, reference lists, conference proceedings, and investigators in the specialty.
REVIEW METHODS: Two independent reviewers screened studies for inclusion, assessed trial quality, and extracted data. Eligible trials were randomised controlled trials enrolling mechanically ventilated adults that compared the effects of daily oral application of antibiotics or antiseptics with no prophylaxis.
RESULTS: 11 trials totalling 3242 patients met the inclusion criteria. Among four trials with 1098 patients, oral application of antibiotics did not significantly reduce the incidence of ventilator associated pneumonia (relative risk 0.69, 95% confidence interval 0.41 to 1.18). In seven trials with 2144 patients, however, oral application of antiseptics significantly reduced the incidence of ventilator associated pneumonia (0.56, 0.39 to 0.81). When the results of the 11 trials were pooled, rates of ventilator associated pneumonia were lower among patients receiving either method of oral decontamination (0.61, 0.45 to 0.82). Mortality was not influenced by prophylaxis with either antibiotics (0.94, 0.73 to 1.21) or antiseptics (0.96, 0.69 to 1.33) nor was duration of mechanical ventilation or stay in the intensive care unit.
CONCLUSIONS: Oral decontamination of mechanically ventilated adults using antiseptics is associated with a lower risk of ventilator associated pneumonia. Neither antiseptic nor antibiotic oral decontamination reduced mortality or duration of mechanical ventilation or stay in the intensive care unit.
The meta-analysis looks at effectiveness of oral care in preventing VAP. Of interest was the overall finding that VAP could indeed be prevented, but preventing it altered neither mortality nor such hospital utilization parameters as duration of mechanical ventilation (MV) or ICU length of stay (LOS).

The study has precipitated a vigorous discussion in class. I will excerpt below some of my responses to the students' questions (all right, so I feel a little tacky quoting myself, but perish the thought I should be accused of plagiarizing anyone, even myself).

One of the students brought up CMS never events (you know, those hospital-acquired conditions that CMS will no longer pay for because they should never happen), and presented me with an opportunity to talk about the subtleties of VAP diagnosis within that context:  

I could not agree more that prevention is critical. The question of whether we can prevent VAP 100% of the time is a little more complicated, however. For one, we are not even sure how to diagnose VAP. Applying the CDC's surveillance definition results in rates of VAP that are quite different from invasive diagnostic testing data. Applying the same definitions to different populations results in rates that are vastly different. Furthermore, diagnostics are driven by somewhat arbitrary thresholds for bacterial counts that may not have the greatest sensitivity or specificity. So, when you are dealing firstly with the wild west of the patient and disease interaction, then add the muddy diagnostic issues to the stew, and season everything with variable processes of care, the issue to me, at least, becomes a little less straight forward. 
Then, when I asked them what is the use of preventing VAP when it does not impact such important outcomes as mortality or LOS, I got some great answers, appropriately ranging from "you are full of s**t" to "OK, my intuition tells me VAP is good to prevent, but here we have no reason for it". Some even referred to these outcome as patient-oriented, so this was a great teachable moment. So, I responded, reiterating:
I am personally a great believer in prevention, but a). it has to be sensible prevention and not just a convenient conglomeration of poorly tested modalities, and b). not everything can be prevented. There are a couple of points to make here.
CMS has not curtailed payment for VAP for the reasons that I outlined above -- what exactly constitutes VAP, what are the best preventive strategies, and exactly how good are they. CMS, on the other hand, HAS stopped paying for completely preventable errors (yes, there are such things), such as leaving instruments inside patients during surgery or administering the wrong unit of blood. These are process errors for which zero tolerance is reasonable. 
Now, on to VAP. From the Chan MA we are under the impression that there is no reason to prevent VAP, since there is no difference in patient-oriented outcomes. First, let me challenge your notion that LOS and MV duration are patient-oriented outcomes. I would argue that the patient cares less about this than about comfort, quality of life, post-critical illness functional status and the development of PTSD after the ICU stay, to name a few. These are rarely measured in RCTs. So, even if the LOS and mortality are not altered by VAP prevention, there may be other perfectly valid reasons to prevent it. Not to mention curbing the use of antibiotics to curtail the spread of resistance. 
One final point to make. Each of the studies was not powered for mortality difference. Having said that, combining the data into a very respectable total number of >3,000 patients analyzable for mortality, this should have been enough to at least show an important trend, if there was one. And in fact the VAP literature is fraught with controversy on whether VAP imparts attributable mortality or not. The LOS issue is even more complex, however. Because LOS is an infinitely variable outcome, an RCT powered to capture this difference would have to be enormously large (a measly 2000 patients would not do). However, the epidemiologic and outcomes literature abounds with data on attributable LOS and $$ due to VAP.
So, here is a valuable MA that shows that there are sound strategies to prevent at least some cases of VAP, but by implication does not justify the effort for its prevention. Here is a situation, where policy requires expert analysis. 
Bottom line, MAs are useful and dangerous at the same time. Their results, if not examined carefully and in context can worsen rather than improve care. And the MA that I assigned is actually of great quality!
I posted this to underline how tricky applying evidence can be. Particularly in an area where there is so much diagnostic confusion. This of course does not mean that we should not strive to understand things better. On the contrary, this calls for more integration of knowledge in a multidisciplinary fashion.

Fools rush in where wise people fear to tread...  
 
  

Tuesday, September 14, 2010

Does a flip of the switch cause the bulb to light?

"When men are most sure and arrogant they are commonly most mistaken, giving views to passion without that proper deliberation which alone can secure them from grossest absurdities"
                                                                                             --David Hume

The seemingly absurd question in the title of today's post becomes much less absurd when reading Rothman and Greenland's paper in the AJPH Supplement from 2005 found here. This paper, which is really an abridged version of a chapter in their textbook Modern Epidemiology, is creating a lot of angst among my MPH students this semester. As well it should: the authors make a compelling case for not trusting any assumptions whatsoever, and thus seem to imply that every time we think we know something, think again.

First, they talk about multicausality of any phenomenon in human biology. Pouring water on the dry ice of our cognitive abilities, they delight in vaporizing our most coveted ideas, that, for example, contributions of all causes to a condition cannot add up to more that 100%:
How can we attribute 75% of the cases to smoking and 67% to alcohol drinking among those who are exposed to both? We can because some cases are counted more than once. Smoking and alcohol interact in some cases of head and neck cancer, and these cases are attributable both to smoking and to alcohol drinking. One consequence of interaction is that we should not expect that the proportions of disease attributable to various component causes will sum to 100%.
Their discussion of causal inference is quotable for the context it provides for the various approaches of scientific inquiry. I feel that ideas in this entire section, with its apt name, are completely lost in our day-to-day scientific discourse:
Impossibility of Proof
Vigorous debate is a characteristic of modern scientific philosophy, no less in epidemiology than in other areas. Perhaps the most important common thread that emerges from the debated philosophies stems from 18th-century empiricist David Hume’s observation that proof is impossible in empirical science. This simple fact is especially important to epidemiologists, who often face the criticism that proof is impossible in epidemiology, with the implication that it is possible in other scientific disciplines. Such criticism may stem from a view that experiments are the definitive source of scientific knowledge. Such a view is mistaken on at least two counts. First, the nonexperimental nature of a science does not preclude impressive scientific discoveries; the myriad examples include plate tectonics, the evolution of species, planets orbiting other stars, and the effects of cigarette smoking on human health. Even when they are possible, experiments (including randomized trials) do not provide anything approaching proof, and in fact may be controversial, contradictory, or irreproducible. The cold-fusion debacle demonstrates well that neither physical nor experimental science is immune to such problems.

Some experimental scientists hold that epidemiologic relations are only suggestive, and believe that detailed laboratory study of mechanisms within single individuals can reveal cause–effect relations with certainty. This view overlooks the fact that all relations are suggestive in exactly the manner discussed by Hume: even the most careful and detailed mechanistic dissection of individual events cannot provide more than associations, albeit at a finer level. Laboratory studies often involve a degree of observer control that cannot be approached in epidemiology; it is only this control, not the level of observation, that can strengthen the inferences from laboratory studies. Furthermore, such control is no guarantee against error. All of the fruits of scientific work, in epidemiology or other disciplines, are at best only tentative formulations of a description of nature, even when the work itself is carried out without mistakes.
They further proceed to decimate Hill's criteria for confirming causality. In fact, in their usual scholarly fashion, having gone back to the original source, the authors convey Hill's own ambivalence:
As is evident, the standards of epidemiologic evidence offered by Hill are saddled with reservations and exceptions. Hill himself was ambivalent about the utility of these "viewpoints" (he did not use the word criteria in the paper). On the one hand, he asked, "In what circumstances can we pass from this observed association to a verdict of causation?" Yet despite speaking of verdicts on causation, he disagreed that any "hard-and-fast rules of evidence" existed by which to judge causation: This conclusion accords with the views of Hume, Popper, and others that causal inferences cannot attain the certainty of logical deductions.
Indeed, they even poke gaping holes in Bayesian thinking, which I myself am quite partial to. And incidentally, I am a big fan of Hill's "criteria", as well.

So, what is the point of assigning this paper to my students early in the course that addresses evidence generation and evaluation in policy? The paper serves as a reminder to address our assumptions systematically and regularly. By its nature, our science is imprecise. While most physical and biological processes are subject to mathematical rules, clinical sciences rely heavily on statistics, which is simply mathematics with uncertainty introduced into it. This uncertainty magnifies with every assumption that we make, and for this reason we need to be quite circumspect when insisting that we know something. In this respect, our science is more similar to civil prosecutions, where preponderance of evidence is enough for a conviction, rather that the standard of "beyond the shadow of a doubt" in criminal law.

Now, let's bring this back to the real world of clinical practice, where we do not have the luxury to ruminate on these uncertainties. This is the true intent of EBM: understand what the preponderance of evidence tells us about a phenomenon at the level of the pertinent population, but do not forget that these are mere approximations of what really happens, some closer to and some further from reality. So, while systematically observed measures of central tendency should give us comfort in the current gestalt at the bedside, the uncertainty around them is an invitation to tailor our approach to the individual before us.      
  
  
  

Wednesday, September 8, 2010

"Clinical Decision Making: Evidence Is Never Enough"


A senior resident, a junior attending, a senior attending, and an emeritus professor were discussing evidence-based medicine (EBM) over lunch in the hospital cafeteria.

"EBM," announced the resident with some passion, "is a revolutionary development in medical practice." She went on to describe EBM's fundamental innovations in solving patient problems.
"A compelling exposition," remarked the emeritus professor.
"Wait a minute," the junior attending exclaimed, also with some heat, and presented an alternative position stating that EBM merely provided a set of additional tools for traditional approaches to patient care.
"You make a strong and convincing case," the emeritus professor commented.
"Wait a minute," the senior attending exclaimed to her older colleague, "their positions are diametrically opposed. They can't both be right."
  
The emeritus professor looked thoughtfully at the puzzled physician and, with the barest hint of a smile, replied, "Come to think of it, you're right too."

So begins the Users' Guide to Medical Literature issue #XXV, "Principles of Applying the Users' Guides to Patient Care", published 10 years ago, almost to the day in JAMA (JAMA 2000;13:284;1290-6). I am enjoying re-reading it, as I now appreciate the authors' evolving positions acknowledging the complexities of modern practice and the need to hold together apparent dualities. They acknowledge their continued commitment to the scientific rigor of EBM:
In 1992, in an article that provided a background to the Users' Guides, we described EBM as a shift in medical paradigms.2 In contrast to the traditional paradigm, EBM acknowledges that intuition, unsystematic clinical experience, and pathophysiologic rationale are insufficient grounds for clinical decision making, and stresses the examination of evidence from clinical research. The philosophy underlying EBM suggests that a formal set of rules must complement medical training and common sense for clinicians to effectively interpret the results of clinical research. Finally, EBM places a lower value on authority than the traditional paradigm of medical practice.
However, they point out that the complexities of the world often call for more than one way of approaching problems:
While we continue to find the paradigm shift a valid way of conceptualizing EBM, as the scenario suggests, the world is often complex enough to invite more than 1 useful way of thinking about an idea or a phenomenon. In this article, we describe the 2 key principles that clinicians must grasp to be effective practitioners of EBM. One of these relates to the value-laden nature of clinical decisions; the other to the hierarchy of evidence postulated by EBM. 
A few paragraphs into the paper, we come to a surprising heading, which serves as the title for today's post: "Clinical Decision Making: Evidence Is Never Enough". Wow, is this not a bit heretical, especially coming from the cradle of EBM? What do the authors mean by this? To illustrate this they develop three scenarios, each involving a patient with pneumococcal pneumonia, a disease for the treatment of which plenty of strong evidence exists. Yet, depending on the context of the patient's situation, treatment decisions may or may not be straightforward. Their examples of a terminal cancer patient who herself may forgo treatment, an elderly demented nursing home patient with no family, whose treatment decisions have to be made solely by the clinicians, and a 30-year-old mother of two, whom only Lord Voldemort would consign to non-treatment. The point being that evidence of effectiveness, in this case for the treatment of pneumococcal pneumonia, is necessarily applied to the particular circumstance at the bedside. In other words, some value judgments are employed to apply the existing evidence to a trade-off decision. Quantifying this process is far from a clear science:
Acknowledging that values play a role in every important patient care decision highlights our limited understanding of eliciting and incorporating societal and individual values. Health economists have played a major role in developing a science of measuring patient preferences.14-15 Some decision aids are based on theassumption that if patients truly understand the potential risks and benefits, their decisions will reflect their preferences.16 These developments constitute a promising start. Nevertheless, many unanswered questions concerning how to elicit preferences, and how to incorporate them in clinical encounters already subject to crushing time pressures, remain. Addressing these issues constitutes an enormously challenging frontier for EBM.
Switching gears somewhat to delve into some of the pragmatic issues, the authors talk about the hierarchy of evidence, acknowledging the superiority of systematic clinical observation over ad hoc anecdotal evidence as a tool to reduce common threats to inference validity. However, they loudly and clearly state that in practice one size does not fit all:
Clinical research goes beyond unsystematic clinical observation in providing strategies that avoid or attenuate the spurious results. Because few, if any, interventions areeffective in all patients, we would ideally test a treatment in the patient to whom we would like to apply it. Numerous factors can lead clinicians astray as they try to interpret the results of conventional open trials of therapy, which include natural history, placebo effects, patient and health worker expectations, and the patient's desire to please.
... and offer the following hierarchy of evidence for individual treatment decisions:



Table 1. A Hierarchy of Strength of Evidence for Treatment Decisions
So, for a patient encounter, the most robust kind of evidence is the N of 1 trial! Not too coincidentally, I came to the same conclusion here. With a few caveats, the authors confirm the feasibility of this undertaking:
N of 1 RCTs are unsuitable for short-term problems; for therapies that cure (such as surgical procedures); for therapies that act over long periods of time or prevent rare or unique events (such as stroke, myocardial infarction, or death); and are possible only when patients and clinicians have the interest and time required. However, when the conditions are right, N of 1 RCTs are feasible,24-25 can provide definitive evidence of treatment effectiveness in individual patients, and may lead to long-term differences in treatment administration.26
They go on to say that applying results based on group data may be fraught with erroneous assumptions:
When considering any source of evidence about treatment other than N of 1 RCTs, clinicians are generalizing from results in other people to their patients, inevitably weakening inferences about treatment impact and introducing complex issues of how trial results apply to individuals.
They again stress the importance of the individual encounter and making collaborative decisions in the context of the patient's values:
Thus, knowing the tools of evidence-based practice is necessary but not sufficient for delivering the highest-quality patient care. In addition to clinical expertise, the clinician requires compassion, sensitive listening skills, and broad perspectives from the humanities and social sciences. These attributes allow understanding of patients' illnesses in the context of their experience, personalities, and cultures.
They rightfully treat scientific evidence as the foundation of practice, but not the whole of it. Indeed, they conclude that
A continuing challenge for EBM, and for medicine in general, will be to better integrate the new science of clinical medicine with the time-honored craft of caring for the sick.
I really like the tempered tone of this paper, paying homage not only to the science, but also to the art of medicine. It is clear that the authors have evolved with the field, and if these giants can evolve, so can we. Clearly, in the art of medicine, "evidence is never enough".
    


  

   
   
      

Tuesday, September 7, 2010

Assume a spherical cow

You know that joke about the farmer whose cows are not producing enough milk? A university panel gathers under the leadership of a theoretical physicist. They analyze each aspect of the problem thoroughly and carefully, and after much deliberation produce a report, the first line of which is "First, assume a spherical cow in a vacuum". This joke has become short-hand for some of the reductionist thinking in theoretical physics, but it can just as easily personify the field of statistics, upon which we rely so heavily to inform our evidence-based practices.

Here is what I mean. Let us look at four common applications of statistical principles. First, descriptive statistics, usually represented by Table 1 in a paper, in which we are interested in understanding the measures of central tendency such as the mean and median values. I would argue that both measures are somewhat flawed in the real world. As we all learned, a mean is a bona fide measure of central tendency in a normal or Gaussian distribution. And furthermore, in such a distribution 95% confidence intervals are bounded by 2.5x the standard deviation from the mean. Although this is very convenient, few things about human biology are actually distributed normally, while many cluster to the left or to the right of the center creating a tail on the opposite side. For these skewed distributions a median is the recommended measure to be used, bracketed by the boundaries of the 25% to 75% range of the values, or the interquartile range. But in a skewed distribution it is exactly the tail that is its most telling feature, as described so eloquently and personally by Stephen J. Gould in his "The Median Isn't the Message" essay. This is especially true when more specific identification of characteristics precipitates the numbers dwindle; e.g., if instead of such general measures as blood pressure among all-comers you want to focus on morning blood pressures in such specific group as African-American males with a history of diabetes and hypercholesterolemia who attend a smoking cessation program, say. And this is important because at the level of the office encounter each patient is a universe of his/her own risk factors and parameters that does not necessarily obey the rules of the herd.  

Second, analytic modeling relies on the assumption of normality. To overcome this limitation we transform certain non-normal distributions to their log forms, for example, to normalize them in order to force them to play nicely. Once normalized, we perform the prestidigitation of a regression, reverse-transform the outcome to its anti-log, and, voila, we have God's own truth! And even though statisticians tend to argue about such techniques, in the real world of research, where evidence cannot wait for perfection, we accept this legerdemain as the best we can do.

The next example I will use is that of pooled analyses, specifically meta-analyses. The intent of meta-analyses is to present the totality of evidence in a single convenient and easily comprehended value. One specific circumstance in which a meta-analysis is considered useful is when different studies are in conflict with one another, as for example, when one study demonstrates that an intervention is effective, while another does not show such effectiveness. In my humble opinion, it is one thing to pool data when studies suffer from Type II error due to small sample sizes. However, what if the studies simply show opposite results? That is, what if one study indicates a therapeutic advantage of treatment T over placebo P, but another shows the exact opposite? Is it still valid to combine these studies to get at the "true" story? Or is it better to leave them separate and try to understand the potential inherent differences in the study designs, populations, interventions, measurements, etc.? I am not saying that all meta-analyses mislead us, but I do think that in the wrong hands this technique can be dangerous, smoothing out differences that are potentially critical. This is one spherical cow that needs to be milked before it is bought.    

The last cow is truly spherical, and that is the one-on-one patient-clinician encounter, wherein the doctor needs to cram the individual patient with his/her binary predispositions into the continuous container of evidence. It is here that all of the other cows are magnified to obscene proportions to create a cumbersome, at times incomprehensible and frequently useless pile of manure. It is one thing for a clinician to ignore evidence willfully; it is entirely another to be a conscientious objector to what is known but not applicable to the individual in the office.

But let's remember that I, as a health services researcher and a self-acknowledged sustainability zealot, am in awe of manure's life-giving properties. Extending the metaphor then, this pile of manure can and should be used to fertilize our field of knowledge by viewing each therapeutic encounter systematically as its own experiment. The middle way is that between the cynicism of discarding and blind acceptance of any and all evidence. That is where the art of medicine must come in, and the emerging payment systems must take it into account. Doctors need time, intellectual curiosity and skills to conduct these "n of 1" trials in their offices. If the fire hose of new data insists on staying on, this is the surest way to direct "conscientious and judicious" application of the resulting oceans of population data in the service of our public's health. And that should make the farmer happy, even if his cows are spherical.