Showing posts with label methodology. Show all posts
Showing posts with label methodology. Show all posts

Friday, May 18, 2012

Why I have a propensity to believe the azithromycin data

Before I take off for the Health Foo Camp in Cambridge, MA, where I will be rubbing elbows with such luminaries as Susannah Fox, Ted Eytan, e-Patient Dave deBronkart, Regina Holliday, Nick Christakis and Nancy Etcoff, I thought I'd say a few words about one of my favorite topics: antibiotics. In particular I want to talk about azithromycin. A lot has been said about it this week already in the wake of the NEJM study (alas, the full paper is behind a pay wall) indicating an increase in the risk of cardiac death among patients exposed to this drug compared to those either not getting antibiotics or those receiving a different class of antimicrobials. And a lot more will be said in the future as the FDA reviews this risk. This is not specifically what I wanted to talk about. I wanted to focus on the methods.

The study was done meticulously, as far as I can see. It was a large Medicaid database from Tennessee (this may present a generalizability problem, of course, though I cannot think of a specific reason why it would). The study design was a retrospective cohort, which, as we already know, sets the study up for all kinds of biases. So, why should we believe anything the study showed? And particularly given many people's distaste for invoking causality from observational data?

There are at least 2 reasons why we should pay attention to these results. The first is that we are talking about the ultimate harm: death. When it comes to harm, my philosophical approach leans in the direction of caution. What this means scientifically is that I accept a much lower threshold for the certainty that the data convey than when it comes to evidence of benefit. This is an extension of the precautionary principle, where the burden of proof of safety now lies with the drug.

But there is a second, possibly more important reason that I am inclined to believe the data. The reason is called succinctly "propensity scoring." This is the technique that the investigators used to adjust away as much as feasible the possibility that factors other than the exposure to the drug caused the observed effect. In my book I briefly discuss propensity scoring in Chapter 21. And here is what I say:
Propensity scoring is gaining popularity as an adjustment method in the medical literature. A propensity score is essentially a number, usually derived from a regression analysis, describing the propensity of each subject for a particular exposure. So, in terms of smoking, we can create a propensity score based on other common characteristics that predict smoking. We take advantage of the presence of some of these characteristics also in people who are non-smokers to yield a similar propensity score in the absence of this exposure. In turn, the outcome of interest can be adjusted in several ways for the propensity for smoking. One common way is to match smokers to non-smokers based on the same (or similar) propensity scores and then examine their respective outcomes. This allows us to understand the independent impact of smoking on, say, the development of coronary artery disease. 
And if you are able to access Table 1 of the paper, you will see that their propensity matching was spectacularly successful. So, although it does not eliminate the possibility that something unobserved or unmeasured is causing this increase in deaths, the meticulous methods used lower the probability of this.

One final word about azithromycin. There are data that suggest that macrolides (the class of drugs that includes erythromycin, clarithromycin and azithromycin) are actually associated with improved outcomes in the setting of community-acquired pneumonia, or CAP. This is why these drugs are in the CAP treatment guideline. The point is that again, as in everything, the benefit of using azithromycin in any individual case will have to be weighed against this newly-identified risk.
  
If you like Healthcare, etc., please consider a donation (button in the right margin) to support development of this content. But just to be clear, it is not tax-deductible, as we do not have a non-profit status. Thank you for your support!

Wednesday, February 1, 2012

Marie Curie, Geiger counters and mass hysteria: more in common than meets the eye

What do Marie Curie, a Geiger counter and mass hysteria have in common? Well, to answer this question we need to go Sir Arthur Eddington, who was a British astrophysicist and philosopher of science at the turn of the 20th century. He came up with what is frequently referred to as the Eddington parable, which has nothing to do with the stars specifically and everything to do with how we make scientific progress. Here it is for your reading enjoyment, as told in this editorial (available by subscriptionby Diamond and Kaul, two highly respected clinician-researchers:
Let us suppose that an ichthyologist is exploring the life of the ocean. He casts a net into the water and brings up a fishy assortment. Surveying his catch, he [concludes that no] sea-creature is less than two inches long. An onlooker may object that the generalization is wrong. "There are plenty of sea-creatures under two inches long, only your net is not adapted to catch them." The ichthyologist dismisses this objection contemptuously: "Anything uncatchable by my net is ipso facto outside the scope of ichthyological knowledge, and is not part of the kingdom of fishes which has been defined as the theme of ichthyological knowledge. In short, what my net can't catch isn't fish”.
Suppose that a more tactful onlooker makes a rather different suggestion: "I realize that you are right in refusing our friend's hypothesis of uncatchable fish, which cannot be verified by any tests you and I would consider valid. By keeping to your own method of study, you have reached a generalization of the highest importance—to fishmongers, who would not be interested in generalizations about uncatchable fish. Since these generalizations are so important, I would like to help you. You arrived at your generalization in the traditional way by examining the fish. May I point out that you could have arrived more easily at the same generalization by examining the net and the method of using it?"
So,you see my point? Tools determine knowledge. Period.
  

Tuesday, September 20, 2011

Eminence or evidence, or how not to look like a fool when reporting your own data

A study presented a the ICAAC meeting was reported by the Family Practice News that piqued my interest. Firstly, it is a study on C. difficile infection treatment, and secondly it is counter to the evidence that has accumulated to date. So, I read the story very carefully, as, alas, the actual study presentation does not appear to be available.

Before I launch into the deconstruction of the data, I need to state that I do have a potential conflict of interest here. I am very involved in the CDI research from the health services and epidemiology perspective. But equally importantly, I have received research and consulting funding from ViroPharma, the manufacturer of oral Vancocin that is used to treat severe CDI.

And here is an important piece of background information: the reason the study was done. The recent evidence-based guideline on CDI developed jointly by SHEA and IDSA recommends initial treatment with metronidazole in the case of an infection that does not meet severe criteria, while advocating the use of vancomycin for severe disease. We will get into the reasons for this recommendation below.  

OK, with that out of the way, let us consider the information at hand.

My first contention is that this is a great example of how NOT to conduct a study (or how not to report it , or both). The study was a retrospective chart review at a single VA hospital in Chicago. All patients admitted between 1/09 and 3/10 who had tested positive for C. difficile toxin were identified and their hospitalizations records reviewed. A total of 147 patients were thus studied, of whom 25 (17%) received vancomycin and 122 (83%) metronidazole. It is worth mentioning that of the 122 initially treated with vancomycin, 28 (23%) were switched over to metronidazole treatment. The reasons for the switch as well as their outcomes remain obscure.

The treatment groups were stratified based on disease severity. Though the abstract states that severity was judged based on "temperature, white blood cell count, serum creatinine , serum albumin, acute mental status changes, systolic blood pressure<90, requirement for pressors," the thresholds for most of these variables are not stated. One can only assume that this stratification was done consistently and comported with the guideline.

Here is how the severity played out:

Nowhere can I find where those patients who were switched from metronidazole to vancomycin fell in these categories. And this is obviously important.

Now, for the outcomes. Those assessed were "need for colonoscopy, presence of pseudomembranes, adynamic ileus, recurrence within 30 days , reinfection > 30 days post therapy, number of recurrences >1, shock, megacolon, colon perforation, emergent colectomy, death." But what was reported? The only outcome to be reported in detail is recurrence in 30 days. And here is how it looks:

The other outcomes are reported merely as "M was equivalent to V irrespective of severity of illness (p=0.14). There was no difference in rate of recurrence (p= 0.41) nor in rate of complications between the groups (p=0.77)."
What the heck does this mean? Is the implication that the p-value tells the whole story? This is absurd! In addition, it does not appear to me from the abstract or the FPC report as if the authors bothered to do any adjusting for potential confounders. Granted, their minuscule sample size did not leave much room for that, but a lack of attempt alone invalidates the conclusion.

Oh, but if this were only the biggest of the problems! I'll start with what I think is the least of the threats to validity and work my way to the top of that heap, skipping much in the middle, as I do not have the time and the information available is full of holes. First, in any observational study of treatment there is a very strong possibility of confounding by indication. I have talked about this phenomenon previously here. I think of it as a clinician noticing something about the patient's severity of illness that does not manifest as a clear physiologic or laboratory sign, yet is very much present. A patient with this characteristic, although looking to us on paper much like one without a disease that is that severe, will be treated as someone at a higher threat level. In this case it may translate into treatment with vancomycin of patients who do not meet our criteria for severe disease, but nevertheless are severely ill. If present, this type of confounding blunts the observed differences between groups.

The lack of adjustment for potential confounding of any sort is a huge issue that negates any possibility of drawing a valid conclusion. Simply comparing groups based on severity of CDI does not eliminate the need to compare based on other factors that may be related to both the exposure and the outcome. This is pretty elementary. But again, this is minor compared to the fatal flaw.

And here it is, the final nail in the coffin of this study for me: sample size and superiority design. Firstly, the abstract and the write-up say nothing of what the study was powered to show. At least if this information had been available, we could make slightly more sense out of the p-values presented. But, no, this is nowhere to be found. As we all know, finding statistical significance is dependent on the effect size and variation within the population: the smaller the effect size and the greater the variation, the more subjects are needed to show a meaningful difference. Note, I said meaningful, NOT significant, and this they likewise neglect. What would be a clinically meaningful difference in the outcome(s)? Could 11% difference in recurrence rates be clinically important? I think so. But it is not statistically significant, you say! Bah-humbug, I say, go back and read all about the bunk that p-values represent!

One final issue, and this is that a superiority study is the wrong design here, in the absence of a placebo arm. In fact, the appropriate design is a non-inferiority study, with a very explicit development of valid non-inferiority margins that have to be met. It is true that a non-inferiority study may signal a superior result, but only if it is properly designed and executed, which this is not.

So, am I surprised that the study found "no differences" as supported by the p-values between the two treatments? Absolutely not. The sample size, the design and other issues touched on above preclude any meaningful conclusions being made. Yet this does not seem to stop the authors from doing exactly that, and the press from parroting them. Here is what the lead author states with aplomb:
              "There is a need for a prospective, head-to-head trial of these two medications, but I’m not sure who’s going to fund that study," Dr. Saleheen said in an interview at the meeting, which was sponsored by the American Society for Microbiology. "There is a paucity of data on this topic so it’s hard to say which antibiotic is better. We’re not jumping to any conclusions. There is no fixed management. We have to individualize each patient and treat accordingly."
OK, so I cannot disagree with the individualized treatment recommendation. But do we really need a "prospective head-to-head trial of these two medications"? I would say "yes," if there were not already not 1 but 2 randomized controlled trials addressing this very question. One by Zar and colleagues and another done as a regulatory study of the failed Genzyme drug tolevamer. Both of the trials contained separate arms for metronidazole and vancomycin (the Genzyme trial also had a tolevamer arm), and both stratified by disease severity. Zar and colleagues reported that in the severe CDI group the rate of clinical response was 76% in the metronidazole-treated patients versus 97% in the vancomycin group, with the p=0.02. In the tolevamer trial, presented as a poster at the 2007 ICAAC, there was an 85% clinical response rate to vancomycin and 65% to metronidazole (p=0.04).

We can always desire a better trial with better designs and different outcomes, but at some point practical considerations have to enter the equation. These are painstakingly performed studies that show a fairly convincing and consistent result. So, to put the current deeply flawed study against these findings is foolish, which is why I suspect the investigators failed to mention anything about these RCTs.

Why do I seem so incensed by this report? I am really getting impatient with both scientists and reporters for willfully misrepresenting the strength and validity of data. This makes everyone look like idiots, but more importantly such detritus clogs the gears of real science and clinical decision-making.

Wednesday, December 8, 2010

Getting beyond the p-value

Update 12/8/10, 9:30 AM: I just got an e-mail from Steve Goodman, MD, MHS, PhD, from Johns Hopkins about this post. Firstly, my apologies for getting his role at the Annals wrong -- he is the Senior Statistical Editor for the journal, not merely a statistical reviewer. I am happy to report that he added more fuel to the p-value fire, and you are likely to see more posts on this (you are overjoyed, right?). So, thanks to Dr. Goodman for his input this morning!

Yesterday I blogged about our preference to avoid false positive associations at the expense of failing to detect some real associations. The p value conundrum, where the arbitrary statistical significance is set at <0.05 has bothered me for a long time. I finally got curious enough to search out the origins of the p value. Believe it or not, the information was not easy to find. I have at lest 10 biostatistics or epidemiology textbooks on the shelves of my office -- not one of them went into the history of the p value threshold. But Professor Google came to my rescue, and here is what I discovered.

Using a carefully crafted search phrase, I found a discussion forum on WAME, World Association of Medical Editors, which I felt represented a credible source. Here I discovered a treasure trove of information and references to what I was looking for. Specifically, one poster referred to Steven Goodman's work, which I promptly looked up. And by the way, Steven Goodman, as it turns out is a statistical reviewer the Senior Statistical Editor for the Annals of Internal Medicine and a member of WAME. So, I went to this gem in the journal Epidemiology from May 2001, called unpretentiously "Of P-values and Bayes: A Modest Proposal". I have to say that some of the discussion was so in the weeds that even I have to go back and reread it several times to understand what the good Dr. Goodman is talking about. But here are some of the more salient and accessible points.

The author begins by stating his mixed feelings about the p-value:
I am delighted to be invited to comment on the use of P-values, but at the same time, it depresses me. Why? So much brainpower, ink, and passion have been expended on this subject for so long, yet plus ca change, plus c'ést le meme chose- the more things change, the more they stay the same. The references on this topic encompass innumerable disciplines, going back almost to the moment that P-values were introduced (by R.A. Fisher in the 1920s). The introduction of hypothesis testing in 1933 precipitated more intense engagement, caused by the subsuming of Fisher's significance test into the hypothesis test machinery.1-9 The discussion has continued ever since. I have been foolish enough to think I could whistle into this hurricane and be heard. 10-12 But we (and I) still use P-values. And when a journal like Epidemiology takes a principled stand against them, 13 epidemiologists who may recognize the limitations of P-values still feel as if they are being forced to walk on one leg. 14
So, here we learn that the p-value is something that has been around for 90 years and was brought into being by the father of frequentist statistics R.A. Fisher. And the users are ambivalent about it, to say the least. So, why, Goodman asks, continue to debate the value of the p-value (or its lack)? And here is the reason: publications.
Let me begin with an observation. When epidemiologists informally communicate their results (in talks, meeting presentations, or policy discussions), the balance between biology, methodology, data, and context is often appropriate. There is an emphasis on presenting a coherent epidemiologic or pathophysiologic story, with comparatively little talk of statistical rejection or other related tomfoolery. But this same sensibility is often not reflected in published papers. Here, the structure of presentation is more rigid, and statistical summaries seem to have more power. Within these confines, the narrative flow becomes secondary to the distillation of complex data, and inferences seem to flow from the data almost automatically. It is this automaticity of inference that is most distressing, and for which the elimination of P-values has been attempted as a curative.
This is clearly a condemnation of the way we publish: it demands a reductionist approach to the lowest common denominator, in this case the p-value. Much like our modern medical paradigm, the p-value does not get at the real issues:
I and others have discussed the connections between statistics and scientific philosophy elsewhere, 11,12,15-22 so I will cut to the chase here. The root cause of our problem is a philosophy of scientific inference that is supported by the statistical methodology in dominant use. This philosophy might best be described as a form of naïve inductivism,23 a belief that all scientists seeing the same data should come to the same conclusions. By implication, anyone who draws a different conclusion must be doing so for nonscientific reasons. It takes as given the statistical models we impose on data, and treats the estimated parameters of such models as direct mirrors of reality rather than as highly filtered and potentially distorted views. It is a belief that scientific reasoning requires little more than statistical model fitting, or in our case, reporting odds ratios, P-values and the like, to arrive at the truth. [emphasis mine]
Here is a sacred scientific cow that is getting tipped! You mean science is not absolute? Well, no, it is not, as the readers of this blog are amply aware. Science at best represents a model of our current understanding of the Universe, it builds upon itself usually in one direction, and it rarely gives an asymptotic approximation of what is really going on. Merely our current understanding of reality, given the tools we have at our disposal. Goodman continues to drive home the naïveté of our inductivist thinking in the following paragraph:
How is this philosophy manifest in research reports? One merely has to look at their organization. Traditionally, the findings of a paper are stated at the beginning of the discussion section. It is as if the finding is something derived directly from the results section. Reasoning and external facts come afterward, if at all. That is, in essence, naïve inductivism. This view of the scientific enterprise is aided and abetted by the P-value in a variety of ways, some obvious, some subtle. The obvious way is in its role in the reject/accept hypothesis test machinery. The more subtle way is in the fact that the P-value is a probability - something absolute, with nothing external needed for its interpretation.
In fact the point is that the p-value is exactly NOT absolute. The p-value needs to be judged relative to some other standard of probability, for example the prior probability of an event. And yet what do we do? We worship at the altar of the p-value without giving any thought to its meaning. And this is certainly convenient for those who want to invoke evidence of absence of certain associations, such as toxic exposures and health effects, for example, when the reality simply indicates absence of evidence.

The point is that we need to get beyond the p-value and develop a more sophisticated, nuanced and critical attitude toward data. Furthermore, regulatory bodies need to get a more nuanced way of communicating scientific data, particularly data evidencing harm, in order not to lose credibility with the people. Most importantly, however, we need to do a better job training researchers on the subtleties of statistical analyses, so that the p-value does not become the ultimate arbiter of the truth.

Tuesday, November 9, 2010

Could our application of EBM be unethical?

As you may have noticed, I have been thinking a lot about the nature of our research enterprise and its output as it relates to practical decisions that need to be made in the real world by physicians and policy makers. I have come to the conclusion that it is woefully inadequate in so many ways, and in fact it may even be borderline unethical. My thinking on this has been influenced at least in part by some of my reading of the work by the Tufts group led by David M. Kent, who has written a lot about the impact of heterogeneous treatment response on the central measures we report in trials -- I urge you to look their work up on Medline. (My potential COI here is that I did all of my Internal Medicine and Pulmonary and Critical Care training at Tufts in the 1990s, but did not know or work with Kent or his group). But admittedly I have been thinking about a lot of this stuff on my own as well, as I have advocated risk stratification for quite some time now. So, here is what I have been thinking.

The randomized controlled trial is what puts "evidence" into evidence-based medicine. It is the sine qua non of EBM, and is the preferred way to resolve questions of "does it work?" The rigorous experimental design, the blinding and placebo controls are all meant to maximize what we call internal validity, or assure that what we are detecting is not due to bias or chance or confounding or placebo effect or some other invalidating reason. But the one downfall of these massive undertakings is that we arrive at some measure that tells us what the response was on average, and how that differed on average from placebo. The beauty of such a measure of central tendency is also its curse: it tends to smooth out any of what we call "noise" in the signal, but at the same time this smoothing cannot differentiate between noise and important population differences in both baseline characteristics and responses. Therefore, I contend that it lacks what I have termed "individualizability", a threat to validity most felt at the bedside.

Take for an example, a cholesterol-lowering medication. Let us assume that the mean or median lowering that it provides is 20 points over 6 months. This number conflates those patients who respond vigorously (say with a 40 point lowering) with those with much more sluggish to no responses. So, while the total patient group may represent a broad spectrum of the population, we walk away from the data not really understanding who is likely to respond well and who not so well. Make sense?

Now, this becomes potentially even more important when looking at adverse responses to interventions. Just because on average we do not detect more deaths, for example, in the treatment than in the placebo arm, there may be important differences between those susceptible to this outcome in the two groups. That is, while in the placebo group the deaths may be due to the disease under investigation, in the treatment group the deaths may be directly attributable to the experimental treatment, and thus may impact different subjects. And although the balance of the accounting does not differ between groups, we may be creating a variant of the trolley problem.

What would be helpful, but is less accepted by statisticians and regulators, is exploration of subgroups of patients within any given trial. I will go even further to say that the trial design should a priori include randomization stratified by these known potential effect modifiers, so as to give the power and validity to understand the continuum of efficacy and harm. Because we rarely do this, the instrument of evidence becomes a blender of forced homogenization, lumping together to create results we like to call generalizable, but lacking the individualizable data that a clinician needs in the office. And it is this lack of individualizability that to me creates a possibility of an ethical breach at the point of treatment.

It has been asserted here that 90% of all medicines work only in 30-50% of the people who qualify for them. So, for the majority of drugs, we have less than a coin-toss chance that it will work. While at 50% we have equipoise, below 50% equipoise disappears. And may I remind you that equipoise is a requirement for an experimental protocol. In other words, if we have data that leans in either direction away from the 50-50% proposition of efficacy, the ethics of an investigation are brought into question. In the current situation, where the probability of a drug working is less than or equal to 50%, does not the office trial of such therapy qualify as investigational at best and potentially unethical at worst? Of course, this would depend, as always, on the risk-benefit balance of treating vs. not treating a particular condition. But my overarching question remains: should such a drug not be considered investigational in every patient in the office when the more individualizable data are not available?

Some of these issues may be remedied by the adaptive trial designs that the FDA is exploring. But knowing the glacial pace of progress at the agency, this solution is not imminent. Also, I am by no means a bioethicist, but I do spend a lot of time thinking about clinical research as it applies to patient care. And this situation seems ripe for a more well informed multi-disciplinary discussion. As always, my ruminations here are just a manifestation of how I am thinking through some of the issues I consider important. None of this is written in stone, and all is evolving. Your thoughtful comments help my evolution.  



 

Wednesday, September 29, 2010

Disruptive innovation in healthcare: Overcoming HTE

I have been working on a talk for the American College of Chest Physicians (Chest) annual meeting. The session is on alternative research study designs, and I chose to talk about the N of 1 trials. I think that you can get an idea of why I chose this topic from reading my previous posts here, here and here. Doing my research, I came upon a term "heterogeneous treatment effect", or HTE, that is well worth exploring. I think that every clinician who has ever seen patients is familiar with this effect, but let us trace its explanation.

This excellent article in the UK Independent summarizes the premiss with some scathing comments from none other than GSK's chief geneticist, Dr. Allen Roses:
"The vast majority of drugs - more than 90 per cent - only work in 30 or 50 per cent of the people," Dr Roses said. "I wouldn't say that most drugs don't work. I would say that most drugs work in 30 to 50 per cent of people. Drugs out there on the market work, but they don't work in everybody."
There is even a table presented with response rates by therapeutic area, though the reference(s) is(are) not cited, so, please, take with a grain of salt:
Response rates
Therapeutic area: drug efficacy rate in per cent
  • Alzheimer's: 30
  • Analgesics (Cox-2): 80
  • Asthma: 60
  • Cardiac Arrythmias: 60
  • Depression (SSRI): 62
  • Diabetes: 57
  • Hepatits C (HCV): 47
  • Incontinence: 40
  • Migraine (acute): 52
  • Migraine (prophylaxis): 50
  • Oncology: 25
  • Rheumatoid arthritis: 50
  • Schizophrenia: 60
In essence, what Dr. Roses was referring to is the phenomenon of HTE, described aptly by Kent et al. as the fact that "the effectiveness and safety of a treatment varies across the patient population". The authors preface it by saying that 
Although “evidence-based medicine” has become the dominant paradigm for shaping clinical recommendations and guidelines, recent work demonstrates that many clinicians’ initial concerns about “evidence-based medicine” come from the very real incongruence between the overall effects of a treatment in a study population (the summary result of a clinical trial) and deciding what treatment is best for an individual patient given their specific condition, needs and desires (the task of the good clinician). The answer, however, is not to accept clinician or expert opinion as a replacement for scientific evidence for estimating a treatment’s efficacy and safety, but to better understand how the effectiveness and safety of a treatment varies across the patient population (referred to as heterogeneity of treatment effect [HTE]) so as to make optimal decisions for each patient.
Ah, so it is not your imagination: when someone brings an evidence-based guideline to you, and insists that unless you comply 95% of the time, you are providing less than great quality of care, and you say "this does not represent my patients", you are actually not crazy. To be sure, a good EBPG will apply to most patients encountered with the particular condition. But the devil, of course is in the details. As I have already pointed out, we impose statistical principles onto data to whip it into submission. When we do a good job, we acknowledge the limitations of what measures of central tendency provide us with. But so much of the time I see physicians relying on the p value alone to compare the effects, that I am convinced that the variation around the center is mostly lost on us. And further, how does this variance help a clinician faced with an individual patient who has at best a probability of response on some continuum of a population of probabilities? And more importantly, what will this individual patient's risk-benefit balance be for a particular therapy?

I think what I am walking away with after thinking about this issue is that it is of utmost importance to understand what kind of data have gone into a recommendation. What is the degree of HTE in the known research, and specifically, what is known about the population that your patient represents. The less HTE and the more knowledge about the specific subgroups, the more confident you can be that the therapy will work. Ultimately, however, each patient is a universe onto herself, since no two people will share same genetics, environmental exposures, chronic condition profile or other treatments, to name just a few potential characteristics that may impact response to therapy.

This is the reason that we need better trials, where people are represented more broadly, leading to an increase in external validity. To make this information useful at the bedside, we need a priori plans to analyze many different subgroups, as that will give clinicians at least some granularity so desperately needed in the office. And while pharmacogenomics may be helpful, I am sure that it will not be the panacea for reducing all of this complexity to zero.

Until technology gives us a better way (assuming that it will), where possible, a systematic approach to treatment trials should be undertaken. Later I will blog about N of 1 trials, which, though not appropriate in every situation, may be quite helpful in optimizing treatment in some chronic conditions. With the advent of health IT, these trials may become less daunting and, in aggregate, provide some very useful generalizable information on what happens in the real world. Each clinician will need to take some ownership in advancing our collective understanding of the diseases s/he treats. This may truly be the disruptive innovation we are all looking for to improve the quality of care not just to please the bureaucrats, but to promote better health and quality of life.