As you may have noticed, I have been thinking a lot about the nature of our research enterprise and its output as it relates to practical decisions that need to be made in the real world by physicians and policy makers. I have come to the conclusion that it is woefully inadequate in so many ways, and in fact it may even be borderline unethical. My thinking on this has been influenced at least in part by some of my reading of the work by the Tufts group led by David M. Kent, who has written a lot about the impact of heterogeneous treatment response on the central measures we report in trials -- I urge you to look their work up on Medline. (My potential COI here is that I did all of my Internal Medicine and Pulmonary and Critical Care training at Tufts in the 1990s, but did not know or work with Kent or his group). But admittedly I have been thinking about a lot of this stuff on my own as well, as I have advocated risk stratification for quite some time now. So, here is what I have been thinking.
The randomized controlled trial is what puts "evidence" into evidence-based medicine. It is the sine qua non of EBM, and is the preferred way to resolve questions of "does it work?" The rigorous experimental design, the blinding and placebo controls are all meant to maximize what we call internal validity, or assure that what we are detecting is not due to bias or chance or confounding or placebo effect or some other invalidating reason. But the one downfall of these massive undertakings is that we arrive at some measure that tells us what the response was on average, and how that differed on average from placebo. The beauty of such a measure of central tendency is also its curse: it tends to smooth out any of what we call "noise" in the signal, but at the same time this smoothing cannot differentiate between noise and important population differences in both baseline characteristics and responses. Therefore, I contend that it lacks what I have termed "individualizability", a threat to validity most felt at the bedside.
Take for an example, a cholesterol-lowering medication. Let us assume that the mean or median lowering that it provides is 20 points over 6 months. This number conflates those patients who respond vigorously (say with a 40 point lowering) with those with much more sluggish to no responses. So, while the total patient group may represent a broad spectrum of the population, we walk away from the data not really understanding who is likely to respond well and who not so well. Make sense?
Now, this becomes potentially even more important when looking at adverse responses to interventions. Just because on average we do not detect more deaths, for example, in the treatment than in the placebo arm, there may be important differences between those susceptible to this outcome in the two groups. That is, while in the placebo group the deaths may be due to the disease under investigation, in the treatment group the deaths may be directly attributable to the experimental treatment, and thus may impact different subjects. And although the balance of the accounting does not differ between groups, we may be creating a variant of the trolley problem.
What would be helpful, but is less accepted by statisticians and regulators, is exploration of subgroups of patients within any given trial. I will go even further to say that the trial design should a priori include randomization stratified by these known potential effect modifiers, so as to give the power and validity to understand the continuum of efficacy and harm. Because we rarely do this, the instrument of evidence becomes a blender of forced homogenization, lumping together to create results we like to call generalizable, but lacking the individualizable data that a clinician needs in the office. And it is this lack of individualizability that to me creates a possibility of an ethical breach at the point of treatment.
It has been asserted here that 90% of all medicines work only in 30-50% of the people who qualify for them. So, for the majority of drugs, we have less than a coin-toss chance that it will work. While at 50% we have equipoise, below 50% equipoise disappears. And may I remind you that equipoise is a requirement for an experimental protocol. In other words, if we have data that leans in either direction away from the 50-50% proposition of efficacy, the ethics of an investigation are brought into question. In the current situation, where the probability of a drug working is less than or equal to 50%, does not the office trial of such therapy qualify as investigational at best and potentially unethical at worst? Of course, this would depend, as always, on the risk-benefit balance of treating vs. not treating a particular condition. But my overarching question remains: should such a drug not be considered investigational in every patient in the office when the more individualizable data are not available?
Some of these issues may be remedied by the adaptive trial designs that the FDA is exploring. But knowing the glacial pace of progress at the agency, this solution is not imminent. Also, I am by no means a bioethicist, but I do spend a lot of time thinking about clinical research as it applies to patient care. And this situation seems ripe for a more well informed multi-disciplinary discussion. As always, my ruminations here are just a manifestation of how I am thinking through some of the issues I consider important. None of this is written in stone, and all is evolving. Your thoughtful comments help my evolution.