The Atlantic article about John Ioannidis' research has sparked a live debate about the trustworthiness of much of the evidence generated in science. Much of what is referenced is his 2005 PLoS paper, where, through a modeling exercise, he concludes that a shotgun approach to hypothesis generation is the formula for garbage-in garbage-out data. The Atlantic article compelled me to search out the primary paper, and, despite its dense language, get though it and see what it really says. Below is my attempt at synthesis and interpretation of the salient points. My conclusion, as you will see, is that not all lies are in fact created equal.
As if the title were not incendiary enough, “Why Most Published Research Findings Are False”, Ioannidis, in this 2005 oft-cited PLoS paper, goes on to provoke further with
“There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims [6–8]. However, this should not be surprising. It can be proven that most claimed research findings are false.”
He develops a model simulation to prove to us mathematically that this is the case. But first he rightfully criticizes the fact that we do not put value in duplicating prior findings:
“Several methodologists have pointed out [9–11] that the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05. Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles should be interpreted based only on p-values.”
He then briefly touches upon the fact that negative findings are also of importance (this is the huge issue of publication bias, which gets amplified in meta-analyses), but forgoes an extensive discussion of this in favor of explaining why we cannot trust the bulk of modern research findings.
“As has been shown previously, the probability that a research finding is indeed true depends on the prior probability of it being true (before doing the study), the statistical power of the study, and the level of statistical significance [10,11].”
He then uses the tedious, yet effective and familiar lingo of Epidemiology and Biostatistics methods to develop his idea:
“Consider a 2 x 2 table in which research findings are compared against the gold standard of true relationships in a scientific field. In a research field both true and false hypotheses can be made about the presence of relationships. Let R be the ratio of the number of “true relationships” to “no relationships” among those tested in the field.”
So, let’s look at something that I know about – healthcare-associated pneumonia. We, and others, have shown that administering empiric antibiotics that do not cover the likely pathogens within the first 24 hours of hospitalization in this population is associated with a 2-3 increase in the risk of hospital death. So, the association is antibiotic choice and hospital survival. Any clinician will tell you that this idea has a lot of biologic plausibility: get the bug with the right drug and you improve the outcome. It is also easy to justify based on the germ theory. Finally, it does not get any more “gold standard” than death. We also look at the bugs themselves to see if some are worse than others, some of the process measures, as well as how sick the patient is, both acutely and chronically. Again, it is not unreasonable to hypothesize that all of these factors influence the biology of host-pathogen interaction. So, again, if you are Bayesian, you are comfortable with the prior probability.
The next idea he puts forth is in my opinion the critical piece of the puzzle:
“R is characteristic of the field and can vary a lot depending on whether the field targets highly likely relationships or searches for only one or a few true relationships among thousands and millions of hypotheses that may be postulated.”
To me what this says is that the more carefully we in any given field define what is probable prior to torturing the data, the more chance we have of being correct. Following through on his computation he derives this:
“Since usually the vast majority of investigators depend on α = 0.05, this means that a research finding is more likely true than false if (1 − β)R > 0.05.”
So, given the conventional β of 0.8, the R has to be 0.25 or greater to meet this threshold. That is 1 of 4 every 4 hypothesized associations in a given field must be correct. This does not seem like and unreasonable proportion if we are invoking a priori probabilities instead of plunging into analyses head first. In the field of genomics, as I understand it, the shotgun approach to finding associations certainly would alter this relationship in favor of a very high denominator, thus making the probability of a real association much lower.
Do I disagree with his assertions about multiple comparisons, biases, fabrication, etc.? Of course not – these are well known and difficult to quantify or remedy. Should we work to get rid of them with more transparency? Absolutely! But does his paper really mean that all of what we think we know is garbage, based on his mathematical model? I do not think so.
As everyone who reads this blog by now knows, I do not believe that we can ever arrive at the absolute truth. We can only inch as close as our methods and interpretations will allow. That is why I am so enamored of Dave deBronkart’s term “illusion of certainty”. I do, however, think that we need to examine these criticisms reasonably and not throw they baby out with the bath water.
I would be grateful to hear others’ interpretations of the Ioannidis’ paper, as it is quite possible that I have missed or misinterpreted something very important. After all, we are all learning.