Monday, December 13, 2010

Can a "negative" p-value obscure a positive finding?

I am still on my p-value kick, brilliantly fueled by Dr. Steve Goodman's correspondence with me and another paper by him aptly named "A Dirty Dozen: Twelve P-Value Misconceptions". It is definitely worth a read in toto, as I will only focus on some of its more salient parts.

Perhaps the most important point that I have gleaned from my p-value quest is that the word "significance" should be taken quite literally. Here is what Merriam-Webster dictionary says about it:


 noun \sig-ˈni-fi-kən(t)s\
Definition of SIGNIFICANCE
a : something that is conveyed as a meaning often obscurely or indirectly

b : the quality of conveying or implying
a : the quality of being important : moment
b : the quality of being statistically significant
 It is the very meaning in point 2a that the word "significance" was meant to convey in reference to statistical testing. That is "worth noting" or "noteworthy". Nowhere do we find any references to God or truth or dogma. So, the first lesson is to drift away from taking statistical significance as the sign from God that we have discovered the absolute truth, and to focus on the fact that  we need to make note of the association. The follow up to the noting action is confirmation or refutation. That is, once identified, this relationship needs to be tested again (and sometimes again and again) before we can say that there may be something to it.

As an aside, how many times have you received comments from peer reviewers saying that what you are showing has already been shown? Yet, in all of our Discussion section we are quite cautious to say that "more research is needed" to confirm what we have seen. So, it seems we -- researchers, editors and reviewers -- just need to get on the same page.

To go on, as the title of the paper states, we are initiated into the 12 common misconceptions about what p-value is not. Here is the table that enumerates all 12 (since the paper is easy to find for no fee, I am assuming that reproducing the table with attribution is not a problem):

Even though some of them seem quite similar, it is worth understanding the degrees of difference, as they provide important insights.

The one I wanted to touch upon further today is Misconception #12, as it dovetails with our prior discussion vis-a-vis environmental risks. But before we do this, it is worth defining the elusive meaning of the p-value once again: "The p-value signifies the probability of obtaining the association (or difference) of the magnitude obtained or one of greater magnitude when in reality there is no association (or difference)". So, let's apply this to an everyday example of smoking and lung cancer risk. Let's say a study shows a 2-fold increase in lung cancer among smokers compared to non-smokers, and the p-value for this association is 0.06. What this really means is that "under conditions of no true association between smoking and lung cancer, there is a 6% or less chance that a study would find a 2-fold or greater increase in cancer associated with smoking". Make sense? Yet, according to the "rules" of statistical significance, we would call this study negative. But is this a true negative? (To the reader of this blog this is obvious, but assure you that, given how cursory our reading of the literature tends to be, and how often I hear my peers discount findings with the careless "But the p-value was not significant", this is a point worth harping on).

The bottom line answer to this is found in the discussion of the bottom line misconception in the Table: "A scientific conclusion or treatment policy should be based on whether or not the p-value is significant". I would like to quote directly from Goodman's paper here, as it really drives home the idiocy of this idea:
This misconception encompasses all of the others. It is equivalent to saying that the magnitude of effect is not relevant, that only evidence relevant to a scientific conclusion is in the experiment at hand, and that both beliefs and actions flow directly from the statistical results. The evidence from a given study needs to be combined with that from prior work to generate a conclusion. In some instances, a scientifically defensible conclusion might be that the null hypothesis is still probably true even after a significant result, and in other instances, a nonsignificant P value might still lead to a conclusion that a treatment works. This can be done formally only through Bayesian approaches. To justify actions, we must incorporate the seriousness of errors flowing from the actions together with the chance that the conclusions are wrong.
When the author advocates Bayesian approaches, he is referring to the idea that a positive result in the setting of a low pre-test probability still has a very low chance of describing a truly positive association. This is better illustrated by the Bayes theorem, which allows us to quantify the result at hand ("posterior probability") to what the bulk of prior evidence and/or thought has indicated about the association ("prior probability"). This implies that the lower our prior probability, the less convinced we can be by a single positive result. As a corollary, the higher our prior probability for an association, the less credence we can put in a single negative result. So, Bayesian approach to evidence, as Goodman indicates here, can merely move us in the direction of either a greater or a lesser doubt about our results, NOT bring us to the truth or falsity.

Taken together, all these points merely confirm my prior assertion that we need to be a lot more cautious about calling results negative when deciding about potentially risky exposures than about beneficial ones. Similarly, we need to set a much higher bar for all threats to validity in studies designed to look at risky rather than beneficial outcomes (more on this in a future post). These are the principles we should be employing when evaluating environmental exposures. This becomes particularly critical in view of the startling revelations of the genome-wide association experiments findings that our genes determine a very small minority of diseases to which we are subject. This means that the ante has been upped dramatically for environmental exposures as culprits, and demands a much more serious push for the precautionary principle as the foundation for our environmental policy.    



  1. What I'm learning here is that I can actually learn to read and understand research better, so when people send Cochrane reviews my way, they don't always mean what we think they mean? Would that be, well, true - not as in a sign from God, but significantly true?

  2. Karen, good to see you here and welcome!

    Well, yes, it is absolutely possible for a Cochrane review to be incorrect, although Cochrane methods are usually quite fastidious. Given their proclivity toward randomized controlled trials to the exclusion of all other types of studies as evidence, however, they are more likely to invoke "more data needed" as the mantra than the "no clear evidence of association". My feeling is, and this is what I try to teach to my students, that a look under the hood of any study is a worthy pursuit before parroting anyone else's conclusions. Does this answer your question?

  3. Dr Z,
    Thanks for a meaningful series on p values. It is an eye opener. On the gene comment, I think the jury is still out and will be out for a good while. Mutations are still worthy of study. Blog on.