Thursday, December 16, 2010

P-values, Bayes and Ioannidis, oh my!

When I graduated from college in the early 1980s, much to my parents' chagrin, I was not sure what to do with my career. So, instead of following some of my more wizened classmates to Wall Street, I got a job in a very well regarded molecular endocrinology laboratory in Boston. When I look back on that time, almost 30 years ago, so much seems surreal. For example, in those days it took us well over a year to sequence a gene! How about that? Something that takes hours with today's technology took an equivalent of eternity. And what a pain it was, running those cumbersome sequencing gels, with the glass cracking in the night, negating days of work. Oh, well, times have changed and for the better, I believe. But these changes are bringing with them some odd incongruities in our prior thinking.

A few days ago, I picked up a link from Michael Pollan's twitter feed to a story that startled me: It was insinuating that all these decades of sequencing human genome to hunt for targets of disease susceptibility have come up nearly empty-handed (with a few well known exceptions). The story recounted a tale of decades of vigorous funding, meteoric career growth and very few results to date. Yet, the scientists involved are not ready to give up on human genes as the primary locus for disease susceptibility. In fact, the strong rescue bias and the fear of losing all that beautiful funding are colluding to generate some creative and complex hypotheses. I came upon one such hypothesis earlier today, following a link from Ed Yong, that British science journalist extraordinaire. But the hypothesis itself, though fascinating, is not what interested me most, no. Care to guess what did? You are correct if you said the p-value.

What grabbed me is the calculation that to identify interactions of significance, the adjusted alpha level has to be set at 10^ -12; that's 0.000000000001! Why is this, and what does it mean in the context of our recent ruminations on the topic of p-value? Well, oddly enough it ties in very nicely with Bayesian thinking and John Ioannides' incendiary assertion of lies in research. How? Follow me.

The hallmark of identifying "significant" relationships (remember that the word "significant" merely means "worthy of noting") in genetic research is searching for statistical associations between the exposure (in this case a mutation in the genetic locus) and the outcome (the specific disease in question). When we start this analysis, we have absolutely no idea which gene(s) mutation(s) is(are) likely to be at play. This means that we have no way of establishing... can you guess what? Yes, that's right, the pre-test probability. We know nothing about the pretest probability. This shotgunning screening of thousands of genes is in essence a random search for matching pairs of mutation and disease. So, why should this matter?

The reason that it matters resides in the simple, albeit overused, coin toss analogy. When you toss a coin, what are your chances of getting heads on any given toss? The odds are 1:1 every time whether you will get heads or tails, meaning that the chance of getting heads is 50% (unless the coin is rigged, in which case we are talking about a Bayesian coin, not the focus of what we are getting at here). So, if you call heads prior to any given attempt, you will be correct half the time. Now, what does this have to do with the question at hand? Well, let's go back to our definition of the p-value: a p-value represents the probability of obtaining the result (in our gene case it is the association with disease) of the magnitude observed or greater under the conditions that no real association exists. So, our customary p-value threshold of 0.05 can be verbalized as follows: "There is a 5% probability that the association of the magnitude observed or greater could happen by chance if there is in reality no association". Now, what happens if we test 20 associations? In other words, what are the chances that one of them will come up "significant" at the p=0.05 level? Yes, there is a 1 in 20 (or 5 in 100) chance that this association will hit our level of significance preset at 0.05. And it follows that the chances of getting a "significant" result grow with more testing.

This is the very reason that gene scientists have asked (and now answered) the question of what level of statistical significance (also called alpha, signified by the p-value) is acceptable when exploring hundreds of thousands of potential associations without any prior clue of what to expect. And that level is a shocking 0.000000000001!

This reinforces a couple of ideas that we have been discussing of late. First, to say simply that the p-value of 0.05 was reached is to say nothing. The p-value needs to be put into the context of a). prior probability of association and b). the number of tests of association performed. Second, as a corollary, the p-value threshold needs to be set according to these two parameters: the lower the prior probability and/or the higher the number of tests, the lower the p-value needs to be in order to be noteworthy. Third, if we pay attention only to the p-value, and particularly if we fail to understand how to set the threshold for significance, what we get is junk, and, as Ioannidis aptly points out, lies. Fourth, and final, genetic data are tougher to interpret that most of us appreciate. It is possibly even less precise than clinical data we usually discuss on this blog. And we know how uncertain that can be!

So, as sexy as the emerging field of genetics was in the 1980s, I am pretty happy that, after four years in the lab, I decided to go first to medical and then to epidemiology schools. Dealing with conditions where we at least have some notion of the pre-test probabilities makes this quantitative window through which I see healthcare just a little less opaque. And these days, I am even happier about having bucked my classmates' Wall Street trend. But that's a story for another day.



  1. Marya, I have really been enjoying your posts. Thanks for sharing your ideas. I think your point about looking at the context of a chosen p-value is important. So, given the incredibly small p-value necessary for genetic association studies, are you suggesting that this type of research is not worthwhile?

  2. Oh, Adrienne, dear Adrienne! So good to see you here!

    No, I am certainly not suggesting anything of the sort. I am merely pointing out that we need to use a different yard stick to measure the data in the lab from one in the office. As I ruminate more on these ideas, I am even thinking that there may be a gradient of "acceptable" p-value thresholds depending on where we are with our knowledge. What I mean is that, early on, when we have no clue what the pre-test probability of an association is, we may be more interested in not missing something that may turnout to be important. So, at this point it may be sensible to accept a higher frequentist p-value. As we learn more, or, to be more precise, as we replicate the data more, a lower significance threshold may help us, shall we say, separate the wheat from the chaff.

    What do you think? I will probably write something about that at some point, once I have thought it through.

  3. Great info! Astounding. Will spread the word

  4. Marya, just found your blog via Gary Schwitzer's Health News Review. I liked the recent posts. Nice work. I'll have to check in regularly.

  5. Another excellent walk through the statistical fields. I wonder if an extension of this line of reasoning is that a p-value should be chosen which is fit for purpose. I'm afraid my library doesn't give me access to current issues of that journal online so all I can see is the abstract. My question, which you may be able to answer, is what is done with cases meeting this p-value threshold? Is the purpose merely to identify individuals at greater risk of disease so that measures can be taken to minimise other modifiable risk factors? Or is it to drive future research, better understanding of what the implicated genes do and whether their disruption could feasibly cause the disease in question, bearing in mind what we already understand of the pathology and molecular biology of the disease? For the first purpose a less specific (higher) p-value may be appropriate, acting as a screening test which modifies risk and can then be acted on with clinical judgement. For the second purpose a lower p-value may be necessary to really narrow down the number of suspects to a few possible genes which are then further intensively (and expensively) examined. Does this make sense??

    I'm not a trained epidemiologist like yourself but this discussion reminds me of a piece suggesting that for population epidemiological studies of association a lower p-value is appropriate to that appropriate for clinical trials of therapy (0.001 vs 0.05, Sterne JAC, Davey Smith G. Sifting the evidence – what’s wrong with significance tests? BMJ 2001; 322: 226–31.)

    I'm a bit surprised at commentators' impatience for magic therapies pouring out from the genome genie bottle. Surely this is sensationally impressive basic science which will form the basis of progress for decades and centuries to come in ways we cannot imagine. When Robert Koch described the isolation of pathogenic organisms I bet there were a few critics crowing 'nice one Bob but really will this ever be of any use to anyone?'