When I graduated from college in the early 1980s, much to my parents' chagrin, I was not sure what to do with my career. So, instead of following some of my more wizened classmates to Wall Street, I got a job in a very well regarded molecular endocrinology laboratory in Boston. When I look back on that time, almost 30 years ago, so much seems surreal. For example, in those days it took us well over a year to sequence a gene! How about that? Something that takes hours with today's technology took an equivalent of eternity. And what a pain it was, running those cumbersome sequencing gels, with the glass cracking in the night, negating days of work. Oh, well, times have changed and for the better, I believe. But these changes are bringing with them some odd incongruities in our prior thinking.
A few days ago, I picked up a link from Michael Pollan's twitter feed to a story that startled me: It was insinuating that all these decades of sequencing human genome to hunt for targets of disease susceptibility have come up nearly empty-handed (with a few well known exceptions). The story recounted a tale of decades of vigorous funding, meteoric career growth and very few results to date. Yet, the scientists involved are not ready to give up on human genes as the primary locus for disease susceptibility. In fact, the strong rescue bias and the fear of losing all that beautiful funding are colluding to generate some creative and complex hypotheses. I came upon one such hypothesis earlier today, following a link from Ed Yong, that British science journalist extraordinaire. But the hypothesis itself, though fascinating, is not what interested me most, no. Care to guess what did? You are correct if you said the p-value.
What grabbed me is the calculation that to identify interactions of significance, the adjusted alpha level has to be set at 10^ -12; that's 0.000000000001! Why is this, and what does it mean in the context of our recent ruminations on the topic of p-value? Well, oddly enough it ties in very nicely with Bayesian thinking and John Ioannides' incendiary assertion of lies in research. How? Follow me.
The hallmark of identifying "significant" relationships (remember that the word "significant" merely means "worthy of noting") in genetic research is searching for statistical associations between the exposure (in this case a mutation in the genetic locus) and the outcome (the specific disease in question). When we start this analysis, we have absolutely no idea which gene(s) mutation(s) is(are) likely to be at play. This means that we have no way of establishing... can you guess what? Yes, that's right, the pre-test probability. We know nothing about the pretest probability. This shotgunning screening of thousands of genes is in essence a random search for matching pairs of mutation and disease. So, why should this matter?
The reason that it matters resides in the simple, albeit overused, coin toss analogy. When you toss a coin, what are your chances of getting heads on any given toss? The odds are 1:1 every time whether you will get heads or tails, meaning that the chance of getting heads is 50% (unless the coin is rigged, in which case we are talking about a Bayesian coin, not the focus of what we are getting at here). So, if you call heads prior to any given attempt, you will be correct half the time. Now, what does this have to do with the question at hand? Well, let's go back to our definition of the p-value: a p-value represents the probability of obtaining the result (in our gene case it is the association with disease) of the magnitude observed or greater under the conditions that no real association exists. So, our customary p-value threshold of 0.05 can be verbalized as follows: "There is a 5% probability that the association of the magnitude observed or greater could happen by chance if there is in reality no association". Now, what happens if we test 20 associations? In other words, what are the chances that one of them will come up "significant" at the p=0.05 level? Yes, there is a 1 in 20 (or 5 in 100) chance that this association will hit our level of significance preset at 0.05. And it follows that the chances of getting a "significant" result grow with more testing.
This is the very reason that gene scientists have asked (and now answered) the question of what level of statistical significance (also called alpha, signified by the p-value) is acceptable when exploring hundreds of thousands of potential associations without any prior clue of what to expect. And that level is a shocking 0.000000000001!
This reinforces a couple of ideas that we have been discussing of late. First, to say simply that the p-value of 0.05 was reached is to say nothing. The p-value needs to be put into the context of a). prior probability of association and b). the number of tests of association performed. Second, as a corollary, the p-value threshold needs to be set according to these two parameters: the lower the prior probability and/or the higher the number of tests, the lower the p-value needs to be in order to be noteworthy. Third, if we pay attention only to the p-value, and particularly if we fail to understand how to set the threshold for significance, what we get is junk, and, as Ioannidis aptly points out, lies. Fourth, and final, genetic data are tougher to interpret that most of us appreciate. It is possibly even less precise than clinical data we usually discuss on this blog. And we know how uncertain that can be!
So, as sexy as the emerging field of genetics was in the 1980s, I am pretty happy that, after four years in the lab, I decided to go first to medical and then to epidemiology schools. Dealing with conditions where we at least have some notion of the pre-test probabilities makes this quantitative window through which I see healthcare just a little less opaque. And these days, I am even happier about having bucked my classmates' Wall Street trend. But that's a story for another day.