We got a little into the weeds last week about significance testing and test characteristics. Because information is power, I realized that it may be prudent to back up a bit and do a very explicit primer on medical testing. I am hoping that this will provide some vocabulary for improved patient-clinician communication. But, alas, please do not be surprised if your practitioner looks at you as if you were an alien -- it is safe to say that most clinicians do not think in these terms in their everyday practices. So, educate them!
Let's dig a little deeper into some of the ideas we batted around last week, specifically those pertaining to testing. Let's start by explicitly establishing the purpose of a medical test. The purpose of a medical test is to detect disease when such disease is present. This fact alone should underscore the importance of your physician's ability to arrive at the most likely reasons for your symptoms. This exercise that every doc should go through as he/she is evaluating you is called "differential diagnosis". When I was a practicing MD, my strategy was to come up with 3-5 most likely and 3-5 most deadly if missed potential diagnoses and explore them further with appropriate testing. Arranging these possible diagnoses as a hierarchy can help the clinician to assign informal probabilities to each, a task that is central to Bayesian thinking. From this hierarchy then follows the tactical sequential work-up, avoiding the frantic shotgun approach.
So, having established a hierarchy of diagnoses, we now engage in adjunctive testing. And here is where we really need to be aware not only of our degree of suspicion for each diagnosis, but also the test characteristics as they are reported in the literature and the test characteristics as they exist in the local center where the testing takes place. Why do I differentiate between the literature and practice? We know very well that the mere fact of observation, not to mention experimental cleanliness of trials, often tends to exaggerate the benefits of an intervention. In other words, real world is much messier than the laboratory of clinical research (which of course itself is messy enough). So, it is this compounded messiness that each clinician has to contend with when making testing decisions.
OK, so let us now deconstruct test characteristics even further. We have used the terms sensitivity, specificity, positive and negative predictive values. We've even explored their meanings to an extent. But let's break them down a bit further. Epidemiologists find it helpful to construct 2-by-2 (or 2 x 2) tables to think through some of these constructs, so, let's engage in that briefly. Below you see a typical 2 x 2 table.
It is easiest to think of sensitivity as how well the test picks up the corresponding disease. In the case of mammography from last week, this number is 80%. This means that among 100 women who actually harbor breast cancer a mammogram will recognize 80. This is the "true positive" value, or disease actually present when the test is positive. What about the remaining 20? Well, those real cancers will be missed by this test, and we call them a "false negative". If you look at the 2 x 2 table, it should become obvious that the sum of the true positives and the false negatives adds up to the total number of people with the disease. Are you shocked that our wonderful tests may miss so much disease? Well, stay tuned.
The flip side of sensitivity is "specificity". Specificity refers to whether or not the test is identifying what we think it is identifying. The noise in this value comes from the test in effect hallucinating disease when the person does not have the disease. A test with high specificity will be negative in the overwhelming proportion of people without the disease, so the "true negative" cell of the table will contain almost the entire group of people without disease. Alas, for any test we develop we walk the tight-rope between sensitivity and specificity. That is, depending on our priorities for testing, we have to give up some accuracy in either the sensitivity or the specificity. The more sensitive the test, the higher our confidence that we will not miss the disease when it is there. Unfortunately, what we gain in sensitivity we usually lose in specificity, thus creating higher odds for a host of false positive results. So, there really is no free lunch when it comes to testing. In fact, it is this very tension between sensitivity and specificity that is the crux of the mammography debate. Not as straight-forward as we had thought, right? And this is not even getting into pre-test probabilities or positive and negative predictive values!
Well, let's get into these ideas now. I believe that the positive and negative predictive values of the test are fairly well understood at this point, no? Just to reiterate, a positive predictive value, which is the ratio of true positives to all positive test results (the latter is the sum of the true and false positives, or the sum of the values across the top row of the 2 x 2 table), tells us how confident we can be that a positive test result corresponds to disease being present. Similarly, the negative predictive value, the ratio of true negative test results to all negative test results (again, the latter being the sum across the second row of the 2 x 2 table, or true and false negatives), tells us how confident we can be that a negative test result really represents the absence of disease. The higher the positive and negative predictive values, the more useful the test becomes. However, when one is likely to be quite high but the other quite low, it is a pitfall of our irrationality to rush head first into the test in hopes of obtaining the answer with a high value (as in the case of the negative predictive value for mammography in women aged 40-50 years), since the opposite test result creates a potentially difficult conundrum. This is where pre-test probability of disease comes in.
Now, what is this pre-test probability and how do we calculate it? Ah, this is the pivotal question. The pre-test probability is estimated based on population epidemiology data. In other words, given the type of a person you are (no, I do not mean nice or nasty or funny or droll) in terms of your demographics, heredity, chronic disease burden and current symptoms, what category of risk you fit into based on these population studies of disease. This approach relies on filing you into a particular cubby hole with other subjects whose characteristics are most similar to yours. Are you beginning to appreciate the complexity of this task? And the imprecision of it? Add to this barely functioning crystal ball the clinician's personal cognitive biases, and is it any wonder that we do not do better? And need I even overlay this with another bugaboo, that of the overwhelming amount of information in the face of the incredible shrinking appointment, to demonstrate to you just how NOT straightforward any of this medicine stuff is?
OK, get your fingers out of that Prozac bottle -- it is not all bad! Yes, these are significant barriers to good healthcare. But guess what? The mere fact that you now know these challenges and can call them by their appropriate names gives you more power to be your own steward of your healthcare. Next time a doc appears certain and recommends some sexy new test, you will know that you cannot just say OK and await further results. Your healthcare is a chess match: you and your healthcare provider need to plan 10 moves ahead and play out many different contingencies.
On our end, researchers, policy makers and software developers all need to do better developing more useful and individualizable information, integrating this information into user-friendly systems, and encouraging thoughtful healthcare encounters. I am convinced that patient empowerment with this information followed by teaming up with providers in advocacy in this vein is the only thing that can assure course correction for our mammoth, unruly, dangerous and irrational healthcare system.