Wednesday, December 15, 2010

Why medical testing is never a simple decision

A couple of days ago, Archives of Internal Medicine published a case report online. Now, it is rather unusual for a high impact journal to publish even a case series, let alone a case report. Yet this was done in the vein of highlighting their theme of "less is more" in medicine. This motif was announced by Rita Redberg many months ago, when she solicited papers to shed light on the potential harms that we perpetrate in healthcare with errors of commission.

The case in question is one of a middle-aged woman presenting to the emergency room with vague symptoms of chest pain. Although from reading the paper it becomes clear that the pain is highly unlikely to represent heart disease, the doctors caring for the patient elect to do a non-invasive CT angiography test, just to "reassure" the patient, as the authors put it. Well, lo' and behold, the test comes back positive, the woman goes for an invasive cardiac catheterization, where, though no disease is found, she suffers a very rare but devastating tear of one of the arteries in her heart. As you can imagine, she gets very ill, requires a bypass surgery and ultimately an urgent heart transplant. Yup, from healthy to a heart transplant patient in just a few weeks. Nice, huh?

The case illustrates the pitfalls of getting a seemingly innocuous test for what appears to be a humanistic reason -- patient reassurance. Yet, look at the tsunami of harm that followed this one decision. But what is done is done. The big question is, can cases like this be prevented in the future? And if so, how? I will submit to you that Bayesian approaches to testing can and should reduce such complications. Here is how.

First, what is Bayesian thinking? Bayesian thinking, formalized mathematically through Bayes theorem, refers to taking the probability of disease being there into account when interpreting subsequent test results. What does this mean? Well, let us take the much embattled example of mammography and put some numbers to the probabilities. Let us assume that an otherwise healthy woman between 40 and 50 years of age has a 1% chance of developing breast cancer (that is 1 out of every 100 such women, or 100 out of 10,000 undergoing screening). Now, let's say that a screening mammogram is able to pick up 80% of all cancers that are actually there (true positives), meaning that 20% go unnoticed by this technology. So, among the 100 women with actual breast cancer of the 10,000 women screened, 80 will be diagnosed as having cancer, while 20 will be missed. OK so far? Let's go on. Let us also assume that, in a certain fraction of the screenings, mammography will merely imagine that a cancer is present, when in fact there is no cancer. Let us say that this happens about 10% of the time. So, going back to the 10,000 women we are screening, of 9,900 who do NOT have cancer (remember that only 100 can have a true cancer), 10%, or 990 individuals, will still be diagnosed as having cancer. So, tallying up all of the positive mammograms, we are now faced with 1,070 women diagnosed with breast cancer. But of course, of these women only 80 actually have the cancer, so what's the deal? Well, we have arrived at the very important idea of the value of a positive test: this roughly tells us how sure we should be that a positive test actually means that the disease is present. It is a simple ratio of the real positives (true positives, in our case the 80 women with true cancer) and all of the positives obtained with the test (in our case 1,070). This is called positive predictive value of a test, and in our mammography example for women between ages of 40 and 50 it turns out to be 7.5%. So, what this means is that over 90% of the positive mammograms in this population will turn out to be false positives.

Now, let us look at the flip side of this equation, or the value of a negative test. Of the 8,930 negative mammograms, only 20 will be false negatives (remember that in our case mammography will only pick up 80 out of 100 true cancers). This means that the other 8,910 negative results are true negatives, making the value of a negative test, or negative predictive value, 8,910/8,930 = 99.8%, or just fantastic! So, if the test is negative, we can be pretty darn sure that there is no cancer. However, if the test is positive, while cancer is present in 80 women, 900 others will undergo unnecessary further testing. And for every subsequent test a similar calculus applies, since all tests are fallible.

Let's do one more maneuver. Let's say that now we have a population of 10,000 women who have a 10% chance of having breast cancer (as is the case with an older population). The sensitivity and specificity of mammography do not change, yet the positive and negative predictive values do. So, among these 10,000 women, 1,000 are expected to have cancer, of which 800 will be picked up on mammography. Among the 9,000 without cancer, a mammogram will "find" a cancer in 900. So, the total positive mammograms add up to 1,700, of which nearly 50% are true positives (800/1,700 = 47.1%). Interestingly, the negative predictive value does not change a whole lot (8,100/[8,100 + 200]) = 97.6%, or still quite acceptably high). So, while among younger women at a lower risk for breast cancer, a positive mammogram indicates the presence of disease in only 8% of the cases, for older women it is about 50% correct.

These two examples illustrate how exquisitely sensitive an interpretation of any test result is to the pre-test probability that a patient has the disease. Applying this to the woman in the case report in the Archives, some back-of-the-napkin calculations based on the numbers in the report suggest that, while a negative CT angiogram would indeed have been reassuring, a positive one would only create confusion, as it, in fact, did.

To be sure, if we had a perfect test, or one that picked up disease 100% of the time when it was present and did not mislabel people without the disease as having it, we would not need to apply this type of Bayesian accounting. However, to the best of my knowledge, no such test exists in today's clinical practice. Therefore, engaging in explicit calculations of what results can be expected in a particular patient from a particular test before ordering such a test can save a lot of headaches, and perhaps even lives. In fact, I do hope that the developers of our new electronic medical environments are giving this serious thought, as these simple algorithms should be built into all decision support systems. Bayes theorem is an idea whose time has surely come.                    


  1. Great Post and incredibly valuable to decision makers.
    In the case mentioned, cost might be the main decision factor that would have avoided the disaster. The persons involved seem to assume FREE as the price of the test. If the patient was also a Consumer during a brief moment with the physician, who should be aware of their professional added role of Steward of scarce, valuable resources, the no scan decision might have been made in less than Bayesian mode. Just a thought, but your posts are awesome. Thanks!

  2. Thanks, Pat. I am not sure that cost would have definitely avoided the cascade that ensued -- perhaps in some cases, but not in others. My point is that even without invoking cost, that third rail of the healthcare debate, we can make some solid clinical arguments for the "less is more" philosophy. It also argues for slow medicine, since the clinician needs to have time to actually think about the patient.
    Thanks again for your comment!

  3. Bayes is the way of the future. This sort of wariness of investigating too far should be a natural impulse of a good clinician. Antenatal ultrasounds and new higher-Tesla MRI scanners are examples of new technologies which currently provide results which provide temptation to investigate further but which we know little about the significance of. Renal pelvis dilatation detected antenatally may mean some risk of vesicouereteric reflux, renal scarring and renal failure but we have little idea how much risk (the modest knowledge we already have relates to children who have actually had a UTI, not just abnormal ultrasound appearance). Higher resolution scans lead to more bubs referred with findings. Similarly neurologists tell me they're seeing more findings on MRI brains these days that are hard to provide advice about and simply would not have been seen with CT or earlier MR technology (and don't whatever you do send these patients to a neurosurgeon, they will reach for the biopsy gun faster than you can say 'informed consent). We can't yet analyse these sorts of cases with Bayesian methods but that should be the aim of research- prospective series with outcomes, providing likelihood ratios we can get our teeth into. And not a p-value in sight.

    @ Dr Syn, agree entirely, but would also like to add that patients should also be part of the decision-making, not only the physician-steward. There are economic systems of health care delivery which would give patients more power (and responsibility) about spending decisions; such systems are not supported by either political side of the healthcare debate in your country, or in mine for that matter).

  4. Well, Dr. Jenner, great to see you back! Thanks for your comment. Unfortunately, the idea of studying these new technologies and their findings at the population level collides with our view of the ethical considerations Take MRI for breast cancer detection. Do all the cancers they detect go on to cause problems? It is clear from Gil Welch's work that they do not, at least not all. But how do you design a study where you merely monitor some of the women over time once they know they have a cancer?
    Until our discussion of ethics gets back to weighing all of the risks and benefits, rather that simply straying into the emotional quicksands of "cancer", we will not get a better sense of prior probabilities.

  5. Excellent job Marya. This is the kind of discussion I used to have with residents. In many cases I would become dismayed when instead of a robust discussion of the relative merits of basing clinical discussions on probabilities my residents would essentially invoke the "more is better" argument or the "well you never can be sure" argument or the "what if it were your wife" argument as reasons to pursue additional diagnostic tests or invasive procedures which are implicitly assumed to be risk-free.

    This (in my opinion) is the real problem. People do not even allow themselves to see a Bayesian argument because the default position in Medicine is that intervention trumps non-intervention.

    Even before a discussion of statistics I always told my residents that it's better to as "why" rather than "why not". The former seeks information. The latter seeks an excuse to act.

    With regard to Dr. Synonymous' point about cost I would say that adding cost to the mix can actually get us to worse outcomes than a strictly statistical approach.

    Quite simply; in a construct that emphasizes cost those with means will be more likely to pursue an intervention/test/procedure while those without will (on average) be less likely to consider the same intervention/test/procedure. This, then, introduces incentives to either perform or not perform the intervention/test/procedure that are not (by definition) clinical.

    Unfortunately (or fortunately) health related expenditures do not follow a free-market consumer oriented algorithm.

  6. I think basic statistics should be a requirement for high school graduation in this country, and informed consent should involve an honest review of the stats. Would the woman have had the procedure done if she truly knew the risks down the road? What if she had a connective tissue disorder (as I do- it is found in 1/2000 people, approximately, and grossly under-diagnosed in women) that could make vascular complications more likely? I think the patient, once stabilized and calmed, should be offered multiple options of varying invasiveness and cost, if possible, with the stats for success/harm for each, and equipped to make a rational decision with the doctor. As it is, he/she is often pressured by lawsuit-wary doctors to take invasive tests with the benefits of subsequent (expensive, marginally useful) treatment emphasized and harms minimized. The doctor walks with the cash, and the patient and his/her primary physician live (or not) with the consequences. Most docs mean well, but I've encountered several that did not. Please spread the idea that the stats are important as far as you can. I've stared a physician down with papers in hand more than once now, and being perceived as a nut is getting old (You don't want a mammogram in your 30s? You like physical therapy better than surgery? Crazy.). Thanks.

  7. Chuk, thanks for taking the time to comment! Agree that cost is not the winning argument here -- it is quite simply the risk-benefit equation, given the patient's profile.
    Mitzi, I could not agree more. When we talk about science literacy and numeracy, schools should be the place to assure this kind of basic preparation. Yet, we are lagging so badly behind the rest of the world. Sometimes I lose hope that we can actually empower people to be better users of the system. But I will keep on with my message! Thanks for the encouragement.

  8. It seems to me that the authors of the Archives case report have conflated likelihood of obstructive coronary artery disease (CAD) with Framingham risk score-derived risk of coronary death or nonfatal heart attack events.

    While they are correct in stating that the risk of coronary death or nonfatal heart attack over 10 yrs is less than 10% (thereby implying low risk), a 52 year-old obese female with high blood pressure and ATYPICAL CHEST PAIN has an INTERMEDIATE (about 30%) likelihood of having obstructive CAD. A CT angiogram would be quite reasonable in this patient provided coronary artery calcium (CAC) score is less than 400. No mention is made of CAC score in the case report but the main coronary artery (LAD) had a complex calcified lesion that precluded evaluation of coronary stenosis. Even if the CT angiogram was accurate in diagnosing LAD stenosis, how would one have prevented a rare, but known, complication of invasive angiography? I think both the authors' and the editorialists' comments are right on target, but for the wrong reason.

    I think CT angiogram (sensitivity of 98% and specificity of 89%) is quite reasonable for someone with intermediate likelihood of CAD (pretest likelihood of 30%). The PPV of 90% and NPV of 98% of the test means that a positive test would increase the pretest probability from 30% to about 80% and a negative test would reduce the probability to under 3%. Reverend Bayes would regard such a test to have clinically relevant diagnostic yield!

  9. Dear Dr. Kaul,

    It is an honor to have you visit and comment. While I cannot dispute your assertion that her risk for CAD is 30%, you do use the best possible test characteristics (sensitivity 98%, specificity 89%) to arrive at her posterior probability. If we use the most conservative estimates mentioned in the paper, along with a prior probability of 30% for CAD, we get the PPV of 49% and NPV of 96%. I can see how for some 50% may mean the need for further testing, while for others it may mean wait and see.

    But regardless of the particulars of the case, I think you and I both agree that this is an important exercise to engage in prior to ordering the test.

    Thank you again fro stopping by and commenting!

  10. Thank you for giving me the opportunity to clarify. And I apologize for the long post.

    The sensitivity and specificity values used in my calculations are based on a meta-analysis of 89 studies published recently in Annals of Internal Medicine by Schuetz et al, Ann Int. Med. 2010; 152:167-177. The sensitivity and specificity values apply to CT scanners with greater than 16 rows (typically 64-row detectors are common in clinical practice). But your point is right, the predictive accuracies depend upon the sensitivity and specificity of the diagnostic test.

    With regards to pretest likelihood of CAD, it is modulated by the typicality of symptoms, age, gender and risk factors. For example, a 55 year old female with atypical chest pain will have a 10% likelihood of CAD without risk factors and 47% likelihood of CAD with risk factors. The probabilities increase to 45% and 79% for a 55 year old male with atypical symptoms. So even if the pretest likelihood is, say 20%, the PPV is 69% and NPV is 99% (I must clarify that my previously posted PPV and NPV values were based on a 50% prevalence of CAD).

    Another useful measure is the positive likelihood ratio (LR+) or negative likelihood ratio (LR-). The LR+ for CT angio is 8.9 and LR- is 0.02 (both consistent with large effects). Perhaps, the best metric for performance of a diagnostic test is area under the curve (AUC) which discriminates those with from those without disease. The AUC value of 0.98 for CT angio makes it a highly discriminant test.

    Hope this is useful.

    I must say that I find your posts very illuminating.
    Thank you

  11. Thanks for clarifying, Sanjay. As you can imagine, you know this literature a lot better than I do.

    As for using other test characteristics, I will at some point talk about them as well here. My sense is that the PPV and NPV may be the most intuitive values for both clinicians and patients, but certainly LR+ and LR- are not that difficult to grasp. As for AUROC, this may be a trickier concept for people, who do not think about this all the time, to wrap their heads around.

    Again, thank you for taking the time to contribute your comments!

  12. I think the default position in cardiology is intervention, not in all of medicine please

    Statistics are great and everything, and I'm all for a Bayesian discussion, but testing is often done because of defensive medicine and perverted incentivisation/conflicts of interest, and a few bad apples see below

    St Joseph Stent Lawyers
    Reviewing Cases for Unneeded
    Stent Implants. Learn More.

  13. I recommend reading Dr. Nissen's 2008 editorial on a study of CT angiography.
    Nissen, Limitations of Computed Tomography Coronary Angiography, J. Am. Coll. Cardiol. 2008;52;2145-2147.

  14. Here's why I performed an unnecessary medical test.