Healthcare, etc.: accuracy

Showing posts with label accuracy. Show all posts

Friday, June 29, 2012

Molecular diagnostics: Making the uncertainties more certain?

Scott Hensley over at the NPR's Shots blog posted a story about the recently approved molecular diagnostic test that can rapidly identify several gram-positive bacteria that cause blood stream infections. This is indeed important, since conventional microbiologic techniques rely on bacterial growth, which can take up to 2 to 3 days. This is too long to wait to identify the bug that is the cause of a serious infection. What doctors have done to date is make the best guess based on several factors, including the type of a patient, the source of the infection and the patterns of bacterial resistance at their site, to tailor empiric antibiotic coverage. The sicker the patient, the broader the coverage, until the culture results come back, when the doctor is meant to alter this treatment accordingly, either by narrowing or broadening the spectrum. The pitfalls of this work flow are obvious -- too many points where error can enter the equation. So on the surface the new tests are a great advance. And they actually are, but they are not free of problems, and we need to be very explicit confronting them.

Each diagnostic test can be evaluated on its sensitivity (how well it identifies the problem when the problem exists), specificity (how rarely it identifies a problem when it does NOT exist), and positive (what proportion of all positive tests represents true problem) and negative (what proportion of all negative tests represents a true absence of problem). Sensitivity and specificity are intrinsic properties of the test and can be altered only by how the test is performed. Positive and negative predictive values are dependent not only on the test and how it is done, but also on the population that is getting tested.

Let's take Nanosphere's test in Scott's story. If you trawl the company's web site, you will find that the sensitivity and specificity of this technology is close to 100%, if not 100%, the "gold standard" for comparison being conventional microbiology culture. And perhaps this is really the case in these very specialized hands that were testing the diagnostic. If these characteristics remain at 100%, disregard the rest of this post, please. However, the odds that they will remain at 100% in the wild of clinical practice are slim. But I am willing to give them 99% on each of these characteristics nevertheless.

OK, so now we have a near-perfect test that is available for anyone to use. Imagine that you are an ED doc at the beginning of your shift. An ambulance pulls up and rolls a septic patient into an empty bay. The astute ED nurses rush into settle the patient, and, as a part of the protocol, take a sample of blood for determining the pathogen that is making your patient sick. You quickly start the patient on broad spectrum antibiotics and walk away to take care of the next patient that has just rolled in with a heart attack. A few hours later, the septic patient, who is still in the ED because there are no ICU beds for him yet, is pretty stable, and you get the lab result back: he has MRSA sepsis. You pat yourself on the back because one of the antibiotics that you ordered was vancomycin, which should cover this bug quite adequately. You had also put him on ceftazidime to cover any potential gram-negative critters that may be lurking within as well. Now that you have the data, though, you can stop ceftaz and just continue vanc. The patient finally gets a bed upstairs, and your shift is over and you go home withe a sense of accomplishment.

The next morning you come in refreshed with your double-venti iced macchiato in your hand, sit at the computer and check on the septic patient. You are shocked to find out that last night he decompensated, went into shock and is now requiring breathing assistance and 3 vasopressors to maintain his blood pressure. You scratch your head wondering what happened. Then you come upon this crazy blog post that tells you.

Here is what happened. What you (and these tests) did not take into account is the likelihood of MRSA being the real problem rather than just a decoy false positive. Let's run some numbers. The literature tells us that the likelihood of MRSA causing sepsis is on the order of 5%. Let's create a 2x2 square to figure out what this means for the value of a positive test, shall we?

	MRSA present	MRSA absent	Total
Test +	495	95	590
Test -	5	9405	9410
Total	500	9,500	10,000

What this says is the following. We have 10,000 patients roll into our ED with sepsis (in reality there are about 1/2 million to 1 million sepsis cases in the US annually), and we test them all with this great new test that has 99% sensitivity and 99% specificity. Of these 10,000, ~~fifty~~ five hundred (thanks, Brad, for noticing this error!) are expected to have MRSA. Given this situation, we are likely to get 590 positive tests, of which 95, or 16%, will be false positive. Face-palm, you drop your head on the desk realizing that Mr. Sepsis from yesterday was probably one of these 16 per 100 false positives, and MRSA is probably not the cause of his infection.

You begin to wonder what if your lab really did not get the sensitivity and specificity of 99%, but more like 98%? Still pretty generous, but what if? You start writing madly on a napkin that you grabbed at Starbucks, and your jaw drops when you see your 2x2:

	MRSA present	MRSA absent	Total
Test +	490	190	680
Test -	10	9310	9320
Total	500	9,500	10,000

Wow, you think, this false positive rate is now nearly 30% (190/680)! You can't believe that you could be jeopardizing your patients' lives 3 times out of 10 because you are under the mistaken impression that they have MRSA sepsis. This is unacceptable. But can you really trust yourself with these calculation? You have to do one more thing to convince yourself. What if your lab only gets 97% specificity and sensitivity? What then? You choke when you see the numbers:

	MRSA present	MRSA absent	Total
Test +	485	285	770
Test -	15	9215	9230
Total	500	9,500	10,000

It's and OMG moment -- nearly 40% would be treated for MRSA when they potentially have something else.

But you, my dear reader, realize that in the real world docs are not that likely to remove gram-negative coverage if MRSA shows up as the culprit pathogen. Why should you think otherwise, when there is so much evidence that people are not that great about de-escalating antimicrobial coverage in response to culture data? But then I have to ask you what's the use of this new test if no one will act on it anyway? In other words, how is it expected to help curb the rise of resistance? In fact, given the false positive MRSA rates we see above, might there not even be a paradoxical increase in the proliferation of resistance?

The point is this: We are about to see many new molecular diagnostic technologies on the market that have really really high sensitivity and specificity. The fly in this ointment, of course is the pre-test probability of the bug causing the problem. Look how in a very low risk group (5% MRSA) even a near-perfect test's value of a positive is reduced by almost a ridiculous magnitude. Do feel free to check my math.

So you trudge into the hospital the next day for your shift and check on Mr. Sepsis one more time. Sure enough, his conventional blood culture grew out E. coli, a gram-negative bug. You notice that he is turning around, though, ceftazidime having been restarted by the astute intensivist (well, I am a bit biased here, of course). All is well in the world once again. Except you hear an ambulance pull up and the nurse talking on the phone to the EMTs -- it's another sepsis on your hands. What are you going to do now?

If you like Healthcare, etc., please consider a donation (button in the right margin) to support development of this content. But just to be clear, it is not tax-deductible, as we do not have a non-profit status. Thank you for your support!

Thursday, February 16, 2012

In medicine, beware of what seems too good to be true

Update 2/17/12:
A reader brought to my attention (thanks!) a very slight inaccuracy in the first table below, which I have corrected. I did the calculations in Excel, which, as you may know, likes to round numbers.

File this under "misleading." Here is the story:

What's the Latest Development?

A California start up has developed a breath test that can diagnose lung cancer with a 83 percent accuracy and distinguish between different types of the disease. The procedures which currently exist to test for lung cancer, which is the leading cause of cancer deaths worldwide, result in too many false positives, meaning unnecessary biopsies and radiation imaging. The new devices works by drawing breath "through a series of filters to dry it out and remove bacteria, then [carries it] over an array of sensors."

What's the Big Idea?

The company is now testing a version of the machine 1,000 times more accurate than its latest model, which could increase the accuracy of diagnoses to 90 percent, the level likely needed to take the device to market. Because the machine is not specific to a particular group of chemicals, the breath tester could, in principle, test for any disease that has a metabolic breath signature, for example, tuberculosis. "A breath signature could give a snapshot of overall health," says the company's founder, Paul Rhodes.

Am I just being a luddite by not getting, well, breathless about this? I'll just lay out my argument, and you can be the judge.

There is not doubt that lung cancer is a devastating disease, and we have not done a great job reducing its burden or the associated mortality. However, there are several issues with what is implied above, and some of the assumptions are unclear. First, what does "accuracy" mean? In the world of epidemiology it refers to how well the test identifies true positives and true negatives. If that is in fact what the story means, then 83% may not be bad; we'll regroup on that point at the end late in this post. This brings me to my second point: what is the gold standard that the test is being measured against? In other words, what is it that has the 100% accuracy in lung cancer detection? Is it a chest X-ray, a CT scan, a biopsy, what?

The SEER database, the most rigorous source of cancer statistics in the US, classifies tissue diagnosis as the highest evidence of cancer. However, in some cases a clinical diagnosis is acceptable. The inference of cancer when no tissue is examined is possible when weighing patient risk factors and the behavior of the tumor. So, you see where I am going here? The gold standard is tissue or tumor behavior in a specific patient. Is that what this technology is being measured against? We need to know. And here is another consideration. What if the tissue provides a cancer diagnosis, but the cancer is not likely to become a problem, like in the prostate cancer story, for example?

But all of these issues are but a prelude to what is the real problem with a technology like the one described: the predictive value of a positive test. The story even alludes to this, pointing the finger at other current-day technologies and their rates of false positivity, and away from itself. Yet, in fact, this is the crux of the matter for all diagnostics. Let me show you what I mean.

The incidence of lung cancer in the US is on the order of 60 cases per 100,000 population. Now, let us give this test a huge break and say that it yields (consistently) 99% sensitivity (identifies patients with cancer when cancer is really present) and 99% specificity (identifies patients without cancer when they really do not have cancer). What will this look like numerically given the incidence above if we test 100,000 people?

	Cancer present	Cancer absent	Total
Test +	59	999	1,0589
Test -	1	98,941	98,9421
Total	60	99,940	100,000

If we add up all the "wrong" test results, the false negative (n=1) and the false positives (n=999), we arrive at a 1% "inaccuracy" rate, or 99% accuracy. But what is hiding behind this 99% accuracy is the fact that of all those people with a positive test only a handful, a paltry 6%, actually have cancer. And what does this mean to the other 94%? Additional testing, a lot of it invasive. And what does this testing mean for the healthcare system? You connect the dots.

Let's explore a slightly different scenario. Let us assume that there is a population of patients whose risk for developing lung cancer is 10 times higher than the population average. Let us say that their incidence is 600 cases per 100,000 population. Let us perform the same calculation assigning this same bionic accuracy to the test:

	Cancer present	Cancer absent	Total
Test +	594	994	1,588
Test -	6	98,406	98,412
Total	600	99,400	100,000

The accuracy remains at 99%, but the value of the positive test rises to 37%. Still, 63% of all people testing positive for cancer will go on to unnecessary testing. And imagine the numbers when we try to screen millions of people, rather than just 100,000.

Let us do just one final calculation. Let us reflect the data back to the test in question, where the article claims that the accuracy of the next version of the technology will be 90%. Assuming a high risk population (600 cases per 100,000 population), what does a positive result mean?

	Cancer present	Cancer absent	Total
Test +	540	9,940	10,480
Test -	60	89,460	89,520
Total	600	99,400	100,000

From this table, the accuracy is indeed 90%, concealing the very low value of a positive test of 5%! This means that of the people testing positive for lung cancer with this technology, 95% will be false positives! What is most startling is that to arrive at the same mediocre 37% value for a positive test that we saw above in this population, we would need a population where cancer incidence is a whopping 6,000 per 100,000, or 6%!

I do not want to belabor this issue any further. Screening for disease that is not yet a clinical problem is fraught with many problems, and manufacturers need to be aware of these logic pitfalls. What I have shown you here is that even when the "accuracy" of a test is exquisitely (almost impossibly) high, it is the pre-test probability of, or the patient's risk for the disease that is the overwhelming driver of false positives. Therefore, I give you this conclusion: beware of tests that sound too good to be true -- most of the time they are.

h/t to @gingerly_onward for the story link