Healthcare, etc.: specificity

Showing posts with label specificity. Show all posts

Friday, June 29, 2012

Molecular diagnostics: Making the uncertainties more certain?

Scott Hensley over at the NPR's Shots blog posted a story about the recently approved molecular diagnostic test that can rapidly identify several gram-positive bacteria that cause blood stream infections. This is indeed important, since conventional microbiologic techniques rely on bacterial growth, which can take up to 2 to 3 days. This is too long to wait to identify the bug that is the cause of a serious infection. What doctors have done to date is make the best guess based on several factors, including the type of a patient, the source of the infection and the patterns of bacterial resistance at their site, to tailor empiric antibiotic coverage. The sicker the patient, the broader the coverage, until the culture results come back, when the doctor is meant to alter this treatment accordingly, either by narrowing or broadening the spectrum. The pitfalls of this work flow are obvious -- too many points where error can enter the equation. So on the surface the new tests are a great advance. And they actually are, but they are not free of problems, and we need to be very explicit confronting them.

Each diagnostic test can be evaluated on its sensitivity (how well it identifies the problem when the problem exists), specificity (how rarely it identifies a problem when it does NOT exist), and positive (what proportion of all positive tests represents true problem) and negative (what proportion of all negative tests represents a true absence of problem). Sensitivity and specificity are intrinsic properties of the test and can be altered only by how the test is performed. Positive and negative predictive values are dependent not only on the test and how it is done, but also on the population that is getting tested.

Let's take Nanosphere's test in Scott's story. If you trawl the company's web site, you will find that the sensitivity and specificity of this technology is close to 100%, if not 100%, the "gold standard" for comparison being conventional microbiology culture. And perhaps this is really the case in these very specialized hands that were testing the diagnostic. If these characteristics remain at 100%, disregard the rest of this post, please. However, the odds that they will remain at 100% in the wild of clinical practice are slim. But I am willing to give them 99% on each of these characteristics nevertheless.

OK, so now we have a near-perfect test that is available for anyone to use. Imagine that you are an ED doc at the beginning of your shift. An ambulance pulls up and rolls a septic patient into an empty bay. The astute ED nurses rush into settle the patient, and, as a part of the protocol, take a sample of blood for determining the pathogen that is making your patient sick. You quickly start the patient on broad spectrum antibiotics and walk away to take care of the next patient that has just rolled in with a heart attack. A few hours later, the septic patient, who is still in the ED because there are no ICU beds for him yet, is pretty stable, and you get the lab result back: he has MRSA sepsis. You pat yourself on the back because one of the antibiotics that you ordered was vancomycin, which should cover this bug quite adequately. You had also put him on ceftazidime to cover any potential gram-negative critters that may be lurking within as well. Now that you have the data, though, you can stop ceftaz and just continue vanc. The patient finally gets a bed upstairs, and your shift is over and you go home withe a sense of accomplishment.

The next morning you come in refreshed with your double-venti iced macchiato in your hand, sit at the computer and check on the septic patient. You are shocked to find out that last night he decompensated, went into shock and is now requiring breathing assistance and 3 vasopressors to maintain his blood pressure. You scratch your head wondering what happened. Then you come upon this crazy blog post that tells you.

Here is what happened. What you (and these tests) did not take into account is the likelihood of MRSA being the real problem rather than just a decoy false positive. Let's run some numbers. The literature tells us that the likelihood of MRSA causing sepsis is on the order of 5%. Let's create a 2x2 square to figure out what this means for the value of a positive test, shall we?

	MRSA present	MRSA absent	Total
Test +	495	95	590
Test -	5	9405	9410
Total	500	9,500	10,000

What this says is the following. We have 10,000 patients roll into our ED with sepsis (in reality there are about 1/2 million to 1 million sepsis cases in the US annually), and we test them all with this great new test that has 99% sensitivity and 99% specificity. Of these 10,000, ~~fifty~~ five hundred (thanks, Brad, for noticing this error!) are expected to have MRSA. Given this situation, we are likely to get 590 positive tests, of which 95, or 16%, will be false positive. Face-palm, you drop your head on the desk realizing that Mr. Sepsis from yesterday was probably one of these 16 per 100 false positives, and MRSA is probably not the cause of his infection.

You begin to wonder what if your lab really did not get the sensitivity and specificity of 99%, but more like 98%? Still pretty generous, but what if? You start writing madly on a napkin that you grabbed at Starbucks, and your jaw drops when you see your 2x2:

	MRSA present	MRSA absent	Total
Test +	490	190	680
Test -	10	9310	9320
Total	500	9,500	10,000

Wow, you think, this false positive rate is now nearly 30% (190/680)! You can't believe that you could be jeopardizing your patients' lives 3 times out of 10 because you are under the mistaken impression that they have MRSA sepsis. This is unacceptable. But can you really trust yourself with these calculation? You have to do one more thing to convince yourself. What if your lab only gets 97% specificity and sensitivity? What then? You choke when you see the numbers:

	MRSA present	MRSA absent	Total
Test +	485	285	770
Test -	15	9215	9230
Total	500	9,500	10,000

It's and OMG moment -- nearly 40% would be treated for MRSA when they potentially have something else.

But you, my dear reader, realize that in the real world docs are not that likely to remove gram-negative coverage if MRSA shows up as the culprit pathogen. Why should you think otherwise, when there is so much evidence that people are not that great about de-escalating antimicrobial coverage in response to culture data? But then I have to ask you what's the use of this new test if no one will act on it anyway? In other words, how is it expected to help curb the rise of resistance? In fact, given the false positive MRSA rates we see above, might there not even be a paradoxical increase in the proliferation of resistance?

The point is this: We are about to see many new molecular diagnostic technologies on the market that have really really high sensitivity and specificity. The fly in this ointment, of course is the pre-test probability of the bug causing the problem. Look how in a very low risk group (5% MRSA) even a near-perfect test's value of a positive is reduced by almost a ridiculous magnitude. Do feel free to check my math.

So you trudge into the hospital the next day for your shift and check on Mr. Sepsis one more time. Sure enough, his conventional blood culture grew out E. coli, a gram-negative bug. You notice that he is turning around, though, ceftazidime having been restarted by the astute intensivist (well, I am a bit biased here, of course). All is well in the world once again. Except you hear an ambulance pull up and the nurse talking on the phone to the EMTs -- it's another sepsis on your hands. What are you going to do now?

If you like Healthcare, etc., please consider a donation (button in the right margin) to support development of this content. But just to be clear, it is not tax-deductible, as we do not have a non-profit status. Thank you for your support!

Sunday, December 19, 2010

How e-patients can fix our healthcare system

We got a little into the weeds last week about significance testing and test characteristics. Because information is power, I realized that it may be prudent to back up a bit and do a very explicit primer on medical testing. I am hoping that this will provide some vocabulary for improved patient-clinician communication. But, alas, please do not be surprised if your practitioner looks at you as if you were an alien -- it is safe to say that most clinicians do not think in these terms in their everyday practices. So, educate them!

Let's dig a little deeper into some of the ideas we batted around last week, specifically those pertaining to testing. Let's start by explicitly establishing the purpose of a medical test. The purpose of a medical test is to detect disease when such disease is present. This fact alone should underscore the importance of your physician's ability to arrive at the most likely reasons for your symptoms. This exercise that every doc should go through as he/she is evaluating you is called "differential diagnosis". When I was a practicing MD, my strategy was to come up with 3-5 most likely and 3-5 most deadly if missed potential diagnoses and explore them further with appropriate testing. Arranging these possible diagnoses as a hierarchy can help the clinician to assign informal probabilities to each, a task that is central to Bayesian thinking. From this hierarchy then follows the tactical sequential work-up, avoiding the frantic shotgun approach.

So, having established a hierarchy of diagnoses, we now engage in adjunctive testing. And here is where we really need to be aware not only of our degree of suspicion for each diagnosis, but also the test characteristics as they are reported in the literature and the test characteristics as they exist in the local center where the testing takes place. Why do I differentiate between the literature and practice? We know very well that the mere fact of observation, not to mention experimental cleanliness of trials, often tends to exaggerate the benefits of an intervention. In other words, real world is much messier than the laboratory of clinical research (which of course itself is messy enough). So, it is this compounded messiness that each clinician has to contend with when making testing decisions.

OK, so let us now deconstruct test characteristics even further. We have used the terms sensitivity, specificity, positive and negative predictive values. We've even explored their meanings to an extent. But let's break them down a bit further. Epidemiologists find it helpful to construct 2-by-2 (or 2 x 2) tables to think through some of these constructs, so, let's engage in that briefly. Below you see a typical 2 x 2 table.

In it we traditionally situate disease information in columns and test information in rows. A good test picks up signal when the signal is there while adding minimal noise. The signal is the disease, while the noise is the imprecise nature of all tests. Even simple blood tests, whose "objective accuracy" we take for granted, are subject to these limitations.

It is easiest to think of sensitivity as how well the test picks up the corresponding disease. In the case of mammography from last week, this number is 80%. This means that among 100 women who actually harbor breast cancer a mammogram will recognize 80. This is the "true positive" value, or disease actually present when the test is positive. What about the remaining 20? Well, those real cancers will be missed by this test, and we call them a "false negative". If you look at the 2 x 2 table, it should become obvious that the sum of the true positives and the false negatives adds up to the total number of people with the disease. Are you shocked that our wonderful tests may miss so much disease? Well, stay tuned.

The flip side of sensitivity is "specificity". Specificity refers to whether or not the test is identifying what we think it is identifying. The noise in this value comes from the test in effect hallucinating disease when the person does not have the disease. A test with high specificity will be negative in the overwhelming proportion of people without the disease, so the "true negative" cell of the table will contain almost the entire group of people without disease. Alas, for any test we develop we walk the tight-rope between sensitivity and specificity. That is, depending on our priorities for testing, we have to give up some accuracy in either the sensitivity or the specificity. The more sensitive the test, the higher our confidence that we will not miss the disease when it is there. Unfortunately, what we gain in sensitivity we usually lose in specificity, thus creating higher odds for a host of false positive results. So, there really is no free lunch when it comes to testing. In fact, it is this very tension between sensitivity and specificity that is the crux of the mammography debate. Not as straight-forward as we had thought, right? And this is not even getting into pre-test probabilities or positive and negative predictive values!

Well, let's get into these ideas now. I believe that the positive and negative predictive values of the test are fairly well understood at this point, no? Just to reiterate, a positive predictive value, which is the ratio of true positives to all positive test results (the latter is the sum of the true and false positives, or the sum of the values across the top row of the 2 x 2 table), tells us how confident we can be that a positive test result corresponds to disease being present. Similarly, the negative predictive value, the ratio of true negative test results to all negative test results (again, the latter being the sum across the second row of the 2 x 2 table, or true and false negatives), tells us how confident we can be that a negative test result really represents the absence of disease. The higher the positive and negative predictive values, the more useful the test becomes. However, when one is likely to be quite high but the other quite low, it is a pitfall of our irrationality to rush head first into the test in hopes of obtaining the answer with a high value (as in the case of the negative predictive value for mammography in women aged 40-50 years), since the opposite test result creates a potentially difficult conundrum. This is where pre-test probability of disease comes in.

Now, what is this pre-test probability and how do we calculate it? Ah, this is the pivotal question. The pre-test probability is estimated based on population epidemiology data. In other words, given the type of a person you are (no, I do not mean nice or nasty or funny or droll) in terms of your demographics, heredity, chronic disease burden and current symptoms, what category of risk you fit into based on these population studies of disease. This approach relies on filing you into a particular cubby hole with other subjects whose characteristics are most similar to yours. Are you beginning to appreciate the complexity of this task? And the imprecision of it? Add to this barely functioning crystal ball the clinician's personal cognitive biases, and is it any wonder that we do not do better? And need I even overlay this with another bugaboo, that of the overwhelming amount of information in the face of the incredible shrinking appointment, to demonstrate to you just how NOT straightforward any of this medicine stuff is?

OK, get your fingers out of that Prozac bottle -- it is not all bad! Yes, these are significant barriers to good healthcare. But guess what? The mere fact that you now know these challenges and can call them by their appropriate names gives you more power to be your own steward of your healthcare. Next time a doc appears certain and recommends some sexy new test, you will know that you cannot just say OK and await further results. Your healthcare is a chess match: you and your healthcare provider need to plan 10 moves ahead and play out many different contingencies.

On our end, researchers, policy makers and software developers all need to do better developing more useful and individualizable information, integrating this information into user-friendly systems, and encouraging thoughtful healthcare encounters. I am convinced that patient empowerment with this information followed by teaming up with providers in advocacy in this vein is the only thing that can assure course correction for our mammoth, unruly, dangerous and irrational healthcare system.