Saturday, February 18, 2012

The implications of a blood test for depression

So, a coupe of days ago we spent considerable (virtual) ink on discussing the risk of a false positive result in the setting of screening for rare events. Today something else has caught my eye: blood test for depression. The Atlantic reported this much-retweeted piece on the same day that I was droning on about lung cancer screening. So what is this about, and does it bear any similarity to what we discussed here?

Let us examine the lede of the Atlantic article:
New research shows that blood screenings can accurately spot multiple telltale biomarkers in patients with classic symptoms of depression.
The writer uses the word "screening" while talking about patients with "classic symptoms of depression." This choice of language is problematic. In clinical medicine the term "screening" generally refers to a population without any signs or symptoms of disease. Think of breast cancer and prostate cancer screening. When symptoms or signs are present, the testing becomes diagnostic, not screening, in its purpose. This difference is actually critical to appreciate in the context of what we talked about in the lung cancer screening post. The presence of signs that make a disease suspect presumably increase the pre-test probability of that disease. This means that the prevalence of this disease is higher in the population that has these particular signs/symptoms than in the overall population without them. And recall that it is this very pre-test probability that drives the predictive value of a positive test. Namely, the higher the pre-test probability of the disease, the more credence we can put in a positive test result.

OK, so let's move on to the data. I went to the primary source, but all I could get to was the abstract (paywall and all), so bear in mind that I do not have all of the data. I am reproducing the abstract here for your convenience:
Despite decades of intensive research, the development of a diagnostic test for major depressive disorder (MDD) had proven to be a formidable and elusive task, with all individual marker-based approaches yielding insufficient sensitivity and specificity for clinical use. In the present work, we examined the diagnostic performance of a multi-assay, serum-based test in two independent samples of patients with MDD. Serum levels of nine biomarkers (alpha1 antitrypsin, apolipoprotein CIII, brain-derived neurotrophic factor, cortisol, epidermal growth factor, myeloperoxidase, prolactin, resistin and soluble tumor necrosis factor alpha receptor type II) in peripheral blood were measured in two samples of MDD patients, and one of the non-depressed control subjects. Biomarkers measured were agreed upon a priori, and were selected on the basis of previous exploratory analyses in separate patient/control samples. Individual assay values were combined mathematically to yield an MDDScore. A ‘positive’ test, (consistent with the presence of MDD) was defined as an MDDScore of 50 or greater. For the Pilot Study, 36 MDD patients were recruited along with 43 non-depressed subjects. In this sample, the test demonstrated a sensitivity and specificity of 91.7% and 81.3%, respectively, in differentiating between the two groups. The Replication Study involved 34 MDD subjects, and yielded nearly identical sensitivity and specificity (91.1% and 81%, respectively). The results of the present study suggest that this test can differentiate MDD subjects from non-depressed controls with adequate sensitivity and specificity. Further research is needed to confirm the performance of the test across various age and ethnic groups, and in different clinical settings.
So, what did they really do? Well, let us go through the info applying the PICO framework. The population (P) is people with a major depressive disorder as diagnosed by clinical criteria. The intervention (I) is the new multi-assay serum test for 9 biomarkers associated with depression. The comparator (C) is the clinical diagnosis of MDD, and the outcome (O) is the concordance of the serum test and the clinical diagnosis. OK so far?

Bear in mind that there were actually two studies, and here is how they played out. For the first study the researchers recruited 36 patients with MDD (disease present) and 43 subjects without MDD (disease absent). Given the sensitivity of 91.7% and specificity of 81.3%, here are the results:

Disease present
Disease absent

Based on these numbers, the positive predictive value is 80.4% and the negative predictive value is 92.1%. What does this mean? This means that in a population with a 45.6% (36/79) prevalence of MDD, only 20% of all positive tests will be false positives, or identifying the disease when it is absent. Conversely, of all negative tests, 8% will be false negative, or missing the disease when it is present. And for the second study, where 34 MDD patients were involved, frankly not enough information is given in the abstract to say anything about it -- I do not have the denominator (the total pool of subjects including those with and without MDD), and therefore cannot say anything about the PPV or NPV.

So what does all of this mean? Well, there are 3 take-home points:
1. When a test is used to diagnose rather than to screen for a disease, you are dealing with a population that has a higher pre-test probability of the disease. So, when the pre-test probability is close to 50%, even a test with suboptimal sensitivity and specificity can be fairly accurate.
2. Your test is only as good as the "gold standard" against which it is being tested. In this case we are talking about a clinical diagnosis of a major depression. The assumption here is that this gold standard test is perfect already. In the absence of anything else to compare it to, it really is: 100% sensitive, 100% specific and quick. How can you improve upon that? And if this is the case, then why do we need a serum test that will give us false results a good part of the time? One argument for this is given here: of the paper’s co-authors said at the very least establishing a physiological link to depression will hopefully get patients to look at their depression as a treatable condition rather than something that’s wrong with their minds. 
But I guess I am not sure that this is really a valid reason for developing a test. It is much like looking for biological mechanism for homosexuality for the purpose of proving that it is OK to be gay. I already know that it is OK, and find its biological origins of mere intellectual curiosity with little practical consequences. Perhaps we just need to change our minds about it, that is all.
3. Finally, would this test be used to screen people for depression? In medicine there is a temptation to go after "the answer" even when the question is rather oblique. In other words, will the testing in the wild of clinical practice really be limited to those with suspected MDD, or is it likely to metastasize into others, those with milder presentations or even those whom the clinician just finds annoying? If it is the latter (and I can almost guarantee that), then we are in deep doo-doo as far as false positive rates are concerned. If you think that we have had an epidemic of depression up until now, just you wait.

My final word for the day is "caution." I want to be very clear that asking scientific questions is never a bad idea, and that the answers do not always have to bring practical or applied value. I just want to inject some caution into the breathless discussion of screening for everything and our dogged search for "hard" evidence.  

No comments:

Post a Comment