Anyhow, I am finally taking a very close look at the much touted Keystone VAP prevention study. I have written quite a bit about VAP prevention here, and my diatribes about the value proposition of "evidence" in this area are well known and tiresome to my reader by now. Yet, I must dissect the most recent installment in this fallacy-laden field, where random chance occurrences and willful reclassifications are deemed causal of dramatic performance improvements.
So, the paper. Here is the link to the abstract, and if you subscribe to the journal, you can read the whole study. But fear not, I will describe it to you in detail.
In its design it was quite similar to the central line-associated blood stream infection prevention study published in the New England Journal in 2006, and similarly the sample frame included Keystone ICUs in Michigan. Now, recall that the reason this demonstration project happened in Michigan is because of their astronomical healthcare-associated infection (HAI) rates. Just to digress briefly, I am sure you have all heard of MRSA; but have you heard of VRSA? VRSA stands for vancomycin-resistant Staphylococcus aureus, MRSA's even more troubling cousin, vancomycin being a drug that MRSA is susceptible to. Now, thankfully, VRSA has not yet emerged as an endemic phenomenon, but of the handful of cases of this virtually untreatable scourge that has been reported, Michigan has had plurality of them. So, you get the picture: Michigan is an outlier (and not in the desirable direction) when it comes to HAIs.
Why is it important to remember Michigan's outlier status? Because of the deceptively simple yet devilishly confounding concept of regression to the mean. The idea is that in an outlier situation, at least some of the effect is due to random luck. Therefore, if the performance of an extreme outlier is measured twice, the second time it will be closer to the population mean just by pure luck alone. But I do not want to get too deeply into this somewhat muddy concept right now -- I will reserve a longer discussion of it for another post. For now I would like to focus on some of the more tangible aspects of the study. As usual, two or three features of the study design reduce substantially the likelihood that the causal inference is correct.
First feature is the training period. Prior to the implementation of the protocol, which by the way consisted of the famous VAP bundle, which we have discussed on this blog ad nauseam, there was intensive educational training of the personnel on a "culture of change", as well as the proper definitions of the interventions and outcomes. It is at this time that the "trained hospital infection prevention personnel" were intimately focused on the definition of VAP that they were using. And even though the protocol states that the surveillance definition of VAP would not change throughout the study period, what are the chances that this intensified education and emphasis did not alter at least some of the classification practices?
Skeptical? Good. Here is another piece of evidence supporting my stance. A study from Michael Klompas from Harvard examined inte-rater variability in the assessment of VAP looking at the same surveillance definition applied in the Keystone (and many other) study. Here is what he wrote:
Three infection control personnel assessing 50 patients for VAP disagreed on 38% of patients and reported an almost 2-fold variation in the total number of patients with VAP. Agreement was similarly limited for component criteria of the CDC VAP definition (radiographic infiltrates, fever, abnormal leukocyte count, purulent sputum, and worsening gas exchange) as well as on final determination of whether VAP was present or absent.And here is his conclusion:
High interobserver variability in the determination of VAP renders meaningful comparison of VAP rates between institutions and within a single institution with multiple observers questionable. More objective measures of ventilator-associated complication rates are needed to facilitate benchmarking and quality improvement efforts.Yet, the Keystone team writes this in their Methods section:
Using infection preventionists minimized the potential for diagnosis bias because they are trained to conduct surveillance for VAP and other healthcare-associated infections by using standardized definitions and methods provided by the CDC in its National Healthcare Safety Network (NHSN).Really? Am I cynical to invoke circular reasoning here? Have I convinced you yet that CAP diagnosis is a moving target? And as such it can be moved by cognitive biases, such as the one introduced by the pre-implementation training of study personnel? No? OK, consider this additional piece from the Keystone study. The investigators state that "teams were instructed to submit at least 3 months of baseline VAP data." What they do not state is whether this was a retrospective collection or a prospective one, and this matters a little. First, retrospective reporting in this case would be a lot more representative of what has been, since these rates of VAP are already recorded for posterity and cannot presumably be altered. On the other hand, if the reporting is prospective, I can still conceive of ways to introduce a bias into this baseline measure. Imagine, if you will, that you are employed by a hospital that is under scrutiny for a particular transgression, and that you know the hospital will look bad if you do not demonstrate improvement following a very popular and "common-sense" intervention. Might you be a tad more liberal with identifying these transgressive episodes in your baseline period that after the intervention has been instituted? This is a subtle, yet all too real conflict of interest, which, as we know so well, can introduce a substantial bias into any study. Still don's believe me? OK, come to my office after school and we will discuss. In the meantime, let's move on.
The next nugget is in the graph in Figure 1, where VAP trends over the pre-specified time periods are plotted (you can find the identical graph in this presentation on slide #20). Look at the mean, rather than the median line. (The reason I want you to look at the mean is that the median is zero, and therefore not credible. Additionally, if we want to assess the overall impact of the intervention, we need to be embracing the outliers, which the median ignores). What is tremendously interesting to me is that there is a precipitous drop in VAP during the period called "intervention", followed by much smaller fluctuations around the new mean across the subsequent time periods. This to me confirms the high probability of reclassification (and Hawthorne effect), rather than an actual improvement in VAP rates, as the cause of the drop.
Another piece of data makes me think that it was not the bundle that "did it." Figure 2 in the paper depicts the rates of compliance with all 5 of the bundle components in the corresponding time periods. Again, here as in the VAP rates graph, the greatest jump in adherence to all 5 strategies is observed in the intervention period. However, there is still a substantial linear increase in this metric between the intervention period and through to 25-27 months period. Yet, looking back at the VAP data, no such robust commensurate reduction is observed. While this is somewhat circumstantial, it makes me that much more wary of trusting this study.
So, does this study add anything to our understanding of what bundles do for VAP prevention? I would say not, and it actually muddies the waters. What would have been helpful to see is whether any of the downstream outcomes, such as antibiotics administration, time on the ventilator and length of stay were impacted. Without impacting these outcomes, our efforts are Quixotic, merely swinging at windmills, mistaking them for a real threat.