Showing posts with label VAP. Show all posts
Showing posts with label VAP. Show all posts

Wednesday, February 15, 2012

Big changes in the world of VAP?

As you may or may not be aware, the four main professional societies in the US that include large critical care constituencies, AACN, Chest, SCCM and ATS, have created something called a Critical Care Societies Collaborative (CCSC). Its purpose is essentially to give the critical care community a voice in shaping public policy. And, as you can imagine, one of the current-day issues they are tackling is performance measures.

Now, on this web site we have spent a lot of virtual ink talking about quality metrics, particularly where ventilator-associated pneumonia, or VAP, is concerned. Well, I am happy to report that finally, a strong voice in Critical Care medicine is in agreement with what we have been saying: the VAP bundle needs to go! In fact, here is the sum of the recommendations made to the National Quality Forum in a letter dated February 9, 2012, from the CCSC leaders about VAP (emphasis mine):
The Task Force felt that the VAP “care bundle” minimizes the importance of each individual component measure and neglects the fact that many elements of the existing VAP bundle are known to have important effects outside of VAP reduction, including improved patient survival. The task force also notes that one of the components of the VAP care bundle, stress ulcer prophylaxis, may actually increase the risk of VAP.51Therefore, the Task Force would like to make the following recommendations regarding measure gaps related to VAP:(1) Dissolve the VAP care bundle and instead develop a new group of quality measures related to general evidencebased practices for patients requiring mechanical ventilation (described above.) These potential measure gaps would include care processes known to reduce morbidity and mortality in patients who are ventilated.
(2) Develop measures using the VAPspecific measure gaps supported by recent guidelines.52,53 These may include measures for the following evidencedbased practices:
• Orotracheal rather than nasotracheal intubation to prevent VAP54;
• Subglottic secretion drainage to prevent VAP55;
• Elevating the head of bed to 45 degrees to prevent VAP56;
• Oral antiseptic administration to prevent VAP57;
• When empiric antibiotics are used to treat VAP, initial treatment based on qualitative endotracheal aspirates rather than quantitative bronchoscopic aspirates58; and
• No more than an 8day course of antibiotics as treatment for uncomplicated VAP.59
All of these VAP prevention strategies are supported by randomizedcontrolled trials. However, not all have favorable costbenefit profiles, and all have significant barriers, which may make widespread adoption unfeasible. Although we list them all here, we note that all may not be good quality measures.
So, here it is -- the recommendation. But will it be followed? When I was at SCCM, I heard a presentation that talked about some new metrics being developed by the CCSC in collaboration with the CDC, which will likely replace VAP as the focus of mechanical ventilation complications. I am in the process of learning more about these developments even as we speak, and will update my readers on what I learn. Suffice it to say, change is coming to the world of VAP. And it's about time. 

Friday, July 29, 2011

Quality measures: Process, outcome, or both?

In the last week I wrote about our quality improvement, or QI, efforts in healthcare. And although there is a burgeoning field representing itself as the "science" of QI, I question much of its scientific validity. As always, VAP is my poster child for these discussions, where neither the definition of the condition itself nor its prevention efforts are subject to much scientific scrutiny. This makes VAP have a surreal, ghost-like quality: now you see it, now you don't. And this alone makes it difficult to assess prevention efforts. Much as in the heated mammography debate, where passionate anecdote prevails, the sanctity of the QI rubric blunts the usual critical approach to the data.

So, the central point that I made in this post was essentially to devalue the VAP eradication efforts as not grounded in solid scientific evidence. What has occurred to me, however, is that this position may be in fact at odds with a realization I blogged about here and here, wherein I agreed with Dan Arieli's suggestion that outcomes in the real world, where they are influenced by so much randomness, are not the thing to reward. It would be much more rational to reward best efforts at best results, thus the process rather than the outcome. So, here is the apparent contradiction: On the one hand I agree that outcomes may be too unpredictable, being that they are influenced by too many factors that are not in our control, yet I am also advocating that we start measuring such outcomes as antibiotic use associated with VAP and its reduction. What gives?

Well, on the one hand, I am OK with contradiction; life is full of instances where we have to hold conflicting information and feelings together. But as a scientist it is my predisposition to analyze (which literally means splitting into smaller, more manageable chunks), so I have given this ostensible paradox more thought. What I came up with is that measuring process is the right thing to do, but only under very specific conditions. Avedis Donabedian, who is considered the father of quality science, introduced the triad of structure-process-outcome as the backbone of quality science. This relationship certainly lends validity to the "process" metrics as surrogates for "outcome." But the condition that has to be met is that there be an actual correlation between the process and the said outcome. If there is no such solid correlation, then we are simply going through the motions, doing a rain dance to cause rain.

So, what I have said about VAP prevention in particular is that we are nowhere near being able to say that the recommended processes correlate with any changes in meaningful clinical outcomes. And because the data on these interventions are so weak, throwing massive resources behind implementing them is irrational and resembles religious fervor more than scientific pragmatism.

It is entirely understandable that we would jump on this bandwagon so rapidly, given the magnitude of harm in our healthcare system combined with the need to reign in the healthcare spending. But there is a more subtle point to be made here too. It relates to the fertile soil of our American psyche, where doing something is always perceived as better than thinking about our course of action, which is frequently referred to with contempt as "doing nothing." In the end, this crisis response mentality is good in a crisis, but potentially detrimental in the long term: we are unlikely to be altering meaningful outcomes, and we are spending billions of dollars on interventions lacking evidence.

So, I stand behind both of my assertions and maintain that they are not mutually exclusive. Yes, outcomes are subject to much randomness; yes, processes known to alter these outcomes are the sensible measures of our efforts to improve quality; and yes, these processes need first to be rigorously validated for their impact on the outcomes in question. Anything short of this pathway is not just a waste of our collective resources, but a manipulation of the public trust. And that is as far from the intent of science as it can get.

Tuesday, July 26, 2011

Tipping a sacred cow: QI under the microscope

So much media and journal space has been devoted to financial conflicts of interest, particularly within and related to pharma and device manufacturers, that to write any more about it may be redundant. On this site we have also intermittently addressed COI from other perspectives, such as financial interest of the members of the American College of Radiology in maintaining mammography screening status quo, thinly veiled in its own version of the pernicious "death panel" language. We have also spoken a bit about the non-financial COI. And even though we are so very much aware of COI's potential to lurk around every corner, there are still some surprises.

Take the sacred cow of "quality improvement" in healthcare. Even the name, much like the "pro life" moniker, suggests that it is untouchable in its purity and nobility of purpose. So necessary is it because of the epic magnitude of morbidity and mortality attributed to healthcare itself, that the billions of dollars spent on it seem unquestionably justified. Indeed, much like our public education system, the QI movement garners higher and higher allocations simply due to the sheer face validity of the assumption that more of it is better. And the most fascinating aspect is that, in our current zeal for sensible economic allocation through evidence, QI, much like education, appears immune to scrutiny. This is the very definition of politics driving policy.

I return to the case of ventilator-associated pneumonia, or VAP, as the poster child for this movement. I have already alluded to the fact that definitionally VAP is a slippery slope: its diagnosis varies based not only on the tools used to diagnose it, but also depending on who is doing the diagnosing. Yes, indeed, what one clinician calls VAP another may call absence of VAP. I have also dissected the weak evidence behind some of the strongest purportedly evidence-based recommendations aimed at VAP prevention. But what if VAP itself is the wrong endpoint? What if we are spending untold dollars and other resources on a futile pursuit?

Do you feel yourself bristling yet? If you said "yes", it is a normal response I get from my colleagues and people who read my scholarly papers. Because how can anyone be against QI? Well, I am not against QI. I am simply against sanctifying QI as a sacred cow and thus shielding it from a sensible and rational evaluation.

So, if you are over the initial shock, allow me to explain myself. I am sure you have heard of surrogate endpoints. Here is a definition from Wikipedia:

In clinical trials, a surrogate endpoint (or marker) is a measure of effect of a certain treatment that may correlate with a real clinical endpoint but doesn't necessarily have a guaranteed relationship. The National Institutes of Health (USA) defines surrogate endpoint as "a biomarker intended to substitute for a clinical endpoint".[1][2]
Surrogate markers are used when the primary endpoint is undesired (e.g., death), or when the number of events is very small, thus making it impractical to conduct a clinical trial to gather a statistically significant number of endpoints. The FDA and other regulatory agencies will often accept evidence from clinical trials that show a direct clinical benefit to surrogate markers. [3]
This begs the question of what constitutes a "real" clinical endpoint. Well, in my simplemindedness I think of them as endpoints that matter to the patient or in the long run. So, death, disability, quality of life, functionality, these are the real endpoints. Something that alters one's life or threatens it is a real endpoint. Thus, blood pressure and cholesterol are surrogate endpoints, since they usually, but not always, correlate with the risk of a myocardial infarction or death. But what if such a correlation did not exist? Furthermore, what if a cholesterol level was measured with, say, tea leaves, and therefore was subject to a tremendous variation in detection? Would we then spend hundreds of billions of dollars on trying to alter this factor or would we calmly and rationally walk away and look for something that truly impacts the real outcome of a heart attack or death? I think I am making my point fairly clearly.


Let me explain why I think that VAP is but a surrogate outcome, and, given its diagnostic challenges, not a sensible one in the least. VAP by definition occurs in patients on mechanical ventilation (breathing machine), whose quality of life is fairly badly damaged in the short term. The literature would suggest that not all VAP impacts mortality adversely, but some forms of VAP indeed do, particularly VAP that develops late in the course of illness. So in this VAP does correlate with a real endpoint. Also, there is very little doubt that getting VAP prolongs one's dependence on mechanical ventilation, and increases the duration of the stay in the ICU and hospital overall. So, this can be considered not a very good, albeit real, outcome. An additional point to remember is that VAP engenders the use of additional, usually broad spectrum, antibiotics, putting both the individual and the society at risk for such unwanted consequences as the emergence of highly resistant microorganisms.


So, even though VAP is a surrogate endpoint, it certainly seems to fit the bill for something we would want to prevent. But here is the monkey wrench in this argument: what seem to be great surrogate endpoints do not always end up correlating with clinical reality. The association of VAP with morbidity and mortality has been detected in mostly retrospective observational studies. Trials of VAP prevention rarely, if ever, report any endpoint other than VAP. And, given how elusive VAP diagnosis is, there is plenty of room for gamesmanship so pervasive in the real world to make any data fit our preconceived hypotheses and political needs. 


So, what is my point? My point is that if QI wants to be a science, it needs to be subject to the same rules that all other science is guided by. Since we do not even know how much money we are spending on the ubiquitous QI efforts (likely hundreds of billions), and since we are not sure what they are accomplishing (see my many prior posts on the lack of validity of current claims in VAP prevention), we need to pause and ask ourselves whether the cheering alone justifies such an investment. I hate to say it, but can we really trust those with most to lose, financially and politically, if in reality QI does little more than lather the masses, to be the oracles of truth about the results of these efforts? The cognitive biases alone should disqualify them from being the arbiters of their own success. So, if we do not want to continue to indulge the principle of diminishing returns in QI, we need to take a sober look at what we have invested and what this investment has accomplished. Then and only then can we claim to practice evidence- rather than politics-or dogma-based policy.

Saturday, April 2, 2011

Invalidated Results Watch, Ivan?

My friend Ivan Oransky runs a highly successful blog called Retraction Watch; if you have not yet discovered it, you should! In it he and his colleague Adam Marcus document (with shocking regularity) retractions of scientific papers. While most of the studies are from the bench setting, some are in the clinical arena. One of the questions they have raised is what should happen with citations of these retracted studies by other researchers? How do we deal with this proliferation of oftentimes fraudulent and occasionally simply mistaken data?

A more subtle but no less difficult conundrum arises when papers cited are recognized to be of poor quality, yet they are used to develop defense for one's theses. The latest case in point comes from the paper I discussed at length yesterday, describing the success of the Keystone VAP prevention initiative. And even though I am very critical of the data, I do not mean to single out these particular researchers. In fact, because I am intimately familiar with the literature in this area, I can judge what is being cited. I have seen similar transgressions from other authors, and I am sure that they are ubiquitous. But let me be specific.

In the Methods section on page 306, the investigators lay out the rationale for their approach (bundles) by stating that the "ventilator care bundle has been an effective strategy to reduce VAP..." As supporting evidence they cite references #16-19. Well, it just so happens that these are the references that yours truly had included in her systematic review of the VAP bundle studies, and the conclusions of that review are largely summarized here. I hope that you will forgive me for citing myself again:
A systematic approach to understanding this research revealed multiple shortcomings. First, since all of the papers reported positive results and none reported negative ones, there is a potential for publication bias. For example, a recent story in a non-peer-reviewed trade publication questioned the effectiveness of bundle implementation in a trauma ICU, where the VAP rate actually increased directionally from 10 cases per 1,000 MV days in the period before to 11.9 cases per 1,000 MV days in the period after implementation of the bundle (24). This was in contradistinction to the medical ICU in the same institution, which achieved a reduction from 7.8 to 2.0 cases per 1,000 MV days with the same intervention (24). Since the results did not appear in a peer-reviewed form, it is difficult to judge the quality or significance of these data; however, the report does highlight the need for further investigation, particularly focusing on groups at heightened risk for VAP, such as trauma and neurological critically ill (25).             
Second, each of the four reported studies suffers from a great potential for selection bias, which was likely present in the way VAP was diagnosed. Since all of the studies were naturalistic and none was blinded, and since all of the participants were aware of the overarching purpose of the intervention, the diagnostic accuracy of VAP may have been different before as compared to after the intervention. This concern is heightened by the fact that only one study reports employing the same team approach to VAP identification in the two periods compared (23). In other studies, although all used the CDC-NNIS VAP definition, there was either no reporting of or heterogeneity in the personnel and methods of applying these definitions. Given the likely pressure to show measurable improvement to the management, it is possible that VAP classification suffered from a bias. 
Third, although interventional in nature, naturalistic quality improvement studies can suffer from confounding much in the same way that observational epidemiologic studies do. Since none of the studies addressed issues related to case mix, seasonal variations, secular trends in VAP, and since in each of the studies adjunct measures were employed to prevent VAP, there is a strong possibility that some or all of these factors, if examined, would alter the strength of the association between the bundle intervention and VAP development. Additional components that may have played a role in the success of any intervention are the size and academic affiliation of the hospital. In a study of interventions aimed at reducing the risk of CRBSI, Pronovost et al. found that smaller institutions had a greater magnitude of success with the intervention than their larger counterparts (26). Similarly, in a study looking at an educational program to reduce the risk of VAP, investigators found that community hospital staff were less likely to complete the educational module than the staff at an academic institution; in turn, the rate of VAP was correlated with the completion of the educational program (27). Finally, although two of the studies included in this review represent data from over 20 ICUs each (20, 22), the generalizability of the findings in each remains in question. For example, the study by Unahalekhaka and colleagues was performed in the institutions in Thailand, where patient mix and the systems of care for the critically ill may differ dramatically from those in the US and other countries in the developed world (22). On the other hand, while the study by Resar and coworkers represents a cross section of institutions within the US and Canada, no descriptions are given of the particular ICUs with respect to the structure and size of their institutions, patient mix or ICU care model (e.g., open vs. closed; intensivists present vs. intensivists absent, etc.) (20). This aggregate presentation of the results gives one little room to judge what settings may benefit most and least from the described interventions. The third study includes data from only two small ICUs in two community institutions in the US (21), while the remaining study represents a single ICU in a community hospital where ICU patients are not cared for by an intensivist (23).  Since it is acknowledged that a dedicated intensivist model leads to improved ICU outcomes (28, 29), the latter study has limited usefulness to institutions that have a more rigorous ICU care model.
OK, you say, maybe the investigators did not buy into my questions about the validity of the "findings." Maybe not, but evidence suggests otherwise. In the Discussion section on page 311 they actually say
While the bundle has been published as an effective strategy for VAP prevention and is advocated by national organizations, there is significant concern about its internal validity.
And guess what they cite? Yup, you guessed it, the paper excerpted above. So, to me it feels like they are trying to have it both ways -- the evidence FOR implementing the bundle is the same evidence AGAINST its internal validity. Much like Bertrand Russell, I am not that great at dealing with paradoxes. Will this contradiction persist in our psyche, or will sense prevail? Perhaps Ivan and Adam need to start a new blog: Invalidated Results Watch. Oh? Did you say that peer review is supposed to be the answer to this? Right.  
    

Friday, April 1, 2011

Another swing at the windmill of VAP

Sorry, folks, but I have been so swamped with work that I have been unable to produce anything cogent here. I see today as a gift day, as my plans to travel to SHEA were foiled by mother nature's sense of humor. So, here I am trying to catch up on some reading and writing before the next big thing. To be sure, I have not been wasting time, but have completed some rather interesting analyses and ruminations, which, if I am lucky, I will be able to share with you in a few weeks.

Anyhow, I am finally taking a very close look at the much touted Keystone VAP prevention study. I have written quite a bit about VAP prevention here, and my diatribes about the value proposition of "evidence" in this area are well known and tiresome to my reader by now. Yet, I must dissect the most recent installment in this fallacy-laden field, where random chance occurrences and willful reclassifications are deemed causal of dramatic performance improvements.

So, the paper. Here is the link to the abstract, and if you subscribe to the journal, you can read the whole study. But fear not, I will describe it to you in detail.

In its design it was quite similar to the central line-associated blood stream infection prevention study published in the New England Journal in 2006, and similarly the sample frame included Keystone ICUs in Michigan. Now, recall that the reason this demonstration project happened in Michigan is because of their astronomical healthcare-associated infection (HAI) rates. Just to digress briefly, I am sure you have all heard of MRSA; but have you heard of VRSA? VRSA stands for vancomycin-resistant Staphylococcus aureus, MRSA's even more troubling cousin, vancomycin being a drug that MRSA is susceptible to. Now, thankfully, VRSA has not yet emerged as an endemic phenomenon, but of the handful of cases of this virtually untreatable scourge that has been reported, Michigan has had plurality of them. So, you get the picture: Michigan is an outlier (and not in the desirable direction) when it comes to HAIs.

Why is it important to remember Michigan's outlier status? Because of the deceptively simple yet devilishly confounding concept of regression to the mean. The idea is that in an outlier situation, at least some of the effect is due to random luck. Therefore, if the performance of an extreme outlier is measured twice, the second time it will be closer to the population mean just by pure luck alone. But I do not want to get too deeply into this somewhat muddy concept right now -- I will reserve a longer discussion of it for another post. For now I would like to focus on some of the more tangible aspects of the study. As usual, two or three features of the study design reduce substantially the likelihood that the causal inference is correct.

First feature is the training period. Prior to the implementation of the protocol, which by the way consisted of the famous VAP bundle, which we have discussed on this blog ad nauseam, there was intensive educational training of the personnel on a "culture of change", as well as the proper definitions of the interventions and outcomes. It is at this time that the "trained hospital infection prevention personnel" were intimately focused on the definition of VAP that they were using. And even though the protocol states that the surveillance definition of VAP would not change throughout the study period, what are the chances that this intensified education and emphasis did not alter at least some of the classification practices?

Skeptical? Good. Here is another piece of evidence supporting my stance. A study from Michael Klompas from Harvard examined inte-rater variability in the assessment of VAP looking at the same surveillance definition applied in the Keystone (and many other) study. Here is what he wrote:
Three infection control personnel assessing 50 patients for VAP disagreed on 38% of patients and reported an almost 2-fold variation in the total number of patients with VAP. Agreement was similarly limited for component criteria of the CDC VAP definition (radiographic infiltrates, fever, abnormal leukocyte count, purulent sputum, and worsening gas exchange) as well as on final determination of whether VAP was present or absent.
And here is his conclusion:
High interobserver variability in the determination of VAP renders meaningful comparison of VAP rates between institutions and within a single institution with multiple observers questionable. More objective measures of ventilator-associated complication rates are needed to facilitate benchmarking and quality improvement efforts. 
Yet, the Keystone team writes this in their Methods section:
Using infection preventionists minimized the potential for diagnosis bias because they are trained to conduct surveillance for VAP and other healthcare-associated infections by using standardized definitions and methods provided by the CDC in its National Healthcare Safety Network (NHSN).
Really? Am I cynical to invoke circular reasoning here? Have I convinced you yet that CAP diagnosis is a moving target? And as such it can be moved by cognitive biases, such as the one introduced by the pre-implementation training of study personnel? No? OK, consider this additional piece from the Keystone study. The investigators state that "teams were instructed to submit at least 3 months of baseline VAP data." What they do not state is whether this was a retrospective collection or a prospective one, and this matters a little. First, retrospective reporting in this case would be a lot more representative of what has been, since these rates of VAP are already recorded for posterity and cannot presumably be altered. On the other hand, if the reporting is prospective, I can still conceive of ways to introduce a bias into this baseline measure. Imagine, if you will, that you are employed by a hospital that is under scrutiny for a particular transgression, and that you know the hospital will look bad if you do not demonstrate improvement following a very popular and "common-sense" intervention. Might you be a tad more liberal with identifying these transgressive episodes in your baseline period that after the intervention has been instituted? This is a subtle, yet all too real conflict of interest, which, as we know so well, can introduce a substantial bias into any study. Still don's believe me? OK, come to my office after school and we will discuss. In the meantime, let's move on.

The next nugget is in the graph in Figure 1, where VAP trends over the pre-specified time periods are plotted (you can find the identical graph in this presentation on slide #20). Look at the mean, rather than the median line. (The reason I want you to look at the mean is that the median is zero, and therefore not credible. Additionally, if we want to assess the overall impact of the intervention, we need to be embracing the outliers, which the median ignores). What is tremendously interesting to me is that there is a precipitous drop in VAP during the period called "intervention", followed by much smaller fluctuations around the new mean across the subsequent time periods. This to me confirms the high probability of reclassification (and Hawthorne effect), rather than an actual improvement in VAP rates, as the cause of the drop.

Another piece of data makes me think that it was not the bundle that "did it." Figure 2 in the paper depicts the rates of compliance with all 5 of the bundle components in the corresponding time periods. Again, here as in the VAP rates graph, the greatest jump in adherence to all 5 strategies is observed in the intervention period. However, there is still a substantial linear increase in this metric between the intervention period and through to 25-27 months period. Yet, looking back at the VAP data, no such robust commensurate reduction is observed. While this is somewhat circumstantial, it makes me that much more wary of trusting this study.

So, does this study add anything to our understanding of what bundles do for VAP prevention? I would say not, and it actually muddies the waters. What would have been helpful to see is whether any of the downstream outcomes, such as antibiotics administration, time on the ventilator and length of stay were impacted. Without impacting these outcomes, our efforts are Quixotic, merely swinging at windmills, mistaking them for a real threat.


       

          

Friday, February 25, 2011

Guidelines: What really constitutes level I evidence?

There has been some interesting buzz in the blogosphere about where evidence-based guideline recommendations come from, and I wanted to add a little fuel to that fire today.

As you know, I think a lot about the nature of evidence, about the "science" in clinical science, and about pneumonia, specifically ventilator-associated pneumonia or VAP. Last week I wrote here and here about a specific recommended intervention to prevent VAP consisting of semi-recumbent, as opposed to supine, positioning. This recommendation, one of 21 maneuvers aimed at modifiable risk factors for VAP, had level I evidence behind it. Given my recent deconstruction of this level I evidence, consisting of a single unblinded RCT in a single academic urban center in Spain, and given that we already know that level I data represent a very small proportion of all the evidence behind guideline recommendations, I got curious about this level I stuff. How is level I really defined? Is there a lot of room for subjective judgment? So, I went to the source.

In its HAP/VAP guideline, the ATS and IDSA committee define the levels of evidence in the following way:
Level I (high)
Level II (moderate) 








Level III (low)
     Evidence comes from well conducted, randomized controlled trials


Evidence comes from well designed, controlled trials without randomization (including cohort, patient series, and case-control studies). Level II studies also include any large case series in which systematic analysis of disease patterns and/or microbial etiology was conducted, as well as reports of new therapies that were not collected in a randomized fashion

Evidence comes from case studies and expert opinion. In some instances therapy recommendations come from antibiotic susceptibility data without clinical observations
So, well conducted, randomized controlled trials. But what does "well conducted" mean? Seems to me that one person's well conducted may be another person's garbage. Well, I went to the text of the document for clarification:
The grading system for our evidence-based recommendations was previously used for the updated ATS Community-acquired Pneumonia (CAP) statement, and the definitions of high-level (Level I), moderate-level (Level II), and low-level (Level III) evidence are summarized in Table 1 (8). 
OK, then. We have to go to reference #8, or the CAP guideline to get to the bottom of the definition. And here is what that document states:
Therefore, in grading the evidence supporting our recommendations, we used the following scale, similar to the approach used in the recently updated Canadian CAP statement (46): Level I evidence comes from well-conducted randomized controlled trials; Level II evidence comes from well-designed, controlled trials without randomization (including cohort, patient series, and case control studies); Level III evidence comes from case studies and expertopinion. Level II studies included any large case series in which systematic analysis of disease patterns and/or microbial etiology was conducted, as well as reports of new therapies that were not collected in a randomized fashion. In some instances therapy recommendations come from antibiotic susceptibility data, without clinical observations, and these constitute Level III recommendations.
Again, we are faced with the nebulous "well-conducted" descriptor with no further defining guidance on how to discern this quality. I resigned myself to going to the next source citation, #46 above, the Canadian CAP statement:
We applied a hierarchical evaluation of the strength of evidence modified from the Canadian Task Force on the Periodic Health Examination [4]. Well-conducted randomized, controlled trials constitute strong or level I evidence; well-designed controlled trials without randomization (including cohort and case-control studies) constitute level II or fair evidence; and expert opinion, case studies, and before-and-after studies are level III (weak) evidence. Throughout these guidelines, ratings appear as roman numerals in parentheses after each recommendation.
Another "well-conducted" construct, another reference, another wild goose chase. The reference #4 above clarified the definition for me thus:
OK, so, now we have "at least one properly randomized controlled trial." So, having gotten to the origin of this broken telephone game, it looks like proper randomization trumps all other markers for a well-done trial. The price of such neglect is giving up generalizability, confirmation, appropriate analyses, and many other important properties that need to be evaluated before stamping the intervention with a seal of approval. 

And this is just one guideline for one syndrome. The bigger point that I wanted to illustrate is that, even though we now know that only 14% of all IDSA guideline recommendations have so-called level I evidence behind them, what is dubious is the value and validity of assigning this highest level of evidence to these recommendations, given the room for subjectivity and misclassification. So, what does all of this mean? Well, for me it means no foreseeable shortage of fodder for blogging. But for our healthcare policy and our public's health? Big doo-doo.

Tuesday, February 15, 2011

The rose-colored glasses of early trial termination

The other day I did a post on semi-recumbent positioning to prevent VAP. The point I wanted to make was that an already existing quality measure for a condition that is well on its way to becoming a CMS "never event" is based on one unreplicated single-center small unblinded randomized controlled trial that was terminated early for efficacy. In my post I cited several issues with the study that question its validity. Today I want to touch upon the issue of early termination, which in and of itself is problematic.

What is early termination? It is just that: stopping the trial before enrolling the pre-planned number of subjects. First, it is important to be explicit in the planning phases about how many subjects will need to be enrolled. This is known as the power calculation and is based on the anticipated effect size and the uncertainty in this effect. Termination can happen for efficacy (the intervention works so splendidly that it becomes unethical not to offer it to everyone), safety (the intervention is so dangerous that it becomes unethical to offer it to anyone) or for other reasons (e.g., the recruitment is taking too long, etc.).

Who makes the decision to terminate early and how is the decision made? Well, under the best of circumstances, there is a Data Safety Monitoring Board, a body that is specifically in place to look at the data at certain points in the recruitment process and look for certain pre-specified differences between groups. This DSMB is fire-walled from both the investigators and the patients. The interim looks at the data  should be pre-specified by the protocol also, as the number of these looks actually influences the initial power calculation, since the more you look, the more differences you are likely to find by chance alone.

So, without going into too much detail on these interim looks, understand that they are not to be taken lightly, and their conditions and reporting require full transparency. To their credit, the semi-recumbent position investigators reported their plan for one interim analysis upon reaching 50% enrollment. Neither the Methods section nor the Acknowledgements, however, specify who was the analyst and the decision-maker. Most likely it was the investigators themselves that ended up taking the look and deciding on the subsequent course of action. And this itself is not that methodologically clean.

Now, let's talk about one problem early termination. This gargantuan effort led by the team from McMaster in Canada and published last year in JAMA sheds the needed light on what had been suspected before: early termination leads to inflated effect estimates. The sheer massiveness of the work done is mind boggling -- over 2,500 studies were reviewed! The investigators elegantly paired meta-analyses of truncated RCTs with meta-analyses of matched but nontruncated ones, and compared the magnitude of the inter-group differences between the two categories of RCTs. Here is one interesting tidbit (particularly for my friend @ivanoransky):
Compared with matching nontruncated RCTs, truncated RCTs were more likely to be published in high-impact journals (30% vs 68%, P<.001).
But here is what should really grab the reader:

Of 63 comparisons, the ratio of RRs was equal to or less than 1.0 in 55 (87%); the weighted average ratio of RRs was 0.71 (95% CI, 0.65-0.77; P <.001)(FIGURE2). In 39 of 63 comparisons (62%), the pooled estimates for nontruncated RCTs were not statistically significant. Comparison of the truncated RCTs with all RCTs (including the truncated RCTs) demonstrated a weighted average ratio of RRs of 0.85; in 16 of 63 comparisons (25%), the pooled estimate failed to demonstrate a significant effect. [Emphasis mine]
The authors went on to conclude the following:

In this empirical study including 91 truncated RCTs and 424 matching nontruncated RCTs addressing 63 questions, we found that truncated RCTs provide biased estimates of effects on the outcome that precipitated early stopping. On average, the ratio of RRs in the truncated RCTs and matching nontruncated RCTs was 0.71. This implies that, for instance, if the RR from the nontruncated RCTs was 0.8 (a 20% relative risk reduction), the RR from the truncated RCTs would be on average approximately 0.57 (a 43% relative risk reduction, more than double the estimate of benefit). Nontruncated RCTs with no evidence of benefit—ie, with an RR of 1.0—would on average be associated with a 29% relative risk reduction in truncated RCTs addressing the same question.

So, what does this mean? It means that truncated RCTs do indeed tend to inflate the effect size substantially and to show differences by chance alone where none exists.

This is concerning in general, and specifically for our example of the semi-recumbent positioning study. Let us do some calculations to see just how this effect inflation would play out in the said study. Recall that microbiologically confirmed pneumonia occurred in 2 of 39 (5%) semi-recumbent cases and in 11 of 47 (23%) supine cases. The investigators calculated the adjusted odds ratio of VAP in the supine compared to semi-recumbent to be 6.8 (95% CI 1.7 - 26.7). This, as I mentioned before is an inflated estimate as odds ratios tend to be with frequent events. Furthermore, I obviously cannot do the adjusted calculation, as I would need the primary patient data for this. What we need is the relative reduction in VAP due to the intervention being investigated anyway, which is the reciprocal of what we have. So, I can derive the unadjusted relative risk thusly: (2/39)/(11/47) = 0.22. Now, if the RCT truncation alone reduces this risk by 29%, then if the trial had been allowed to go to completion, this relative risk would have been ~0.3. In this range, the difference does not seem all that impressive. But as all of the threats to validity we discussed in the original post begin to chisel mercilessly away at this risk reduction, the 29% inflation becomes a proportionally bigger deal.

Well, that does it.  

Friday, February 11, 2011

CMS never events: Evidence of smoke in mirrors?

Let me tell you a fascinating story. In 1999, I was still fresh out of my Pulmonary and Critical Care Fellowship, struggling for breath in the vortex of private practice, when a cute little paper appeared in the Lancet from a great group of researchers in Spain, describing a study performed in one large academic urban medical center's two ICUs: one respiratory and one medical. Its modest aim was to see if semi-recumbent (partly sitting up) compared to supine (lying flat on the back) positioning could reduce the incidence of that bane of the ICU, ventilator-associated pneumonia (VAP). The study was a well done randomized controlled trial, and the investigators even went so far as to calculate the power (the number needed to enroll in order to detect a pre-determined magnitude of effect [in this case an ambitious 50% reduction in clinically suspected VAP]), and this number was 182 based on the assumption of a 40% VAP prevalence in the control (supine) group. The primary endpoint was the prevalence (percentage of all mechanically ventilated [MV] patients developing) and the secondary the incidence density (number of cases among all MV patients spread over all the cumulative days of MV [patient-days of MV]) of clinically suspected VAP, based on the CDC criteria, while microbiologically confirmed VAP (also rigorously defined) served as the secondary endpoint.

Here is what they found. The study was stopped early due to efficacy (this means that the intervention was so superior to the control in reaching the endpoint that it was deemed unethical after the interim look to continue the study), enrolling only 86 patients, 39 in the intervention and 47 in the control groups. And here are the results for the primary and secondary outcomes:

So, this is great! No matter how you slice it, VAP is reduced substantially; there is a microbiologically confirmed prevalence reduction of nearly 6-fold (this is unadjusted for potential differences between groups; and there were differences!). Well, you know what's coming next. That's right, the "not so fast" warning. Let's examine the numbers in context.

First of all, if we look at the evidence-based guideline on HCAP, HAP and VAP from the ATS and IDSA, the prevalence of VAP is generally between 5 and 15%; in the current study the control group exceeds 20%. Now, for the incidence density, for years now the CDC has been keeping and reporting these numbers in the US, and the rate in patients comparable to the ones in the study should be around 2-4 cases per 1,000 MV days. In this study, no matter how you slice it, clinically or microbiologically, the incidence density is exceedingly high, more in line with some of the ex-US numbers reported in other studies. So, they started high and ended high, albeit with a substantial reduction.

Second of all, there is a wonderful flow chart in the paper that shows the enrollment algorithm. One small detail has always been somewhat obscure to me: the 4 patients in the semi-recumbent group that were excluded from analysis due to reintubation (this means that they were taken off MV, but had to go back on it within a day or two), which was deemed a protocol violation. Now, you might think that 4 patients is a pretty small number to worry about. But look at the total number of patients in the group: 39. If the excluded 4 all had microbiologically confirmed VAP, that would bring our prevalence from 5% to 14% (6 out of 43). This would certainly be a less than 6-fold reduction in VAP.

Thirdly, and this I think is critical, the study was not blinded. In other words, the people who took care of the patients knew the group assignment. So what, you ask. Well remember that VAP is a pretty difficult, elusive and unclear diagnosis. So, let us pretend that I am a doc who is also an investigator on the study, and I am really invested in showing how marvelous semi-recumbent positioning is for VAP prevention. I am likely to have a much lower threshold for suspecting and then diagnosing VAP in the comparator group than in my pet intervention group. And this is not an indictment of anyone's judgment or integrity; it is just how our brains are wired.

Next, there were indeed important differences between groups in their baseline risk factors for VAP. For example, more patients in the control (38%) than in the intervention (26%) group were on MV for a week or longer, the single most important risk factor for developing VAP. Likewise, the baseline severity of illness was higher in the control than the intervention group. To be sure, the authors did statistical analyses to adjust these differences away, and still found an adjusted odds ratio of VAP among the supine group to be 6.8, with the 95% confidence interval between 1.7 and 26.7. This is generally taken to mean that, on average, the risk of VAP increases nearly 7-fold for supine position as opposed to semi-recumbent, and if the trial was repeated 100 times, 95 of those times this estimate would fall between a 1.7 and a 26.7-fold increase. OK, so we can accept this as a possible viable strategy, right?

But wait, there is more. Remember what we said about the odds ratio? When the event happens in more than 10% of the sample, the odds ratio vastly overestimates the risk of this event. 28.4% anyone?

Now, let's put it all together. A single center study from a Spanish academic hospital, among respiratory and medical ICU patients, with a minuscule sample size, yet halted early for efficacy, an exceedingly high baseline rate of VAP, a substantial number of patients excluded for a nebulous reason, unblinded and therefore prone to biased diagnosis, reporting an inflated reduction in VAP development in the intervention group. It would be very easy to write this off as a flawed study (like all studies tend to be in one way or another) in need of confirmatory evidence, if it were not so critical in the current punitive environment of quality improvement. (By the way, to the best of my knowledge, there is no study that replicates these results). The ATS/IDSA guideline includes semi-recumbent positioning as a level I (highest possible level of evidence) recommendation for VAP prevention, and it is one of the elements of the MV bundle, as promoted by the Institute for Healthcare Improvement, which demands 95% compliance with all 5 elements of the bundle in order to get the "compliant" designation. And even this is not the crux of the matter. The diabolical detail here is that CMS is creeping up on making VAP into one of their magical "never" events, and the efforts by hospitals will most assuredly be including this intervention. So, ICU nurses are already expected to fall in step with this deceptively simple yet not-so-easily executable practice.

And this is what is under the hood of just one simple level I recommendation by two reputable professional organizations in their evidence-based guidelines. One shudders to think...              

Monday, December 6, 2010

"Invisibility, inertia and income" and patient safety

Hat tip to @KentBottles for a link to this story

I spend a lot of time thinking about the quality and safety of our healthcare system, as well as our efforts to improve it. I have written a lot about it here in this blog and in some of my peer-reviewed publications. You, my reader, have surely sensed my frustration with the fact that we have been unable to put any kind of a dent in the killing that goes on within our hospitals and other healthcare encounter locations. So, it is always with much interest and appreciation that I learn that I am not alone, and that others have had it with the criminal lack of the sense of urgency to stop this medical holocaust. For this reason, I was really happy to read Michael Millenson's post on the Health Affairs Blog titled "Why We Still Kill Patients: Invisibility, Inertia and Income". I was very curious to see how he structured his argument to boil it down to these three I's, since I think that sexy slogans and memorable triplets are the way to go. So, here is how his arguments went.

First, establish the problem. And indeed, we have been killing around 100,000 people annually since the late 1970s (and probably since before then, as you actually have to look in order to find), which amounts to the total 20-year toll of 2.5 million unnecessary deaths due to healthcare in the US. This is truly appalling. And this is just up through the 1999 IoM report! Here is what I was thinking: And if we take into account not just the killing fields of the hospital, but all of life's interfaces with healthcare, we arrive at an even more frightening 400,000 deaths annually, as known back in 2000. Multiply this by 10, and now we really are talking about a killing machine of holocaust proportions! And I completely agree with Millenson that the fact that we continue to say "more research needed" and other pablum like that is utterly and completely irresponsible. However, is this really an invisible problem? The author makes a good argument for how we minimize these numbers by failing to add them up:

I laid out those numbers in a March, 2003 Health Affairs article that challenged the profession to break a silence of deed — failing to take corrective actions — and a silence of word — failing to discuss openly the consequences of that failure. This pervasive silence, I wrote:
continually distorts the public policy debate [and] gives individuals and institutions that must undergo difficult changes a license to postpone them. Most seriously of all, it allows tens of thousands of preventable patient deaths and injuries to continue to accumulate while the industry only gradually starts to fix a problem that is both long-standing and urgent.
Nearly eight years later, medical professionals now talk freely about the existence of error and loudly about the need for combating it, but silence about the extent of professional inaction and its causes remains the norm. You can see it in this latest study, which decries the continuing “patient-safety epidemic” while failing to do next what any public health professional would instinctually do: tally up the toll. Instead, we get dry language about the IOM’s goal of a 50 percent error reduction over five years not being met.
Let’s fill in the blanks: If this unchecked “epidemic” were influenza and not iatrogenesis, then from 1999 to date it would have killed the equivalent of every man, woman and child in the cities of Raleigh (this study took place in North Carolina) and Washington, D.C. Does a disaster of that magnitude really suggest that “further study” and a “refocusing of resources” are what’s needed?
I guess this makes sense -- adding up the numbers is pretty startling, yet we are reluctant to do so. At the same time I hesitate to call this "invisible", since as you saw in a paragraph above, I just multiplied by 10! Yet I am willing to concede the first "I" to Millenson, since I do see the power in these startling numbers.


On the to the next "I", inertia. I agree with Millenson generally, and we actually know this, that physicians do not practice evidence-based medicine, and, even when it does, evidence takes decades to penetrate practice. And there is every reason to be upset that the medical profession has not rushed to adopt evidence-based prevention measures that Millenson talks about. But there is a greater subtlety here than meets the eye. True, the Kestone project is frequently held as an example of a simple evidence-based bundled intervention resulting in in a huge reduction in central line-associated blood stream infections. Indeed, this is a great success and everyone should be practicing the checklist instituted in the project by Peter Pronovost's group. What is less obvious and even less talked about is that the same approach of evidence-based bundled approach to prevention of ventilator-associated pneumonia (VAP) has also been piloted by the Keystone group, yet none of us has seen any data from that. All I have is rumors at this point, but they are not good. Why is this? Well, I have discussed this before here and here: VAP is a very tricky diagnosis in a very tricky population. This is not to say that we need not work as hard as we can to prevent it. It is just to clarify that we are not sure of the best ways to accomplish this. Is this in and of itself shameful? Well, yes, if you think that medicine is a precise science. But if you have been reading my blog long enough, you know this is not the case.


Millenson further sites his reading of the Joint Commission Journal, which has been documenting the progress within one large Catholic healthcare system, Ascension, in its efforts to reduce infections, falls and other common iatrogenic harms. By the system's account, they are now able to save over 2,000 lives annually with these measures. This is impressive. But is it trustworthy? Unfortunately, without reading the primary studies I cannot comment on the latter. However, I did publish a review of studies from this very journal on VAP prevention efforts, and here is what I found:
A systematic approach to understanding this research revealed multiple shortcomings. First, since all of the papers reported positive results and none reported negative ones, there is a potential for publication bias. For example, a recent story in a non-peer-reviewed trade publication questioned the effectiveness of bundle implementation in a trauma ICU, where the VAP rate actually increased directionally from 10 cases per 1,000 MV days in the period before to 11.9 cases per 1,000 MV days in the period after implementation of the bundle (24). This was in contradistinction to the medical ICU in the same institution, which achieved a reduction from 7.8 to 2.0 cases per 1,000 MV days with the same intervention (24). Since the results did not appear in a peer-reviewed form, it is difficult to judge the quality or significance of these data; however, the report does highlight the need for further investigation, particularly focusing on groups at heightened risk for VAP, such as trauma and neurological critically ill (25).             
Second, each of the four reported studies suffers from a great potential for selection bias, which was likely present in the way VAP was diagnosed. Since all of the studies were naturalistic and none was blinded, and since all of the participants were aware of the overarching purpose of the intervention, the diagnostic accuracy of VAP may have been different before as compared to after the intervention. This concern is heightened by the fact that only one study reports employing the same team approach to VAP identification in the two periods compared (23). In other studies, although all used the CDC-NNIS VAP definition, there was either no reporting of or heterogeneity in the personnel and methods of applying these definitions. Given the likely pressure to show measurable improvement to the management, it is possible that VAP classification suffered from a bias.
Third, although interventional in nature, naturalistic quality improvement studies can suffer from confounding much in the same way that observational epidemiologic studies do. Since none of the studies addressed issues related to case mix, seasonal variations, secular trends in VAP, and since in each of the studies adjunct measures were employed to prevent VAP, there is a strong possibility that some or all of these factors, if examined, would alter the strength of the association between the bundle intervention and VAP development. Additional components that may have played a role in the success of any intervention are the size and academic affiliation of the hospital. In a study of interventions aimed at reducing the risk of CRBSI, Pronovost et al. found that smaller institutions had a greater magnitude of success with the intervention than their larger counterparts (26). Similarly, in a study looking at an educational program to reduce the risk of VAP, investigators found that community hospital staff were less likely to complete the educational module than the staff at an academic institution; in turn, the rate of VAP was correlated with the completion of the educational program (27). Finally, although two of the studies included in this review represent data from over 20 ICUs each (20, 22), the generalizability of the findings in each remains in question. For example, the study by Unahalekhaka and colleagues was performed in the institutions in Thailand, where patient mix and the systems of care for the critically ill may differ dramatically from those in the US and other countries in the developed world (22). On the other hand, while the study by Resar and coworkers represents a cross section of institutions within the US and Canada, no descriptions are given of the particular ICUs with respect to the structure and size of their institutions, patient mix or ICU care model (e.g., open vs. closed; intensivists present vs. intensivists absent, etc.) (20). This aggregate presentation of the results gives one little room to judge what settings may benefit most and least from the described interventions. The third study includes data from only two small ICUs in two community institutions in the US (21), while the remaining study represents a single ICU in a community hospital where ICU patients are not cared for by an intensivist (23).  Since it is acknowledged that a dedicated intensivist model leads to improved ICU outcomes (28, 29), the latter study has limited usefulness to institutions that have a more rigorous ICU care model.           
So, not to toot my own horn here, and not expecting you to read the long-winded Discussion, suffice it to say that we found many methodologic errors in this body of research from the Joint Commission's own journal to invalidate potentially nearly all of the reported findings. My point is again to reiterate that unless you read each study with a critical eye and then put it into the larger context, do not believe someone else's cursory reference to the staggering improvements. I guess pertinent to our discussion, inertia, while present, is a more nuanced issue than we are led to believe.


And finally, income. I do agree that it is annoying that economic arguments are even necessary to promote a culture of prevention and safety. What I disagree with is that these economic fallacies of the C-suite impact in any way the implementation of the needed prevention systems. Most of the evidence-based preventions are pretty low tech. And although they do require teams and commitment and systems to implement broadly, small demonstrations at the level of individual clinicians are possible. Also, I shudder at the thought that a group of dedicated clinicians could not persuade a group of equally dedicated administrators to do the right thing, even at the risk of losing some revenue. 


Bottom line? While I like Millenson's sexy little "three I's of safety", I think the solutions, as is always the case when you start looking under the hood, are more complicated and nuanced. In a recent post I cited 5 potential solutions to our quality problem, and I will repeat them here:
1. Empower clinicians to provide only care that is likely to produce a benefit that outweighs risks, be they physical or emotional.
2. Reward the signal and not the noise. I wrote about this here andhere.
3. Reward clinicians with more time rather than money. Although I am not aware of any data to back up this hypothesis, my intuition is that slowing down the appointment may result not only in reduction of harm by cutting out unnecessary interventions, but also in overall lowering of healthcare expenditures. It is also sure to improve the crumbling therapeutic relationship.
4. We need to re-engineer our research enterprise for the most important stakeholder in healthcare: the clinician-patient dyad. We need to make the data that are currently manufactured and consumed for large scale policy decisions more friendly at the individual level. And as a corollary, we need to re-think how we help information diffuse into practice and adopt some of the methods of the social sciences.
5. Let's get back to the tried and true methods of public health, where an ounce of prevention continues to be worth a pound of cure. Yes, let's strive for reducing cancer mortality, but let us invest appropriately in stuffing that tobacco horse back into its barn -- getting people to stop smoking will reduce lung cancer mortality by 85% rather than 0.3%, and at a much lower cost with no complications or false positives. Same goes for our national nutrition and physical activity struggles. Our social policies must support these well-recognized and efficient population interventions.
No, they are not simple, they are not sexy, and most importantly they may be painful. Yet, what is the alternative? We must stop this massive bleeder before the American public starts thinking that the cure is worse than the disease.     

Wednesday, September 22, 2010

Is VAP prevention woo science?

My students and I are continuing with the VAP theme, this week exploring the development and implementation of evidence-based practice guidelines (EBPG) through the ATS/IDSA HAP/VAP/HCAP guideline. To continue what we started last week, we are specifically talking about VAP prevention. In this EBPG, there are over 20 suggested maneuvers to prevent VAP, most of them based on level I or II evidence. One lively thread in the discussion deals with the logistics of implementing so many interventions. The complexity of codifying and introducing durably these many processes was duly acknowledged. And although I used the word "bundle" once so far, I have not alluded yet to the IHI's effort to simplify the process. As some of you know, I have been somewhat critical of our current CMS Administrator's approach to quality and safety improvements, and I have even engendered the wrath of some colleagues by publishing this evidence-driven criticism in a review paper (yes, quoting myself again):
    Given that the MV bundle simply represents a conglomeration of some of the recommended practices, it is still important to evaluate how it performs as a VAP preventive strategy for several reasons. First, for example, it is possible that one or two of the interventions chosen to be included into the bundle drive most of the VAP preventive benefit. If this is the case, then it may be inefficient to include the remaining elements, as they may divert the necessary implementation resources from the elements that truly matter. One example of seemingly simple, cheap, yet rarely attainable goal is compliance with head of the bed elevation. Some studies have indicated that a variety of reasons preclude this goal from being achieved 85% of the time, and even call its effectiveness into question (31). Understanding which elements of the bundle drive improvements in which populations may deprioritize head of the bed elevation as a goal to be achieved across the board. Alternatively, it may help to make a more forceful argument to improve compliance with this recommendation. Second, other evidence-based recommendations included in the EBPG, but not the bundle, may impart a greater magnitude of VAP prevention, thus once again making the current approach inefficient. For example, some educational strategies incorporating more broadly the EBPG recommendations have been demonstrated to effect a substantial reduction in the rates of VAP (27, 32). Third, neither the expenditures associated with building the infrastructure for bundle implementation nor the potential return on such investment has been explicitly quantified. In general, the recent disappointing results of two meta-analyses of studies evaluating the impact of a rapid response team on hospital outcomes should serve as a cautionary note for adoption of any new process, even one with a great deal of face validity, that has not undergone rigorous testing as a whole (33, 34). More importantly, in the absence of such rigorous validation scoring requiring nearly complete compliance with these processes (e.g., 95% compliance advocated in the case of the MV bundle) as a quality measure would be misguided.
The 95% compliance refers to the IHI's stipulation that an institution reach this level of implementation of all the components of the bundle in order to be considered compliant.

Although several studies out there have demonstrated that by applying a group of evidence-based preventive strategies at least some cases of VAP can be prevented, none has addressed the potential hierarchy of or interactions between the components. What is clear from the literature, however, is that a concomitant educational effort is necessary to make the guideline stick.

And how much of the effect is actually due to Hawthorne effect rather than the result of the specific intervention? I could of course argue that Hawthorne effect is not something to avoid if it helps reduce VAP rates, but then I might get accused of advocating "woo". Perish that thought!

     

Friday, September 17, 2010

VAP: A case of mistaken identity?

This week in my class we are talking about systematic reviews and meta-analyses. As in the past, I assigned this excellent example published a couple of years ago by Canadian friends:
BMJ. 2007 Apr 28;334(7599):889. Epub 2007 Mar 26.

Oral decontamination for prevention of pneumonia in mechanically ventilated adults: systematic review and meta-analysis.

Department of Nursing Services, Tan Tock Seng Hospital, Singapore. ee_yuee_chan@ttsh.com.sg
Comment in:

Abstract

OBJECTIVE: To evaluate the effect of oral decontamination on the incidence of ventilator associated pneumonia and mortality in mechanically ventilated adults.
DESIGN: Systematic review and meta-analysis.
DATA SOURCES: Medline, Embase, CINAHL, the Cochrane Library, trials registers, reference lists, conference proceedings, and investigators in the specialty.
REVIEW METHODS: Two independent reviewers screened studies for inclusion, assessed trial quality, and extracted data. Eligible trials were randomised controlled trials enrolling mechanically ventilated adults that compared the effects of daily oral application of antibiotics or antiseptics with no prophylaxis.
RESULTS: 11 trials totalling 3242 patients met the inclusion criteria. Among four trials with 1098 patients, oral application of antibiotics did not significantly reduce the incidence of ventilator associated pneumonia (relative risk 0.69, 95% confidence interval 0.41 to 1.18). In seven trials with 2144 patients, however, oral application of antiseptics significantly reduced the incidence of ventilator associated pneumonia (0.56, 0.39 to 0.81). When the results of the 11 trials were pooled, rates of ventilator associated pneumonia were lower among patients receiving either method of oral decontamination (0.61, 0.45 to 0.82). Mortality was not influenced by prophylaxis with either antibiotics (0.94, 0.73 to 1.21) or antiseptics (0.96, 0.69 to 1.33) nor was duration of mechanical ventilation or stay in the intensive care unit.
CONCLUSIONS: Oral decontamination of mechanically ventilated adults using antiseptics is associated with a lower risk of ventilator associated pneumonia. Neither antiseptic nor antibiotic oral decontamination reduced mortality or duration of mechanical ventilation or stay in the intensive care unit.
The meta-analysis looks at effectiveness of oral care in preventing VAP. Of interest was the overall finding that VAP could indeed be prevented, but preventing it altered neither mortality nor such hospital utilization parameters as duration of mechanical ventilation (MV) or ICU length of stay (LOS).

The study has precipitated a vigorous discussion in class. I will excerpt below some of my responses to the students' questions (all right, so I feel a little tacky quoting myself, but perish the thought I should be accused of plagiarizing anyone, even myself).

One of the students brought up CMS never events (you know, those hospital-acquired conditions that CMS will no longer pay for because they should never happen), and presented me with an opportunity to talk about the subtleties of VAP diagnosis within that context:  

I could not agree more that prevention is critical. The question of whether we can prevent VAP 100% of the time is a little more complicated, however. For one, we are not even sure how to diagnose VAP. Applying the CDC's surveillance definition results in rates of VAP that are quite different from invasive diagnostic testing data. Applying the same definitions to different populations results in rates that are vastly different. Furthermore, diagnostics are driven by somewhat arbitrary thresholds for bacterial counts that may not have the greatest sensitivity or specificity. So, when you are dealing firstly with the wild west of the patient and disease interaction, then add the muddy diagnostic issues to the stew, and season everything with variable processes of care, the issue to me, at least, becomes a little less straight forward. 
Then, when I asked them what is the use of preventing VAP when it does not impact such important outcomes as mortality or LOS, I got some great answers, appropriately ranging from "you are full of s**t" to "OK, my intuition tells me VAP is good to prevent, but here we have no reason for it". Some even referred to these outcome as patient-oriented, so this was a great teachable moment. So, I responded, reiterating:
I am personally a great believer in prevention, but a). it has to be sensible prevention and not just a convenient conglomeration of poorly tested modalities, and b). not everything can be prevented. There are a couple of points to make here.
CMS has not curtailed payment for VAP for the reasons that I outlined above -- what exactly constitutes VAP, what are the best preventive strategies, and exactly how good are they. CMS, on the other hand, HAS stopped paying for completely preventable errors (yes, there are such things), such as leaving instruments inside patients during surgery or administering the wrong unit of blood. These are process errors for which zero tolerance is reasonable. 
Now, on to VAP. From the Chan MA we are under the impression that there is no reason to prevent VAP, since there is no difference in patient-oriented outcomes. First, let me challenge your notion that LOS and MV duration are patient-oriented outcomes. I would argue that the patient cares less about this than about comfort, quality of life, post-critical illness functional status and the development of PTSD after the ICU stay, to name a few. These are rarely measured in RCTs. So, even if the LOS and mortality are not altered by VAP prevention, there may be other perfectly valid reasons to prevent it. Not to mention curbing the use of antibiotics to curtail the spread of resistance. 
One final point to make. Each of the studies was not powered for mortality difference. Having said that, combining the data into a very respectable total number of >3,000 patients analyzable for mortality, this should have been enough to at least show an important trend, if there was one. And in fact the VAP literature is fraught with controversy on whether VAP imparts attributable mortality or not. The LOS issue is even more complex, however. Because LOS is an infinitely variable outcome, an RCT powered to capture this difference would have to be enormously large (a measly 2000 patients would not do). However, the epidemiologic and outcomes literature abounds with data on attributable LOS and $$ due to VAP.
So, here is a valuable MA that shows that there are sound strategies to prevent at least some cases of VAP, but by implication does not justify the effort for its prevention. Here is a situation, where policy requires expert analysis. 
Bottom line, MAs are useful and dangerous at the same time. Their results, if not examined carefully and in context can worsen rather than improve care. And the MA that I assigned is actually of great quality!
I posted this to underline how tricky applying evidence can be. Particularly in an area where there is so much diagnostic confusion. This of course does not mean that we should not strive to understand things better. On the contrary, this calls for more integration of knowledge in a multidisciplinary fashion.

Fools rush in where wise people fear to tread...