Saturday, April 2, 2011

Invalidated Results Watch, Ivan?

My friend Ivan Oransky runs a highly successful blog called Retraction Watch; if you have not yet discovered it, you should! In it he and his colleague Adam Marcus document (with shocking regularity) retractions of scientific papers. While most of the studies are from the bench setting, some are in the clinical arena. One of the questions they have raised is what should happen with citations of these retracted studies by other researchers? How do we deal with this proliferation of oftentimes fraudulent and occasionally simply mistaken data?

A more subtle but no less difficult conundrum arises when papers cited are recognized to be of poor quality, yet they are used to develop defense for one's theses. The latest case in point comes from the paper I discussed at length yesterday, describing the success of the Keystone VAP prevention initiative. And even though I am very critical of the data, I do not mean to single out these particular researchers. In fact, because I am intimately familiar with the literature in this area, I can judge what is being cited. I have seen similar transgressions from other authors, and I am sure that they are ubiquitous. But let me be specific.

In the Methods section on page 306, the investigators lay out the rationale for their approach (bundles) by stating that the "ventilator care bundle has been an effective strategy to reduce VAP..." As supporting evidence they cite references #16-19. Well, it just so happens that these are the references that yours truly had included in her systematic review of the VAP bundle studies, and the conclusions of that review are largely summarized here. I hope that you will forgive me for citing myself again:
A systematic approach to understanding this research revealed multiple shortcomings. First, since all of the papers reported positive results and none reported negative ones, there is a potential for publication bias. For example, a recent story in a non-peer-reviewed trade publication questioned the effectiveness of bundle implementation in a trauma ICU, where the VAP rate actually increased directionally from 10 cases per 1,000 MV days in the period before to 11.9 cases per 1,000 MV days in the period after implementation of the bundle (24). This was in contradistinction to the medical ICU in the same institution, which achieved a reduction from 7.8 to 2.0 cases per 1,000 MV days with the same intervention (24). Since the results did not appear in a peer-reviewed form, it is difficult to judge the quality or significance of these data; however, the report does highlight the need for further investigation, particularly focusing on groups at heightened risk for VAP, such as trauma and neurological critically ill (25).             
Second, each of the four reported studies suffers from a great potential for selection bias, which was likely present in the way VAP was diagnosed. Since all of the studies were naturalistic and none was blinded, and since all of the participants were aware of the overarching purpose of the intervention, the diagnostic accuracy of VAP may have been different before as compared to after the intervention. This concern is heightened by the fact that only one study reports employing the same team approach to VAP identification in the two periods compared (23). In other studies, although all used the CDC-NNIS VAP definition, there was either no reporting of or heterogeneity in the personnel and methods of applying these definitions. Given the likely pressure to show measurable improvement to the management, it is possible that VAP classification suffered from a bias. 
Third, although interventional in nature, naturalistic quality improvement studies can suffer from confounding much in the same way that observational epidemiologic studies do. Since none of the studies addressed issues related to case mix, seasonal variations, secular trends in VAP, and since in each of the studies adjunct measures were employed to prevent VAP, there is a strong possibility that some or all of these factors, if examined, would alter the strength of the association between the bundle intervention and VAP development. Additional components that may have played a role in the success of any intervention are the size and academic affiliation of the hospital. In a study of interventions aimed at reducing the risk of CRBSI, Pronovost et al. found that smaller institutions had a greater magnitude of success with the intervention than their larger counterparts (26). Similarly, in a study looking at an educational program to reduce the risk of VAP, investigators found that community hospital staff were less likely to complete the educational module than the staff at an academic institution; in turn, the rate of VAP was correlated with the completion of the educational program (27). Finally, although two of the studies included in this review represent data from over 20 ICUs each (20, 22), the generalizability of the findings in each remains in question. For example, the study by Unahalekhaka and colleagues was performed in the institutions in Thailand, where patient mix and the systems of care for the critically ill may differ dramatically from those in the US and other countries in the developed world (22). On the other hand, while the study by Resar and coworkers represents a cross section of institutions within the US and Canada, no descriptions are given of the particular ICUs with respect to the structure and size of their institutions, patient mix or ICU care model (e.g., open vs. closed; intensivists present vs. intensivists absent, etc.) (20). This aggregate presentation of the results gives one little room to judge what settings may benefit most and least from the described interventions. The third study includes data from only two small ICUs in two community institutions in the US (21), while the remaining study represents a single ICU in a community hospital where ICU patients are not cared for by an intensivist (23).  Since it is acknowledged that a dedicated intensivist model leads to improved ICU outcomes (28, 29), the latter study has limited usefulness to institutions that have a more rigorous ICU care model.
OK, you say, maybe the investigators did not buy into my questions about the validity of the "findings." Maybe not, but evidence suggests otherwise. In the Discussion section on page 311 they actually say
While the bundle has been published as an effective strategy for VAP prevention and is advocated by national organizations, there is significant concern about its internal validity.
And guess what they cite? Yup, you guessed it, the paper excerpted above. So, to me it feels like they are trying to have it both ways -- the evidence FOR implementing the bundle is the same evidence AGAINST its internal validity. Much like Bertrand Russell, I am not that great at dealing with paradoxes. Will this contradiction persist in our psyche, or will sense prevail? Perhaps Ivan and Adam need to start a new blog: Invalidated Results Watch. Oh? Did you say that peer review is supposed to be the answer to this? Right.  


  1. The bundle supporters don't want you to look at the evidence. Even components of the bundle are bogus. Take elevation of the head of the bed for example. The original paper said the bed had to elevated to 45 degrees. The bundlers chose 30 degrees because they knew no one could comply with 45. Guess what? No one complies with 30 degrees either.

  2. Thanks for your comment, Skeptical. There certainly are issues with what is chosen for incorporation into guidelines. to be sure, I am not averse to the bundle. I just think that the evidence is too weak to be used as a policy stick and make everyone spin their wheels to implement it. It is not that there is evidence of absence of its effectiveness, but absence of evidence of such, and, given the gravity with which it is taken as a policy issue, this is not acceptable. Furthermore, the science in this area is still crappy, and spreading crappy science is a bad thing.

    I talked about the semi-recumbent positioning evidence recently:

  3. Marya
    Great Post

    Two questions:

    1)Is naturalistic study = prospective cohort trial, and matter of semantics?

    2) "where the VAP rate actually increased directionally"

    What is the need for saying "directionally?" VAP rate increased, so why this modifier? Is there something I am missing statistically whereby inserting this communicates additional information I am overlooking?

  4. Brad, thanks for your comments. Yes, a naturalistic study is essentially the same as an observational one in that you have to worry about all the same threats to validity. I used "directionally" to differentiate from "statistically significant", as reviewers and editors get all nervous when you say "increase" when the sacred p-value is not achieved.

    Gain, thanks for your comments and questions.