As you know, I think a lot about the nature of evidence, about the "science" in clinical science, and about pneumonia, specifically ventilator-associated pneumonia or VAP. Last week I wrote here and here about a specific recommended intervention to prevent VAP consisting of semi-recumbent, as opposed to supine, positioning. This recommendation, one of 21 maneuvers aimed at modifiable risk factors for VAP, had level I evidence behind it. Given my recent deconstruction of this level I evidence, consisting of a single unblinded RCT in a single academic urban center in Spain, and given that we already know that level I data represent a very small proportion of all the evidence behind guideline recommendations, I got curious about this level I stuff. How is level I really defined? Is there a lot of room for subjective judgment? So, I went to the source.
In its HAP/VAP guideline, the ATS and IDSA committee define the levels of evidence in the following way:
Level I (high)
Level II (moderate)
Level III (low)
Evidence comes from well conducted, randomized controlled trials
The grading system for our evidence-based recommendations was previously used for the updated ATS Community-acquired Pneumonia (CAP) statement, and the definitions of high-level (Level I), moderate-level (Level II), and low-level (Level III) evidence are summarized in Table 1 (8).OK, then. We have to go to reference #8, or the CAP guideline to get to the bottom of the definition. And here is what that document states:
Therefore, in grading the evidence supporting our recommendations, we used the following scale, similar to the approach used in the recently updated Canadian CAP statement (46): Level I evidence comes from well-conducted randomized controlled trials; Level II evidence comes from well-designed, controlled trials without randomization (including cohort, patient series, and case control studies); Level III evidence comes from case studies and expertopinion. Level II studies included any large case series in which systematic analysis of disease patterns and/or microbial etiology was conducted, as well as reports of new therapies that were not collected in a randomized fashion. In some instances therapy recommendations come from antibiotic susceptibility data, without clinical observations, and these constitute Level III recommendations.Again, we are faced with the nebulous "well-conducted" descriptor with no further defining guidance on how to discern this quality. I resigned myself to going to the next source citation, #46 above, the Canadian CAP statement:
We applied a hierarchical evaluation of the strength of evidence modified from the Canadian Task Force on the Periodic Health Examination . Well-conducted randomized, controlled trials constitute strong or level I evidence; well-designed controlled trials without randomization (including cohort and case-control studies) constitute level II or fair evidence; and expert opinion, case studies, and before-and-after studies are level III (weak) evidence. Throughout these guidelines, ratings appear as roman numerals in parentheses after each recommendation.Another "well-conducted" construct, another reference, another wild goose chase. The reference #4 above clarified the definition for me thus:
OK, so, now we have "at least one properly randomized controlled trial." So, having gotten to the origin of this broken telephone game, it looks like proper randomization trumps all other markers for a well-done trial. The price of such neglect is giving up generalizability, confirmation, appropriate analyses, and many other important properties that need to be evaluated before stamping the intervention with a seal of approval.
And this is just one guideline for one syndrome. The bigger point that I wanted to illustrate is that, even though we now know that only 14% of all IDSA guideline recommendations have so-called level I evidence behind them, what is dubious is the value and validity of assigning this highest level of evidence to these recommendations, given the room for subjectivity and misclassification. So, what does all of this mean? Well, for me it means no foreseeable shortage of fodder for blogging. But for our healthcare policy and our public's health? Big doo-doo.