Friday, February 25, 2011

Guidelines: What really constitutes level I evidence?

There has been some interesting buzz in the blogosphere about where evidence-based guideline recommendations come from, and I wanted to add a little fuel to that fire today.

As you know, I think a lot about the nature of evidence, about the "science" in clinical science, and about pneumonia, specifically ventilator-associated pneumonia or VAP. Last week I wrote here and here about a specific recommended intervention to prevent VAP consisting of semi-recumbent, as opposed to supine, positioning. This recommendation, one of 21 maneuvers aimed at modifiable risk factors for VAP, had level I evidence behind it. Given my recent deconstruction of this level I evidence, consisting of a single unblinded RCT in a single academic urban center in Spain, and given that we already know that level I data represent a very small proportion of all the evidence behind guideline recommendations, I got curious about this level I stuff. How is level I really defined? Is there a lot of room for subjective judgment? So, I went to the source.

In its HAP/VAP guideline, the ATS and IDSA committee define the levels of evidence in the following way:
Level I (high)
Level II (moderate) 

Level III (low)
     Evidence comes from well conducted, randomized controlled trials

Evidence comes from well designed, controlled trials without randomization (including cohort, patient series, and case-control studies). Level II studies also include any large case series in which systematic analysis of disease patterns and/or microbial etiology was conducted, as well as reports of new therapies that were not collected in a randomized fashion

Evidence comes from case studies and expert opinion. In some instances therapy recommendations come from antibiotic susceptibility data without clinical observations
So, well conducted, randomized controlled trials. But what does "well conducted" mean? Seems to me that one person's well conducted may be another person's garbage. Well, I went to the text of the document for clarification:
The grading system for our evidence-based recommendations was previously used for the updated ATS Community-acquired Pneumonia (CAP) statement, and the definitions of high-level (Level I), moderate-level (Level II), and low-level (Level III) evidence are summarized in Table 1 (8). 
OK, then. We have to go to reference #8, or the CAP guideline to get to the bottom of the definition. And here is what that document states:
Therefore, in grading the evidence supporting our recommendations, we used the following scale, similar to the approach used in the recently updated Canadian CAP statement (46): Level I evidence comes from well-conducted randomized controlled trials; Level II evidence comes from well-designed, controlled trials without randomization (including cohort, patient series, and case control studies); Level III evidence comes from case studies and expertopinion. Level II studies included any large case series in which systematic analysis of disease patterns and/or microbial etiology was conducted, as well as reports of new therapies that were not collected in a randomized fashion. In some instances therapy recommendations come from antibiotic susceptibility data, without clinical observations, and these constitute Level III recommendations.
Again, we are faced with the nebulous "well-conducted" descriptor with no further defining guidance on how to discern this quality. I resigned myself to going to the next source citation, #46 above, the Canadian CAP statement:
We applied a hierarchical evaluation of the strength of evidence modified from the Canadian Task Force on the Periodic Health Examination [4]. Well-conducted randomized, controlled trials constitute strong or level I evidence; well-designed controlled trials without randomization (including cohort and case-control studies) constitute level II or fair evidence; and expert opinion, case studies, and before-and-after studies are level III (weak) evidence. Throughout these guidelines, ratings appear as roman numerals in parentheses after each recommendation.
Another "well-conducted" construct, another reference, another wild goose chase. The reference #4 above clarified the definition for me thus:
OK, so, now we have "at least one properly randomized controlled trial." So, having gotten to the origin of this broken telephone game, it looks like proper randomization trumps all other markers for a well-done trial. The price of such neglect is giving up generalizability, confirmation, appropriate analyses, and many other important properties that need to be evaluated before stamping the intervention with a seal of approval. 

And this is just one guideline for one syndrome. The bigger point that I wanted to illustrate is that, even though we now know that only 14% of all IDSA guideline recommendations have so-called level I evidence behind them, what is dubious is the value and validity of assigning this highest level of evidence to these recommendations, given the room for subjectivity and misclassification. So, what does all of this mean? Well, for me it means no foreseeable shortage of fodder for blogging. But for our healthcare policy and our public's health? Big doo-doo.


  1. Actually we use an evidence pyramid in meta-analysis with the top being meta-analyses using RCT's and individual patient data.

  2. Thanks for the comment, datacooker! I am certainly aware that in some EBPGs that is the definition of level I evidence. However, in the series of guidelines that I am citing that is certainly NOT the case. This bears reviewing in other so-called EBPGs.

  3. However, from what I have seen of exculsion and inclusion criteria, even a RCT is useless to me as almost of the patients I see would have qualified for any of the studies.

    The assumption that those study results apply to patients who would not have qualified for the study is false.

    Medical science is as much an oxymoron as jumbo shrimp, or others that are too political to list here.