A very close friend of mine has breast cancer. It is a very small tumor, diagnosed on an annual mammogram, requiring confirmation with a breast MRI. She had a lumpectomy today, and I was with her at the hospital. This proved to be an enlightening experience.
To put things in perspective, when I was in training and in practice (yes, in the dark ages when we were expected to stay awake AND care for patients for over 48 hours at a time every 3 days), we had not heard of patient-centered medicine. I learned that my role was to diagnose, come up with a plan of action and convince the patient at any cost that my plan was the correct one. To be sure, I always tried to do this in a nice way, but would get a bit impatient when my judgment was questioned. This is the behavior modeled for me by my elders and others whom I respected.
Well, that was then. Having had quite a few years to reflect on the practice of medicine in the context of our healthcare system, I have learned just how misguided this attitude is. And, being a Sagittarius, I cannot fathom how this universal truth is escaping others. Yet escaping it is. This became obvious to me today.
My friend had to have a nuclear medicine test prior to her lumpectomy to define the extent of axillary nodal involvement. She had been told that this is an arduous and painful experience that cannot be mitigated with pre-medication. She was also informed that asking the radiologist to deliver the radionuclide slowly rather than as a rapid push might reduce the sensation. So, my friend, who is herself a physician, was prepared for a civilized and simple conversation with the practitioner. Yet, this is not what transpired. You would think that being asked to deliver the chemical slowly is not such a big and unreasonable request. Well, if you thought this, you were wrong: evidently this was such a big ego blow to the radiologist that she felt compelled to respond snidely, "Well, OK, I am not going to fight with you about it". Now, this is off-putting under the best of circumstances. Imagine being about to go to the OR to have a cancer removed from your breast, and having this snide come-back thrown at you. And why? What is the harm in going along with the patient's request if it makes no difference in the end-result of the test? Is it really necessary to diminish her in such a blatant way?
Well, this physician was of a similar vintage to me, and I can only imagine that she came into practice before patient-centered care became the standard. In her mind, as in mine in those distant days, my involvement with the patient's care was not really about the patient necessarily, unless they fell in line with my recommendation. The shameful fact is that my ego was much too fragile to allow a discussion or questions about my considered course of action. How could they go against my years of training, deep knowledge and their best interests? I cannot say for sure, but it is likely that my friend's radiologist was cut from similar cloth. And what is so obvious to me today has not yet been assimilated by so many of my colleagues, including this person.
As I have said before, the new direction for medicine cannot be what I am used to in real estate: "I do not have what you need, but I will show what I do have". The new direction in medicine must undoubtedly be one where the patient is the center of the encounter, and it is the patient's interest rather than the doctor's ego that must be protected assiduously.
Lest you think that the entire hospital experience was negative, let me be clear: of all the people taking care of my friend, the radiologist was the sole disappointing exception. Her surgeons, anesthesiologists, nurses and ancillary personnel went above and beyond my expectations. I was amazed by the level of civility, good humor, politeness and real involvement everyone exhibited -- it was truly different from my days on the wards and pleasantly eye-opening. It even gave me some hope for the future of medicine in the midst of my normally nihilistic ruminations.            
The great poet Rumi said that changing language can change our life. Well, when the recovery room nurse said to my friend "Let me know when you feel that you would rather rest at home than here", I was overcome with warmth and good will. The language of medicine does seem to be changing. And if it continues in this vein, perhaps it will change our lives.
Tuesday, December 21, 2010
Monday, December 20, 2010
Why we need collaborations across healthcare sectors
I want to digress from our recent focus on methods and talk a bit about conflict of interest (COI for short). There has been a lot in the press lately about doctors taking money from the biopharmaceutical manufacturers, and doctors inserting unnecessary hardware into patients' hearts and spines. All of this has been happening against the background of a low hum of an ongoing discussion of what constitutes a COI, how much is too much and for what (for example, can a doc who takes research and education dollars from a manufacturer with an interest in anticoagulation sit on a committee that develops the guidelines for prevention of thromboembolic disease?), and how to mitigate these ubiquitous and pesky COIs.
In some ways watching this discussion has been amusing, while in others it has been downright sad. Medical journals, while insisting that advertising money is OK to take (presumably because the editorial and marketing offices are separated by some sort of a fire wall), though professional societies should not be able to take this tainted education money. Professional societies, on the other hand, are running away from the accusations by tightening their continuing medical education (CME) criteria and scrambling to replace the lavish budgets derived from pharma to develop their coveted evidence-based practice guidelines. And while all the pots are calling all the kettles black, academic researchers are being barred from collaborating with the industry on research projects, and industry researchers are being precluded from presenting their data at professional society meetings. While all the time the public is being whipped into lather about these alleged systematic transgressions, and forced to cheer for the ensuing retribution.
But, like many things in life, and especially stuff that we discuss on this blog, this issue is neither black nor white. Don't take me wrong: I am not condoning the egregious excesses of greed demonstrated by some members of my hallowed profession. If you have been reading my blog for some time, you know that I do not dispute the shameful reality of many breeches of public trust. I am an ardent supporter of exposing these breeches and of harsh punishments that they deserve. This is not what I am talking about here.
I am much more concerned about the one-sided story that we have been hearing about pharma-academic collaborations. Because of the persecutory nature of public opinion, some institutions are now shying away from such collaborations. This attitude is akin to navigating a treacherous road while looking in the rearview mirror. Yes, there have been transgressions, yes there has been greed and even scientific fraud in the name of money. Does this mean that we need to stop everything and come up with an entirely new way of managing these risks? Absolutely! Does this mean that we have to get rid of all pharma-academic collaborations? Absolutely not! In my humble opinion, erecting non-scaleable walls between these two groups is a big mistake. Here is why.
First, let me make a disclaimer: I do have active ongoing collaborations with multiple manufacturers. I do not take speaking or other promotional money, but limit myself to consulting and research grant funding. I also do a good deal of unfunded research, and I have never taken a penny for any of my blogging or blogging-related activities. And here is the crux of the matter: In this world of über-subspecialization, with the expertise being demographically and geographically diffuse, how can we afford not to collaborate across different types of organizations with different types of capabilities? Can we really afford to leave all of therapeutic development in the hands of organizations whose overarching purpose is to make money? And equally importantly, can we afford to continue this fragmented model of medical development without any thought to integration of the needs of all of the stake holders? I think not. Just as we are reaping the fruit of electronic medical record development in isolation from the end-user, so this isolation of research effort will lead to even less coherence in medicine. And unless we are ready to socialize our entire healthcare system, it seems naïve to expect that this one sector will acquiesce and start working outside of our coveted free market for the good of humankind alone.
My readers know that I am not an industry apologist. On the contrary, I have said many times that there has been bad behavior across all the sectors of healthcare, starting with biopharma. But if we want to advance rather than stagnate and regress, we need robust collaborations. We also need higher ethical standards and greater professionalism to keep public's health as our top priority.
There is COI everywhere, and, while financial COI is most visible, it is the more hidden COI that is most insidious. An hidden COI can be intellectual, reputational, ego-driven, career-mediated, etc. It is incumbent on us all in this complex world to ask questions and mitigate any ill effects of any cognitive biases, including those created by COI. Ultimately, as I have begun to realize of late, nothing will replace an educated and empowered patient: This is the only model that can provide appropriate checks and balances for our oftentimes misaligned and perverse incentives, both academic and economic.
In some ways watching this discussion has been amusing, while in others it has been downright sad. Medical journals, while insisting that advertising money is OK to take (presumably because the editorial and marketing offices are separated by some sort of a fire wall), though professional societies should not be able to take this tainted education money. Professional societies, on the other hand, are running away from the accusations by tightening their continuing medical education (CME) criteria and scrambling to replace the lavish budgets derived from pharma to develop their coveted evidence-based practice guidelines. And while all the pots are calling all the kettles black, academic researchers are being barred from collaborating with the industry on research projects, and industry researchers are being precluded from presenting their data at professional society meetings. While all the time the public is being whipped into lather about these alleged systematic transgressions, and forced to cheer for the ensuing retribution.
But, like many things in life, and especially stuff that we discuss on this blog, this issue is neither black nor white. Don't take me wrong: I am not condoning the egregious excesses of greed demonstrated by some members of my hallowed profession. If you have been reading my blog for some time, you know that I do not dispute the shameful reality of many breeches of public trust. I am an ardent supporter of exposing these breeches and of harsh punishments that they deserve. This is not what I am talking about here.
I am much more concerned about the one-sided story that we have been hearing about pharma-academic collaborations. Because of the persecutory nature of public opinion, some institutions are now shying away from such collaborations. This attitude is akin to navigating a treacherous road while looking in the rearview mirror. Yes, there have been transgressions, yes there has been greed and even scientific fraud in the name of money. Does this mean that we need to stop everything and come up with an entirely new way of managing these risks? Absolutely! Does this mean that we have to get rid of all pharma-academic collaborations? Absolutely not! In my humble opinion, erecting non-scaleable walls between these two groups is a big mistake. Here is why.
First, let me make a disclaimer: I do have active ongoing collaborations with multiple manufacturers. I do not take speaking or other promotional money, but limit myself to consulting and research grant funding. I also do a good deal of unfunded research, and I have never taken a penny for any of my blogging or blogging-related activities. And here is the crux of the matter: In this world of über-subspecialization, with the expertise being demographically and geographically diffuse, how can we afford not to collaborate across different types of organizations with different types of capabilities? Can we really afford to leave all of therapeutic development in the hands of organizations whose overarching purpose is to make money? And equally importantly, can we afford to continue this fragmented model of medical development without any thought to integration of the needs of all of the stake holders? I think not. Just as we are reaping the fruit of electronic medical record development in isolation from the end-user, so this isolation of research effort will lead to even less coherence in medicine. And unless we are ready to socialize our entire healthcare system, it seems naïve to expect that this one sector will acquiesce and start working outside of our coveted free market for the good of humankind alone.
My readers know that I am not an industry apologist. On the contrary, I have said many times that there has been bad behavior across all the sectors of healthcare, starting with biopharma. But if we want to advance rather than stagnate and regress, we need robust collaborations. We also need higher ethical standards and greater professionalism to keep public's health as our top priority.
There is COI everywhere, and, while financial COI is most visible, it is the more hidden COI that is most insidious. An hidden COI can be intellectual, reputational, ego-driven, career-mediated, etc. It is incumbent on us all in this complex world to ask questions and mitigate any ill effects of any cognitive biases, including those created by COI. Ultimately, as I have begun to realize of late, nothing will replace an educated and empowered patient: This is the only model that can provide appropriate checks and balances for our oftentimes misaligned and perverse incentives, both academic and economic.
Sunday, December 19, 2010
How e-patients can fix our healthcare system
We got a little into the weeds last week about significance testing and test characteristics. Because information is power, I realized that it may be prudent to back up a bit and do a very explicit primer on medical testing. I am hoping that this will provide some vocabulary for improved patient-clinician communication. But, alas, please do not be surprised if your practitioner looks at you as if you were an alien -- it is safe to say that most clinicians do not think in these terms in their everyday practices. So, educate them!
Let's dig a little deeper into some of the ideas we batted around last week, specifically those pertaining to testing. Let's start by explicitly establishing the purpose of a medical test. The purpose of a medical test is to detect disease when such disease is present. This fact alone should underscore the importance of your physician's ability to arrive at the most likely reasons for your symptoms. This exercise that every doc should go through as he/she is evaluating you is called "differential diagnosis". When I was a practicing MD, my strategy was to come up with 3-5 most likely and 3-5 most deadly if missed potential diagnoses and explore them further with appropriate testing. Arranging these possible diagnoses as a hierarchy can help the clinician to assign informal probabilities to each, a task that is central to Bayesian thinking. From this hierarchy then follows the tactical sequential work-up, avoiding the frantic shotgun approach.
So, having established a hierarchy of diagnoses, we now engage in adjunctive testing. And here is where we really need to be aware not only of our degree of suspicion for each diagnosis, but also the test characteristics as they are reported in the literature and the test characteristics as they exist in the local center where the testing takes place. Why do I differentiate between the literature and practice? We know very well that the mere fact of observation, not to mention experimental cleanliness of trials, often tends to exaggerate the benefits of an intervention. In other words, real world is much messier than the laboratory of clinical research (which of course itself is messy enough). So, it is this compounded messiness that each clinician has to contend with when making testing decisions.
OK, so let us now deconstruct test characteristics even further. We have used the terms sensitivity, specificity, positive and negative predictive values. We've even explored their meanings to an extent. But let's break them down a bit further. Epidemiologists find it helpful to construct 2-by-2 (or 2 x 2) tables to think through some of these constructs, so, let's engage in that briefly. Below you see a typical 2 x 2 table.
In it we traditionally situate disease information in columns and test information in rows. A good test picks up signal when the signal is there while adding minimal noise. The signal is the disease, while the noise is the imprecise nature of all tests. Even simple blood tests, whose "objective accuracy" we take for granted, are subject to these limitations.
It is easiest to think of sensitivity as how well the test picks up the corresponding disease. In the case of mammography from last week, this number is 80%. This means that among 100 women who actually harbor breast cancer a mammogram will recognize 80. This is the "true positive" value, or disease actually present when the test is positive. What about the remaining 20? Well, those real cancers will be missed by this test, and we call them a "false negative". If you look at the 2 x 2 table, it should become obvious that the sum of the true positives and the false negatives adds up to the total number of people with the disease. Are you shocked that our wonderful tests may miss so much disease? Well, stay tuned.
The flip side of sensitivity is "specificity". Specificity refers to whether or not the test is identifying what we think it is identifying. The noise in this value comes from the test in effect hallucinating disease when the person does not have the disease. A test with high specificity will be negative in the overwhelming proportion of people without the disease, so the "true negative" cell of the table will contain almost the entire group of people without disease. Alas, for any test we develop we walk the tight-rope between sensitivity and specificity. That is, depending on our priorities for testing, we have to give up some accuracy in either the sensitivity or the specificity. The more sensitive the test, the higher our confidence that we will not miss the disease when it is there. Unfortunately, what we gain in sensitivity we usually lose in specificity, thus creating higher odds for a host of false positive results. So, there really is no free lunch when it comes to testing. In fact, it is this very tension between sensitivity and specificity that is the crux of the mammography debate. Not as straight-forward as we had thought, right? And this is not even getting into pre-test probabilities or positive and negative predictive values!
Well, let's get into these ideas now. I believe that the positive and negative predictive values of the test are fairly well understood at this point, no? Just to reiterate, a positive predictive value, which is the ratio of true positives to all positive test results (the latter is the sum of the true and false positives, or the sum of the values across the top row of the 2 x 2 table), tells us how confident we can be that a positive test result corresponds to disease being present. Similarly, the negative predictive value, the ratio of true negative test results to all negative test results (again, the latter being the sum across the second row of the 2 x 2 table, or true and false negatives), tells us how confident we can be that a negative test result really represents the absence of disease. The higher the positive and negative predictive values, the more useful the test becomes. However, when one is likely to be quite high but the other quite low, it is a pitfall of our irrationality to rush head first into the test in hopes of obtaining the answer with a high value (as in the case of the negative predictive value for mammography in women aged 40-50 years), since the opposite test result creates a potentially difficult conundrum. This is where pre-test probability of disease comes in.
Now, what is this pre-test probability and how do we calculate it? Ah, this is the pivotal question. The pre-test probability is estimated based on population epidemiology data. In other words, given the type of a person you are (no, I do not mean nice or nasty or funny or droll) in terms of your demographics, heredity, chronic disease burden and current symptoms, what category of risk you fit into based on these population studies of disease. This approach relies on filing you into a particular cubby hole with other subjects whose characteristics are most similar to yours. Are you beginning to appreciate the complexity of this task? And the imprecision of it? Add to this barely functioning crystal ball the clinician's personal cognitive biases, and is it any wonder that we do not do better? And need I even overlay this with another bugaboo, that of the overwhelming amount of information in the face of the incredible shrinking appointment, to demonstrate to you just how NOT straightforward any of this medicine stuff is?
OK, get your fingers out of that Prozac bottle -- it is not all bad! Yes, these are significant barriers to good healthcare. But guess what? The mere fact that you now know these challenges and can call them by their appropriate names gives you more power to be your own steward of your healthcare. Next time a doc appears certain and recommends some sexy new test, you will know that you cannot just say OK and await further results. Your healthcare is a chess match: you and your healthcare provider need to plan 10 moves ahead and play out many different contingencies.
On our end, researchers, policy makers and software developers all need to do better developing more useful and individualizable information, integrating this information into user-friendly systems, and encouraging thoughtful healthcare encounters. I am convinced that patient empowerment with this information followed by teaming up with providers in advocacy in this vein is the only thing that can assure course correction for our mammoth, unruly, dangerous and irrational healthcare system.
           
Let's dig a little deeper into some of the ideas we batted around last week, specifically those pertaining to testing. Let's start by explicitly establishing the purpose of a medical test. The purpose of a medical test is to detect disease when such disease is present. This fact alone should underscore the importance of your physician's ability to arrive at the most likely reasons for your symptoms. This exercise that every doc should go through as he/she is evaluating you is called "differential diagnosis". When I was a practicing MD, my strategy was to come up with 3-5 most likely and 3-5 most deadly if missed potential diagnoses and explore them further with appropriate testing. Arranging these possible diagnoses as a hierarchy can help the clinician to assign informal probabilities to each, a task that is central to Bayesian thinking. From this hierarchy then follows the tactical sequential work-up, avoiding the frantic shotgun approach.
So, having established a hierarchy of diagnoses, we now engage in adjunctive testing. And here is where we really need to be aware not only of our degree of suspicion for each diagnosis, but also the test characteristics as they are reported in the literature and the test characteristics as they exist in the local center where the testing takes place. Why do I differentiate between the literature and practice? We know very well that the mere fact of observation, not to mention experimental cleanliness of trials, often tends to exaggerate the benefits of an intervention. In other words, real world is much messier than the laboratory of clinical research (which of course itself is messy enough). So, it is this compounded messiness that each clinician has to contend with when making testing decisions.
OK, so let us now deconstruct test characteristics even further. We have used the terms sensitivity, specificity, positive and negative predictive values. We've even explored their meanings to an extent. But let's break them down a bit further. Epidemiologists find it helpful to construct 2-by-2 (or 2 x 2) tables to think through some of these constructs, so, let's engage in that briefly. Below you see a typical 2 x 2 table.
In it we traditionally situate disease information in columns and test information in rows. A good test picks up signal when the signal is there while adding minimal noise. The signal is the disease, while the noise is the imprecise nature of all tests. Even simple blood tests, whose "objective accuracy" we take for granted, are subject to these limitations.
It is easiest to think of sensitivity as how well the test picks up the corresponding disease. In the case of mammography from last week, this number is 80%. This means that among 100 women who actually harbor breast cancer a mammogram will recognize 80. This is the "true positive" value, or disease actually present when the test is positive. What about the remaining 20? Well, those real cancers will be missed by this test, and we call them a "false negative". If you look at the 2 x 2 table, it should become obvious that the sum of the true positives and the false negatives adds up to the total number of people with the disease. Are you shocked that our wonderful tests may miss so much disease? Well, stay tuned.
The flip side of sensitivity is "specificity". Specificity refers to whether or not the test is identifying what we think it is identifying. The noise in this value comes from the test in effect hallucinating disease when the person does not have the disease. A test with high specificity will be negative in the overwhelming proportion of people without the disease, so the "true negative" cell of the table will contain almost the entire group of people without disease. Alas, for any test we develop we walk the tight-rope between sensitivity and specificity. That is, depending on our priorities for testing, we have to give up some accuracy in either the sensitivity or the specificity. The more sensitive the test, the higher our confidence that we will not miss the disease when it is there. Unfortunately, what we gain in sensitivity we usually lose in specificity, thus creating higher odds for a host of false positive results. So, there really is no free lunch when it comes to testing. In fact, it is this very tension between sensitivity and specificity that is the crux of the mammography debate. Not as straight-forward as we had thought, right? And this is not even getting into pre-test probabilities or positive and negative predictive values!
Well, let's get into these ideas now. I believe that the positive and negative predictive values of the test are fairly well understood at this point, no? Just to reiterate, a positive predictive value, which is the ratio of true positives to all positive test results (the latter is the sum of the true and false positives, or the sum of the values across the top row of the 2 x 2 table), tells us how confident we can be that a positive test result corresponds to disease being present. Similarly, the negative predictive value, the ratio of true negative test results to all negative test results (again, the latter being the sum across the second row of the 2 x 2 table, or true and false negatives), tells us how confident we can be that a negative test result really represents the absence of disease. The higher the positive and negative predictive values, the more useful the test becomes. However, when one is likely to be quite high but the other quite low, it is a pitfall of our irrationality to rush head first into the test in hopes of obtaining the answer with a high value (as in the case of the negative predictive value for mammography in women aged 40-50 years), since the opposite test result creates a potentially difficult conundrum. This is where pre-test probability of disease comes in.
Now, what is this pre-test probability and how do we calculate it? Ah, this is the pivotal question. The pre-test probability is estimated based on population epidemiology data. In other words, given the type of a person you are (no, I do not mean nice or nasty or funny or droll) in terms of your demographics, heredity, chronic disease burden and current symptoms, what category of risk you fit into based on these population studies of disease. This approach relies on filing you into a particular cubby hole with other subjects whose characteristics are most similar to yours. Are you beginning to appreciate the complexity of this task? And the imprecision of it? Add to this barely functioning crystal ball the clinician's personal cognitive biases, and is it any wonder that we do not do better? And need I even overlay this with another bugaboo, that of the overwhelming amount of information in the face of the incredible shrinking appointment, to demonstrate to you just how NOT straightforward any of this medicine stuff is?
OK, get your fingers out of that Prozac bottle -- it is not all bad! Yes, these are significant barriers to good healthcare. But guess what? The mere fact that you now know these challenges and can call them by their appropriate names gives you more power to be your own steward of your healthcare. Next time a doc appears certain and recommends some sexy new test, you will know that you cannot just say OK and await further results. Your healthcare is a chess match: you and your healthcare provider need to plan 10 moves ahead and play out many different contingencies.
On our end, researchers, policy makers and software developers all need to do better developing more useful and individualizable information, integrating this information into user-friendly systems, and encouraging thoughtful healthcare encounters. I am convinced that patient empowerment with this information followed by teaming up with providers in advocacy in this vein is the only thing that can assure course correction for our mammoth, unruly, dangerous and irrational healthcare system.
Top 5 this week
There has been significant interest in the p-value this week...
Here are the top 5 posts for the week:
#1: Why medical testing is never a simple decision
#2: Of P values, power, tobacco and cell phones
#3: P-values, Bayes and Ioannidis, oh my!
#4: Can a "negative" p-value obscure a positive findin...
#5: Getting beyond the p-value
Here are the top 5 posts for the week:
#1: Why medical testing is never a simple decision
#2: Of P values, power, tobacco and cell phones
#3: P-values, Bayes and Ioannidis, oh my!
#4: Can a "negative" p-value obscure a positive findin...
#5: Getting beyond the p-value
Thursday, December 16, 2010
P-values, Bayes and Ioannidis, oh my!
When I graduated from college in the early 1980s, much to my parents' chagrin, I was not sure what to do with my career. So, instead of following some of my more wizened classmates to Wall Street, I got a job in a very well regarded molecular endocrinology laboratory in Boston. When I look back on that time, almost 30 years ago, so much seems surreal. For example, in those days it took us well over a year to sequence a gene! How about that? Something that takes hours with today's technology took an equivalent of eternity. And what a pain it was, running those cumbersome sequencing gels, with the glass cracking in the night, negating days of work. Oh, well, times have changed and for the better, I believe. But these changes are bringing with them some odd incongruities in our prior thinking.
A few days ago, I picked up a link from Michael Pollan's twitter feed to a story that startled me: It was insinuating that all these decades of sequencing human genome to hunt for targets of disease susceptibility have come up nearly empty-handed (with a few well known exceptions). The story recounted a tale of decades of vigorous funding, meteoric career growth and very few results to date. Yet, the scientists involved are not ready to give up on human genes as the primary locus for disease susceptibility. In fact, the strong rescue bias and the fear of losing all that beautiful funding are colluding to generate some creative and complex hypotheses. I came upon one such hypothesis earlier today, following a link from Ed Yong, that British science journalist extraordinaire. But the hypothesis itself, though fascinating, is not what interested me most, no. Care to guess what did? You are correct if you said the p-value.
What grabbed me is the calculation that to identify interactions of significance, the adjusted alpha level has to be set at 10^ -12; that's 0.000000000001! Why is this, and what does it mean in the context of our recent ruminations on the topic of p-value? Well, oddly enough it ties in very nicely with Bayesian thinking and John Ioannides' incendiary assertion of lies in research. How? Follow me.
The hallmark of identifying "significant" relationships (remember that the word "significant" merely means "worthy of noting") in genetic research is searching for statistical associations between the exposure (in this case a mutation in the genetic locus) and the outcome (the specific disease in question). When we start this analysis, we have absolutely no idea which gene(s) mutation(s) is(are) likely to be at play. This means that we have no way of establishing... can you guess what? Yes, that's right, the pre-test probability. We know nothing about the pretest probability. This shotgunning screening of thousands of genes is in essence a random search for matching pairs of mutation and disease. So, why should this matter?
The reason that it matters resides in the simple, albeit overused, coin toss analogy. When you toss a coin, what are your chances of getting heads on any given toss? The odds are 1:1 every time whether you will get heads or tails, meaning that the chance of getting heads is 50% (unless the coin is rigged, in which case we are talking about a Bayesian coin, not the focus of what we are getting at here). So, if you call heads prior to any given attempt, you will be correct half the time. Now, what does this have to do with the question at hand? Well, let's go back to our definition of the p-value: a p-value represents the probability of obtaining the result (in our gene case it is the association with disease) of the magnitude observed or greater under the conditions that no real association exists. So, our customary p-value threshold of 0.05 can be verbalized as follows: "There is a 5% probability that the association of the magnitude observed or greater could happen by chance if there is in reality no association". Now, what happens if we test 20 associations? In other words, what are the chances that one of them will come up "significant" at the p=0.05 level? Yes, there is a 1 in 20 (or 5 in 100) chance that this association will hit our level of significance preset at 0.05. And it follows that the chances of getting a "significant" result grow with more testing.
This is the very reason that gene scientists have asked (and now answered) the question of what level of statistical significance (also called alpha, signified by the p-value) is acceptable when exploring hundreds of thousands of potential associations without any prior clue of what to expect. And that level is a shocking 0.000000000001!
This reinforces a couple of ideas that we have been discussing of late. First, to say simply that the p-value of 0.05 was reached is to say nothing. The p-value needs to be put into the context of a). prior probability of association and b). the number of tests of association performed. Second, as a corollary, the p-value threshold needs to be set according to these two parameters: the lower the prior probability and/or the higher the number of tests, the lower the p-value needs to be in order to be noteworthy. Third, if we pay attention only to the p-value, and particularly if we fail to understand how to set the threshold for significance, what we get is junk, and, as Ioannidis aptly points out, lies. Fourth, and final, genetic data are tougher to interpret that most of us appreciate. It is possibly even less precise than clinical data we usually discuss on this blog. And we know how uncertain that can be!
So, as sexy as the emerging field of genetics was in the 1980s, I am pretty happy that, after four years in the lab, I decided to go first to medical and then to epidemiology schools. Dealing with conditions where we at least have some notion of the pre-test probabilities makes this quantitative window through which I see healthcare just a little less opaque. And these days, I am even happier about having bucked my classmates' Wall Street trend. But that's a story for another day.
          
A few days ago, I picked up a link from Michael Pollan's twitter feed to a story that startled me: It was insinuating that all these decades of sequencing human genome to hunt for targets of disease susceptibility have come up nearly empty-handed (with a few well known exceptions). The story recounted a tale of decades of vigorous funding, meteoric career growth and very few results to date. Yet, the scientists involved are not ready to give up on human genes as the primary locus for disease susceptibility. In fact, the strong rescue bias and the fear of losing all that beautiful funding are colluding to generate some creative and complex hypotheses. I came upon one such hypothesis earlier today, following a link from Ed Yong, that British science journalist extraordinaire. But the hypothesis itself, though fascinating, is not what interested me most, no. Care to guess what did? You are correct if you said the p-value.
What grabbed me is the calculation that to identify interactions of significance, the adjusted alpha level has to be set at 10^ -12; that's 0.000000000001! Why is this, and what does it mean in the context of our recent ruminations on the topic of p-value? Well, oddly enough it ties in very nicely with Bayesian thinking and John Ioannides' incendiary assertion of lies in research. How? Follow me.
The hallmark of identifying "significant" relationships (remember that the word "significant" merely means "worthy of noting") in genetic research is searching for statistical associations between the exposure (in this case a mutation in the genetic locus) and the outcome (the specific disease in question). When we start this analysis, we have absolutely no idea which gene(s) mutation(s) is(are) likely to be at play. This means that we have no way of establishing... can you guess what? Yes, that's right, the pre-test probability. We know nothing about the pretest probability. This shotgunning screening of thousands of genes is in essence a random search for matching pairs of mutation and disease. So, why should this matter?
The reason that it matters resides in the simple, albeit overused, coin toss analogy. When you toss a coin, what are your chances of getting heads on any given toss? The odds are 1:1 every time whether you will get heads or tails, meaning that the chance of getting heads is 50% (unless the coin is rigged, in which case we are talking about a Bayesian coin, not the focus of what we are getting at here). So, if you call heads prior to any given attempt, you will be correct half the time. Now, what does this have to do with the question at hand? Well, let's go back to our definition of the p-value: a p-value represents the probability of obtaining the result (in our gene case it is the association with disease) of the magnitude observed or greater under the conditions that no real association exists. So, our customary p-value threshold of 0.05 can be verbalized as follows: "There is a 5% probability that the association of the magnitude observed or greater could happen by chance if there is in reality no association". Now, what happens if we test 20 associations? In other words, what are the chances that one of them will come up "significant" at the p=0.05 level? Yes, there is a 1 in 20 (or 5 in 100) chance that this association will hit our level of significance preset at 0.05. And it follows that the chances of getting a "significant" result grow with more testing.
This is the very reason that gene scientists have asked (and now answered) the question of what level of statistical significance (also called alpha, signified by the p-value) is acceptable when exploring hundreds of thousands of potential associations without any prior clue of what to expect. And that level is a shocking 0.000000000001!
This reinforces a couple of ideas that we have been discussing of late. First, to say simply that the p-value of 0.05 was reached is to say nothing. The p-value needs to be put into the context of a). prior probability of association and b). the number of tests of association performed. Second, as a corollary, the p-value threshold needs to be set according to these two parameters: the lower the prior probability and/or the higher the number of tests, the lower the p-value needs to be in order to be noteworthy. Third, if we pay attention only to the p-value, and particularly if we fail to understand how to set the threshold for significance, what we get is junk, and, as Ioannidis aptly points out, lies. Fourth, and final, genetic data are tougher to interpret that most of us appreciate. It is possibly even less precise than clinical data we usually discuss on this blog. And we know how uncertain that can be!
So, as sexy as the emerging field of genetics was in the 1980s, I am pretty happy that, after four years in the lab, I decided to go first to medical and then to epidemiology schools. Dealing with conditions where we at least have some notion of the pre-test probabilities makes this quantitative window through which I see healthcare just a little less opaque. And these days, I am even happier about having bucked my classmates' Wall Street trend. But that's a story for another day.
Wednesday, December 15, 2010
Why medical testing is never a simple decision
A couple of days ago, Archives of Internal Medicine published a case report online. Now, it is rather unusual for a high impact journal to publish even a case series, let alone a case report. Yet this was done in the vein of highlighting their theme of "less is more" in medicine. This motif was announced by Rita Redberg many months ago, when she solicited papers to shed light on the potential harms that we perpetrate in healthcare with errors of commission.
The case in question is one of a middle-aged woman presenting to the emergency room with vague symptoms of chest pain. Although from reading the paper it becomes clear that the pain is highly unlikely to represent heart disease, the doctors caring for the patient elect to do a non-invasive CT angiography test, just to "reassure" the patient, as the authors put it. Well, lo' and behold, the test comes back positive, the woman goes for an invasive cardiac catheterization, where, though no disease is found, she suffers a very rare but devastating tear of one of the arteries in her heart. As you can imagine, she gets very ill, requires a bypass surgery and ultimately an urgent heart transplant. Yup, from healthy to a heart transplant patient in just a few weeks. Nice, huh?
The case illustrates the pitfalls of getting a seemingly innocuous test for what appears to be a humanistic reason -- patient reassurance. Yet, look at the tsunami of harm that followed this one decision. But what is done is done. The big question is, can cases like this be prevented in the future? And if so, how? I will submit to you that Bayesian approaches to testing can and should reduce such complications. Here is how.
First, what is Bayesian thinking? Bayesian thinking, formalized mathematically through Bayes theorem, refers to taking the probability of disease being there into account when interpreting subsequent test results. What does this mean? Well, let us take the much embattled example of mammography and put some numbers to the probabilities. Let us assume that an otherwise healthy woman between 40 and 50 years of age has a 1% chance of developing breast cancer (that is 1 out of every 100 such women, or 100 out of 10,000 undergoing screening). Now, let's say that a screening mammogram is able to pick up 80% of all cancers that are actually there (true positives), meaning that 20% go unnoticed by this technology. So, among the 100 women with actual breast cancer of the 10,000 women screened, 80 will be diagnosed as having cancer, while 20 will be missed. OK so far? Let's go on. Let us also assume that, in a certain fraction of the screenings, mammography will merely imagine that a cancer is present, when in fact there is no cancer. Let us say that this happens about 10% of the time. So, going back to the 10,000 women we are screening, of 9,900 who do NOT have cancer (remember that only 100 can have a true cancer), 10%, or 990 individuals, will still be diagnosed as having cancer. So, tallying up all of the positive mammograms, we are now faced with 1,070 women diagnosed with breast cancer. But of course, of these women only 80 actually have the cancer, so what's the deal? Well, we have arrived at the very important idea of the value of a positive test: this roughly tells us how sure we should be that a positive test actually means that the disease is present. It is a simple ratio of the real positives (true positives, in our case the 80 women with true cancer) and all of the positives obtained with the test (in our case 1,070). This is called positive predictive value of a test, and in our mammography example for women between ages of 40 and 50 it turns out to be 7.5%. So, what this means is that over 90% of the positive mammograms in this population will turn out to be false positives.
Now, let us look at the flip side of this equation, or the value of a negative test. Of the 8,930 negative mammograms, only 20 will be false negatives (remember that in our case mammography will only pick up 80 out of 100 true cancers). This means that the other 8,910 negative results are true negatives, making the value of a negative test, or negative predictive value, 8,910/8,930 = 99.8%, or just fantastic! So, if the test is negative, we can be pretty darn sure that there is no cancer. However, if the test is positive, while cancer is present in 80 women, 900 others will undergo unnecessary further testing. And for every subsequent test a similar calculus applies, since all tests are fallible.
Let's do one more maneuver. Let's say that now we have a population of 10,000 women who have a 10% chance of having breast cancer (as is the case with an older population). The sensitivity and specificity of mammography do not change, yet the positive and negative predictive values do. So, among these 10,000 women, 1,000 are expected to have cancer, of which 800 will be picked up on mammography. Among the 9,000 without cancer, a mammogram will "find" a cancer in 900. So, the total positive mammograms add up to 1,700, of which nearly 50% are true positives (800/1,700 = 47.1%). Interestingly, the negative predictive value does not change a whole lot (8,100/[8,100 + 200]) = 97.6%, or still quite acceptably high). So, while among younger women at a lower risk for breast cancer, a positive mammogram indicates the presence of disease in only 8% of the cases, for older women it is about 50% correct.
These two examples illustrate how exquisitely sensitive an interpretation of any test result is to the pre-test probability that a patient has the disease. Applying this to the woman in the case report in the Archives, some back-of-the-napkin calculations based on the numbers in the report suggest that, while a negative CT angiogram would indeed have been reassuring, a positive one would only create confusion, as it, in fact, did.
To be sure, if we had a perfect test, or one that picked up disease 100% of the time when it was present and did not mislabel people without the disease as having it, we would not need to apply this type of Bayesian accounting. However, to the best of my knowledge, no such test exists in today's clinical practice. Therefore, engaging in explicit calculations of what results can be expected in a particular patient from a particular test before ordering such a test can save a lot of headaches, and perhaps even lives. In fact, I do hope that the developers of our new electronic medical environments are giving this serious thought, as these simple algorithms should be built into all decision support systems. Bayes theorem is an idea whose time has surely come.
The case in question is one of a middle-aged woman presenting to the emergency room with vague symptoms of chest pain. Although from reading the paper it becomes clear that the pain is highly unlikely to represent heart disease, the doctors caring for the patient elect to do a non-invasive CT angiography test, just to "reassure" the patient, as the authors put it. Well, lo' and behold, the test comes back positive, the woman goes for an invasive cardiac catheterization, where, though no disease is found, she suffers a very rare but devastating tear of one of the arteries in her heart. As you can imagine, she gets very ill, requires a bypass surgery and ultimately an urgent heart transplant. Yup, from healthy to a heart transplant patient in just a few weeks. Nice, huh?
The case illustrates the pitfalls of getting a seemingly innocuous test for what appears to be a humanistic reason -- patient reassurance. Yet, look at the tsunami of harm that followed this one decision. But what is done is done. The big question is, can cases like this be prevented in the future? And if so, how? I will submit to you that Bayesian approaches to testing can and should reduce such complications. Here is how.
First, what is Bayesian thinking? Bayesian thinking, formalized mathematically through Bayes theorem, refers to taking the probability of disease being there into account when interpreting subsequent test results. What does this mean? Well, let us take the much embattled example of mammography and put some numbers to the probabilities. Let us assume that an otherwise healthy woman between 40 and 50 years of age has a 1% chance of developing breast cancer (that is 1 out of every 100 such women, or 100 out of 10,000 undergoing screening). Now, let's say that a screening mammogram is able to pick up 80% of all cancers that are actually there (true positives), meaning that 20% go unnoticed by this technology. So, among the 100 women with actual breast cancer of the 10,000 women screened, 80 will be diagnosed as having cancer, while 20 will be missed. OK so far? Let's go on. Let us also assume that, in a certain fraction of the screenings, mammography will merely imagine that a cancer is present, when in fact there is no cancer. Let us say that this happens about 10% of the time. So, going back to the 10,000 women we are screening, of 9,900 who do NOT have cancer (remember that only 100 can have a true cancer), 10%, or 990 individuals, will still be diagnosed as having cancer. So, tallying up all of the positive mammograms, we are now faced with 1,070 women diagnosed with breast cancer. But of course, of these women only 80 actually have the cancer, so what's the deal? Well, we have arrived at the very important idea of the value of a positive test: this roughly tells us how sure we should be that a positive test actually means that the disease is present. It is a simple ratio of the real positives (true positives, in our case the 80 women with true cancer) and all of the positives obtained with the test (in our case 1,070). This is called positive predictive value of a test, and in our mammography example for women between ages of 40 and 50 it turns out to be 7.5%. So, what this means is that over 90% of the positive mammograms in this population will turn out to be false positives.
Now, let us look at the flip side of this equation, or the value of a negative test. Of the 8,930 negative mammograms, only 20 will be false negatives (remember that in our case mammography will only pick up 80 out of 100 true cancers). This means that the other 8,910 negative results are true negatives, making the value of a negative test, or negative predictive value, 8,910/8,930 = 99.8%, or just fantastic! So, if the test is negative, we can be pretty darn sure that there is no cancer. However, if the test is positive, while cancer is present in 80 women, 900 others will undergo unnecessary further testing. And for every subsequent test a similar calculus applies, since all tests are fallible.
Let's do one more maneuver. Let's say that now we have a population of 10,000 women who have a 10% chance of having breast cancer (as is the case with an older population). The sensitivity and specificity of mammography do not change, yet the positive and negative predictive values do. So, among these 10,000 women, 1,000 are expected to have cancer, of which 800 will be picked up on mammography. Among the 9,000 without cancer, a mammogram will "find" a cancer in 900. So, the total positive mammograms add up to 1,700, of which nearly 50% are true positives (800/1,700 = 47.1%). Interestingly, the negative predictive value does not change a whole lot (8,100/[8,100 + 200]) = 97.6%, or still quite acceptably high). So, while among younger women at a lower risk for breast cancer, a positive mammogram indicates the presence of disease in only 8% of the cases, for older women it is about 50% correct.
These two examples illustrate how exquisitely sensitive an interpretation of any test result is to the pre-test probability that a patient has the disease. Applying this to the woman in the case report in the Archives, some back-of-the-napkin calculations based on the numbers in the report suggest that, while a negative CT angiogram would indeed have been reassuring, a positive one would only create confusion, as it, in fact, did.
To be sure, if we had a perfect test, or one that picked up disease 100% of the time when it was present and did not mislabel people without the disease as having it, we would not need to apply this type of Bayesian accounting. However, to the best of my knowledge, no such test exists in today's clinical practice. Therefore, engaging in explicit calculations of what results can be expected in a particular patient from a particular test before ordering such a test can save a lot of headaches, and perhaps even lives. In fact, I do hope that the developers of our new electronic medical environments are giving this serious thought, as these simple algorithms should be built into all decision support systems. Bayes theorem is an idea whose time has surely come.
Monday, December 13, 2010
Can a "negative" p-value obscure a positive finding?
I am still on my p-value kick, brilliantly fueled by Dr. Steve Goodman's correspondence with me and another paper by him aptly named "A Dirty Dozen: Twelve P-Value Misconceptions". It is definitely worth a read in toto, as I will only focus on some of its more salient parts.
Perhaps the most important point that I have gleaned from my p-value quest is that the word "significance" should be taken quite literally. Here is what Merriam-Webster dictionary says about it:
 It is the very meaning in point 2a that the word "significance" was meant to convey in reference to statistical testing. That is "worth noting" or "noteworthy". Nowhere do we find any references to God or truth or dogma. So, the first lesson is to drift away from taking statistical significance as the sign from God that we have discovered the absolute truth, and to focus on the fact that  we need to make note of the association. The follow up to the noting action is confirmation or refutation. That is, once identified, this relationship needs to be tested again (and sometimes again and again) before we can say that there may be something to it.
As an aside, how many times have you received comments from peer reviewers saying that what you are showing has already been shown? Yet, in all of our Discussion section we are quite cautious to say that "more research is needed" to confirm what we have seen. So, it seems we -- researchers, editors and reviewers -- just need to get on the same page.
To go on, as the title of the paper states, we are initiated into the 12 common misconceptions about what p-value is not. Here is the table that enumerates all 12 (since the paper is easy to find for no fee, I am assuming that reproducing the table with attribution is not a problem):
    
Even though some of them seem quite similar, it is worth understanding the degrees of difference, as they provide important insights.
The one I wanted to touch upon further today is Misconception #12, as it dovetails with our prior discussion vis-a-vis environmental risks. But before we do this, it is worth defining the elusive meaning of the p-value once again: "The p-value signifies the probability of obtaining the association (or difference) of the magnitude obtained or one of greater magnitude when in reality there is no association (or difference)". So, let's apply this to an everyday example of smoking and lung cancer risk. Let's say a study shows a 2-fold increase in lung cancer among smokers compared to non-smokers, and the p-value for this association is 0.06. What this really means is that "under conditions of no true association between smoking and lung cancer, there is a 6% or less chance that a study would find a 2-fold or greater increase in cancer associated with smoking". Make sense? Yet, according to the "rules" of statistical significance, we would call this study negative. But is this a true negative? (To the reader of this blog this is obvious, but assure you that, given how cursory our reading of the literature tends to be, and how often I hear my peers discount findings with the careless "But the p-value was not significant", this is a point worth harping on).
The bottom line answer to this is found in the discussion of the bottom line misconception in the Table: "A scientific conclusion or treatment policy should be based on whether or not the p-value is significant". I would like to quote directly from Goodman's paper here, as it really drives home the idiocy of this idea:
Taken together, all these points merely confirm my prior assertion that we need to be a lot more cautious about calling results negative when deciding about potentially risky exposures than about beneficial ones. Similarly, we need to set a much higher bar for all threats to validity in studies designed to look at risky rather than beneficial outcomes (more on this in a future post). These are the principles we should be employing when evaluating environmental exposures. This becomes particularly critical in view of the startling revelations of the genome-wide association experiments findings that our genes determine a very small minority of diseases to which we are subject. This means that the ante has been upped dramatically for environmental exposures as culprits, and demands a much more serious push for the precautionary principle as the foundation for our environmental policy.
Perhaps the most important point that I have gleaned from my p-value quest is that the word "significance" should be taken quite literally. Here is what Merriam-Webster dictionary says about it:
sig·nif·i·cance
noun \sig-ˈni-fi-kən(t)s\
Definition of SIGNIFICANCE
1
a : something that is conveyed as a meaning often obscurely or indirectly
b : the quality of conveying or implying
2
a : the quality of being important : moment
b : the quality of being statistically significant
As an aside, how many times have you received comments from peer reviewers saying that what you are showing has already been shown? Yet, in all of our Discussion section we are quite cautious to say that "more research is needed" to confirm what we have seen. So, it seems we -- researchers, editors and reviewers -- just need to get on the same page.
To go on, as the title of the paper states, we are initiated into the 12 common misconceptions about what p-value is not. Here is the table that enumerates all 12 (since the paper is easy to find for no fee, I am assuming that reproducing the table with attribution is not a problem):
Even though some of them seem quite similar, it is worth understanding the degrees of difference, as they provide important insights.
The one I wanted to touch upon further today is Misconception #12, as it dovetails with our prior discussion vis-a-vis environmental risks. But before we do this, it is worth defining the elusive meaning of the p-value once again: "The p-value signifies the probability of obtaining the association (or difference) of the magnitude obtained or one of greater magnitude when in reality there is no association (or difference)". So, let's apply this to an everyday example of smoking and lung cancer risk. Let's say a study shows a 2-fold increase in lung cancer among smokers compared to non-smokers, and the p-value for this association is 0.06. What this really means is that "under conditions of no true association between smoking and lung cancer, there is a 6% or less chance that a study would find a 2-fold or greater increase in cancer associated with smoking". Make sense? Yet, according to the "rules" of statistical significance, we would call this study negative. But is this a true negative? (To the reader of this blog this is obvious, but assure you that, given how cursory our reading of the literature tends to be, and how often I hear my peers discount findings with the careless "But the p-value was not significant", this is a point worth harping on).
The bottom line answer to this is found in the discussion of the bottom line misconception in the Table: "A scientific conclusion or treatment policy should be based on whether or not the p-value is significant". I would like to quote directly from Goodman's paper here, as it really drives home the idiocy of this idea:
This misconception encompasses all of the others. It is equivalent to saying that the magnitude of effect is not relevant, that only evidence relevant to a scientific conclusion is in the experiment at hand, and that both beliefs and actions flow directly from the statistical results. The evidence from a given study needs to be combined with that from prior work to generate a conclusion. In some instances, a scientifically defensible conclusion might be that the null hypothesis is still probably true even after a significant result, and in other instances, a nonsignificant P value might still lead to a conclusion that a treatment works. This can be done formally only through Bayesian approaches. To justify actions, we must incorporate the seriousness of errors flowing from the actions together with the chance that the conclusions are wrong.When the author advocates Bayesian approaches, he is referring to the idea that a positive result in the setting of a low pre-test probability still has a very low chance of describing a truly positive association. This is better illustrated by the Bayes theorem, which allows us to quantify the result at hand ("posterior probability") to what the bulk of prior evidence and/or thought has indicated about the association ("prior probability"). This implies that the lower our prior probability, the less convinced we can be by a single positive result. As a corollary, the higher our prior probability for an association, the less credence we can put in a single negative result. So, Bayesian approach to evidence, as Goodman indicates here, can merely move us in the direction of either a greater or a lesser doubt about our results, NOT bring us to the truth or falsity.
Taken together, all these points merely confirm my prior assertion that we need to be a lot more cautious about calling results negative when deciding about potentially risky exposures than about beneficial ones. Similarly, we need to set a much higher bar for all threats to validity in studies designed to look at risky rather than beneficial outcomes (more on this in a future post). These are the principles we should be employing when evaluating environmental exposures. This becomes particularly critical in view of the startling revelations of the genome-wide association experiments findings that our genes determine a very small minority of diseases to which we are subject. This means that the ante has been upped dramatically for environmental exposures as culprits, and demands a much more serious push for the precautionary principle as the foundation for our environmental policy.
Thursday, December 9, 2010
1,000 lives per day or 45 lives every hour
In the wake of the recent studies confirming our suspicions that we are no better off today than a decade ago as far as the safety of our healthcare system is concerned, I have been doing a lot of thinking and writing about this issue. The other day I blogged about the fact that there are no simple solutions, yet we must pursue change. Today, this e-mail from 350.org really stopped me in my tracks:
All these lives come with stories, all these lives are loved by someone, and all these lives cannot just be written off as sacrificial lambs in the name of a growing bureaucracy that cannot move the meter. We can wring our collective hands and say that we wish we knew how to stop this gushing bleed. Yet, we continue to conduct business as usual, increasing revenues and testing and interventions and cognitive loads and questionable evidence. Ultimately, should eleven years of doing the same thing and getting the same woefully inadequate result encourage us to continue in the same direction, or should we just come to a full stop for a moment?
I realize that medicine cannot stop -- illness will not stop. But the lifestyle that feeds the gluttonous homicidal machine of healthcare can be altered. A combination of prevention, reduction of interventions of questionable effectiveness and safety, more time for doctors to think about their patients and make decisions together -- this is the path. It is not easy, but neither is losing a partner, a brother or a child to the very idol at whose altar we have come to worship and atone for all of our individual and societal bad choices. Today is the day. Who is with me?
Dear friends,To put it in the context of our healthcare system, the unnecessary mortalities and morbidities are happening faster than our quality improvements are moving! In other words, if there are approximately 400,000 avoidable deaths annually attributable to healthcare encounters, this means that every day we delay implementing a viable solution we lose over 1,000 lives per day or about 45 lives every hour or 1 life every 1 and 1/2 minutes! In the time that it took me to write this post, 20 patients have lost their lives unnecessarily. Are any of them your loved ones?
Climate negotiations can seem quite abstract sometimes.
I'm here in Cancún, Mexico, where UN delegates from around the world spend hours debating details of complex regulations. Sometimes it seems that everyone has forgotten a crucial fact: the climate is changing much faster than these negotiations are moving.
Meanwhile, out in the real world, climate impacts are all too visible. Since the negotations began 10 days ago, climate disasters have struck all over the world: flooding in Australia, Venezuela, the Balkans, Columbia, India; wildfires in Israel, Lebanon, Tibet; freak winter storms in Europe and the United States. These events have been devastating--hundreds are dead, and hundreds of thousands have been affected.
All these lives come with stories, all these lives are loved by someone, and all these lives cannot just be written off as sacrificial lambs in the name of a growing bureaucracy that cannot move the meter. We can wring our collective hands and say that we wish we knew how to stop this gushing bleed. Yet, we continue to conduct business as usual, increasing revenues and testing and interventions and cognitive loads and questionable evidence. Ultimately, should eleven years of doing the same thing and getting the same woefully inadequate result encourage us to continue in the same direction, or should we just come to a full stop for a moment?
I realize that medicine cannot stop -- illness will not stop. But the lifestyle that feeds the gluttonous homicidal machine of healthcare can be altered. A combination of prevention, reduction of interventions of questionable effectiveness and safety, more time for doctors to think about their patients and make decisions together -- this is the path. It is not easy, but neither is losing a partner, a brother or a child to the very idol at whose altar we have come to worship and atone for all of our individual and societal bad choices. Today is the day. Who is with me?
Wednesday, December 8, 2010
Getting beyond the p-value
Update 12/8/10, 9:30 AM: I just got an e-mail from Steve Goodman, MD, MHS, PhD, from Johns Hopkins about this post. Firstly, my apologies for getting his role at the Annals wrong -- he is the Senior Statistical Editor for the journal, not merely a statistical reviewer. I am happy to report that he added more fuel to the p-value fire, and you are likely to see more posts on this (you are overjoyed, right?). So, thanks to Dr. Goodman for his input this morning!
Yesterday I blogged about our preference to avoid false positive associations at the expense of failing to detect some real associations. The p value conundrum, where the arbitrary statistical significance is set at <0.05 has bothered me for a long time. I finally got curious enough to search out the origins of the p value. Believe it or not, the information was not easy to find. I have at lest 10 biostatistics or epidemiology textbooks on the shelves of my office -- not one of them went into the history of the p value threshold. But Professor Google came to my rescue, and here is what I discovered.
Using a carefully crafted search phrase, I found a discussion forum on WAME, World Association of Medical Editors, which I felt represented a credible source. Here I discovered a treasure trove of information and references to what I was looking for. Specifically, one poster referred to Steven Goodman's work, which I promptly looked up. And by the way, Steven Goodman, as it turns out isa statistical reviewer the Senior Statistical Editor for the Annals of Internal Medicine and a member of WAME. So, I went to this gem in the journal Epidemiology from May 2001, called unpretentiously "Of P-values and Bayes: A Modest Proposal". I have to say that some of the discussion was so in the weeds that even I have to go back and reread it several times to understand what the good Dr. Goodman is talking about. But here are some of the more salient and accessible points.
The author begins by stating his mixed feelings about the p-value:
The point is that we need to get beyond the p-value and develop a more sophisticated, nuanced and critical attitude toward data. Furthermore, regulatory bodies need to get a more nuanced way of communicating scientific data, particularly data evidencing harm, in order not to lose credibility with the people. Most importantly, however, we need to do a better job training researchers on the subtleties of statistical analyses, so that the p-value does not become the ultimate arbiter of the truth.
Yesterday I blogged about our preference to avoid false positive associations at the expense of failing to detect some real associations. The p value conundrum, where the arbitrary statistical significance is set at <0.05 has bothered me for a long time. I finally got curious enough to search out the origins of the p value. Believe it or not, the information was not easy to find. I have at lest 10 biostatistics or epidemiology textbooks on the shelves of my office -- not one of them went into the history of the p value threshold. But Professor Google came to my rescue, and here is what I discovered.
Using a carefully crafted search phrase, I found a discussion forum on WAME, World Association of Medical Editors, which I felt represented a credible source. Here I discovered a treasure trove of information and references to what I was looking for. Specifically, one poster referred to Steven Goodman's work, which I promptly looked up. And by the way, Steven Goodman, as it turns out is
The author begins by stating his mixed feelings about the p-value:
I am delighted to be invited to comment on the use of P-values, but at the same time, it depresses me. Why? So much brainpower, ink, and passion have been expended on this subject for so long, yet plus ca change, plus c'ést le meme chose- the more things change, the more they stay the same. The references on this topic encompass innumerable disciplines, going back almost to the moment that P-values were introduced (by R.A. Fisher in the 1920s). The introduction of hypothesis testing in 1933 precipitated more intense engagement, caused by the subsuming of Fisher's significance test into the hypothesis test machinery.1-9 The discussion has continued ever since. I have been foolish enough to think I could whistle into this hurricane and be heard. 10-12 But we (and I) still use P-values. And when a journal like Epidemiology takes a principled stand against them, 13 epidemiologists who may recognize the limitations of P-values still feel as if they are being forced to walk on one leg. 14So, here we learn that the p-value is something that has been around for 90 years and was brought into being by the father of frequentist statistics R.A. Fisher. And the users are ambivalent about it, to say the least. So, why, Goodman asks, continue to debate the value of the p-value (or its lack)? And here is the reason: publications.
Let me begin with an observation. When epidemiologists informally communicate their results (in talks, meeting presentations, or policy discussions), the balance between biology, methodology, data, and context is often appropriate. There is an emphasis on presenting a coherent epidemiologic or pathophysiologic story, with comparatively little talk of statistical rejection or other related tomfoolery. But this same sensibility is often not reflected in published papers. Here, the structure of presentation is more rigid, and statistical summaries seem to have more power. Within these confines, the narrative flow becomes secondary to the distillation of complex data, and inferences seem to flow from the data almost automatically. It is this automaticity of inference that is most distressing, and for which the elimination of P-values has been attempted as a curative.This is clearly a condemnation of the way we publish: it demands a reductionist approach to the lowest common denominator, in this case the p-value. Much like our modern medical paradigm, the p-value does not get at the real issues:
I and others have discussed the connections between statistics and scientific philosophy elsewhere, 11,12,15-22 so I will cut to the chase here. The root cause of our problem is a philosophy of scientific inference that is supported by the statistical methodology in dominant use. This philosophy might best be described as a form of naïve inductivism,23 a belief that all scientists seeing the same data should come to the same conclusions. By implication, anyone who draws a different conclusion must be doing so for nonscientific reasons. It takes as given the statistical models we impose on data, and treats the estimated parameters of such models as direct mirrors of reality rather than as highly filtered and potentially distorted views. It is a belief that scientific reasoning requires little more than statistical model fitting, or in our case, reporting odds ratios, P-values and the like, to arrive at the truth. [emphasis mine]Here is a sacred scientific cow that is getting tipped! You mean science is not absolute? Well, no, it is not, as the readers of this blog are amply aware. Science at best represents a model of our current understanding of the Universe, it builds upon itself usually in one direction, and it rarely gives an asymptotic approximation of what is really going on. Merely our current understanding of reality, given the tools we have at our disposal. Goodman continues to drive home the naïveté of our inductivist thinking in the following paragraph:
How is this philosophy manifest in research reports? One merely has to look at their organization. Traditionally, the findings of a paper are stated at the beginning of the discussion section. It is as if the finding is something derived directly from the results section. Reasoning and external facts come afterward, if at all. That is, in essence, naïve inductivism. This view of the scientific enterprise is aided and abetted by the P-value in a variety of ways, some obvious, some subtle. The obvious way is in its role in the reject/accept hypothesis test machinery. The more subtle way is in the fact that the P-value is a probability - something absolute, with nothing external needed for its interpretation.In fact the point is that the p-value is exactly NOT absolute. The p-value needs to be judged relative to some other standard of probability, for example the prior probability of an event. And yet what do we do? We worship at the altar of the p-value without giving any thought to its meaning. And this is certainly convenient for those who want to invoke evidence of absence of certain associations, such as toxic exposures and health effects, for example, when the reality simply indicates absence of evidence.
The point is that we need to get beyond the p-value and develop a more sophisticated, nuanced and critical attitude toward data. Furthermore, regulatory bodies need to get a more nuanced way of communicating scientific data, particularly data evidencing harm, in order not to lose credibility with the people. Most importantly, however, we need to do a better job training researchers on the subtleties of statistical analyses, so that the p-value does not become the ultimate arbiter of the truth.
Tuesday, December 7, 2010
Of P values, power, tobacco and cell phones
Like everything else political, science, as written in the tobacco playbook, is being used for divisive purposes. A new study from Denmark, as reported by WebMD, indicates that exposure to cell phones in utero may influence the child's behavior. The authors are appropriately tempered in their conclusions and subsequent recommendations, responsibly acknowledging the obvious limitations of their research. Yet, to paraphrase one of the authors, what is the harm in asking a pregnant woman to keep her cell phone away from her uterus? It is at most a slight inconvenience. This sentiment is echoed by Devra Davis, the author of Disconnect, a book summarizing what we know about human health effects of microwave radiation and cellular technology, in her comments to WebMD, wherein she plainly echoes the authors to say that the current study was not perfect. Yet, she urges precaution until we know more about the potential ill effects of cell phones, particularly on the most vulnerable in our population.
Now, I have blogged before about Devra Davis and about my sense that her messages of caution are less than popular even with those who consider themselves to be scientifically skeptical. Perhaps especially with those who consider themselves to be scientifically skeptical. It seems to me that we have gone a little bit overboard on this critical thinking idea, where we demand data on harms to reach the same level of evidentiary standards as salutary effects. This is a fallacy. It is a convenient fallacy, to be sure. In fact, what many people may not realize is that it is much more than an accidental byproduct of our skepticism. No, it is a strategy, as discussed in Davis's book The Secret History of the War on Cancer deliberately developed and skillfully executed by the tobacco industry to cast doubt for decades on the harms of smoking. Look how well it has done -- 80 years of knowing of cancer risks associated with smoking and nearly 50 years after the Surgeon General's report on the dangers of smoking, we are still looking to mass screening for diseases caused by tobacco rather than mass cessation efforts to diminish its impact. This incredibly effective and durable strategy relies on scientific data generated by men with loud reputations funded by the tobacco companies to demand unequivocal causality between the exposure and the outcome. So, inherently, it demands bodies in the street to show unequivocal links.
Predictably, now the cell phone industry is using the same strategy. Here are a couple of quotes from the WebMD article, which I personally find less than amusing:
Surprise! The science is just not there! So, let's party, right?
Well, wrong. In epidemiology we set a very high bar for avoiding a false positive, and a much lower bar for a false negative. You have all heard of the much misunderstood p value. What this number represents is exactly the probability of chance giving us the result of the magnitude observed or one of a greater magnitude under the circumstances of no real association. By convention (whim?) we usually set the threshold for significance at <0.05, meaning that there is a <5% of obtaining the given result or one of greater magnitude by chance if there is no actual association. This clearly stacks the deck against calling an association positive. In contradistinction, we are much more willing to obtain a false negative, as illustrated by the conventional power of a study to detect a difference where a true difference exists at only 80%. So, colloquially, we want to be 95% sure that a positive result is not a false positive association, yet only 80% sure that a negative result is not a false negative. You see the difference? We are clearly more keen to discard associations that do not really exist than to identify them. To an untrained eye the difference may be subtle, but it creates a world of philosophical difference in how we approach data.
And it is exactly these in some ways arbitrarily accepted standards of our science that are corrupted to sell false certainty of the negative associations. I have written about my thoughts on proof of harm vs. benefit in prior posts. I firmly believe that it is our interest in being more certain that an association is not spurious at the expense of being more willing to discard associations that are true that is facilitating our national blindness to many potentially adverse health effects from stuff around us. I am convinced that causation of harm needs to be viewed with greater caution and therefore with greater scientific permissiveness. If we do not adopt this precautionary principle, used by the EU liberally in their environmental policies, we will continue to see even more piles of bodies in the streets -- just look at tobacco. This is especially true when all there is is a risk with no benefit (tobacco), or when a simple behavior change can eliminate any concern for an adverse outcome (cell phones).
Now, I have blogged before about Devra Davis and about my sense that her messages of caution are less than popular even with those who consider themselves to be scientifically skeptical. Perhaps especially with those who consider themselves to be scientifically skeptical. It seems to me that we have gone a little bit overboard on this critical thinking idea, where we demand data on harms to reach the same level of evidentiary standards as salutary effects. This is a fallacy. It is a convenient fallacy, to be sure. In fact, what many people may not realize is that it is much more than an accidental byproduct of our skepticism. No, it is a strategy, as discussed in Davis's book The Secret History of the War on Cancer deliberately developed and skillfully executed by the tobacco industry to cast doubt for decades on the harms of smoking. Look how well it has done -- 80 years of knowing of cancer risks associated with smoking and nearly 50 years after the Surgeon General's report on the dangers of smoking, we are still looking to mass screening for diseases caused by tobacco rather than mass cessation efforts to diminish its impact. This incredibly effective and durable strategy relies on scientific data generated by men with loud reputations funded by the tobacco companies to demand unequivocal causality between the exposure and the outcome. So, inherently, it demands bodies in the street to show unequivocal links.
Predictably, now the cell phone industry is using the same strategy. Here are a couple of quotes from the WebMD article, which I personally find less than amusing:
John Walls, vice president of public affairs at CTIA-The Wireless Association, a trade group representing the wireless industry, tells WebMD that his group “stands behind the research review by independent and renowned public health agencies around the world which states that there are no known adverse health effects associated with using wireless devices.”
Jeff Stier, a senior fellow at the National Center for Public Policy Research, a conservative think tank, says that the new study is full of holes. “For starters, self-reporting of cell phone use makes it impossible to assign any meaning to the exposure,” he says.
“Different phones give off different exposures, and even those who were reported to be not exposed, probably had significant environmental exposure, rendering the study only slightly more than amusing,” he tells WebMD in an email.
Surprise! The science is just not there! So, let's party, right?
Well, wrong. In epidemiology we set a very high bar for avoiding a false positive, and a much lower bar for a false negative. You have all heard of the much misunderstood p value. What this number represents is exactly the probability of chance giving us the result of the magnitude observed or one of a greater magnitude under the circumstances of no real association. By convention (whim?) we usually set the threshold for significance at <0.05, meaning that there is a <5% of obtaining the given result or one of greater magnitude by chance if there is no actual association. This clearly stacks the deck against calling an association positive. In contradistinction, we are much more willing to obtain a false negative, as illustrated by the conventional power of a study to detect a difference where a true difference exists at only 80%. So, colloquially, we want to be 95% sure that a positive result is not a false positive association, yet only 80% sure that a negative result is not a false negative. You see the difference? We are clearly more keen to discard associations that do not really exist than to identify them. To an untrained eye the difference may be subtle, but it creates a world of philosophical difference in how we approach data.
And it is exactly these in some ways arbitrarily accepted standards of our science that are corrupted to sell false certainty of the negative associations. I have written about my thoughts on proof of harm vs. benefit in prior posts. I firmly believe that it is our interest in being more certain that an association is not spurious at the expense of being more willing to discard associations that are true that is facilitating our national blindness to many potentially adverse health effects from stuff around us. I am convinced that causation of harm needs to be viewed with greater caution and therefore with greater scientific permissiveness. If we do not adopt this precautionary principle, used by the EU liberally in their environmental policies, we will continue to see even more piles of bodies in the streets -- just look at tobacco. This is especially true when all there is is a risk with no benefit (tobacco), or when a simple behavior change can eliminate any concern for an adverse outcome (cell phones).
Monday, December 6, 2010
"Invisibility, inertia and income" and patient safety
Hat tip to @KentBottles for a link to this story
I spend a lot of time thinking about the quality and safety of our healthcare system, as well as our efforts to improve it. I have written a lot about it here in this blog and in some of my peer-reviewed publications. You, my reader, have surely sensed my frustration with the fact that we have been unable to put any kind of a dent in the killing that goes on within our hospitals and other healthcare encounter locations. So, it is always with much interest and appreciation that I learn that I am not alone, and that others have had it with the criminal lack of the sense of urgency to stop this medical holocaust. For this reason, I was really happy to read Michael Millenson's post on the Health Affairs Blog titled "Why We Still Kill Patients: Invisibility, Inertia and Income". I was very curious to see how he structured his argument to boil it down to these three I's, since I think that sexy slogans and memorable triplets are the way to go. So, here is how his arguments went.
First, establish the problem. And indeed, we have been killing around 100,000 people annually since the late 1970s (and probably since before then, as you actually have to look in order to find), which amounts to the total 20-year toll of 2.5 million unnecessary deaths due to healthcare in the US. This is truly appalling. And this is just up through the 1999 IoM report! Here is what I was thinking: And if we take into account not just the killing fields of the hospital, but all of life's interfaces with healthcare, we arrive at an even more frightening 400,000 deaths annually, as known back in 2000. Multiply this by 10, and now we really are talking about a killing machine of holocaust proportions! And I completely agree with Millenson that the fact that we continue to say "more research needed" and other pablum like that is utterly and completely irresponsible. However, is this really an invisible problem? The author makes a good argument for how we minimize these numbers by failing to add them up:
On the to the next "I", inertia. I agree with Millenson generally, and we actually know this, that physicians do not practice evidence-based medicine, and, even when it does, evidence takes decades to penetrate practice. And there is every reason to be upset that the medical profession has not rushed to adopt evidence-based prevention measures that Millenson talks about. But there is a greater subtlety here than meets the eye. True, the Kestone project is frequently held as an example of a simple evidence-based bundled intervention resulting in in a huge reduction in central line-associated blood stream infections. Indeed, this is a great success and everyone should be practicing the checklist instituted in the project by Peter Pronovost's group. What is less obvious and even less talked about is that the same approach of evidence-based bundled approach to prevention of ventilator-associated pneumonia (VAP) has also been piloted by the Keystone group, yet none of us has seen any data from that. All I have is rumors at this point, but they are not good. Why is this? Well, I have discussed this before here and here: VAP is a very tricky diagnosis in a very tricky population. This is not to say that we need not work as hard as we can to prevent it. It is just to clarify that we are not sure of the best ways to accomplish this. Is this in and of itself shameful? Well, yes, if you think that medicine is a precise science. But if you have been reading my blog long enough, you know this is not the case.
Millenson further sites his reading of the Joint Commission Journal, which has been documenting the progress within one large Catholic healthcare system, Ascension, in its efforts to reduce infections, falls and other common iatrogenic harms. By the system's account, they are now able to save over 2,000 lives annually with these measures. This is impressive. But is it trustworthy? Unfortunately, without reading the primary studies I cannot comment on the latter. However, I did publish a review of studies from this very journal on VAP prevention efforts, and here is what I found:
And finally, income. I do agree that it is annoying that economic arguments are even necessary to promote a culture of prevention and safety. What I disagree with is that these economic fallacies of the C-suite impact in any way the implementation of the needed prevention systems. Most of the evidence-based preventions are pretty low tech. And although they do require teams and commitment and systems to implement broadly, small demonstrations at the level of individual clinicians are possible. Also, I shudder at the thought that a group of dedicated clinicians could not persuade a group of equally dedicated administrators to do the right thing, even at the risk of losing some revenue.
Bottom line? While I like Millenson's sexy little "three I's of safety", I think the solutions, as is always the case when you start looking under the hood, are more complicated and nuanced. In a recent post I cited 5 potential solutions to our quality problem, and I will repeat them here:
I spend a lot of time thinking about the quality and safety of our healthcare system, as well as our efforts to improve it. I have written a lot about it here in this blog and in some of my peer-reviewed publications. You, my reader, have surely sensed my frustration with the fact that we have been unable to put any kind of a dent in the killing that goes on within our hospitals and other healthcare encounter locations. So, it is always with much interest and appreciation that I learn that I am not alone, and that others have had it with the criminal lack of the sense of urgency to stop this medical holocaust. For this reason, I was really happy to read Michael Millenson's post on the Health Affairs Blog titled "Why We Still Kill Patients: Invisibility, Inertia and Income". I was very curious to see how he structured his argument to boil it down to these three I's, since I think that sexy slogans and memorable triplets are the way to go. So, here is how his arguments went.
First, establish the problem. And indeed, we have been killing around 100,000 people annually since the late 1970s (and probably since before then, as you actually have to look in order to find), which amounts to the total 20-year toll of 2.5 million unnecessary deaths due to healthcare in the US. This is truly appalling. And this is just up through the 1999 IoM report! Here is what I was thinking: And if we take into account not just the killing fields of the hospital, but all of life's interfaces with healthcare, we arrive at an even more frightening 400,000 deaths annually, as known back in 2000. Multiply this by 10, and now we really are talking about a killing machine of holocaust proportions! And I completely agree with Millenson that the fact that we continue to say "more research needed" and other pablum like that is utterly and completely irresponsible. However, is this really an invisible problem? The author makes a good argument for how we minimize these numbers by failing to add them up:
I laid out those numbers in a March, 2003 Health Affairs article that challenged the profession to break a silence of deed — failing to take corrective actions — and a silence of word — failing to discuss openly the consequences of that failure. This pervasive silence, I wrote:
continually distorts the public policy debate [and] gives individuals and institutions that must undergo difficult changes a license to postpone them. Most seriously of all, it allows tens of thousands of preventable patient deaths and injuries to continue to accumulate while the industry only gradually starts to fix a problem that is both long-standing and urgent.
Nearly eight years later, medical professionals now talk freely about the existence of error and loudly about the need for combating it, but silence about the extent of professional inaction and its causes remains the norm. You can see it in this latest study, which decries the continuing “patient-safety epidemic” while failing to do next what any public health professional would instinctually do: tally up the toll. Instead, we get dry language about the IOM’s goal of a 50 percent error reduction over five years not being met.
Let’s fill in the blanks: If this unchecked “epidemic” were influenza and not iatrogenesis, then from 1999 to date it would have killed the equivalent of every man, woman and child in the cities of Raleigh (this study took place in North Carolina) and Washington, D.C. Does a disaster of that magnitude really suggest that “further study” and a “refocusing of resources” are what’s needed?I guess this makes sense -- adding up the numbers is pretty startling, yet we are reluctant to do so. At the same time I hesitate to call this "invisible", since as you saw in a paragraph above, I just multiplied by 10! Yet I am willing to concede the first "I" to Millenson, since I do see the power in these startling numbers.
On the to the next "I", inertia. I agree with Millenson generally, and we actually know this, that physicians do not practice evidence-based medicine, and, even when it does, evidence takes decades to penetrate practice. And there is every reason to be upset that the medical profession has not rushed to adopt evidence-based prevention measures that Millenson talks about. But there is a greater subtlety here than meets the eye. True, the Kestone project is frequently held as an example of a simple evidence-based bundled intervention resulting in in a huge reduction in central line-associated blood stream infections. Indeed, this is a great success and everyone should be practicing the checklist instituted in the project by Peter Pronovost's group. What is less obvious and even less talked about is that the same approach of evidence-based bundled approach to prevention of ventilator-associated pneumonia (VAP) has also been piloted by the Keystone group, yet none of us has seen any data from that. All I have is rumors at this point, but they are not good. Why is this? Well, I have discussed this before here and here: VAP is a very tricky diagnosis in a very tricky population. This is not to say that we need not work as hard as we can to prevent it. It is just to clarify that we are not sure of the best ways to accomplish this. Is this in and of itself shameful? Well, yes, if you think that medicine is a precise science. But if you have been reading my blog long enough, you know this is not the case.
Millenson further sites his reading of the Joint Commission Journal, which has been documenting the progress within one large Catholic healthcare system, Ascension, in its efforts to reduce infections, falls and other common iatrogenic harms. By the system's account, they are now able to save over 2,000 lives annually with these measures. This is impressive. But is it trustworthy? Unfortunately, without reading the primary studies I cannot comment on the latter. However, I did publish a review of studies from this very journal on VAP prevention efforts, and here is what I found:
A systematic approach to understanding this research revealed multiple shortcomings. First, since all of the papers reported positive results and none reported negative ones, there is a potential for publication bias. For example, a recent story in a non-peer-reviewed trade publication questioned the effectiveness of bundle implementation in a trauma ICU, where the VAP rate actually increased directionally from 10 cases per 1,000 MV days in the period before to 11.9 cases per 1,000 MV days in the period after implementation of the bundle (24). This was in contradistinction to the medical ICU in the same institution, which achieved a reduction from 7.8 to 2.0 cases per 1,000 MV days with the same intervention (24). Since the results did not appear in a peer-reviewed form, it is difficult to judge the quality or significance of these data; however, the report does highlight the need for further investigation, particularly focusing on groups at heightened risk for VAP, such as trauma and neurological critically ill (25).
So, not to toot my own horn here, and not expecting you to read the long-winded Discussion, suffice it to say that we found many methodologic errors in this body of research from the Joint Commission's own journal to invalidate potentially nearly all of the reported findings. My point is again to reiterate that unless you read each study with a critical eye and then put it into the larger context, do not believe someone else's cursory reference to the staggering improvements. I guess pertinent to our discussion, inertia, while present, is a more nuanced issue than we are led to believe.Second, each of the four reported studies suffers from a great potential for selection bias, which was likely present in the way VAP was diagnosed. Since all of the studies were naturalistic and none was blinded, and since all of the participants were aware of the overarching purpose of the intervention, the diagnostic accuracy of VAP may have been different before as compared to after the intervention. This concern is heightened by the fact that only one study reports employing the same team approach to VAP identification in the two periods compared (23). In other studies, although all used the CDC-NNIS VAP definition, there was either no reporting of or heterogeneity in the personnel and methods of applying these definitions. Given the likely pressure to show measurable improvement to the management, it is possible that VAP classification suffered from a bias.Third, although interventional in nature, naturalistic quality improvement studies can suffer from confounding much in the same way that observational epidemiologic studies do. Since none of the studies addressed issues related to case mix, seasonal variations, secular trends in VAP, and since in each of the studies adjunct measures were employed to prevent VAP, there is a strong possibility that some or all of these factors, if examined, would alter the strength of the association between the bundle intervention and VAP development. Additional components that may have played a role in the success of any intervention are the size and academic affiliation of the hospital. In a study of interventions aimed at reducing the risk of CRBSI, Pronovost et al. found that smaller institutions had a greater magnitude of success with the intervention than their larger counterparts (26). Similarly, in a study looking at an educational program to reduce the risk of VAP, investigators found that community hospital staff were less likely to complete the educational module than the staff at an academic institution; in turn, the rate of VAP was correlated with the completion of the educational program (27). Finally, although two of the studies included in this review represent data from over 20 ICUs each (20, 22), the generalizability of the findings in each remains in question. For example, the study by Unahalekhaka and colleagues was performed in the institutions in Thailand, where patient mix and the systems of care for the critically ill may differ dramatically from those in the US and other countries in the developed world (22). On the other hand, while the study by Resar and coworkers represents a cross section of institutions within the US and Canada, no descriptions are given of the particular ICUs with respect to the structure and size of their institutions, patient mix or ICU care model (e.g., open vs. closed; intensivists present vs. intensivists absent, etc.) (20). This aggregate presentation of the results gives one little room to judge what settings may benefit most and least from the described interventions. The third study includes data from only two small ICUs in two community institutions in the US (21), while the remaining study represents a single ICU in a community hospital where ICU patients are not cared for by an intensivist (23). Since it is acknowledged that a dedicated intensivist model leads to improved ICU outcomes (28, 29), the latter study has limited usefulness to institutions that have a more rigorous ICU care model.
And finally, income. I do agree that it is annoying that economic arguments are even necessary to promote a culture of prevention and safety. What I disagree with is that these economic fallacies of the C-suite impact in any way the implementation of the needed prevention systems. Most of the evidence-based preventions are pretty low tech. And although they do require teams and commitment and systems to implement broadly, small demonstrations at the level of individual clinicians are possible. Also, I shudder at the thought that a group of dedicated clinicians could not persuade a group of equally dedicated administrators to do the right thing, even at the risk of losing some revenue.
Bottom line? While I like Millenson's sexy little "three I's of safety", I think the solutions, as is always the case when you start looking under the hood, are more complicated and nuanced. In a recent post I cited 5 potential solutions to our quality problem, and I will repeat them here:
1. Empower clinicians to provide only care that is likely to produce a benefit that outweighs risks, be they physical or emotional.No, they are not simple, they are not sexy, and most importantly they may be painful. Yet, what is the alternative? We must stop this massive bleeder before the American public starts thinking that the cure is worse than the disease.
2. Reward the signal and not the noise. I wrote about this here andhere.
3. Reward clinicians with more time rather than money. Although I am not aware of any data to back up this hypothesis, my intuition is that slowing down the appointment may result not only in reduction of harm by cutting out unnecessary interventions, but also in overall lowering of healthcare expenditures. It is also sure to improve the crumbling therapeutic relationship.
4. We need to re-engineer our research enterprise for the most important stakeholder in healthcare: the clinician-patient dyad. We need to make the data that are currently manufactured and consumed for large scale policy decisions more friendly at the individual level. And as a corollary, we need to re-think how we help information diffuse into practice and adopt some of the methods of the social sciences.
5. Let's get back to the tried and true methods of public health, where an ounce of prevention continues to be worth a pound of cure. Yes, let's strive for reducing cancer mortality, but let us invest appropriately in stuffing that tobacco horse back into its barn -- getting people to stop smoking will reduce lung cancer mortality by 85% rather than 0.3%, and at a much lower cost with no complications or false positives. Same goes for our national nutrition and physical activity struggles. Our social policies must support these well-recognized and efficient population interventions.
"When evil walks into a room, it is not wearing horns and a tail"
I was listening to a podcast of Krista Tippett On Being this past weekend while hiking. She was having a conversation with Darius Rejali, an Iranian-born American scholar who studies and writes about torture. His most memorable line was: "When evil walks into a room, it is not wearing horns and a tail". He said that one does not have to read Hannah Arendt to understand the banality of evil.
Yet for me, reading Eichmann in Jerusalem did produce the very epiphany evoked by the book's subtitle: A report on the Banality of Evil. More than any other previous experience, Arendt's narrative wove for me the convincing tale demonstrating how easily one can slide into evil. The journey is incremental and, by virtue of this, insidious. The travel companions also matter -- they must be people whom one respects and emulates, and, the logic becomes that, if they are acting in a certain way, it must then be OK for one to act similarly. She illustrates these ideas by recounting the testimony of Eichmann, a petty clerk, a follower essentially, who ascended in the Nazi bureaucracy to the level of orchestrating a massive genocide. The story is one of almost witless and blind compliance with the higher-ups, all people commanding Eichmann's respect and admiration. The thesis is so spooky that it demands constant self-examination to assure avoiding the barely noticeable descent into evil.
Similarly, Rejali described people who are hired to perpetrate torture not as psychopaths, passionate about hurting others, but more as methodical disciplinarians, careful not to inflict too much damage, lest the victim expire prior to giving the wanted information. So, essentially, they are just "doing their jobs". Just as marketers who sell us cigarettes are doing their jobs. I am sure that they do not perceive themselves as evil -- they have families, children, they have their faith, they play golf and eat Chinese food, just like the rest of us. They drive their children to soccer games and sit on school committees and boards of charitable organizations. Yet, what they do at work is follow orders and money and sell certain death.
And how much other evil gets perpetrated by regular people? What about climate change deniers? Because they do not have unequivocal proof that humans are catalyzing an environmental catastrophe, they feel justified in prioritizing the economy over the facts. They do not understand that observational science does not provide proof beyond the shadow of the doubt, and we indeed need to heed the circumstantial evidence before us. Yet, since respected politicians deny any possibility of an impending man-made environmental disaster, citizen deniers feel justified in their willful ignorance.
Am I stretching this concept too much? I do not think so. Nicholas Christakis and colleagues have been showing the importance of networks to our health, and it is not a stretch to imagine that networks may be critical to the health of other common concerns. These include the environment, and may even include our vast and growing economic disparities, which are no longer merely causing illness and inconvenience, but are responsible for killing our children.
So, next time we cavalierly dismiss others' concerns about shared resources, next time we turn our heads away from socially and economically inconvenient realities, let us inspect methodically the possibility that this very banality of evil may be working to insinuate itself into who we are. Because an ounce of prevention is still worth a pound of cure.
Yet for me, reading Eichmann in Jerusalem did produce the very epiphany evoked by the book's subtitle: A report on the Banality of Evil. More than any other previous experience, Arendt's narrative wove for me the convincing tale demonstrating how easily one can slide into evil. The journey is incremental and, by virtue of this, insidious. The travel companions also matter -- they must be people whom one respects and emulates, and, the logic becomes that, if they are acting in a certain way, it must then be OK for one to act similarly. She illustrates these ideas by recounting the testimony of Eichmann, a petty clerk, a follower essentially, who ascended in the Nazi bureaucracy to the level of orchestrating a massive genocide. The story is one of almost witless and blind compliance with the higher-ups, all people commanding Eichmann's respect and admiration. The thesis is so spooky that it demands constant self-examination to assure avoiding the barely noticeable descent into evil.
Similarly, Rejali described people who are hired to perpetrate torture not as psychopaths, passionate about hurting others, but more as methodical disciplinarians, careful not to inflict too much damage, lest the victim expire prior to giving the wanted information. So, essentially, they are just "doing their jobs". Just as marketers who sell us cigarettes are doing their jobs. I am sure that they do not perceive themselves as evil -- they have families, children, they have their faith, they play golf and eat Chinese food, just like the rest of us. They drive their children to soccer games and sit on school committees and boards of charitable organizations. Yet, what they do at work is follow orders and money and sell certain death.
And how much other evil gets perpetrated by regular people? What about climate change deniers? Because they do not have unequivocal proof that humans are catalyzing an environmental catastrophe, they feel justified in prioritizing the economy over the facts. They do not understand that observational science does not provide proof beyond the shadow of the doubt, and we indeed need to heed the circumstantial evidence before us. Yet, since respected politicians deny any possibility of an impending man-made environmental disaster, citizen deniers feel justified in their willful ignorance.
Am I stretching this concept too much? I do not think so. Nicholas Christakis and colleagues have been showing the importance of networks to our health, and it is not a stretch to imagine that networks may be critical to the health of other common concerns. These include the environment, and may even include our vast and growing economic disparities, which are no longer merely causing illness and inconvenience, but are responsible for killing our children.
So, next time we cavalierly dismiss others' concerns about shared resources, next time we turn our heads away from socially and economically inconvenient realities, let us inspect methodically the possibility that this very banality of evil may be working to insinuate itself into who we are. Because an ounce of prevention is still worth a pound of cure.
Sunday, December 5, 2010
Top 5 this week
#5: Could our application of EBM be unethical?
#4: Evidence of harm
#3: Our nation's shocking Lady Macbeth moment
#2: Why are we still paying tobacco executives to kill...
And the #1 post this week is... Healthcare quality: 5 ways to stop the insanity
Thank you all for stopping by, commenting and broadening my thinking with your contributions!
#4: Evidence of harm
#3: Our nation's shocking Lady Macbeth moment
#2: Why are we still paying tobacco executives to kill...
And the #1 post this week is... Healthcare quality: 5 ways to stop the insanity
Thank you all for stopping by, commenting and broadening my thinking with your contributions!
Subscribe to:
Comments (Atom)
 
 
 

