The study was a meta-analysis, where data from all clinical trials comparing Effexor to an SSRI were pooled together. The authors used remission on the Hamilton Rating Scale for Depression (HAM-D) as their measure of treatment effectiveness. On the HAM-D, a score of less than or equal to 7 was used to define remission. They found that remission rates on Effexor were 5.9% greater than remission rates on SSRIs. Thus, one would need to treat 17 depressed patients with Effexor rather than an SSRI to yield one remission that would not have occurred had all 17 patients received an SSRI. Not a big difference, you say? Here's what the authors said:
...the pooled effect size across all comparisons of venlafaxine versus SSRIs reflected an average difference in remission rates of 5.9%, which reflected a NNT of 17 (1/.059), that is, one would expect to treat approximately 17 patients with venlafaxine to see one more success than if all had been treated with another SSRI. Although this difference was reliable and would be important if applied to populations of depressed patients, it is also true that it is modest and might not be noticed by busy clinicians in everyday practice. Nonetheless, an NNT of 17 may be of public health relevance given the large number of patients treated for depression and the significant burden of illness associated with this disorder. [my emphasis]Public Health Relevance/Remission: The public health claim is pretty far over the top. If one had to treat 17 patients with Effexor to prevent a suicide or homicide that would have occurred had SSRIs been used, then yes, we'd be talking about a significant impact on public health. But that's not what we're dealing with in this study. The outcome variable was remission on the HAM-D, which is a soft, squishy measure of convenience. The authors state that remission rates are "the most rigorous measure of antidepressant efficacy," but to my knowledge there is no evidence supporting their adoption of the magic cutoff score of 7 on the HAM-D as the definition for depressed/not depressed. Are people who scored 8 or 9 on the HAM-D really significantly more depressed than people who scored 6 or 7? Take a look at the HAM-D yourself and make your own decision. I know of not a single piece of empirical data stating that such small differences are meaningful. So I'm not buying the public health benefit -- in fact, I think it is patently ridiculous.
Outcome measures can be either categorical (e.g., remission or no remission) or continuous (e.g., change on HAM-D scores from pretest to posttest). Joanna Moncrieff and Irving Kirsch discuss how using cut-off scores (categorical measures) rather than looking at mean change (continuous measures) can result in the categorical measure making the treatment appear much more effective than examination of continuous measures. Applied to this case, one wonders why the data on mean improvement was not provided. One can make a very weak case that Effexor works better than SSRIs based on an arbitrary categorical measure but not one shred of data was presented to show superiority on a continuous measure. If the data supported Effexor on both categorical and continuous measures, then I'd bet they would have been discussed in this article, as it was funded by Wyeth (patent holder for Effexor). Thus, the absence of data on continuous measures (e.g., difference in mean improvement on the HAM-D between Effexor-treated patients and SSRI-treated patients), is suspicious.
Even if the authors decided to use only categorical measures, it would have been nice had they opted to use multiple measures. They could have used the equally arbitrary 50% improvement criterion (HAM-D scores drop by 50% during treatment), for example. However, such data were not provided. So the authors decided to use one highly arbitrary measure, on which they found a very small benefit for venflafaxine over placebo. Whoopee.
I received an email from a respected psychiatrist (who shall remain anonymous) about this study. He/she opined:
...it would have been interesting if the authors had used other cutoffs for the Hamilton scale besides 7 to define remission; i.e., if they had done a sensitivity analysis. Apparently, has all the raw data from the studies, so a lot of interesting science could be done with this very large aggregate database. For example, there are robust factor analyses of the Hamilton scale that indicate reasonably independent dimensions of depressed mood, agitation/anxiety, diurnal variation, etc., and it would be of great interest to determine the relative effects of the various drugs on these different illness dimensionsIn other words, the authors could have attempted to see if there were meaningful differences between Effexor and SSRIs on important variables, yet they opted to not undertake such analysis. A skeptical view is that they analyzed the data in such a fashion, found nothing, and thus just reported the "good news" about Effexor. I don't know if they conducted additional analyses that were not reported. However, it would seem to me that someone at Wyeth would have run such analyses at some point, perhaps as part of this meta-analysis, because any advantage over SSRIs would make for excellent marketing copy. In fact, Effexor has been running the "better than SSRIs" line for years, based on rather scant data. If there were more impressive data, they would have been reported by now.
Prozac and the Rest: The findings showed that Effexor was only superior to a statistically significant degree (i.e., we'd not expect such differences by chance alone) when compared to Prozac (fluoxetine). The authors, to their credit, pointed this out on multiple occasions. However, their reporting seems a little contradictory when, on one hand, they report that venlafaxine was superior to SSRIs as a class (see quote toward the top of the post), but then note that the differences were only statistically significant when compared to Prozac. The percentage difference in remission favoring Effexor over Zoloft (sertraline) was 3.4%, over Paxil (paroxetine) was 4.6%, Celexa (citalopram) was 3.9%, and Luvox (fluvoxamine) was 14.1%. I think just about anyone would concur that the difference versus fluvoxamine seems too high to be credible, and it was based on only one study, making the fluke factor more tenable. Again, the advantage of Effexor over all SSRIs except Prozac was not statistically significant. Even if these differences were statistically significant, would the authors claim that needing to treat 26 patients with Effexor rather than Celexa to achieve one additional depression remission would improve public health? Small differences on a soft, squishy, arbitrary endpoint combined with not performing (or not reporting) more meaningful data = Not news.
The Editor Piles On: In a press release, the editor of the journal in which this article appears jumped on board in a big way:
Seeing a journal editor swallow the Kool-Aid is not encouraging. Again, the 5.9% difference is based on an endpoint that may well mean nothing.
Acknowledging the seemingly small advantage, John H. Krystal, M.D., Editor of Biological Psychiatry and affiliated with both Yale University School of Medicine and the VA Connecticut Healthcare System, comments that this article “highlights an advance that may have more importance for public health than for individual doctors and patients.” He explains this reasoning:
"If the average doctor was actively treating 200 symptomatic depressed patients and switched all of them to venlafaxine from SSRI, only 12 patients would be predicted to benefit from the switch. This signal of benefit might be very hard for that doctor to detect. But imagine that the entire population of depressed patients in the United States, estimated to be 7.1% of the population or over 21 million people, received a treatment that was 5.9% more effective, then it is conceivable that more than 1 million people would respond to venlafaxine who would not have responded to an SSRI. This may be an example of where optimal use of existing medications may improve public health even when it might not make much difference for individual doctors and patients."
Ghostwriter Watch: Who wrote the study and who conducted the analyses? The authors are listed as Charles Nemeroff, Richard Entsuah, Isma Benattia, Mark Demitrack, Diane Sloan, and Michael Thase. Their respective contributions are not listed in the text of the article. The contribution of Wilfrido Ortega-Leon for assistance with statistical analysis is acknowledged in the article, as are the contributions of Sherri Jones and Lorraine Sweeney of Advogent for "editorial assistance."
Ortega-Leon appears to be an employee of Wyeth. So did an employee of Wyeth run all of the stats, then pass them along to the authors for writeup? Last time I checked, there were sometimes problems associated with having a company-funded statistician run the stats then pass them along without any independent oversight. I don't know what happened, but my questions could have been easily resolved: Describe each author's contributions in a note at the end of the article.
Sherri Jones and Lorraine Sweeney have served in an "editorial assistant" role for other studies promoting Effexor, such as this one. I suspect that they are familiar with the key marketing messages for the drug. An important question: What does "editorial assistance" mean? Did Jones and Sweeney simply spell-check the paper and make sure the figures looked pretty? Did they consult the authors to get the main points, then fill in a few gaps? Or did they write the whole paper then watch the purported authors rubber-stamp their names on the author byline? Simply listing "editorial assistance" is not transparency. I have no problem with medical writers helping with a manuscript, depending on what "helping" means. Many researchers are not skilled writers and cleaning up their writing is a good idea for all parties. But having a medical writer who is paid by a drug company to make sure that key marketing messages are included in the paper can lead to problems.
Part 2, regarding the unemphasized, but important, finding from this study that antidepressants yield mediocre benefits over placebo.
Update (03-03-08): See comments. A wise reader has pointed out that there are actually three authors from Advogent. Well, um, one author and two editorial assistants. A skeptical person would add that the presence of three medical writers and a Wyeth statistican who appears in a footnote at the end of the study obviates the need for those pesky academic authors except for the need to lend the study a stamp of approval from "independent scientists." Is that too cynical?