Crisis of faith? Instead of CBT, we should be worrying about meta-analyses


[Please note: changes were made to this blog on 27/7/15, following discussion between Ioana Cristea and Tom Johnsen – see comments below].

Mental Elf readers are well aware of criticisms directed at psychotherapy’s prodigal son: cognitive behaviour therapy (CBT). Unarguably the most studied and most recommended form of psychotherapy, CBT has nonetheless been been shown to have some problems (see my previous blogs on CBT for adult depression and wait-list control exaggerating the efficacy of CBT).

This may explain why critics particularly rejoiced at this recent meta-analysis (Johnsen and Friborg, 2015), and why it attracted so much attention and almost indiscriminate praise, both among researchers and in the media. Publication in psychology’s number 1 journal, Psychological Bulletin, seemed to additionally guarantee its trustworthiness. But did it?

Disclosure: I am currently involved in a re-analysis of the Johnsen and Friborg study (2015).


CBT is clearly not the universal cure for depression, even if it is shown to work well for many service users.


The primary objective of this meta-analysis was to examine whether published clinical CBT trials (both uncontrolled and controlled) for depressive disorders showed an historical change, in the sense of an increase in their treatment effects over time, independent of other study related variables. The authors included both uncontrolled and controlled trials (randomised or not) in their meta-analysis.

They used an impressive set of exclusion criteria. Studies were excluded if:

  • The implemented therapy was not “pure” CBT (e.g. mindfulness-based CBT)
  • Unipolar depression was not the primary diagnosis
  • Participants were not adults
  • Therapy was not implemented by a trained CBT therapist
  • The psychological intervention was not intended to treat depression
  • Outcome was not measured with the Hamilton Rating Scale for Depression (HRSD) or the Beck Depression Inventory (BDI)
  • Patients had acute physical illnesses, bipolar or psychotic disorders
  • Treatment was not implemented as individual face-to-face therapy
  • Patients had a BDI score lower than 13.5

For studies that did not include a control group, effects sizes (ES) were computed as the standardised mean difference, by subtracting the post- from the pre-intervention means, and dividing by the standard deviation of the change score. For controlled trials, effect sizes were calculated separately for the intervention and control group, with a similar procedure. The authors also computed remission rates, defined as the number of patients who completed treatment with a BDI score below a predefined cut-off of 10.

You wouldn't serve a trifle with garlic. Should you serve a meta-analysis with randomised AND non-randomised trials?

You wouldn’t serve a trifle with garlic. Should you serve a meta-analysis with randomised AND non-randomised trials?


The authors identified 70 trials, out of which 52 were randomised controlled trials (RCTs), and the rest were non-randomised trials. Apart from computing overall effect sizes, thus combining RCTs and non-randomised studies, they also operated an unusual combination. They also computed a controlled effect size (RCTs with a waitlist or treatment as usual control) and within-study design effect sizes (all non-randomised trials PLUS the CBT arm from RCTs that could not, in the authors’ words, be qualified as controlled in the present analyses, even if they were also RCTs such as comparisons with medication). What is essential to note is that only the former category (controlled effect sizes) included exclusively RCTs.

53 within-study design effect sizes  and 17 controlled effect sizes (out of which 15 were wait-list comparisons) were analysed.

  • There was a negative relationship between the ESs of CBT based on the BDI and publication year (p<.001). Subgroup analysis indicated that a similar relationship was evident among both within-study design (p<.001) and controlled studies (p<.05).
  • A similar trend of the ESs of CBT decreasing with publication year was shown on the HRSD (p=.01). However, in this case, while the relationship was evident in the within-study design (p<.01), it was not significant in the controlled studies (p=0.51).
  • Remission rates were also negatively related with publication year (p<0.01). Unfortunately, the authors did not report separate results for within-study design and controlled effect sizes.
  • The waiting list control group did not show a similar trend of decreasing ESs across time (p=.48).
  • Analyses excluding studies with small sample sizes (arbitrarily defined as n<20) obtained the same significant negative trend (p=0.02). Again, the authors failed to report separate results for within-study design and controlled studies.
The authors did not always report separate results for within-study design and uncontrolled studies, which makes it impossible to interpret these findings.

The authors did not always report separate results for within-study design and controlled studies, which makes it impossible to interpret these findings.


The authors concluded that:

The main finding was that the treatment effect of CBT showed a declining trend across time and across both measures of depression (the BDI and the HRSD).

The authors also discuss possible reasons for the decreasing effects of CBT, ranging from deviations from the therapy manual, to the reduction of treatment fidelity and the dynamics of the placebo effect. As the placebo effect is always higher for new treatments, they wonder if that may have been the case for CBT, and whether as time passed, positive expectations about CBT dwindled. In fact, they even worry whether their own meta-analysis might further weaken faith in CBT.

But for anyone familiar with the methodological aspects of meta-analyses, this particular one does not engender a loss of confidence in CBT. It does, however, elicit considerable loss of faith in meta-analyses and in their reliability and usefulness.

Is our faith in CBT dwindling over time, or is it meta-analyses that we should be doubting?

Is our faith in CBT dwindling over time, or is it meta-analyses that we should be doubting?


  • The most important limitation is the combination of uncontrolled and controlled trials, or rather of non-randomised and randomised. Randomisation of participants to treatment groups serves to ensure that sources of bias are equally distributed between these groups, with the only difference between them being the intervention. Non-randomised trials are subject to a whole array of sources of bias, which to a degree we have no way of appropriately gauging. Effects in these trials might be due to many factors other than the intervention, such as the passing of time, non-specific factors like subjects’ expectations (the Placebo effect), the subjects being particular cases, variables unbeknownst to the experimenter being responsible for change, and so on. This phenomenon is of course made worse in uncontrolled trials, where we have absolutely no way of knowing whether effects were due to the specific nature of the intervention at all.
  • Why, then, may the reader justly ask, do we even have non-randomised and indeed uncontrolled trials? Well, in some cases, for pragmatic reasons, it is impossible to randomise participants to treatment conditions. Maybe the disease is so rare or so serious that randomisation would be unfeasible or unethical. Maybe the treatment is so new and untested that one needs to see if it’s worth pursuing at all or if it doesn’t carry serious side-effects. Fortunately, none of these is the case for CBT, which has plenty of RCTs.
  • The authors seem completely oblivious of the many recent meta-analysis of the efficacy of CBT in depression. At least 4 such meta-analyses including comparisons between CBT and a control group were conducted since 2013 (Cuijpers et al, 2013; Barth et al, 2013; Furukawa et al, 2014; Chen et al, 2014) and without exception all of them exclusively included RCTs, with numbers for CBT versus presumably non-active control group comparisons (waitlist, no
    treatment, treatment as usual, placebo) ranging from 49 to 115. In contrast, the authors of this meta-analysis included only 17 such group comparisons. This difference is staggering, and difficult to explain, even if we assume more restrictive inclusion criteria.
  • One meta-analysis (Chen et al, 2014) was also an historical analysis looking at the changes in the quality and quantity of psychotherapy trials (including CBT) for depression. It revealed a relevant, albeit unsurprising, fact about RCTs for depression: most trial quality criteria considered improve over time, and this improvement was particularly present in CBT trials. So it is plausible that the apparent decrease in the efficiency of CBT for depression over time might simply be a by-product of increasing quality of trials. Johnsen and Friborg did look at study quality and found no moderating effect. But given their hotchpotch of uncontrolled and controlled trials and their limited sample of CBT studies, this analysis is not very informative.
  • Related to this, another aspect that changed during time is sample size, with earlier studies having small sample studies. This is an important confounder as it is well established for treatments in general that larger studies yield smaller effect sizes and almost all meta-analyses of psychotherapy for depression have found evidence of this small sample bias. The authors did redo their analysis using an arbitrarily defined cut-off for sample size, but did not look whether sample size significantly moderated effect sizes, or the relationship between effect sizes and publication year.
  • An important analysis that is missing regards heterogeneity, which the authors say they have analysed, but I was unable to find in the paper. Given their combination of studies, heterogeneity was probably very high. So high, in fact, as to indicate there is not much point in combining these studies at all. It is telling that in the most homogenous sample of studies (RCTs) the decreasing trend of CBT was much less evident.
Why did this meta-analysis only include 17 comparisons with presumably non-active control when there are so many more relevant published studies available?

Why did this meta-analysis only include 17 comparisons with presumably non-active control when there are so many more relevant published studies available?


I think the most important implication of this systematic review is not whether the initial efficiency of CBT vanished into thin air or what may be the reasons for that, but whether we can really put our faith in these reviews anymore.

It has always been argued that a major advantage of meta-analyses was that by aggregating more trials they could provide a more objective, balanced view of a field, where sources of bias would be effectively controlled. But if researchers conducting these reviews can obtain such widely different results, if their methodological choices can have such an influence over the results, and indeed if publication in the number one journal in a field is no guarantee, should end users still trust the objectivity and reliability of meta-analyses?

Some things in life you can rely on. Unfortunately, not all meta-analyses fit the bill.

Some things in life you can rely on. Unfortunately, not all meta-analyses fit the bill.


Primary paper

Johnsen TJ, Friborg O, (2015) The effects of cognitive behavioral therapy as an anti-depressive treatment is falling: A meta-analysis (PDF). Psychol. Bull. 141, 747–768. doi:10.1037/bul0000015

Other references

Barth J, Munder T, Gerger H, Nüesch E, Trelle S, Znoj H, Jüni P, Cuijpers P (2013) Comparative efficacy of seven psychotherapeutic interventions for patients with depression: a network meta-analysis. PLoS Med. 10, e1001454. doi:10.1371/journal.pmed.1001454

Chen P, Furukawa TA, Shinohara K, Honyashiki M, Imai H, Ichikawa K, Caldwell DM, Hunot V, Churchill R (2014) Quantity and quality of psychotherapy trials for depression in the past five decades. J. Affect. Disord. 165, 190–195. doi:10.1016/j.jad.2014.04.071 [Abstract]

Cuijpers P, Berking M, Andersson G, Quigley L, Kleiboer A, Dobson KS (2013) A meta-analysis of cognitive-behavioural therapy for adult depression, alone and in comparison with other treatments. Can. J. Psychiatry Rev. Can. Psychiatr. 58, 376–385. [Abstract]

Furukawa TA, Noma H, Caldwell DM, Honyashiki M, Shinohara K., Imai H, Chen P, Hunot V, Churchill R (2014) Waiting list may be a nocebo condition in psychotherapy trials: a contribution from network meta-analysis. Acta Psychiatr. Scand. doi:10.1111/acps.12275 [Abstract]

Share on Facebook Tweet this on Twitter Share on LinkedIn Share on Google+