There is uncertainty and controversy around the best psychosocial interventions for bipolar disorder.
Network meta-analysis (NMA) is becoming increasingly employed to try and resolve such uncertainty. NMA can be an ideal tool for achieving data synthesis (Leucht, 2016), however for the approach to be valid the relevant data must meet exact formal criteria; such methodological requirements of NMA need to be more widely appreciated.
Mary Lou Chatterton, Emily Stockings, Michael Berk, Jan J. Barendregt, Rob Carter and Cathrine Mihalopoulos have entered published data identified by systematic review from psychosocial interventions for bipolar disorder into a network meta-analysis (Chatterton et al, 2017) focused on:
- Relapse to mania or depression,
- Medication adherence, and
- Symptom scales for mania, depression and Global Assessment of Functioning (GAF).
Forty one trials were identified and the interventions were grouped as:
- Cognitive Behaviour Therapy (CBT),
- Psychoeducation alone,
- Psychoeducation in combination with CBT,
- Psychoeducation and Personalized Real-time Intervention for Stabilizing Mood,
- Family focused psychotherapy and
- Carer-focused interventions.
Networks were created for the different outcomes described in the methods section and are illustrated in the online supplementary material (PDF). They are all sparse and not well connected, in that most of the comparisons are not between different interventions but with the common control condition known as ‘treatment as usual’ or TAU. Ranking of the different treatments was then made for the different treatment outcomes using relative risk (for events) or effect size (g: for scaled outcomes).
The analysis suggested that:
- Carer-focused interventions significantly reduced the risk of depressive or manic relapse;
- Psychoeducation alone and in combination with cognitive behavioural therapy (CBT) significantly reduced medication non-adherence;
- Psychoeducation plus CBT significantly reduced manic symptoms and increased GAF;
- No intervention was associated with a significant reduction in depression symptom scale scores.
The authors rightly highlight the desirability of more standardized outcome measures in trials of psychosocial interventions and highlight the challenge of treating depression in bipolar disorder, but they are cautiously confident in endorsing their findings as a basis for practice and indeed policy. We are not so sure that their confidence is really justified.
Strengths and weaknesses
In 2014, NICE recommended psychological treatments as the primary modality of treatment in primary care for bipolar depression, as equivalent to medication in the management of bipolar depression in secondary care and sharing equal importance with medication in the long term (Kendall 2014).
An earlier blog reviewed a study demonstrating the bias apparent in the handling of the relevant evidence by NICE (Jauhar et al, 2016). In particular, the use of multiple parallel meta-analyses and subsequent cherry-picking of possible positive findings was highlighted. In principle, NMA should take us beyond that kind of approach and the present study appears to do so.
Was the network meta-analysis valid?
Network meta-analysis has been developed to synthesise evidence across a network of randomised trials, assess the relative effectiveness of several interventions and rank treatment options. This statistical method is based on the simultaneous analysis of direct evidence (which compares treatments within the same study) and indirect evidence (comparing interventions across different studies using a treatment in common). Indirect evidence is important because, alone, it allows comparison of treatments that have not been compared directly and, in combination with direct evidence, it increases the precision of the estimates (producing so-called mixed evidence).
NMA is becoming very popular in the scientific literature, but unfortunately this success too often goes together with an increased risk of poor methodological quality. NMA requires that the subject data meet various criteria the most basic of which is transitivity. In other words, before carrying out a NMA (and in order to be confident that final results are correct and clinically informative) the authors should answer “yes” to the following questions:
- Can any patient within the network be randomised to any of the treatments included in the network?
- Is it possible to imagine a mega trial where patients can be randomised to any of the interventions included in the network?
If yes to both, transitivity is preserved. If no, it is not. In fact the answer appears to be ‘no’ here, because some studies required patients to be “euthymic” and other studies simply required that they could give consent (and so might be highly symptomatic, in an episode of depression or even treatment resistant). In such situations, having access to the full protocol usually helps because authors report why they think the assumption of transitivity holds. We searched but we could not find the document (in PROSPERO there is only a very short and not really informative summary). We would be grateful to Mary Lou Chatterton and colleagues if they could clarify their position.
A further critical issue is the shape of the network. Ideally, networks should be well connected and these are not. This means that the analysis depends heavily on indirect comparisons via TAU, which is problematic. The choice of a fair comparison treatment is difficult in psychotherapy trials. When the active treatment is superior to TAU, no specificity can be claimed for the content and TAU is often very poorly specified. Any network so dependent on indirect comparison is likely to be unstable.
NMA allows consistency to be examined formally (i.e. statistically): so if A beats B and B beats C, does A beat C? If the answer is yes, there is no inconsistency in the loop between A, B and C. If not, the loop is inconsistent. In this paper there are no data/tables that report this, and the authors did not perform local tests of inconsistency (even when the global test is OK, there might be important local inconsistencies, which must be appraised in order to evaluate the affected mixed estimates). Despite the best efforts of investigators to construct a consistent network, statistically significant inconsistency may arise, which should be investigated when found. Systematic review protocols list potential sources of heterogeneity, possibly using them to form more homogeneous subgroups of studies and generate hypotheses for effect modifiers. Similarly, network meta-analysis should describe in the protocol a clear strategy to deal with inconsistency. There is nothing like that here and, indeed, the reporting of the statistical analysis is inadequate.
Finally, the quality of the evidence is crucial to interpret the results from a network meta-analysis. Guidance exists on how to rate the quality of evidence supporting treatment effect estimates obtained from NMA and simply assessing the risk of bias of individual studies (as the authors did and reported in detail in the Appendix) is not enough. Methods developed by the GRADE working group have standardised the procedure and showed that the quality of evidence supporting NMA estimates can vary from high to very low across comparisons (quality ratings given to a whole network are uninformative and likely to mislead). In any NMA, quality of evidence is likely to be different from estimate to estimate; therefore the GRADE ratings need to be attempted for all primary outcome estimates. For these elements not to be addressed in a NMA published in a peer-reviewed journal is disappointing; the authors deserved more searching editorial scrutiny.
The problem of interpreting psychotherapy trials
It is possible to carry out a methodologically sound NMA, but we are left with the generic problems for psychotherapy trials of small scale, allegiance bias, lack of blinding, demand characteristics when assessing outcomes and publication bias (Flint et al, 2015). This is certainly not something a systematic reviewer can change; however, they should be discussed and appraised in any attempt at evidence synthesis of psychological therapies.
‘Adverse reactions’ to psychological treatment are not collected systematically and hence are under-appreciated (Nutt and Sharpe, 2008). A consideration only of efficacy is inherently unbalanced in the perspective it gives to treatment selection.
NMA does offer a powerful method for synthesising data and avoiding bias in the selection of evidence. However to be valid, it sets high demands on the statistical methods employed and the data included. This is not widely enough appreciated and the present publication illustrates some of the limitations. Its conclusions may be correct, but we remain concerned that the evidence supporting them is actually less than compelling.
- Guy Goodwin has advised many pharma doing trials in the bipolar therapeutic area and drafted the BAP guidelines for the treatment of bipolar disorder (PDF).
- Andrea Cipriani is leading a network meta-analysis of psychological interventions in bipolar disorder (CRD42015016085)
Chatterton ML, Stockings E, Berk M, Barendregt JJ, Carter R, Mihalopoulos C. (2017) Psychosocial therapies for the adjunctive treatment of bipolar disorder in adults: network meta-analysis. The British Journal of Psychiatry Feb 2017, DOI: 10.1192/bjp.bp.116.195321 [Abstract]
Is the NICE guideline for bipolar disorder biased in favour of psychosocial interventions?
Flint J, Cuijpers P, Horder J, Koole SL, Munafò MR. (2015) Is there an excess of significant findings in published studies of psychotherapy for depression? Psychol Med 45: 439-446.
Kendall T, Morriss R, Mayo-Wilson E, Marcus E. (2014) Guideline Development Group of the National Institute for Health and Care Excellence. Assessment and management of bipolar disorder: summary of updated NICE guidance. BMJ. 2014 Sep 25;349:g5673.
Leucht S, Chaimani A, Cipriani A, Davis JM, Furukawa TA, Salanti G. (2016) Network meta-analyses should be the highest level of evidence in treatment guidelines. Eur Arch Psychiatry Clin Neurosci. 2016 Sep;266(6):477-80.
Nutt DJ and Sharpe M. (2008) Uncritical positive regard? Issues in the efficacy and safety of psychotherapy. Journal of psychopharmacology (Oxford, England) 22: 3-6.
I read with interest the blog comments by Cipriani and Goodwin on the paper by Chatterton et al. which on first read appears reasonable and I thus ended up reading the Chatterton et al paper in detail. I then realized that there had been much ado about nothing.
The first criticism was that NMA requires that the subject data meet the assumption of transitivity. In other words, in order for the reader to be confident that final results are clinically informative, it should be possible to imagine a mega trial where patients can be randomised to any of the interventions included in the network. Since the authors included both studies requiring patients to be “euthymic” as well as studies simply requiring that they could give consent (and thus may be symptomatic) they concluded that this condition is not met and transitivity is not preserved. Unfortunately, what they do not consider is that patients who are euthymic are simply labeled so because they no longer meet the threshold of a disorder such as depression or mania, as assessed by diagnostic criteria or by cutoff points in rating scales, However, considerable fluctuations in psychological distress, often subsumed under the rubric of subclinical or residual symptomatology have been recorded in studies with longitudinal designs, suggesting that the illness is always active, even though its intensity may vary  . Such findings are consistent with the socioeconomic, psychosocial and clinical deterioration in these patients. It is thus questionable whether subthreshold symptomatic periods truly represent euthymia or are simply a part of the manifestations of bipolar illness  and justifies the selection into this study.
A further criticism was the issue of the authors not evaluating consistency formally (i.e. statistically). This also is wrong as the authors clearly state that “The weighted average H statistic for
all the networks in this analysis was less than 3, indicating minimal inconsistency in treatment effects” Given that the authors used multiple three treatment loops in the GPM framework, they were able to compute a weighted average H across all mixed pooled estimates and the values suggest that analyses were more or less consistent. Had inconsistency been detected, they would have then attempted to investigate each mixed pooled estimate and identify the comparisons mainly responsible for this and remove them from the network in an attempt to reduce overall inconsistency. Maybe the criticism alludes to no P value being provided but then do we need a P value for I2 in heterogeneity assessment undertaken with direct treatment effect meta-analyses?
A final criticism was that while the authors have assessed the quality of all studies included in this analysis individually, they have not extended this to the body of evidence across the network. This is again simply wrong. The authors make use of the quality effects model (results in supplementary material) to demonstrate that the individual quality assessments do not impact the network. This approach is even better and more objective that the GRADE approach. However, the authors may not agree with the application of this method for this purpose but that is a different discussion and cannot be conflated with them not addressing this. This resulted in similar findings suggesting that the variation in the quality of evidence across the network had no major implication on these results.
Finally a strength of the authors analysis is the GPM approach because this is devoid of all assumptions (associated with multivariate frequentist and Bayesian approaches) except that of transitivity.
College of Medicine, Qatar University
Research School of Population Health, Australian National University
Conflict of interest statement
Suhail Doi is a close affiliate of one of the authors (Jan J Barendregt) and was the co-author of the GPM framework for network meta-analysis. Suhail Doi is also the author of the IVhet and quality effects models used by the authors in this paper. However, Suhail Doi had no role in the conception, design, analysis or drafting of this paper and only read this document after publication. Interestingly, this paper represents the first application of the GPM framework to network meta-analysis which can not be criticized given its simplicity and transparency of procedure.
 Fava GA: Subclinical symptoms in mood disorders: pathophysiological and therapeutic
implications. Psychol Med 1999; 29: 47–61.