Bipolar disorder is a complex and recurrent mental illness characterised by episodes of mania and depression. There is evidence to suggest that bipolar disorder is often under-recognised in primary care with a large proportion of cases misdiagnosed as unipolar depression (Cerimele et al., 2014). This may be due to the fact that on average people with bipolar disorder tend to experience depressive episodes more frequently and for longer than manic episodes (Morris et al., 2013). Also, when presenting with depressive symptoms, patients may be less likely to recall a manic or hypomanic episode.
The under-recognition of bipolar disorder is problematic as delayed diagnosis is associated with a range of negative outcomes for the individual, their families and society as a whole. For example, Shi and colleagues (2004) found a higher risk of attempted suicide in people with undiagnosed bipolar disorder compared to those with a diagnosis. In addition, it is likely that a delayed diagnosis may affect treatment and lead to suboptimal outcomes. This can lead to wider social and economic consequences due to increased medical costs and loss of productivity due to an inability to work (Matza et al., 2005).
Screening tools can facilitate the diagnosis of mental health disorders, especially in primary care settings where resources are limited. Carvalho and colleagues (2015) conducted a systematic review to investigate the accuracy of screening tools for bipolar disorder in adults.
They looked at the most commonly used self-report screening tools found in the literature:
- Mood Disorders Questionnaire (MDQ)
- Bipolar Spectrum Diagnostic Scale (BSDS)
- Hypomanic Checklist (HC-32)
They also aimed to compare the accuracy of these measures and to explore the effect of heterogeneity on estimates.
The authors looked for eligible articles by searching electronic databases, scanning reference lists of included studies and tracking citations using Google Scholar.
Studies were included if they reported the:
- Sensitivity (the proportion of people with bipolar disorder, according to the gold standard, who screen positive on the measure) and
- Specificity (the proportion of people without bipolar disorder, according to the gold standard, who screen negative on the measure)
of the MDQ, BSDS or the HCL-32 using the DSM-IV or DSM-IV-TR as the gold standard in the following settings:
- Mental health care
- Primary care/ general population
Studies investigating perinatal, postnatal, child and adolescent samples were excluded.
A series of meta-analyses were used to calculate the average summary values of sensitivity and specificity. These were plotted in receiver-operating characteristic (ROC) space using data from a single cut-off from each study. As some studies used different cut-offs, meta-analyses using hierarchical summary ROC models were also carried out. Meta-analytic regression models were used to compare the accuracy of the three measures with indirect and direct comparison and investigate the potential effect of heterogeneity on estimates. Separate analyses were conducted for study setting (mental health care and primary care/community) and diagnosis (bipolar disorder in general, bipolar II disorder, and bipolar disorder not otherwise specified [NOS]).
In total, 53 studies of 21,543 participants met the inclusion criteria.
Mental health care setting
A total of 44 studies (including 17,451 participants) were carried out in a mental health care setting. At the recommended cut-offs the summary sensitivity and specificity for the screening of bipolar disorder were as follows:
In direct comparisons, three studies compared the MDQ with the BSDS. The studies used different cut-offs for the measure and overall findings were inconsistent. Eight studies compared the MDQ with the HCL-32. Different cut-offs were used across studies but the HCL-32 consistently showed higher sensitivity and lower specificity compared to the MDQ. There was no overall difference in accuracy between the measures.
Primary care setting
Five studies (including 3,321 participants) were carried out in a primary care/general population setting. Four of these studies investigated the MDQ, which showed a sensitivity and specificity of 43% and 95% respectively. One study compared the BSDS with the HCL-32 and found a higher sensitivity and a lower sensitivity for the BSDS although cut-offs were different to the recommended levels.
Due to limited data, it was not possible to conduct a meta-analysis to compare the three measures.
Bipolar disorder, type II
Seventeen studies evaluated the MDQ, BSDS and HCL-32 for the detection of bipolar II in a mental health care setting. There was evidence that the HCL-32 showed higher accuracy in detecting bipolar II disorder than the MDQ in a mental health care setting. Other comparisons did not show any significant differences.
Bipolar disorder, not otherwise specified (NOS)
Two studies evaluated the MDQ for the detection of bipolar disorder NOS in a mental health care setting and found sensitivities of 29% and 68%, whereas specificity estimates were 77% and 80%.
For the MDQ there was evidence that the way in which participants were recruited to the study (enrolling a consecutive or random sample) had an effect of the diagnostic accuracy of the measure. This did not hold for the HCL-32. There was no evidence of a difference in diagnostic accuracy for either measure when the use of a case-control design was used or the inclusion of Asian studies was explored.
Due to a lack of studies it was not possible to assess heterogeneity in the diagnostic accuracy of the BDSD.
The authors concluded that:
Screening instruments for bipolar disorder have elevated specificities indicating that these scales would effectively screen out a large proportion of true negatives. However, a positive screen should be confirmed by a clinical diagnostic evaluation for bipolar disorder. The accuracy properties of the MDQ and HCL-32 are supported by a larger evidence base than those of the BSDS. The HCL-32 is more accurate for the detection of type II bipolar disorder than the MDQ in mental health care settings.
Strengths and limitations
This was a good quality review which was guided by the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement and the Cochrane Handbook for Diagnostic Test Accuracy. The search strategy for locating research articles was thorough and the quality of included studies was assessed using a validated tool, the QUADAS-2.
An important limitation of the review is the lack of studies that investigated the screening tools in a primary care or general population setting. Compared with 44 studies in mental health care settings, only five studies were conducted in community based samples, the majority of which only looked at the MDQ (4/5).
Another limitation lies in the quality of included studies. Over half of the studies (55%) had a high risk of bias in a least one domain of the QUADAS-II, with over a third (36%) showing biases in the way in which participants were recruited into the study (e.g. enrolling a consecutive sample). This bias was explored in the analysis, where a significant difference in accuracy was found between studies that enrolled a consecutive sample compared with those that enrolled a random sample. Whilst this finding is of interest, the authors did not detail how this affected sensitivity and specificity estimates.
Accuracy estimates of the MDQ, BSDS and the HCL-32 varied in this review. In a mental health setting, all measures had an acceptable level of accuracy; the MDQ showed a better specificity than the HCL-32 and the BSDS, whilst the HCL-32 indicated a higher sensitivity than the other two measures. There was limited evidence in a general/primary care population which indicated poor sensitivity but excellent specificity for the MDQ and inconclusive evidence for the BSDS and the HCL-32, as this was based on a single study. HCL-32 was more accurate in detecting type II bipolar disorder compared with the MDQ.
The higher levels of specificity found in this review indicate that the tools will adequately screen out true negatives. However, sensitivity was generally low, especially in primary care and general population settings (only 43%), which would result in a large number of true cases being missed.
Taking everything into account, the findings indicate a limited use for screening tools in recognising bipolar spectrum disorders and highlight the need for a specialist diagnostic evaluation in confirming a diagnosis. This is in line with NICE guidance on bipolar disorder which specifically recommends against the use of screening tools in primary care (NICE, 2013). Whilst this was a good quality review, further research is needed to find accurate screening tools which can be used in primary care to improve the recognition of bipolar spectrum disorders.
Carvalho, André F. et al. (2015) Screening for bipolar spectrum disorders: A comprehensive meta-analysis of accuracy studies. Journal of Affective Disorders , Volume 172 , 337-346. [PubMed abstract]
Cerimele JM, Chwastiak LA, Dodson S, Katon WJ. (2014) The prevalence of bipolar disorder in general primary care samples: a systematic review. General Hospital Psychiatry. 2014;36:19–25
Judd L, Akiskal HS, Schettler P, Endicott J, Maser J, Solomon DA, et al. (2002) The long term natural history of the weekly symptomatic status of bipolar I disorder (PDF). Archives of General Psychiatry. 2002;59:530-7
Morriss R, Yang M, Chopra A, Bentall R, Paykel E, Scott J. (2013) Differential effects of depression and mania symptoms on social adjustment: prospective observational study in bipolar disorder (PDF). Bipolar Disorders. 2013;15:80-91
Shi L, Thiebaud P, McCombs JS. (2004) The impact of unrecognized bipolar disorders for patients treated for depression with antidepressants in the fee-for-services California Medicaid (Medi-Cal) program. Journal of Affective Disorders. 2004b;82:373-83. [PubMed abstract]
Matza LS, Rajagopalan KS, Thompson CL, de Lissovoy G. (2005) Misdiagnosed patients with bipolar disorder: comorbidities, treatment patterns, and direct treatment costs. Journal of Clinical Psychiatry. 2005;66:1432-40. [PubMed abstract]