Depression has a global (and European) point prevalence of about 4.4% (Baxter et al 2014); this translates into over 2.5 million Britons depressed at any one time.
It is estimated that up to 50% of cases are not identified in primary care and general medical practice (Cleare et al 2015).
NICE, in their depression clinical guidelines (National Institute for Health and Care Excellence, 2009) do not advocate general screening for depression (e.g. everyone attending their GP) but do recommend using the two Whooley questions (see Box) for ‘case finding’ in patients at increased risk of depression, such as those with long-term physical health conditions. NICE acknowledge that this is based on limited evidence about diagnostic accuracy, and it could be added, whether there is any benefit from so doing.
|Whooley* questions for depression
1. During the last month, have you often been bothered by feeling down, depressed or hopeless? (YES/NO)
2. During the last month, have you often been bothered by little interest or pleasure in doing thing? (YES/NO
YES to one or both questions is take as a positive screen for depression
* So called from the first author of the original publication (Whooley et al, 1997)
Given that the Whooley questions feature prominently in NICE guidance, Bosanquet et al (2016) set out to determine their diagnostic accuracy in a systematic review, and also to examine the benefit of an additional question asking whether the person wanted help which is sometimes used with the two questions.
The authors undertook a systematic review and meta-analysis by searching a wide range of electronic databases, including sources for studies in progress, those unpublished and the grey literature, from 1994 (when the Whooley questions were first published), until April 2015. The search strategy is available in supplementary material.
Studies were selected using a pre-piloted form by at least 2 reviewers, and had to use the standard Whooley wording (or derived translation) and scoring (see Box), with no restriction on how they were administered (including self-administration). The comparator was a gold standard diagnostic interview for major depression based either on the Diagnostic and Statistical Manual (DSM) or International Classification of Disease (ICD), and sufficient data had to be reported to extract 2×2 contingency tables (i.e. true positive, true negative, false positive and false negative results).
A bivariate diagnostic meta-analysis was undertaken to obtain pooled estimates of specificity, sensitivity, likelihood ratios, diagnostic odds ratios (ORs) and their associated 95% confidence intervals (CIs). The bivariate model took into account the precision by which differences in sensitivity and specificity had been calculated, incorporating and estimating the amount of between-study variability in sensitivity and specificity.
There were pre-specified subgroup analyses and examination of causes of heterogeneity.
Ten studies were identified that met the inclusion criteria, ranging in size from 89 to 1,025, with the proportion varying from 3.3% to 34%. Six studies used the questions in English with clinicians administering them in most studies.
- For all studies the pooled sensitivity was high at 0.95 (CI 0.88 to 0.97) with a lower pooled specificity 0.65 (CI 0.56 to 0.74).
- The pooled positive likelihood ratio was 2.78 (CI 2.16 to 3.57) and pooled negative likelihood ratio 0.07 (CI 0.03 to 0.16), which means that a positive result only increases the likelihood that the person has depression modestly (e.g. up to 40% if population rate is less than 20%), but a negative results makes depression much less likely.
- The diagnostic OR (ratio of the odds of the test being positive if the subject has depression relative to the odds of the test being positive if the subject does not) was a healthy 36.91 (17.52 to 77.76).
- The level of between-study heterogeneity was low (I2=24.1%) suggesting that the studies tended to be measuring the same thing; only the prevalence of depression influenced the findings significantly in exploration of heterogeneity.
Analysis of the five primary care studies gave similar results. There were insufficient studies with similarly phrased ‘help’ questions for pooling – in general acknowledging the need for help appeared to decrease the sensitivity and increase the specificity of the test.
This meta-analysis confirmed the findings from previous reviews, and individual studies. The questions are efficient at ruling out depression when the population prevalence is low (e.g. <20%) but they are not an efficient way to identify depression. A positive screen requires a standard clinical assessment to take place subsequently, and most of those so assessed would not be depressed.
Strengths and limitations
The systematic review and meta-analysis were carried out to a high standard according to current best practice. Unfortunately the authors were unable to reach definitive conclusions about the use of additional questions about whether help was needed. A very useful Bayesian graph of pre-test versus post-test probabilities shows the trade-off between positive and negative results according to population prevalence of depression, showing it is only clinically useful to exclude depression at low prevalence.
The authors acknowledge that there are potential methodological issues in the included studies that could have led to test performance being overestimated, in particular most studies did not exclude those already known to have depression, and some studies were not blinded.
It is worth pointing out that the Whooley questions are not independent from the ‘gold’ standards used to make the syndromal diagnosis of major depression and it could be argued that this study is really a confirmation of the obvious; the real surprise would have been if results had been different. Whooley questions are extremely similar to the two core symptoms in the DSM system, at least one of which needs to be elicited to make the diagnosis. In fact it is difficult to see how a diagnosis of major depression can be made without endorsing at least one Whooley question (hence the high sensitivity). The lower specificity is explained by other symptoms being required for diagnosis (to reach a minimum threshold of 5 symptoms).
What this study cannot do is shed light on two key issues. First whether screening/case finding makes any clinical difference (e.g. Goldberg et al, 1998), and indeed whether benefits (e.g. appropriate treatment) outweigh potential harms (e.g. increased assessment time or over-diagnosis and inappropriate treatment). Second the assumption that excluding or identifying depression is enough when considering psychological distress/disorders. There is a danger that only thinking about ‘screening out’ depression may get in the way of recognising anxiety disorders. These are nearly as common as depression (Baxter et al 2014), occur in similar ‘high risk’ populations, cause significant morbidity and warrant treatment (Baldwin et al 2014).
The Whooley questions are sensitive but not specific in identifying major depression as defined by accepted diagnostic systems, something not surprising given their high similarity to core symptoms required to be present (but not sufficient on their own) to make the diagnosis. Although a negative test might be helpful in ruling out the syndrome of major depression in populations with a low prevalence, there is no reason to believe it performs well in excluding equally important anxiety disorders, so cannot be relied on to exclude a broader range of common psychiatric diagnoses. In individual situations where there is a high suspicion of depression, a full clinical assessment for depression is warranted; in which case the Whooley questions may be good place to start but not enough on their own.
Bosanquet K, Bailey D, Gilbody S, et al (2015). Diagnostic accuracy of the Whooley questions for the identification of depression: a diagnostic meta-analysis (PDF). BMJ Open 5:e008913.
Baldwin DS, Anderson IM, Nutt DJ, et al (2014). Evidence-based pharmacological treatment of anxiety disorders, post-traumatic stress disorder and obsessive-compulsive disorder: a revision of the 2005 guidelines from the British Association for Psychopharmacology (PDF). J Psychopharmacol. 28:403-39.
Baxter AJ, Scott KM, Ferrari AJ, et al (2014).Challenging the myth of an “epidemic” of common mental disorders: trends in the global prevalence of anxiety and depression between 1990 and 2010 (PDF). Depress Anxiety. 31:506-16. (PubMed abstract)
Cleare A, Pariante CM, Young AH, et al (2015). Evidence-based guidelines for treating depressive disorders with antidepressants: A revision of the 2008 British Association for Psychopharmacology guidelines (PDF). J Psychopharmacol. 29:459-525.
Goldberg D, Privett M, Ustun B, et al (1998). The effects of detection and treatment on the outcome of major depression in primary care: a naturalistic study in 15 cities (PDF). Br.J.Gen.Pract. 48 :1840-1844.
National Institute for Health and Care Excellence. Clinical Guideline 90. Depression in adults (update): full guideline (PDF).
Whooley M, Avins A, Miranda J,et al (1997). Case-finding instruments for depression. Two questions are as good as many (PDF). J Gen Intern Med 12:439–45.