Two-question screening for depression in older adults


At least one in twenty elderly people have major depressive disorder and two to three times as many have sub-threshold depressive symptoms (Meeks T. et al., 2011). They specifically are at risk of depression due to physical illness, bereavement and increasing social isolation.

As well as being deeply distressing for the person and their family and friends, depression in older people is associated with increased dementia risk (Ownby R. et. al., 2006), worse day-to-day functioning, increased contact with healthcare services (Creed F. et. al., 2002) and suicide (Iliffe S. et. al. 2010).

The core features of depression in old age are the same as in younger populations:

  • Pervasive low mood
  • Loss of pleasure in activities
  • Feeling guilty or worthless
  • Marked change in appetite, weight or sleep
  • Observable slowing or agitation of movements
  • Tiredness and poor concentration
  • Thoughts of dying or suicide

A range of effective treatment options are available, including medication, psychological and social therapies, but rates of older people in treatment are lower than young adults (Rodda J. et. al. 2011). This is because depression is more difficult to detect as symptoms are incorrectly attributed to physical illness or dementia (Wells C. et. al. 1979), and due to therapeutic nihilism (Burroughs H. et. al. 2006).

Though stopping short of recommending screening, NICE does suggest that clinicians are alert to people who may have depression (NICE, 2009).

The authors of a new systematic review (Tsoi et al, 2017) set out to determine whether people with depression can be identified by an extremely short screening test.

Two-question screening test

  1. During the last month, have you often been bothered by feeling down, depressed or hopeless?
  2. During the last month, have you often been bothered by having little interest or pleasure in doing things?

Answering yes to either of these questions is considered a positive test result, warranting further assessment.

This study is a systematic review and meta-analysis of studies evaluating how the two-question screen performed compared to other instruments. The authors scrutinised review articles to generate a list of depression screening instruments and then searched four scientific databases looking for studies assessing the accuracy of the instruments they found in identifying depression in people aged over 60 years, compared to gold-standard diagnosis using accepted criteria. The authors excluded studies not written in English.

Two researchers extracted information from the included studies about their publication year and location, participants, measurements and results. The authors adapted a rating scale for assessing study quality. Random effects meta-analysis combined the sensitivity, specificity and diagnostic odds ratios from different studies for each screening test. Sensitivity analyses investigated whether diagnostic performance differed in studies of major depressive disorder compared to less severe depression, and for people from clinical settings and nursing homes, compared to those recruited from general populations or primary care.

Measuring a screening test’s performance

A new test’s performance is measured against a ‘gold-standard’ more thorough and established test. The perfect new screening test would be brief and easily administered to many people, and should agree completely with the gold-standard, i.e. identify correctly everyone who has the illness and not mis-diagnose anyone who is healthy. As this is rarely possible, a trade-off is needed. A test usually aims to pick up almost all unwell people, but it is ok if it identifies as potentially unwell, some people who eventually prove to not have the illness. These ‘false positives’ can be given a clean bill of health later when more thoroughly assessed.

The following measures of test performance are used in this study:

  • Sensitivity = the proportion of people with the illness, who are identified as unwell by the new test.
  • Specificity = the proportion of people who do not have the illness who are correctly identified as healthy.
  • Diagnostic odds ratio = ratio of the odds of a positive result in unwell people compared to odds of a positive test result in a well person.

For comparison, liquid based cytology used in the UK national cervical cancer screening programme has sensitivity of around 90% and specificity 70% (Coste J. et. al. 2003, Arbyn M. et. al. 2008).


The authors identified 132 eligible studies including 40,000 people, of whom nearly 7,000 had depression, and assessing 16 instruments. The instruments ranged from the 30 question Geriatric Depression Scale (GDS) to a ‘one question screen’.

From the six studies evaluating the two question screen, for which significant heterogeneity in results was found, the combined:

  • Sensitivity was 0.92 (95% CI, 0.85 to 0.96)
  • Specificity was 0.68 (0.58 to 0.76)
  • The diagnostic odds ratio (OR) was 23.6 (9.4 to 58.9)

This compared to other commonly used instruments:

  Sensitivity Specificity Diagnostic OR
Two question screen 91.8 67.7 23.6
Geriatric Depression Scale 82.8 72.2 12.5
Beck Depression Inventory 85.7 73.5 16.7
Patient Health Questionnaire-9 83.4 85.8 30.5
Centre for Epidemiological Depression Scale-20 79.7 76.5 12.8
Hamilton Rating Scale for Depression 88.6 84.9 43.8
One question screen 66.4 82.1 9.0

Results for the two-question screen held up when only studies of major depressive disorder were assessed (sensitivity = 89.8%, specificity = 66.2%). The instruments for which comparison between clinical and community settings was possible seemed to be more accurate in clinical settings.

The authors conclude that the two-question screen is comparable in its accuracy to other instruments so suggest that, considering its acceptability due to its brevity and ease of use, it should be favoured over other instruments in screening for depression in older people.

Strengths and limitations

Overall, this is an informative and well-conducted study which aids our understanding of screening instruments for depression in the elderly. The methodology was appropriate and the authors’ conclusions are accurate. However, the reporting of this study could have been more thorough and some caution is needed when interpreting the results.

  • Reporting of methods and results should be completely transparent and some journals mandate authors to compete a PRISMA checklist, (Moher D. et. al. 2009) reporting adherence to gold-standard conduct, and encourage prospective registration of study protocols in a database such as PROSPERO. This paper does not include either of these so some information about the study is lacking.
  • A search strategy should be sufficiently detailed to allow a reader to reproduce the search. However, we are not provided with the full search terms and, rather than a single search looking for studies evaluating screening tools of depression in older people, the authors used a two stage approach of generating a list of screening tools and then seeking studies for each instrument, using keywords of ‘depression’ and ‘elderly’. They also did not contact experts in the field, ask for unpublished data or include non-English language articles. These deviations from best practice may have resulted in eligible screening tools not being found.
  • Quality ratings for studies (online appendix DS1), are high; each tool scores a median of 7 or 8 out of 8 which raises questions about how critical these rating criteria were. The quality rating was not used in the analysis, and it would have been interesting to see a sensitivity analysis of only the highest quality studies.
  • We are not provided with the full information extracted from studies; instead summary information is presented for each screening instrument. Considering the high level of heterogeneity observed in the two-question screen results (heterogeneity statistics for other instruments are not presented), the interested reader could have looked for potential sources of heterogeneity in a detailed table, e.g. in an online appendix.
  • A potential source of heterogeneity is participant eligibility criteria differing between studies. While there is benefit in including a wide range of studies, as the authors here have done, this might have been better restricted (or examined with sensitivity analyses). Depression in dementia has different causative factors and can present with different symptoms to depression in non-demented older people (Korczyn A. et. al. 2009) so it is unusual to analyse general depression screens together with a specialised screen such as the Cornell Scale for Depression in Dementia. Similarly, the six included studies for the two question screen included such diverse participants as people with Parkinson’s disease, those receiving palliative care, people with coronary heart disease and patients on an acute medical ward. The nature of depressive symptoms in these diverse settings may well have differed and affected diagnostic performance.
  • Finally, for the main outcome regarding the two question screen, the included studies were only  set in Ireland, US and UK, so we can only really apply the main conclusion about the performance of this test to these and culturally similar settings. The meaning of these questions to non-English-speakers may well be different and further testing would be needed to evaluate their use in other populations.
This study summarises the published evidence and finds that the two-question screen is of equivalent quality to other brief instruments for detecting depression in older people, so can be easily used by clinicians to identify those who need more thorough assessment.

Though screening is not recommended, this instrument’s performance is equivalent to those used in national screening programmes.

Considering depression’s prevalence and the high number of at risk older people who are in contact with healthcare services, including these two simple questions in your assessment might help to make a difference.

Primary paper

Tsoi KKF, Chan JYC, Hirai HW, Wong SYS. (2017) Comparison of diagnostic performance of Two-Question Screen and 15 depression screening instruments for older adults: systematic review and meta-analysis. The British Journal of Psychiatry 1–6. doi: 10.1192/bjp.bp.116.186932

Other references

