Depression in young people is a huge health challenge. The peak age of onset is in the mid-teens, while fewer than half the young people who are affected maintain their recovery into adult life, and many experience recurrences, some of them repeatedly (Costello & Maughan, 2015). Indeed, the World Health Organization attributes the greatest burden of disease among all conditions in 12-19 year olds to depression (WHO, 2017).
Like many health conditions, depressive symptoms are normally distributed, so that as with height and weight, most of us have some mild issues, a very lucky few have almost none, and a small minority are severely impaired and can be considered to have depressive disorder. As depression tends to run a fluctuating course, it can be difficult to clearly define when and how things have improved (Costello & Maughan, 2015; Krause et al, 2019). This poses a challenge to those who are evaluating the effectiveness of interventions in routine practice as well as research.
An added complication of outcome assessment is the use of multiple informants, which increases the accuracy of measurements (Garb, 2005), but conflicting reports are common (Achenbach et al, 1987). Parents, young people and practitioners will have different frames of reference and experiences, which are often reflected in different opinions and responses to outcome measures, even when completing the same questionnaire (Ford & Parker, 2016).
This study (Krause et al, 2019) aimed to explore:
- which types of change were being used to assess the outcomes of treatment for depression,
- which measures/questionnaires were most often used,
- how often young people’s reports were included, and
- whether this had changed between 2007 and 2017.
The reviewers searched 3 medical databases for studies that reported the outcomes of treatment for depression (defined as diagnosis, seeking help or referral for treatment) among young people aged 12-19 years. Studies had to be peer reviewed, but there was no restriction on study design, setting or type of intervention. Several comorbidities (but importantly not anxiety) were excluded, as were pilot studies/feasibility studies, or those focused on prevention, maintenance, treatment adherence or engagement.
The authors consulted manuals and existing taxonomies for outcomes measures to map the primary domain they measured, and the resulting coding framework was also applied to the 3 included qualitative studies. The authors used an abbreviated form of the Downs and Black (1998) checklist to assess the quality of the 92 studies that used quantitative methods.
There were 95 included studies, two thirds of which were conducted in North America, and a similar proportion were randomised controlled trials. Most took place in outpatient settings. Half the included studies were testing novel or adaptations to interventions while the others were evaluating factors that influence treatment response or longer term outcomes. Several of the latter comprised secondary analyses of large RCTs, which means some of the studies draw from the same datasets.
Studies varied in quality (scores 10-22 out of a possible 23). Randomised studies scored higher on study quality (n= 72; mean 18.1, standard deviation 2.0) compared to non-randomised studies (n=20; mean 13.7, SD 2.2).
The authors identified 10 overarching domains of outcome; the mean number of domains assessed was 2.1 per study. The mean number of outcomes measured was 4, with a somewhat symmetrical range of only 1 outcome measure in 14 studies to a maximum of 14 in 1 study.
- Nearly all (94%) studies assessed depressive symptoms as their outcome, and for 57, this was the primary outcome
- The second commonest outcome was functioning (in 52% studies, primary outcome in 27)
- Nearly a fifth of studies measured change in comorbid symptoms
- 3% assessed cognitive or behavioural patterns associated with depression
- The other domains in their coding framework (interpersonal relationships, personal growth, service satisfaction, quality of life, parental symptoms and physical health) were less commonly evaluated.
Interestingly there was an increase in the number of domains assessed over time, and smaller studies tended to cover more domains than larger studies. Overall, the young person’s self-report was collected in 53% of studies, but practitioner report was the primary outcome measure in 75%. There was no clear pattern over time in relation to the type informants, but clinician report dominated functioning, and young person’s report was the main source of information for personal growth, service satisfaction and interpersonal relationships.
- Studies of treatment for depression have mostly focused on the symptoms related to depression, or how well the young person can function, often as reported by practitioners rather than young people themselves.
- These are not necessarily the domains that young people or parents would see as the most important, nor the most valid measures of recovery.
Strengths and limitations
This paper systematically explored the important issue of what outcome measures, completed by which informants, were most commonly used in the peer reviewed literature. The study has several methodological strengths, and the amount of work in conducting such a study should not be underestimated.
The review included a wide range of studies that were addressing different types of questions, which can greatly complicate the synthesis of findings. The authors clearly applied systematic review approaches, and checked the reliability of the full text screening by double screening 10%, as well as double-data extraction in 25% of studies. Although not made explicit in the paper, the study was registered on the PROSPERO database prior to data extraction, and this work represents part of a wider study that also examined outcomes in anxiety. The PRISMA diagram clearly illustrates the selection of studies and describes inclusion and exclusion criteria, so their work could be replicated. The authors evaluated study quality using a validated checklist, although the reference supplied is for the full initial version, so it not clear which abbreviated version they used.
The authors point out that they necessarily focused on published work, and that this may not reflect clinical practice. As the aim was to explore the measures used, I was not clear why feasibility or pilot studies were excluded, as their aim is often to test the feasibility and acceptability of different measures. Similarly, there was no double screening of titles and abstracts, only partial double screening of full texts and data extraction, and no information about how disagreements were handled or the level of agreement on data extraction. Given that some studies were based on data from the same RCTs, we might assume that some of them shared outcome measures. In this situation, many reviewers would select one study to represent these data to avoid double counting.
The coding framework was complex and I can see the inherent difficulty in developing it. Few would doubt the importance of patient satisfaction with services or the role of parental mental health in treatment response, but are they really valid outcomes of treatment for depression in young people. In addition, goals-based outcomes were classified with patient satisfaction, whereas they could be conceptualised as a measure of function or symptoms, depending on what the goal was.
There is often a trade off in terms of resources in the number of measures versus the sample size, which was clearly demonstrated by this study. RCTs push hard for data completeness, which may push towards clinician report as a primary outcome as it can be easier to ensure completion.
Implications for practice
Much of the research literature prioritises clinician report, yet most practitioners would accept the importance of multi-informant assessment and the need to hear the young person’s opinion. Most are skilled in balancing conflicting reports, and do so by applying sophisticated judgements that are difficult to operationalise in research.
Most studies also focus on symptoms and global functioning rather than other potentially important outcomes. The evidence-base should value the outcomes of most importance to young people and their families, which might well give rise to faster improvements in effectiveness and novel targets for treatment. Given the importance of being able to cope with school to children’s outcomes and mental health, that attendance and performance at school was not addressed by any of the measures is a notable gap.
Conflicts of interest
Krause KR, Bear HA, Edbrooke-Childs J, Wolpert M. (2019) Review: What Outcomes Count? A Review of Outcomes Measured for Adolescent Depression Between 2007 and 2017. J Am Acad Child Adolesc Psychiatry. 2019 Jan;58(1):61-71. doi: 10.1016/j.jaac.2018.07.893. Epub 2018 Oct 29. https://doi.org/10.1016/j.jaac.2018.07.893
Costello EJ & Maughan B. (2015) Annual Research Review: optimal outcomes of child and adolescent mental illness. Journal of Child Psychology and Psychiatry 2015; 56: 324-341.
WHO (2017) Depression and Other Common Mental Disorders: Global Health Estimates (PDF). Geneva: World Health Organization; 2017. Licence: CC BY-NC-SA 3.0 IGO.
Garb HN. (2005) Clinical judgement and decision making. Clinical Psychology 2005; 1: 67. [PubMed abstract]
Achenbach TM, McConaughy SH, Howell CT. (1987) Child/adolescent behavioral and emotional problems: implications of cross-informant correlations for situational specificity. Psychol Bull. 1987 Mar;101(2):213-32. [PubMed abstract]
Ford T, Parker C. (2016) Emotional and behavioural difficulties and mental (ill) health. Emotional and Behavioural Difficulties 21(1) 1-7.
Downs SH, Black N. (1998) The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J Epidemiol Community Health. 1998;52(6):377-384.