Despite ongoing concern about levels of anxiety and depression among young people (Bor et al., 2014; NHS Digital, 2018) and an increased focus on the need to invest in treatment (e.g., see Sampson, 2016), it remains unclear how often treatment can be considered effective. Previous research in this area has focused on effect sizes, or overall change across the group being studied. This approach does not provide clear information about the extent of individual-level change within the group; that is, just because the group as a whole has improved, it doesn’t mean that everybody within the group has improved (Jensen & Corralejo, 2017; Wolpert, 2017).
The other problem with using effect sizes across a group to measure change is that it doesn’t provide a lot of information about how meaningful any improvements are. A small effect size doesn’t necessarily mean everybody truly improved by a small amount, because it might be that some people improved a lot, and some people didn’t improve at all (or even got worse), but the differences in effects were cancelled out by averaging them. Effect sizes are also reliant on standard thresholds that tell us whether a change is small, moderate, or large, but these don’t always map clearly onto what can be considered “meaningful” in clinical practice (Jacobson & Truax, 1991).
As such, examining individual-level rates of improvement can provide clearer information about how frequently treatment can be considered to improve outcomes (Jensen & Corralejo, 2017; Wolpert, 2017). Here, the authors used “reliable change”, which estimates potentially clinically important change among individuals rather than groups. Reliable change metrics indicate that changes in scores on a scale reflect “meaningful” change rather than fluctuating or imprecision within the survey (Jacobson & Truax, 1991).
The authors analysed the data of 4,464 adolescents across 70 public-funded services involved in the Children and Young People’s Improving Access to Psychological Therapies (CYP IAPT). Participants were included if:
- their cases were recorded as closed, with at least three recorded events beyond assessment;
- they had completed the same measure of anxiety or depression on at least two time points;
- they were aged 8-18; and
- they scored above age- and gender-appropriate thresholds for anxiety (social phobia, separation anxiety, panic disorder, or generalised anxiety) or depression sub-scales, as measured by the Revised Child Anxiety and Depression Scale (RCADS).
The authors paired the first-ever and last-ever recorded completion of participants’ anxiety and depression measures and assessed change in self-reported symptoms over time. Each sub-scale was assessed to identify the level of change in scores required to indicate “reliable change”, rather than measurement error alone (the “reliable change criterion”). Authors assessed whether cases:
- had reliably improved on at least one sub-scale without reliably deteriorating on any other subscale; and
- recovered to below the RCADS threshold for all sub-scales.
Cases which showed both of these forms of improvement were considered to have “reliably recovered”.
Effect sizes were also calculated for all cases above the threshold for each scale to compare findings with individual level findings (n = 1,149 to 3,052 across 75 services). It is important to note that this analysis focused on a larger sample than was used for the reliable change analysis, because the same inclusion criteria weren’t applied.
Findings for reliable improvement showed that:
- Of 1,208 participants initially scoring above the threshold for anxiety,
- 52.8% were found to have reliably improved,
- with 45.9% considered to have “reliably recovered” to a point below the threshold.
- Of 621 participants initially scoring above the threshold for depression,
- 44.3% were classified as reliably improved,
- while 41.5% were identified as “reliably recovered” to below the threshold.
- Of 2,635 participants who initially scored above the threshold for comorbid depression and anxiety,
- 34.6% were identified as having reliably improved,
- with 25.5% considered to have “reliably recovered” to below the threshold.
Pre-post effect sizes across the depression and individual anxiety sub-scales (social phobia, separation anxiety, panic, and generalised anxiety) ranged from 0.8 to 1.4 in size.
The authors concluded that:
these relatively modest rates of individual improvement are in contrast with how the results might have been conceived if this paper had focused on pre-post effect size analyses.
Observed effect sizes ranging from 0.8 and 1.4 would be considered large using standard thresholds, which would have given the impression of substantial change across the group. Examination of individual-level rates provides a different picture, reflecting more realistic estimations of how effective treatment can be considered.
Strengths and limitations
This research has a number of strengths:
- The focus on reliable change provides a more robust estimation of the effectiveness of treatment for adolescent depression and anxiety.
- Including a comparison between individual rates and effect sizes is important as it allows examination of the extent of this issue.
- The sample was taken from a large number of public-funded services, offering insight into effectiveness in provision at a general level.
However, there were also some limitations that need to be considered, including:
- Only 49% of possible cases had sufficient data at closure to be included in this study. It’s possible that the other cases experienced different rates of improvement, which might have skewed findings.
- The timing of measure completion may not have been ideal for detecting changes. The last-ever completed measure does not have to have been completed at the end of treatment and could have been carried out during treatment. It’s also plausible that there are long-term effects that cannot be picked up by assessing rates immediately after treatment.
- It was beyond the scope of this study to examine or control for different processes within the treatment received, but it may be that reliable change was affected by particular elements of this. Services were delivering a wide range of different treatments, and it may be that some were more effective than others, while it is may also be that the extent to which individuals engaged in their treatment affected the impact, as discussed by Jessamy Hibberd in a previous Mental Elf blog.
- The use of a score above “abnormal” thresholds for one of the RCADS sub-scales means that findings are limited to those reporting high levels of symptoms. Future research could examine reliable improvement (though not reliable recovery) across those who engaged in services but were not reporting symptoms to this extent.
Implications for practice
Currently there is little available guidance for practitioners on expected improvement rates, and the dominant message given to children and young people is that treatment will be beneficial. The authors suggest that these findings indicate a need to have open and realistic conversations with children, adolescents, and families at the beginning of their engagement with services about the likelihood of improvement. Nevertheless, they note that this presents clear challenges in handling a sensitive issue while also ensuring that individuals begin care feeling hopeful, which can be an important component in therapy (Wampold, 2013).
Practitioners reviewing the evidence base for different forms of treatment should be mindful of the potential pitfalls of relying on effect sizes to examine effectiveness. There is also a need for researchers and service providers to ensure that they themselves are using the best methods to assess change, as well as setting realistic expectations with funders and the general public.
Conflicts of interest
Ola collaborates with Miranda Wolpert on a separate project and other research outputs, but these do not relate to the project or data reported in the study covered here. Ola also has an interest in depression and anxiety during adolescence, but her research focuses on exploring the mechanisms underpinning these symptoms rather than therapeutic treatment experiences or outcomes.
Edbrooke-Childs J, Wolpert M, Zamperoni V, Napoleone E, Bear H. (2018) Evaluation of reliable improvement rates in depression and anxiety at the end of treatment in adolescents. BJPsych Open 2018;4(4):250–5. https://doi.org/10.1192/bjo.2018.31
Bor, W., Dean, A. J., Najman, J., & Hayatbakhsh, R. (2014) Are child and adolescent mental health problems increasing in the 21st century? ANZJP 2014 48(7) 606-616. [abstract]
Hibberd, J. (2013) More frequent psychotherapy may lead to better depression outcomes, says new meta-analysis. The Mental Elf, 9 Jul 2013.
Jacobson, N. S., & Truax, P.. (1991) Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology 1991 59(1) 12-19. [abstract]
Jensen, S. A., Corralejo, S. M. (2016) Measurement issues: Large effect sizes do not mean most people get better – clinical significance and the importance of individual results. Child and Adolescent Mental Health 2016 22(3) 163-166. [abstract]
NHS Digital. Mental health of children and young people in England, 2017 (PDF). NHS Digital website 2018, last accessed 6th March 2019
Sampson, C. (2016) The case for investing in anxiety and depression treatment on a global scale. The Mental Elf, 6 Sep 2016.
Wampold, B. (2013) The Great Psychotherapy Debate: Models, Methods, and Findings. Routledge, 2013.
Wolpert, M. (2017) Commentary: Why measuring clinical change at the individual level is challenging but crucial – commentary on Jensen and Corralejo (2017). Child and Adolescent Mental Health 2017 22(3), 167-169. [abstract]
- Photo by Anton Darius | @theSollers on Unsplash
- Photo by Jon Tyson on Unsplash
- Photo by Pixel Talkies on Unsplash
- Photo by Paul Bence on Unsplash