Most individuals who experience a first-episode of psychosis (FEP), will remit (Crespo-Facorro et al., 2016). However, around 1 in 4 (23-24%) will be classified as having “treatment-resistant schizophrenia (TRS)”. Treatment resistance can be very debilitating and often involves poorer life quality, higher healthcare costs and a larger societal burden (Kennedy et al., 2014). The only effective current treatment for TRS is an antipsychotic called clozapine (Lally et al., 2016). However, there is evidence to show that the later in the course of illness clozapine is prescribed, the less effective it is (Howes et al., 2012; John et al., 2018). Given the substantial harms of TRS, and that the only effective treatment decreases in efficacy the later it is administered, identifying those who are likely to develop TRS is of vital importance.
Methods to predict those who may have TRS have been repeatedly studied, and this group of patients has been frequently reported to show identifiable differences from those who remit. For example, in a systematic review, Bozzatello et al. (2019) found that TRS could be predicted by multiple factors including demographics, co-morbidities, trajectory of illness, and several neurobiological factors. However, to create a prediction model from most of these measures is impractical – we cannot conduct imaging studies on every participant and many of these measures are not routinely collected in clinical settings.
Osimo et al., (2023) set out to address this issue by developing a prediction model based only on routinely collected clinical measures. Using this approach, the study aimed to develop a prediction model suitable for current clinical settings, while still effectively identifying those who would likely later develop TRS.
Osimo et al. (2023) made use of data from three UK-based Early Intervention in Psychosis services (EIPs). For the development of the model, data from 758 patients with FEP from the Cambridgeshire and Peterborough Assessing, Managing and Enhancing Outcomes EIP & Birmingham EIP was used. Data from 1,100 other patients with FEP from the South London and Maudsley NHS Foundation Trust EIP was then used to validate the model. TRS was defined as those treated with clozapine at any point during the follow up period. Measures used to predict patient outcome included several sociodemographic and biological measures, such as sex, age, smoking status, and lymphocyte blood cell counts.
Two predictive model approaches were used:
- Forced-entry logistic regression models – all predictors were simultaneously added to the model.
- LASSO-based selection model – allows for the selection of only the predictors believed to be relevant for the model.
The primary outcome was discrimination i.e., could the models correctly predict if someone would be in the TRS group or not. Following the external validation, the models were recalibrated using a “fine-tuning” model called logistic recalibration. Logistic recalibration is a method where you adjust the model’s probability estimate based on the actual number of individuals who had TRS in the external sample. Additionally, when there were distinct differences in the associations between individual predictors and TRS status between the internal and external samples, a single predictor was removed and the model recalibrated. This was limited to one predictor for each model. Finally, an analysis technique referred to as decision curve analysis was performed to assess the potential clinical benefit of the final model.
The forced-entry logistic regression (MOZART) model ultimately included all sociodemographic measures available (e.g., sex, age, ethnicity) and three biological predictors (triglyceride concentration, lymphocyte counts, Alkaline phosphatase levels). These were determined based on previous research, clinical knowledge and clinical utility. The LASSO-based selection model included all predictors from the MOZART model and additional measures of smoking status, BMI, random plasma glucose and neutrophil blood cell count. As the LASSO method has a predictor selection step and is less vulnerable to overfitting (making the model too specific to this dataset, and less applicable to other samples), the inclusion of the additional prediction measures in a smaller sample was possible. As with real world clinical data, not all predictors were available for all participants. The authors’ solution was to use a multiple imputation model, a method of taking into account missing data. Additionally, a technique called bootstrapping was used, which resamples the dataset to create smaller, sample datasets to get a measure of accuracy.
The C-statistic was 0.70 (95% CI 0.63 to 0.76) for the MOZART model, and for the LASSO model the C-statistic was 0.69 (95% CI 0.63 to 0.77). A C-statistic of ≥0.70 is considered “good”, and so the models are considered to be fair to good.
One limitation of using many prediction models is that there is often a failure to externally validate, i.e. a failure to determine if the identified measures are useful for predicting the outcome, or if they are only useful in predicting the outcome for the specific sample used in the study. Osimo et al. tested both models in a separate sample. They found that the C-Statistic dropped to 0.63 (95% CI 0.58 to 0.69) for the MOZART model and 0.64 (95% CI 0.58 to 0.69) for the LASSO model, meaning that these models would now be classified as “fair” for prediction. However, this second step of external validation allows for better confidence in the model’s reliability, robustness, and applicability to different groups, and so this was an important step to include.
Following the external validation, the researchers conducted a recalibration on the external validation group using logistic regression. The MOZART study now had a C-statistic of 0.67 (95% CI 0.62 to 0.73) – a small improvement. However, the LASSO model did not improve, with a C-statistic of 0.64 (95% CI 0.58 to 0.69).
The decision curve analysis on the MOZART model showed that the predictive model had a slight benefit in a clinical setting, although this was not substantial overall. Osimo et al. (2023) also produced a useful data-visualisation tool to better show how each predictive measure in the model affects an individual’s chance of developing TRS. You can check out this tool for yourself here – https://eosimo.shinyapps.io/trs_app/.
Osimo et al. (2023) set out to address a limitation of many previous prediction models developed in research, i.e., the impracticality of implementing these models in clinical settings. By limiting the predictive model to only routinely collected data and determining generalisability, they yielded a prediction model which has clinical utility that has been externally validated.
The final recalibrated MOZART model was the most useful following this analysis, but it only reached the criteria of fair/good in terms of accuracy of identifying those who would develop TRS. While it showed no significant benefit compared to invasive interventions, it could add clinical utility in terms of low-level interventions.
Strengths and limitations
The key strength of this study is its focus on clinical utility. The samples used are all clinically-based, meaning the model was developed and validated in a relevant and applicable sample. Similarly, the use of routinely collected clinical measures meant that the prediction model could be effectively implemented into clinical settings.
Secondly, the statistical methods used are robust, and address limitations of many prediction models previously presented in the field. Specifically, the current study includes an appropriate sample size, clearly reports their methods, uses multiple imputation to address missing data, conducts sufficient external validation in an appropriate and relevant group, and clearly reports the calibration, recalibration and discrimination measures used. All these are previously identified limitations of past prediction models in psychosis (Lee et al., 2022), and the statistical approaches of this paper used to address these limitations are excellent.
However, with this strong focus on clinical utility, there is a trade off in the practicalities of using pre-existing FEP service data. The authors did not have control over what measures were collected. Specifically, TRS in this study is classified as “received clozapine treatment”. As the authors themselves note, clozapine as a treatment can be given to those who are not actually TRS, and co-morbidity as a reason for prescribing clozapine could not be excluded using the data available. Similarly, while the logic of using clinical measures was valuable for future use, the collected measures did not produce a strong predictive model, and the clinical utility of the models produced is limited.
Implications for practice
As we continue to attempt to bridge the gap between research and clinical utility, this paper is incredibly valuable as a guide to future research. It is clear that the authors kept this aa a primary objective central to all stages of their project. Choosing to use clinical samples and clinical measures to develop and validate their models increases their practicality and applicability.
On the other side of this coin, the model yielded was not as accurate as we would ideally want a prediction model to be, and the clinical benefit did not hold up well against many other types of intervention. The measures that they identified produced a fair model, but in order to accurately predict those who will develop TRS, we need to develop models of a higher standard to ensure we can ultimately improve the outcomes of those who develop TRS. With such a robust methodology, the conclusion might be that clinical services need to accept the necessity to include additional measures into routine clinical collection. Previous research has found a substantial number of predictors not routinely collected (Bozzatello et al., 2019), which could be theoretically implemented. For example, negative symptoms can be screened during routine clinical interviews (Galderisi et al., 2021).
We often focus on how research isn’t translatable to clinical settings, but studies like this using current clinical standards, highlight the need for clinical settings to also adapt to aid research. However, there are many reasonable and practical complications to such adjustments, including the extra time required to screen additional measures, the requirement for further staff training, and the necessary adaptations to current clinical practices/policies to include such measures. In a time when many mental health services are under extreme duress, it is not sensible, or appropriate, to simply demand clinical settings adapt to additional screening and measurement standards. Instead, it must be a more collaborative process, with focus put on achievable implementation over a perfect design.
Statement of interests
Lorna Staines has no conflicts of interests to declare.
Osimo, E. F., Perry, B. I., Mallikarjun, P., Pritchard, M., Lewis, J., Katunda, A., Murray, G. K., Perez, J., Jones, P. B., Cardinal, R. N., Howes, O. D., Upthegrove, R., & Khandaker, G. M. (2023). Predicting treatment resistance from first-episode psychosis using routinely collected clinical information. Nature Mental Health, 1(1), 1. https://doi.org/10.1038/s44220-022-00001-z
Bozzatello, P., Bellino, S., & Rocca, P. (2019). Predictive Factors of Treatment Resistance in First Episode of Psychosis: A Systematic Review. Frontiers in Psychiatry, 10, 67. https://doi.org/10.3389/fpsyt.2019.00067
Crespo-Facorro, B., Pelayo-Teran, J. M., & Mayoral-van Son, J. (2016). Current Data on and Clinical Insights into the Treatment of First Episode Nonaffective Psychosis: A Comprehensive Review. Neurology and Therapy, 5(2), 105–130. https://doi.org/10.1007/s40120-016-0050-8
Galderisi, S., Mucci, A., Dollfus, S., Nordentoft, M., Falkai, P., Kaiser, S., Giordano, G. M., Vandevelde, A., Nielsen, M. Ø., Glenthøj, L. B., Sabé, M., Pezzella, P., Bitter, I., & Gaebel, W. (2021). EPA guidance on assessment of negative symptoms in schizophrenia. European Psychiatry, 64(1), e23. https://doi.org/10.1192/j.eurpsy.2021.11
Howes, O. D., Vergunst, F., Gee, S., McGuire, P., Kapur, S., & Taylor, D. (2012). Adherence to treatment guidelines in clinical practice: Study of antipsychotic treatment prior to clozapine initiation. The British Journal of Psychiatry: The Journal of Mental Science, 201(6), 481–485. https://doi.org/10.1192/bjp.bp.111.105833
John, A. P., Ko, E. K. F., & Dominic, A. (2018). Delayed Initiation of Clozapine Continues to Be a Substantial Clinical Concern. Canadian Journal of Psychiatry. Revue Canadienne de Psychiatrie, 63(8), 526–531. https://doi.org/10.1177/0706743718772522
Kennedy, J. L., Altar, C. A., Taylor, D. L., Degtiar, I., & Hornberger, J. C. (2014). The social and economic burden of treatment-resistant schizophrenia: A systematic literature review. International Clinical Psychopharmacology, 29(2), 63. https://doi.org/10.1097/YIC.0b013e32836508e6
Lally, J., Gaughran, F., Timms, P., & Curran, S. R. (2016). Treatment-resistant schizophrenia: Current insights on the pharmacogenomics of antipsychotics. Pharmacogenomics and Personalized Medicine, 9, 117–129. https://doi.org/10.2147/PGPM.S115741
Lee, R., Leighton, S. P., Thomas, L., Gkoutos, G. V., Wood, S. J., Fenton, S.-J. H., Deligianni, F., Cavanagh, J., & Mallikarjun, P. K. (2022). Prediction models in first episode psychosis: A systematic review and critical appraisal. The British Journal of Psychiatry : The Journal of Mental Science, 220(Spec Iss 4 Themed Iss Precision Medicine and Personalised Healthcare in Psychiatry), 179–191. https://doi.org/10.1192/bjp.2021.219
Osimo, E. F., Perry, B. I., Cardinal, R. N., Lynall, M.-E., Lewis, J., Kudchadkar, A., Murray, G. K., Perez, J., Jones, P. B., & Khandaker, G. M. (2021). Inflammatory and cardiometabolic markers at presentation with first episode psychosis and long-term clinical outcomes: A longitudinal study using electronic health records. Brain, Behavior, and Immunity, 91, 117–127. https://doi.org/10.1016/j.bbi.2020.09.011