Can proteomics improve our prediction of depression remission?


An important characteristic of major depressive disorder (MDD) is that the symptoms can vary quite a bit from patient to patient (Musliner et al. 2016). Additionally, treatment success is patient specific, with 20-25% of the MDD patients at risk of developing chronic depression (Penninx et al. 2011). Therefore, recent research has sought to find biomarkers that can help guide treatment decisions and aid our prediction of treatment outcomes (Gadad et al. 2018). The hope is that by tailoring the treatments to the patients, particularly for patients at high risk, the remission rate could be improved.

It is known that chronic depression is associated with a specific set of symptoms, including:

longer symptom duration, increased symptom severity and earlier age of onset; higher levels of neuroticism and lower levels of extraversion and conscientiousness; and various inflammatory markers, low levels of vitamin D, metabolic syndrome and lower cortisol awakening response (Habets et al. 2023)

However, previous studies that have attempted to predict individual-level treatment response have not been very successful. Therefore, Habets et al. (2023) used multi-omics data (lipid-metabolomics, proteomics, transcriptomics, and genetics), demographic, physiological and clinical data in combination with a non-linear prediction method to capture the complex pathophysiology of MDD. The authors generated different prediction models and evaluated how well each model predicts MDD remission after two years.


Depression symptoms and response to treatment differs vastly from patient to patient and models attempting to predict treatment response have not been very successful to date.


804 participants from the Netherlands Study of Depression and Anxiety (NESDA) were included. All participants had a depression or dysthymia diagnosis, assessed by the Composite International Diagnostic Interview (CIDI) in the 6 months prior to participation and they were assessed again using the same tool at a 2 year follow up.

In addition to routine clinical data like depressive symptom severity, proteomics (available for n = 611), lipid-focused metabolomics (n = 790), transcriptomics (n = 669) and genetic data were collected (n = 701). For every set of data, a prediction model was trained separately with a non-linear technique, called XGBoost, using cross validation. This means that the model was repeatedly trained on subsets of the data and evaluated on the left-out data to assess how well the model can generalise to unseen data. Then, additional models were trained using a combination of the clinical data and each of the omics datasets as well as a using a combination of all datasets. The performance of the prediction model was assessed by the area under the receiver operating characteristic curve (AUROC) in a separate test set (20% of total sample). The AUROC is a measure how well you can perform the prediction, where 0.5 is equivalent to guessing and 1 means a strong prediction (i.e., high sensitivity and specificity).

To test the importance of each variable used in terms of their predictive ability, SHAP analysis was performed on the proteomics and proteomics plus clinical data models. Lastly, four clinical psychiatrists were asked to predict the likelihood of remission for 200 patients based on either 10 or 17 clinical variables.


So, which set of data best predicted treatment response when used in isolation?

  • All models had an accuracy level above chance. However, the model based solely on polygenic risk scores (PRS), a technique used to calculate one’s genetic risk of having a certain outcome, was biased towards false negative classifications. This means that the model tended to wrongly classify individuals who were in remission as having depression. You can read more about polygenic risk scores in this recent Mental Elf blog on Tourette Syndrome (Palmer, 2023).
  • The model trained solely on proteomic data had the highest AUROC of 0.67.
  • The next best model was based on 10 clinical variables (AUROC = 0.63). These included age, sex, years of education, and depressive symptom severity, as measured by the Inventory of Depressive Symptomatology – Self-Report (IDS-SR) in a continuous and categorical fashion, and five personality dimensions.
  • A model based on 63 clinical variables did not perform any better than the model based on 10 variables.

Combination of omics and clinical data

When clinical information was added to the omics datasets, the models all outperformed their respective individual models, which included only the omics data. The combination of clinical and proteomics data had the highest AUROC of 0.78. This was also the only combination of datasets where the difference in performance, when compared to the individual datasets, was found to be significant (p<0.05). Interestingly, all modalities together only showed an AUROC of 0.70.

When using linear prediction models, models based only on proteomics data showed poor predictive performance. Moreover, the addition of proteomics data to clinical data did not improve the predictive performance.

Variable importance analysis

In both models (proteomics alone and proteomics plus clinical data), fibrinogen showed the highest variable importance. From the clinical data, symptom severity at baseline was deemed most important in the model with clinical data alone and proteomics plus clinical data.

Protein-protein-interaction networks and pathway enrichment analyses were also calculated separately for the proteomic variables that were deemed important for the prediction in both models. Networks involved in the inflammatory response and lipid metabolism were found. These networks showed that pathways related to interleukin 10 signalling, chemokine signalling, cholesterol esterification and reverse cholesterol transport were most important for predicting remission outcome.

Clinician prediction of remission

For the purposes of comparison, clinicians were asked to predict the remission status for 200 patients based on clinical data. The clinicians’ ratings showed a low average accuracy of 0.51. Interestingly, providing more clinical variable information only marginally increased this prediction accuracy (0.55).

Both prediction models trained using this same clinical data outperformed the human raters (AUROC of 0.63 and 0.65, respectively).


Prediction models using both clinical and proteomic data showed the best performance. These models also both outperformed clinician remission predictions.


This study showed that combining datasets from different domains and using a non-linear model can improve prediction performance as compared to previously applied simpler approaches.

The authors conclude that:

this study shows that what is predictive of remission of MDD within 2 years is a combined signature of symptom severity, personality traits and immune and lipid metabolism related proteins at baseline.

Even though the balanced accuracy of 71% is still too low for clinical use, this model nevertheless performs better than predictions made by clinicians themselves. Therefore, this study can be seen as a starting point, highlighting which data types are most informative for machine learning models that should ultimately be tested in clinical trials.


This study suggests that symptom severity, personality traits and proteins related to immune and lipid metabolism can best predict depression remission after 2 years.

Strengths and limitations


  • The authors used a comparatively big dataset for which multi-omics data was measured and the prediction modelling was set up well with cross validation.
  • The pre-processing of the data was conducted in a manner that ensured no information was leaked from the training to the test set.
  • Additionally, they used a separate test set that was not part of the cross validation for model evaluation.
  • Lastly, they chose a long enough follow up timepoint of 2 years to allow for the evaluation of depression remission in a meaningful way.


  • The model was evaluated using patients from the same cohort as those included in the training set. This could inflate the model performance and reduce its generalisability.
  • Also, the mean accuracy of 71% is still too low for general clinical practice.
  • In addition, the number of predictor variables used need to be reduced so that they can be reliably and cost-effectively measured in a clinical laboratory.
  • Lastly, the most important proteomic analyte, fibrinogen, was below the lower limit of detection in nearly 70% of the samples. This is a regular occurrence when using the methods applied in this study, however, one needs to remain cautious when interpreting this finding as a result.

While this study was well conducted and has addressed several limitations of previous depression-focused prediction models, the overall accuracy of the final model is still too low to consider its use in general clinical practice.

Implications for practice

It is promising that this study shows that the addition of proteomics data to clinical data increases the accuracy of the model predictions for depression remission after two years. This shows that it is possible to find biomarkers that are related to the condition.

For researchers, it is interesting that the best omics dataset was proteomics. Most often, transcriptomics or genomics data are used because they can be easily measured in a high-throughput fashion and are relatively cost-effective. In contrast, proteomic measurements are not as common. In immunopsychiatry studies, often the protein concentration of only a few immune markers are measured (e.g. interleukin (IL) 1 alpha, IL-6, tumour necrosis factor alpha or C-reactive protein). Habets et al. (2023) provide evidence that researchers could benefit from using a more general proteomics approach.

For clinicians, this study shows that there is value in biomarker data. The increased accuracy of the predictions when proteomics were added to the clinical data is evidence of this. This particularly reigns true in comparison to the clinicians’ own prediction ratings. Perhaps the future lies in using machine learning models and multi-omics data to support practitioners in their remission outcome predictions and, ultimately, their treatment response predictions.


Machine learning models using multi-omics data could one day support clinicians in their predictions of remission outcome and treatment response in depression.

Statement of interests

No conflicts to declare.


Primary paper

Habets PC, Thomas RM, Milaneschi Y, Jansen R, Pool R, Peyrot WJ, Penninx BWJH, Meijer OC, van Wingen GA, Vinkers CH. (2023) Multimodal Data Integration Advances Longitudinal Prediction of the Naturalistic Course of Depression and Reveals a Multimodal Signature of Remission During 2-Year Follow-up. Biol Psychiatry. 2023 Dec 15;94(12):948-958. doi: 10.1016/j.biopsych.2023.05.024. Epub 2023 Jun 15. PMID: 37330166.

Other references

Gadad, Bharathi S., Manish K. Jha, Andrew Czysz, Jennifer L. Furman, Taryn L. Mayes, Michael P. Emslie, and Madhukar H. Trivedi. 2018. “Peripheral Biomarkers of Major Depression and Antidepressant Treatment Response: Current Knowledge and Future Outlooks.” Journal of Affective Disorders, Are there Biomarkers for Mood Disorders?, 233 (June): 3–14.

Musliner, Katherine L., Trine Munk-Olsen, William W. Eaton, and Peter P. Zandi. 2016. “Heterogeneity in Long-Term Trajectories of Depressive Symptoms: Patterns, Predictors and Outcomes.” Journal of Affective Disorders 192 (March): 199–211.

Palmer, E. Genetic risk for Tourette Syndrome and related conditions. The Mental Elf, 23 November 2023

Penninx, Brenda W. J. H., Willem A. Nolen, Femke Lamers, Frans G. Zitman, Johannes H. Smit, Philip Spinhoven, Pim Cuijpers, et al. 2011. “Two-Year Course of Depressive and Anxiety Disorders: Results from the Netherlands Study of Depression and Anxiety (NESDA).” Journal of Affective Disorders 133 (1): 76–85.

Photo credits

Share on Facebook Tweet this on Twitter Share on LinkedIn Share on Google+