Measuring treatment effects in dementia studies: towards a consistent approach


It is now well accepted across the health and social care communities that the incidence of dementia is rising as people continue to live longer. The projected prevalence of dementia over the next ten to twenty years is causing widespread concern at all levels of policy making and care provision. There is a very real concern that care and treatment services could become overwhelmed, to the detriment of all.

Clearly a situation such as this requires the development of evidence-based, cost-effective interventions to prevent dementia, slow its development and manage its symptoms. In the midst of this we must not neglect the needs of the growing population of carers, who bear a huge burden in looking after people with dementia in the community.


The breadth and volume of dementia research can make it hard to pool findings in meta studies

The research base for interventions to tackle dementia has grown hugely over the last ten years. A consequence of this is that the plethora of trials and studies has given birth to a large, heterogenous group of measures and tests which are used to assess the effect of interventions under investigation. This heterogeneity limits the capacity to compare the outcomes of clinical trials, a crucial step in agreeing those interventions to be recommended at policy-making level, and specifying and managing their introduction.

I will not dwell on the nature of Randomised Controlled Trials (RCTs) other than to note that they remain the gold standard in investigating treatments and their effectiveness, this blog is concerned with a Systematic Review published by Bossers et al (2012). A Systematic Review is a study which gathers together all the clinical trials relating to a specific research question, and through detailed analysis of these, tries to synthesize all the high quality research evidence relevant to that question. This analysis is then generally followed by a statement of effectiveness and applicability which helps to direct further research and/or policy.

Bossers et al did just this to investigate the spectrum of neuropsychological and exercise tests used in clinical trials to assess cognition and physical fitness in older people with dementia. The aim of this review was to distil out a list of recommended tests, in order to bring an element of consistency and standardisation to these trials, thus facilitating their superior comparability and applicability in practice (Bossers et al, 2012).

There are a number of tools available to critically appraise a Systematic Review, for example the Critical Appraisal Skills Programme tool (CASP, 2010). Some people will design their own appraisal tools based on a combination of others (I am happy to share mine if anyone is interested). There are three key elements of appraising a Systematic Review – Validity, Reliability and Applicability.


In respect of Validity, this review does very well. The research question is clear and concise, and the process of identifying and selecting trials is clearly set out, with an all important flow chart present and correct. The quality of the studies is assessed using the PEDro tool which is well recognised and validated (De Morton, 2009), and effect sizes are measured using the Cohen coefficients which are standard statistical tools in this context. 89 RCTs were included after screening, these containing 59 neuropsychological tests and 10 exercise tests, and with 45 reliability and validity studies. Large heterogeneity was found (confirming the pre-study assumption).


In respect of Reliability, the results (which I will list below) are set out robustly and systematically, with an excellent series of tables listing the vital metrics assessed (tables such as these are often crucial when investigating the effectiveness of an intervention in a commissioning setting). There is a clear line of sight from results to conclusions. I must admit I struggled slightly with the identification and reduction of the impact of chance on the conclusions but I suspect that is due to my limitations as a statistician and ingrained need to see confidence intervals everywhere rather than any fault of the study authors!


As regards Applicability, the outcomes are systematically set out and I can see no reason why the results could not be applied to any generic UK population.

The results were as follows:

Neuropsychological tests:

This review found that so many different memory tests were used, which made it impossible to recommend any one test over the others

  • Global cognitive tests were used more often in comparison with neuropsychological tests that measure one specific domain
  • The global cognitive tests Mini Mental State Examination (MMSE), Alzheimer Disease Assessment Scale – cognitive subscale (ADAS-cog) and the Severe Impairment Battery (SIB) were recommended to measure global cognition
  • The Verbal Fluency Test Category/Letters, Clock Drawing Test and Trail Making Test-B were recommended to measure executive functioning
  • The Digit Span Forward, Digit Span Backward and Trail Making Test-A were recommended to measure attention
  • No specific memory test could be recommended due to the large heterogeneity that was found in memory test use.

Physical Exercise Tests:

  • The Timed Up and Go and Six Meter Walk was recommended for mobility, the Six Minute Walk Distance was recommended for endurance capacity, and the Tinetti Balance Scale was also recommended

The authors conclude by stating that the list of recommended tests they set out may lead to a more evidence-based choice of tests to be used in further studies. They do add the important caution that psychometric quality is in many cases still insufficient and that tests should be selected with care – as such researchers are advised to select the recommended tests that most closely fit their study objectives. This leads us to perhaps the most important question in appraising any Systematic review – the ”So What?” that I have referred to before, i.e. should policy or practice change as a result of evidence contained within the review?

So what?

The authors do not claim that their findings should signal a paradigm shift in dementia research, and indeed their recommendations are directed squarely at the research community. I would imagine that dementia researchers will welcome this study and I plan to pursue this further with colleagues in Bradford and Airedale. I do not feel that this study can be used by frontline clinicians other than as an adjunct to their broader knowledge and skills, the authors are clear that this should represent the limits of its direct application.


  1. Bossers W, van der Woude L, Boersma T, Scherder E, van Heuvelen M. Recommended Measures for the Assessment of Cognitive and Physical Performance in Older Patients with Dementia: A Systematic ReviewDement Geriatr Cogn Dis Extra 2012;2(1):589-609
  2. CASP Systematic Review Appraisal Checklist (accessed 15/08/13)
  3. De Morton N. The PEDro scale is a valid measure of the methodological quality of clinical trials: a demographic study (PDF). Australian Journal of Physiotherapy 2009;55:129-133.
Share on Facebook Tweet this on Twitter Share on LinkedIn Share on Google+