A large clinical trial might be said to resemble an ocean liner, which is leaving Southampton to sail to New York. It is a complicated system. There will be a captain on the bridge and a large crew. They will have filed a course some time before they set sail, but not everything can be foreseen in advance; the weather for example. Sometimes small corrections to the route will need to be made en voyage. These will be meticulously recorded, authorised by the ship’s owners, otherwise known as the Trial Steering Committee, which can overrule the captain if needed. Occasionally something happens; the vessel is seaworthy, the cabins are ready, and the band has started playing, but the ship never sets out, most often because they can’t get enough passengers on board to make the voyage worthwhile. Very occasionally there is a shipwreck, but this is very rare. Few trials suffer the fate of the Titanic, but sometimes the ship gets to the USA, but not to New York, but some other place; destination changed en route, which is considered bad form. In that case, people may debate for years afterwards what actually pushed them off course, and what that means. But most often the ship does eventually dock in New York, with satisfied passengers, and a tired but relieved crew.
Such a ship was the clinical trial known as PACE, standing for Pacing, graded Activity, and Cognitive behaviour therapy: a randomised Evaluation. It was a very large ship, one of the largest of its kind, and its voyage was undoubtedly one of the choppiest crossings ever. It certainly did make it to the other side, completing its voyage, with a full complement of passengers, and hardly any washed overboard (“lost to follow up”), but its troubles were not over. Indeed even safely in harbour it continues to be buffeted to this day. The question is whether or not it is still shipshape, and whether or not its voyage fulfilled its goals. Some claimed even before the ship had sailed that it should stay in port; one of the main patient organisations in this country campaigned for that before a single passenger got on board. Others have said that the ship struck an iceberg on the way, and even though it limped into New York, all the passengers and crew had a wasted voyage and nothing of benefit emerged.
In this blog I will argue that HMS PACE did make it successfully across the Atlantic. Small corrections to the route taken were made on the way, but these were of little significance. The fundamental mechanics of the ship remained water tight and at no time were the ship or its passengers in peril until it safely docked exactly where it was supposed to. Storms continue to buffet the ship even as it remains in harbour, but none of these have damaged the ship to impair its seaworthiness.
I was not on the ship, neither as passenger or crew. I helped recruit some patients to the study from our clinic, as did many doctors, but that was as far as it went. I am not an author on the ship’s log, but I am not a neutral observer. I know a lot about how these ships sail (Everitt and Wessely, 2003). I know a lot about the passengers, those with the illness known as chronic fatigue syndrome (CFS), because I have seen what must be a few thousand now as a doctor who used to research the illness and continues to see sufferers in the clinic. I have done a few voyages similar to the one undertaken by PACE (e.g. Deale et al, 1997), but not in such a large and complex boat, at least not for this illness. I also make no secret of the fact that I know some members of the crew well. I have worked happily with many of them over the years, and in particular I consider the most senior officers on board this particular ship to be personal friends. Do I have competing interests? Sure I do.
Back on dry land
So moving on from our nautical analogy, I am well informed about clinical trials in general, and about the issues that surround chronic fatigue syndrome in particular. I have previously made it clear that I think that PACE was a good trial; I once described it as a thing of beauty. In this blog I will describe why I still think that and I will try and avoid very technical issues, which have been addressed by the investigators on many occasions. Here is a recent response to criticisms, few of them new.
Nor will I drown the reader in the details of the trial itself, except where necessary. Again, here is a link to the main paper of the trial, so you can check what I say against the main record (White et al, 2011) and here is the follow-up paper published last week (Sharpe et al, 2015).
Finally I will not discuss some of the wider issues that the trial raises, or the wider debate on chronic fatigue syndrome. I will simply state that CFS is a genuine illness, can cause severe disability and distress, affects not just patients but their families and indeed wider society, as it predominantly affects working age adults, and its cause, or more likely causes, remains fundamentally unknown. I do not think that chronic fatigue syndrome is “all in the mind”, whatever that means, and nor do the PACE investigators. I do think that, as with most illnesses, of whatever nature, psychological and social factors can be important in understanding illness and helping patients recover. Like many of the PACE team, I have run a clinic for patients with chronic fatigue syndrome for many years. Like the PACE investigators, I have also in the past done research into the biological nature of the illness; research that has indicated some of the biological abnormalities that have been found repeatedly in CFS.
What was the PACE trial and what were the main results?
The PACE trial randomly allocated 641 patients with chronic fatigue syndrome, recruited in six clinics across the UK, into one of four treatments. Everyone received Specialist Medical Care (SMC), where specialist doctors gave advice on managing the illness and may have prescribed medication for symptoms.
- One group received SMC alone; the other three groups also received a therapy:
- Adaptive Pacing Therapy (APT) where patients adapt their lives to live better with the limits of their condition;
- Cognitive Behaviour Therapy (CBT) where the therapy aims to help patients explore different ways to understand their illness and cope actively; or
- Graded Exercise Therapy (GET) where patients receive help to gradually increase the time they are physically active and then the activity’s intensity. All were followed up until one year after they entered the trial.
What were its main findings? These were simple:
- That both cognitive behaviour therapy (CBT) and graded exercise therapy (GET) improved fatigue and physical function more than either adaptive pacing therapy (APT) or specialist medical care (SMC) a year after entering the trial.
- All four treatments were equally safe.
These findings are consistent with previous trials (and there are also more trials in the pipeline), but PACE, because of its sheer size, has attracted the most publicity, both good and bad. It has already been used as an example of how to conduct a large complex intervention, and has been cited 219 times in Scopus. But of course it has also been subjected to what in my experience is an unprecedented campaign of criticism, which sometimes has merged into something approaching vilification that goes well beyond a reasoned scientific critique.
Reading some of the criticism, I am struck that some of the critics are not familiar with the fundamental strengths of the randomised control trial, and why medicine continues to value it so highly. Likewise, some show unfamiliarity with the core methodological components that contribute to the integrity of a clinical trial, and whose violation calls into question the findings, as compared to what one might call secondary less important features. In other words, what distinguishes a good trial whose results are likely to be sound from one in which there is a definite risk of bias. And so returning to my nautical analogy, what are the main pitfalls that might occur from the moment the naval architects start to design the ship, to it coming safely to rest in New York harbour?
What makes a good trial and how does PACE measure up?
So what does the literature on randomised controlled trials tell us about the factors that are known to influence or bias the results of trials?
a. Allocation concealment
Far and away the most important is allocation concealment; the ability of investigators/patients to influence the randomisation process (a computer algorithm organised independently of the investigators). If trials are to be judged by one quality alone, there is agreement that it would be allocation concealment (Schultz & Grimes, 2002a and b). When this is violated, it calls into question all the findings of a trial, and considerably increases the risk of error. No one has criticised allocation concealment in PACE, it was exemplary.
Next comes power. A study needs to be big. If a study is small then it might well decide that a treatment is not effective when actually it is. Alternatively, it might do the opposite; find something works which doesn’t. Randomisation can’t overcome the chances of a maverick result in a small sample size. None of this applies to PACE. It was planned to recruit 600 patients to four arms and over recruited. Predetermined sample size calculations showed it had plenty of power to detect clinically significant differences. It was one of the largest behavioural or psychological medicine trials ever undertaken. No one has criticised its size.
c. Loss to follow-up
The next thing that can jeopardise the integrity of a trial is major losses to follow up, which would reduce the ability of a trial to deliver a significant result (i.e. loss of power). That is bad enough; it reduces the efficiency of a trial, and might mean that nothing much can be concluded from the results. But the situation would be worse if follow up is also biased by allocation. That would happen if people receiving one treatment were more likely to be followed up than those in a different arm. This would introduce bias, rather than waste, and can invalidate the results even if statistically significant. The key end point in PACE was pre-defined as the one year follow up. 95% of patients provided follow up data at this stage. I am unaware of any large scale behavioural medicine trial that has exceeded this. Again, no one has questioned this, and indeed one of the fiercest critics has specifically praised this. Even more importantly, what little loss to follow up there was did not differ between the treatment arms. So again, we can have confidence in the main results.
d. Treatment infidelity
Next comes treatment infidelity, which is where participants do not get the treatment they were allocated to. PACE had a series of checks on this, including therapy supervisors listening to randomly chosen audio-recordings during the trial, and providing feedback to therapists. At the end of the trial, two independent scrutineers, masked to treatment allocation, both rated over 90% of the randomly chosen 62 sessions they listened to as the allocated therapy. Only one session was thought by both scrutineers not to be the right therapy. Again, no criticism has been made on the basis of therapy infidelity.
e. Analytical bias
The analytical protocol was predetermined (before the analysis started) and published. Two statisticians were involved in the analysis, blind to treatment group until the analysis was completed and signed off. So again, the chances of bias being introduced at this stage are also negligible.
f. Post-hoc sub-group analysis (fishing for significant differences)
This often happens when investigators are frustrated when their main hypothesis is not supported. Especially in large trials, they can then go looking for particular sub-groups which might have responded to the treatment, even if overall there was no effect. A landmark paper on this is the classic analysis of a massive cardiology trial that showed there were significant differences in responses to treatment according to signs of the Zodiac (ISIS 2, 1998). The only sub-group analyses undertaken in the main PACE paper were pre-specified and showed that the outcomes were similar in those patients who met two other definitions of CFS. There were no post-hoc sub-group analyses in the main outcome paper. A couple of sub-group post-hoc analyses were done in follow up publications, and clearly identified as such and appropriate cautions issued. None concerned the main outcomes. Again, no one has raised the issue of sub-group analyses.
Trials can be rated as single, double or even triple blind. This means at the patients, clinicians and raters either know or don’t know which treatment is which. PACE was not blinded; the therapists and patients knew what treatments were being given, which would be hard to avoid. This has been raised by several critics, and of course is true. It could hardly be otherwise; therapists knew they were delivering APT, or CBT or whatever, and patients knew what they were receiving. This is not unique to PACE. It is true in any trial of a psychological, behavioural or surgical intervention for example. Indeed, it turns out to be true in many trials of drug treatments as well, since it is difficult and sometimes impossible to remove recognition of a treatment medicine because of the impact of side effects.
So patients knew what they were getting. This is what would happen in real life, which is what the PACE trial was trying to recreate. Did this matter? One way is to see whether there were differences in what patients thought of the treatment, to which they were allocated, before they started them. There might be problems if one treatment was thought to be better than another, whether rightly or wrongly. Expectations can influence the outcomes, especially in psychological treatments, which is why so called patient preference trials, in which patients chose the intervention they prefer – give results that can be difficult to interpret, which indeed is an issue around the longer term outcomes of PACE after the end of the formal follow up (see the references below). Randomisation removes the worst of this problem, since patients by definition cannot select what they get. But if they still have higher or lower expectations of one treatment over another, it can still matter. And that did happen in the PACE trial itself. One therapy was rated beforehand by patients as being less likely to be helpful, but that treatment was CBT. In the event, CBT came out as one of the two treatments that did perform better. If it had been the other way round; that CBT had been favoured over the other three, then that would have been a problem. But as it is, CBT actually had a higher mountain to climb, not a smaller one, compared to the others.
So far then, I would suggest that PACE has passed the main challenges to the integrity of a trial with flying colours. If we check it against any of the many rating scales that exist for randomised controlled trials, it comes out well, losing points only on the issue of blinding, as do most trials in surgery or psychiatry, and every trial in clinical psychology, social interventions or health psychology. For example, the two most recent systematic reviews in this field rated PACE as good quality, with a low risk of bias, much the same as numerous previous systematic reviews have rated PACE’s predecessors (Larun et al 2015: Smith et al, 2015).
Response to other criticisms of PACE
So now we move on to lesser issues. I say these are lesser issues because that is what the literature on randomised trials says. Not unimportant, but not likely to affect the fundamental integrity of a trial, nor the confidence with which we can view the results. The major criticisms that have been often repeated and can be crystallised as follows:
a. Entry criteria too broad
The criteria for deciding who had the illness were too broad and included people who did not have CFS
There is no “gold standard” definition of chronic fatigue syndrome. Some 20 definitions have been published. The PACE trial used the Oxford criteria. These are broad criteria, chosen to include as many clinic attenders with CFS as possible. Randomisation will have made sure that this had no impact on the main results of the trial. It might potentially influence what is known as generalisation; how much do the results of the trial apply to other populations? Measures taken during the trial allowed the authors to see if the results would have different if other narrower criteria had been used. The answer was no. However, because all patients were recruited in various clinics, those who could not attend clinics were not included. Although there are some case reports of, for example, bed bound patients receiving similar therapies to those tested in PACE, there is no suggestion that the results of PACE can be generalised to such patients. So the findings only apply to those patients who were able to attend clinic regularly, and not to bed-bound patients. The important point to remember is that the choice of criteria did influence generalizability (not to bed bound patients) but did not influence the key findings of the study.
The researchers also used stringent procedures to ensure that those people with another diagnosis that would explain their fatigue were excluded (White et al, 2007).
b. Incremental point change in entry criteria
There was a one incremental point change in the entry criteria for physical disability, introduced 11 months after starting the trial in order both to include those who would normally be offered treatment and in order to boost recruitment.
As before, randomisation would ensure that this would not have any impact on the main findings.
c. Patient newsletter
A patient newsletter sent during the trial included some positive feedback from patients that particularly affected expectations of CBT.
A PDF of the patient newsletter is freely available.
You can see there were six comments from patients, praising the trial, their therapy, their treatment and research staff. The important thing is that all four arms were represented and no treatment or therapy was named. There is a quote from Number 10, Downing Street praising the trial, with no treatments named, which was a response to a public petition to stop the trial, a reminder if one is needed of the external and unpleasant atmosphere. Finally there is a quote from a doctor praising a therapy “which I know is recommended for CFS”. The newsletter was written after all 641 patients had been recruited. At that time just 30 or so would have still been receiving CBT.
There is no way of knowing how many people actually read the newsletter; my own experience of similar newsletters is not encouraging. It is also rather implausible that reading brief anonymous feedback would really have had much influence over and above all the direct one to one sessions with trial therapists and all the other material provided to participants. Even if it did, again, that would be immaterial unless it specifically impacted on any particular therapeutic approach. This seems unlikely. The patient comments came from all four arms of the trial and no treatment or therapy was named, so it is very unlikely that any bias towards one arm or another would follow from this. Perhaps the quote from Downing Street had a positive influence on Labour supporters and the opposite on Conservative; that seems unlikely as well, but even more unlikely is that more Labour supporters were receiving CBT and so on. The medical quote could have referred to any of the therapies, since all three, including activity management and pacing, were recommended by NICE.
But frankly, it all seems far fetched. If a few encouraging anonymous sentences are all it takes to improve outcomes in CFS, then one wonders why we are bothering with all these complex treatments anyway. It is more plausible that there would have been a negative impact from some of the relentless negative publicity out there, some of it backed by one of the UK patient associations, and which was explicitly aimed against CBT and GET, and in favour of pacing, and some of it also specifically directed against PACE.
d. It’s good to talk
Non-specific effects of seeing a therapist (i.e. just having someone to talk to helps).
These are certainly important, but could only impact on the key results if they differed between the treatment groups. The team ensured that all therapists were well trained and supervised, and that the total number of treatment sessions and time offered was the same across all therapies. Patients reported that they were similarly satisfied after all three therapies (APT, CBT and GET). Independent scrutineers of audio-recordings of therapies reported that the therapies had been delivered as designed. So it seems improbable that the differences between the therapy groups could be due to non-specific effects.
e. Changes to original protocol
The researchers changed the way they scored and analysed the primary outcomes from the original protocol.
The actual outcome measures did not change, but it is true that the investigators changed the way that fatigue was scored from one method to another (both methods have been described before and both are regularly used by other researchers) in order to provide a better measure of change (one method gives a maximum score of 11, the other 33). How the two primary outcomes (fatigue and physical function) were analysed was also changed from using a more complex measure, which combined two ways to measure improvement, to a simple comparison of mean (average) scores. This is a better way to see which treatment works best, and made the main findings easier to understand and interpret. This was all done before the investigators were aware of outcomes and before the statisticians started the analysis of outcomes. The changes were approved by the two independent oversight committees. The very detailed analysis plan, including these changes, was published, and these changes and the reasons for them were also described in the main paper.
f. Interpreting the follow-up study
The 2.5 year follow up is hard to interpret because it was no longer randomised.
Correct. The study team has just published the results of the 2.5 year follow up (Sharpe et al 2015). It has been criticised because after the end of the main study (one year post-treatment) participants were then able to choose further treatments if they wished, thus breaking the randomisation. As those who do trials know, it is unethical (as well as impossible) to deny this and indeed this was mandated for PACE. The findings of the long term follow up are clear. There was no deterioration from the one year gains in patients originally allocated to CBT and GET. Meanwhile those originally allocated to SMC and APT improved so that their outcomes were now similar. What isn’t clear is why. It may be because many had CBT and GET after the trial, but it may not. Whatever the explanation for the convergence , it does seem that CBT and GET accelerate improvement, as the accompanying commentary pointed out (Moylan, 2015).
No trial is perfect. Nothing as complex as a multi-centre trial (there were six centres involved), that recruited 641 people, delivered thousands of hours of treatment, and managed to track nearly all of them a year later, can ever be without some faults. But this trial was a landmark in behavioural complex intervention studies. That is why it survived all the independent scrutiny as it progressed, survived the rigorous review processes of one of the world’s top medical journals, which rejects nearly all the papers it receives, and this is why it has already been cited in over 200 medical publications. But even then, one trial does not a summer make, and one needs to see it as part of the totality of similar trials before and since.
Were the results maverick? Did PACE report the opposite to what has gone before or happened since? The answer is no. It is a part of a jigsaw (admittedly the biggest piece) but the picture it paints fits with the other pieces. I think that we can have confidence in the principal findings of PACE, which to repeat, are that two therapies (CBT and GET) are superior to adaptive pacing or standard medical treatment, when it comes to generating improvement in patients with chronic fatigue syndrome, and that all these approaches are safe.
What does that mean? Is it important? Does it matter? That is a matter of judgement for you, the reader to decide. Here is my take; you are welcome to make your own. I think this trial is the best evidence we have so far that there are two treatments that can provide some hope for improvement for people with chronic fatigue syndrome. Furthermore the treatments are safe, so long as they are provided by trained appropriate therapists who are properly supervised and in a way that is appropriate to each patient. These treatments are not “exercise and positive thinking” as one newspaper unfortunately termed it; these are sophisticated, collaborative therapies between a patient and a professional. Having said that, there were a significant number of patients who did not improve with these treatments. Some patients deteriorated, but this seems to be the nature of the illness, rather than related to a particular treatment.
It would be nice to think that perhaps we can now all move on. We can accept that PACE was a good trial and we can have some confidence in its findings, for all the reasons outlined above. We can differ in our views as to whether this matters; whether it was all worth it. And then we can come together again and agree that PACE or no PACE, we need more research to provide treatments for those who do not respond to presently available treatments. The PACE trial will not be the last word. But it is the best we have for now.
White, PD et al. (2011) Comparison of adaptive pacing therapy, cognitive behaviour therapy, graded exercise therapy, and specialist medical care for chronic fatigue syndrome (PACE): a randomised trial. The Lancet , Volume 377 , Issue 9768 , 823 – 836 doi:10.1016/S0140-6736(11)60096-2
Sharpe M et al. (2015) Rehabilitative treatments for chronic fatigue syndrome: long-term follow-up from the PACE trial. The Lancet Psychiatry 2015 28th October online. doi:10.1016/S2215-0366(15)00317-X
[Note: both of the above papers are free to access, but you may need to register on The Lancet website to download them].
Deale A, Chalder T, Marks I, Wessely S. A randomised controlled trial of cognitive behaviour therapy for chronic fatigue syndrome. Am J Psychiatry 1997; 154:408-414. [PubMed abstract]
Everitt B, Wessely S. Clinical Trials in Psychiatry. Oxford University Press, 2003.
ISIS-2 Collaborative Group. Randomized trial of intravenous streptokinase, oral aspirin, both or neither among 17,187 cases of suspected acute myocardial infarction: ISIS-2. Lancet. 1988; 2: 349–360. [PubMed abstract]
Larun L, Brurberg KG, Odgaard-Jensen J, Price JR. (2015) Exercise therapy for chronic fatigue syndrome. Cochrane Database of Systematic Reviews 2015, Issue 2. Art. No.: CD003200. DOI: 10.1002/14651858.CD003200.pub3.
Moylan, S et al. (2015) Chronic fatigue syndrome: what is it and how to treat? The Lancet Psychiatry commentary 27/10/15 doi:10.1016/S2215-0366(15)00475-7
Schultz K, Grimes D. (2002). Allocation concealment in randomised trials: defending against deciphering. Lancet 359: 614-618. doi:10.1016/S0140-6736(02)07750-4
Schultz, K. and D. Grimes (2002). Blinding in randomised trials: hiding who got what (PDF). Lancet 359: 696-700.
Smith MB et al. (2015) Treatment of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome: A Systematic Review for a National Institutes of Health Pathways to Prevention Workshop. Ann Intern Med. 162: 841-850. doi: http://dx.doi.org/10.7326/M15-0114.
White PD et al. (2007) Protocol for the PACE trial: a randomised controlled trial of adaptive pacing, cognitive behaviour therapy, and graded exercise, as supplements to standardised specialist medical care versus standardised specialist medical care alone for patients with the chronic fatigue syndrome/myalgic encephalomyelitis or encephalopathy. BMC Neurol 7:6. doi: http://dx.doi.org/10.1186/1471-2377-7-6