Background: The high failure rate in phase III oncology trials is partly because the signal obtained from phase II trials is often weak. Several papers have considered the appropriateness of various phase II end-points for individual trials, but there has not been a systematic comparison using simulated data to determine which end-point should be used in which situation.
Methods: In this paper we carry out simulation studies to compare the power of several Response Evaluation Criteria in Solid Tumours (RECIST) response-based end-points for one-arm and two-arm trials, together with progression-free survival (PFS) and testing the tumour-shrinkage directly for two-arm trials. We consider six scenarios: (1) short-term cytotoxic therapy; (2) continuous cytotoxic therapy; (3 + 4) cytostatic therapy; (5 + 6) delayed tumour-shrinkage effect (seen in some immunotherapies). We also consider measurement error in the assessment of tumour size.
Results: Measurement error affects the type-I error rate and power of single-arm trials, and the power of two-arm trials. Generally no single end-point performed well in all scenarios. Best observed response rate, PFS and directly testing the tumour-shrinkages performed best for a number of scenarios. PFS performed very poorly when the effect of the treatment was short-lived. In scenario 6, where the delay in effect was long, no end-point performed well.
Conclusions: A clinician setting up a phase II trial should consider the likely mechanism of action the drug will have and choose an end-point that provides high power for that scenario. Testing the difference in tumour-shrinkage is often powerful. Alternative end-points are required for therapies with a long delayed effect.