Figure 4

Combining datasets or tumours and mean-centering significantly increases prognostic prediction. A, Before mean batch-centering. B, After mean batch-centering. The R 2 statistic (Cox proportional hazards model) is an assessment of the performance of the predictor generated using each combination of training datasets and the remaining test datasets, generated using supervised principal components analysis. Median values are used where a training dataset was used to assess more than one test dataset (up to 5). R 2 and p-value results for all possible combinations of training datasets and test datasets (1016) are given in the matrix in Additional File 6.