Cancer Survival Prediction From Whole Slide Images With Self-Supervised Learning and Slide Consistency.
Histopathological Whole Slide Images (WSIs) at giga-pixel resolution are the gold standard for cancer analysis and prognosis. Due to the scarcity of pixel- or patch-level annotations of WSIs, many existing methods attempt to predict survival outcomes based on a three-stage strategy that includes patch selection, patch-level feature extraction and aggregation. However, the patch features are usually extracted by using truncated models (e.g. ResNet) pretrained on ImageNet without fine-tuning on WSI tasks, and the aggregation stage does not consider the many-to-one relationship between multiple WSIs and the patient. In this paper, we propose a novel survival prediction framework that consists of patch sampling, feature extraction and patient-level survival prediction. Specifically, we employ two kinds of self-supervised learning methods, i.e. colorization and cross-channel, as pretext tasks to train convnet-based models that are tailored for extracting features from WSIs. Then, at the patient-level survival prediction we explicitly aggregate features from multiple WSIs, using consistency and contrastive losses to normalize slide-level features at the patient level. We conduct extensive experiments on three large-scale datasets: TCGA-GBM, TCGA-LUSC and NLST. Experimental results demonstrate the effectiveness of our proposed framework, as it achieves state-of-the-art performance in comparison with previous studies, with concordance index of 0.670, 0.679 and 0.711 on TCGA-GBM, TCGA-LUSC and NLST, respectively.