Skip to Main Content

COVID-19 is an emerging, rapidly evolving situation.

What people with cancer should know:

Get the latest public health information from CDC:

Get the latest research information from NIH:

Principal Investigator
Yi Chen
Position Title
Machine Learning Engineer
About this CDAS Project
NLST (Learn more about this study)
Project ID
Initial CDAS Request Approval
Feb 19, 2020
Improve Lung Cancer Prediction by Training Multi-task Self-supervised Pre-training on Large Unlabeled Data
Supervised Image classification algorithm by deep learning got a huge success in the natural image dataset, such as ImageNet. However, scarce annotation is a common problem in the medical image. Since medical image annotations need to come from doctors or radiologists, it is hard to increase annotations quickly. To deal with this difficulty, some researchers focus on training a self-supervised pre-training model by giving a surrogate label, which can be a scalar[1] or image[2]. Others try to improve model performance by multi-task learning, which feeds several image sources to model. It can be different tasks[3] or different modalities[4], like CT, MRI. We provide a new idea to train a self-supervised pre-training model by multi-task learning. We first design several surrogate labels for unlabeled images and then training the neural network with different "heads", which can be a decoder or fully connected layer connected to the encoder part, simultaneously. This architecture regularizes the encoder part to learn prior information in the medical image by satisfying different pre-defined surrogate classifiers or decoders. After that, we can finetune this pre-training encoder or the whole model to get better performance and faster convergence in lung cancer prediction. Therefore, we need a huge amount of unlabeled data like NLST to train this self-supervised model.

[1] Tajbakhsh, Nima, et al. "Surrogate supervision for medical image analysis: Effective deep learning from limited quantities of labeled data." *2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019)*. IEEE, 2019.
[2] Zhou, Zongwei, et al. "Models genesis: Generic autodidactic models for 3d medical image analysis." *International Conference on Medical Image Computing and Computer-Assisted Intervention*. Springer, Cham, 2019.
[3] Jaeger, Paul F., et al. "Retina U-Net: Embarrassingly simple exploitation of segmentation supervision for medical object detection." *arXiv preprint arXiv:1811.08661* (2018).
[4] Chen, Sihong, Kai Ma, and Yefeng Zheng. "Med3d: Transfer learning for 3d medical image analysis." *arXiv preprint arXiv:1904.00625* (2019).

- Generate a self-supervised pre-training model for better performance and faster convergence by multi-task learning.
- Validate the performance and convergence on other open datasets, like LUNA16 or Kaggle.


Allen Kao (ASUS AICS,
HungWei Chen (ASUS AICS,