NLST-637: Improve Lung Cancer Prediction by Training Multi-task Self-supervised Pre-training on … - Approved Projects

Studies on CDAS

Additional Studies...

More Information

Principal Investigator

Name

Yi Chen

Degrees

M.D

Institution

ASUS AICS

Position Title

Machine Learning Engineer

yi1_chen@asus.com

About this CDAS Project

Study

NLST (Learn more about this study)

Project ID

NLST-637

Initial CDAS Request Approval

Feb 19, 2020

Title

Improve Lung Cancer Prediction by Training Multi-task Self-supervised Pre-training on Large Unlabeled Data

Summary

Supervised Image classification algorithm by deep learning got a huge success in the natural image dataset, such as ImageNet. However, scarce annotation is a common problem in the medical image. Since medical image annotations need to come from doctors or radiologists, it is hard to increase annotations quickly. To deal with this difficulty, some researchers focus on training a self-supervised pre-training model by giving a surrogate label, which can be a scalar[1] or image[2]. Others try to improve model performance by multi-task learning, which feeds several image sources to model. It can be different tasks[3] or different modalities[4], like CT, MRI. We provide a new idea to train a self-supervised pre-training model by multi-task learning. We first design several surrogate labels for unlabeled images and then training the neural network with different "heads", which can be a decoder or fully connected layer connected to the encoder part, simultaneously. This architecture regularizes the encoder part to learn prior information in the medical image by satisfying different pre-defined surrogate classifiers or decoders. After that, we can finetune this pre-training encoder or the whole model to get better performance and faster convergence in lung cancer prediction. Therefore, we need a huge amount of unlabeled data like NLST to train this self-supervised model.

[1] Tajbakhsh, Nima, et al. "Surrogate supervision for medical image analysis: Effective deep learning from limited quantities of labeled data." *2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019)*. IEEE, 2019.
[2] Zhou, Zongwei, et al. "Models genesis: Generic autodidactic models for 3d medical image analysis." *International Conference on Medical Image Computing and Computer-Assisted Intervention*. Springer, Cham, 2019.
[3] Jaeger, Paul F., et al. "Retina U-Net: Embarrassingly simple exploitation of segmentation supervision for medical object detection." *arXiv preprint arXiv:1811.08661* (2018).
[4] Chen, Sihong, Kai Ma, and Yefeng Zheng. "Med3d: Transfer learning for 3d medical image analysis." *arXiv preprint arXiv:1904.00625* (2019).

Aims

- Generate a self-supervised pre-training model for better performance and faster convergence by multi-task learning.
- Validate the performance and convergence on other open datasets, like LUNA16 or Kaggle.

Collaborators

Allen Kao (ASUS AICS, Allen1_Kao@asus.com)
HungWei Chen (ASUS AICS, Hungwei_Chen@asus.com)