Longitudinal Deep Radiomic Signatures for Lung Cancer Prognosis and Treatment Response Prediction (renewal of project NLST-256)

Principal Investigator

Name
Lisa Di Jorio

Degrees
Ph.D

Institution
Imagia Cybernetics

Position Title
Director of AI Research and Strategy

Email
lisa@imagia.com

About this CDAS Project

Study
NLST (Learn more about this study)

Project ID
NLST-761

Initial CDAS Request Approval
Mar 9, 2021

Title
Longitudinal Deep Radiomic Signatures for Lung Cancer Prognosis and Treatment Response Prediction (renewal of project NLST-256)

Summary
This project is a renewal and an extension of NLST-256, approved on November 18th, 2016.
The two years have witnessed an explosive growth in the applications of deep convolutional neural network (DCNN) architectures to medical image analysis (e.g. IEEE Transactions on Medical Imaging May 2016 special issue, and papers cited therein). Several of those studies demonstrate the potential of DCNN models for lung nodule detection and characterization, with some results suggesting a performance that can exceed that of models based on traditional imaging features and shallow classifiers (Anthimopoulos et al., 2016; Setio et al., 2016).

It is well known that DCNNs are very data hungry architectures — and an assessment of the best architecture for a given task might depend strongly on whether one trains on 1000 or 50,000 patient records. Many recent studies of computer aided-diagnosis (CAD) for lung nodule detection make use of the LIDC/IDRI dataset (Armato et al., 2011); however, with about 1000 patients, this dataset might be too small to fully understand algorithm performance, and particularly the behavior on rare nodule types. A much larger dataset would allow to provide a finer understanding of the trade offs involving algorithm complexity versus accuracy, for different training dataset sizes.

Moreover, recent progress in the field of radiomics has shown promise in improving diagnosis and patient stratification in lung cancer (Scrivener et al., 2016). Radiomics aims at the characterization of tumor phenotypes from the analysis of quantitative imaging features, such as textures and tumor shape, and such features have been associated with specific mutations in lungs (Liu et al., 2016) and a number of other cancers (Aerts et al., 2014). So far, most work on radiomics has been centered on feature extraction from single images, without much consideration for the sequence of features that may be extracted from a multi-year longitudinal patient screening and follow-up effort. In addition, the current published works on radiomics continue to use hand-engineered features, instead of features trained to be maximally predictive of a clinical outcome.

This project aims at bridging this gap by evaluating whether DCNNs can serve to extract predictive radiomics features, either from individual scans and/or pathology slices, or given a (longitudinal) sequence thereof. To this end, we will correlate features extracted from DCNNs trained on the task of detecting lung nodules with available patient outcome results, including patient survival, diagnostic procedures undertaken, medical complications, lung cancer stage, and treatment administered. We intend to evaluate a number of DCNN architectures for this task, including multitask and transfer learning models that have proven successful in a medical imaging context (Tajbakhsh et al., 2016), as well as automatic models discovery (PHAM et al., 2018)

If successful, this work will assert the usefulness of DCNNs to provide a general modelling framework to integrate imaging with other clinical patient data into a predictive system that could help support clinical decisions and ultimately improve patient care.

Aims

Aim 1: Nodule Detection and Transfer Learning (COMPLETED)
Update of January 2021: Imagia actively worked on nodule detection (Jesson et al., 2017) and transfer learning (Varno et al., 2020) and is currently in the process of applying these fundamental works to medical imaging. However, validating such methods on NLST requires accurate ground truth at the pixel level in order to quantify our algorithm performances as most of the benchmarked metrics found in the literature work with segmentation masks. We are consolidating a very small sample of validation set annotated by human experts in order to close this research.

Aim 1: Nodule Detection using Weak Localisation and Auto Machine Learning Methods (NEW)
Since the beginning of the work on automatic detection, Imagia noticed the need for pixel-wise annotated datasets in order to appropriately train models. This approach is not efficient and scalable: in order to be accurate, annotators need to be experts of the domain, which is time-consuming, has a high cost, and cannot be easily ported to new organs/lesion type annotations. Imagia is expecting that recent advances in semi-supervised methods will help solve this problem. We developed multiple weak localisation techniques that were tested on NLST and LIDC, allowing us to pinpoint the most important parts of the image for the DCNN’s decision at a pixel level.
More recently, we developed models able to generate architectures tailored to a specific volumetric dataset and able to solve weak localisation tasks. Our models have been tested on the LIDC dataset and work on small volumes (48x48x48). As we are making more progress to adapt to bigger volumes, we believe that NLST will be the most appropriate dataset to test and publish our final results.

Aim 2: Extraction of Radiomic Features from Single Images (UPDATED from project 256)
Update of January 2021: Imagia recently ported its Deep Radiomic method to real-world use cases (Elkrief et al., 2020) with very encouraging results. Our next step is to integrate more imaging modalities, such as pathology scans, hence aim 4.

Aim 3: Extraction of Longitudinal Radiomics Features (UNCHANGED from project 256)

Aim 4: Radiomic Features from Multi-Modal Data (NEW from project 256)
Imagia has been recently exploring new potential signal sources coming through different modalities. Our experimentations in various public datasets empirically demonstrate that some information is contained in only specific sources and can complement and improve the signals contained in other sources (Sylvain et al., 2020). We aim at extending this idea to the combination of macro-level data (such as CT scans) to micro-level data (such as pathology slices). Research from different groups shows promising results on this avenue (Schmauch et al., 2020) and the available NLST data completed with pathology scans provide a unique set allowing to test this hypothesis.

Collaborators

Imagia Inc.