Lung nodule characterisation with deep learning

Principal Investigator

Name
Julia Schnabel

Degrees
Ph.D., M.Sc.

Institution
Kings College London

Position Title
Professor of Computational Imaging

Email
julia.schnabel@kcl.ac.uk

About this CDAS Project

Study
NLST (Learn more about this study)

Project ID
NLST-454

Initial CDAS Request Approval
Nov 19, 2018

Title
Lung nodule characterisation with deep learning

Summary
This project aims to improve the identification and characterization of lung nodules in low-dose CT. One particular aspect that we aim to investigate is early detection and prediction of nodule progression, using baseline and follow-up lung CT scans for training novel machine learning algorithms. To gain insight on early detection we aim to develop and train deep learning network architectures based on the radiologist annotations, particularly the annotations about new observed abnormalities that were not present missed in the first screening scan.

Another important aspect of the project is to analyze nodule characteristics for more differentiated diagnosis and patient management. This includes identifying nodules subcategories, or clusters, that might be challenging to handle and where there is uncertainty about their diagnosis. Such subcategories might not correspond to the commonly used set of categories employed to describe nodules. This information could provide insight in the challenging task of nodules characterization and patient handling where human decision making still plays a significant role. Including this uncertainty information in a state-of-the-art machine learning model would get it closer to current evaluation procedures where uncertainty and judgment are present and should be described. To achieve these goals, we are collaborating with our clinical partners at Guy’s and St. Thomas’ hospital who have provided dedicated, labeled datasets. Unfortunately, these are not sufficiently large to train state-of-the-art machine learning models. We therefore hope that by augmenting our data with the NLST dataset, we will be able to achieve these goals.

Finally, we are planning to explore new ideas and models for lung nodule characterization using generative models and semi-supervised learning algorithms. Generative models (e.g. generative adversarial networks (GANs)) have recently obtained impressive results in medical imaging. Semi-supervised methods on the other hand, combine a mix of labeled and unlabeled datasets. Their general idea is to employ unlabeled datasets to define relevant features using e.g. unsupervised learning methods, and then improve on these features using the labeled dataset and supervised methods. In a similar way to our previous proposals, we intend to use a combination of the NLST dataset and our smaller datasets to characterize the nodules for our own cases.

Aims

We intend to use the NLST database to pre-train our machine learning models (based on deep neural networks), and then further adapt them using transfer learning to our our own, much smaller databases, to achieve the following aims:

1. to identify characteristics related to nodule progression in a dataset composed of follow-up scans, by using the NLST radiologist observations from the three annual screening exams (T0, T1, T2), upon which a deep learning model is trained so that it captures nodule / clinically significant abnormalities progression and how it relates to the radiologist follow-up procedures recommendations, and then finally tested on our own follow-up scans dataset.

2. to train a deep learning model based on the radiologist annotations from the second and third scans regarding new abnormalities that were missed in previous scans, to test whether our model identifies subtle patterns related to early stages of the nodules that were not identified by radiologists during the first evaluations, which might offer valuable information that can be used on screening tests, which we will further test on our on dataset of follow-up scans.

3. to apply such models to a recently collected dataset where a large number of radiologists have evaluated a set of nodules, in order to identify the key aspects of the nodules that may lead to diverging diagnosis. We intend to compare this uncertainty in these data via deep learning models against nodule states in the NLST database.

4. to evaluate state-of-the-art architectures of generative models and semi-supervised approaches that combine labeled and unlabeled data. We would like to use a deep learning tool known as GANs to model nodule progression and identify a threshold where a nodule can be detected by clinicians.

5. To establish whether semi-supervised learning can help to combine unlabeled and labeled datasets, by using the intrinsic information from the NLST dataset to identify features that can help in improving results on our much smaller, but heavily annotated dataset.

Collaborators

Julia Schnabel (King's College London)
Vasileios Baltatzis (King's College London)
Ben Glocker (Imperial College London)
Loic Le Folfoc (Imperial College London)
Arjun Nair (University College London)