Skip to Main Content

An official website of the United States government

Principal Investigator
Padraig Cantillon-Murphy
University College Cork
Position Title
About this CDAS Project
NLST (Learn more about this study)
Project ID
Initial CDAS Request Approval
Jun 15, 2016
Lung cancer malignancy prediction tool using chest CT and open-source machine learning
The overall aim of our project is to create an open-source tool to estimate the likelihood of malignancy in pulmonary nodules identified on chest CT. In recent years, several systems to support radiologists in the detection and classification of pulmonary nodules have been developed. Such computer-aided de-tection (CAD) methods aim to to automatically identify, segment and classify nodules in CT images. However, this is not an easy task. The development of enhanced CAD methods for identification and classification of pulmonary nodules is of great interest to researchers and physicians, and advanced CAD systems can aid in the decision to proceed with a biopsy for a specific nodule

A complete CAD system should be able to detect the nodule, classify it based on its location and con-nection to other structures, and ideally determine whether it is malignant or not. Despite significant progress in this area, an optimal and widely used CAD scheme for lung nodule detection and classifica-tion which uses chest CT as a starting point is not available. This is due to the great diversity in shape, size, CT density, etc., of pulmonary nodules as well as co-existing pulmonary conditions. Moreover, connection to lung structures as well as possible noise patterns can greatly affect performance and ro-bustness of CAD systems.

In addition to CAD systems, several tools have been proposed to help radiologists determine probabil-ity of malignancy of pulmonary nodules based on different clinical and radiographic characteristics. As an example, Chest X-Ray (Gurney) requires to specify size, location, edge smoothness, growth rate, cavity wall thickness and calcification present in the nodule, along with clinical characteristics, such as age of the patient, smoking level, previous history of malignancy, and presence of hemoptysis, to de-termine probability of malignancy (McWilliams, 2013; Gurney, 1993a; Gurney et al., 1993b).

To evaluate our nodule malignancy estimation tool, each nodule is identified by the user via a simple graphical user interface. The algorithm then computes this nodule’s characteristics and describes size, spicularity, calcification, edge properties etc. From those properties, the tool automatically derives the likelihood of malignancy in this nodule based on existing Bayesian-based estimators (Gurney et al.). The software has been developed and tested on smaller local data sets with encouraging early results. The immediate goal of using the NLST datasets is to extrapolate these results within a larger and sufficiently-powered sample size. We plan to compare the follow up clinical diagnosis from the NLST datasets to enable benchmarking of the new tool against confirmed clinical diagnosis.

The second part of the project is developing a software tool which is capable of supporting a radiologist in identifying pulmonary nodules for input into our malignancy estimator (CAD). In order to do this, we require a large volume of CT studies with pre-identified nodules with appropriate clinical or radiological follow-up. These scans will be used to train and evaluate the underlying machine learning algorithms using standard methods of cross-validation used in other machine learning applications. The use of recently opened machine learning tools such as Google’s Tensorflow and Microsoft’s CNTK will be actively explored.

Specific requirements:

1. Pre-identified nodules, indicating their position within the CT (e.g., coordinate in combination with a slice number or something similar). This is an essential requirement and all annotations should be in a machine-readable format such as a SQL file, a comma separated file, an excel file or similar.

2. Appropriate follow-up with clinical ground truth these nodules i.e. one of the following; (i) pathological cell typing – from a biopsy or a resected surgical specimen, (ii) radiological diagnosis of malignancy – either in patients unfit for biopsy or in the case of metastatic disease; (iii) “benign” behavior of a nodule on follow up imaging (min. 2 years for solid nodule, min. 3 years for sub-solid nodule); or (iv) “benign” appearance of a lung nodule on 1-off imaging study – e.g. Hamartoma (fat), Granuloma (calcium).

3. Any additional nodule specific properties such as shape estimation, classification as solid / semi-solid / ground-glass, proximity to vessels or the pleura, are all highly valuable. All such information should be on a structured scale


Dr. Stephen Power (Department of Radiology, Cork University Hospital)

Dr. Kevin O’Regan (Department of Radiology, Cork University Hospital)