Using the NLST database to develop a lung cancer malignancy prediction tool from CT and WSI data
A complete CAD system should be able to detect the nodule, classify it based on its location and connection to other structures, and ideally determine whether it is malignant or not. Despite significant progress in this area, an optimal and widely used CAD scheme for lung nodule detection and classification which uses chest CT as a starting point is not available. This is due to the great diversity in shape, size, CT density, etc., of pulmonary nodules as well as co-existing pulmonary conditions. Moreover, connection to lung structures as well as noise patterns can affect performance and robustness of CAD systems.
In addition to CAD systems, several tools have been proposed to help radiologists determine probability of malignancy of pulmonary nodules based on different clinical and radiographic characteristics. As an example, Chest X-Ray (Gurney) requires specification of size, location, edge smoothness, growth rate, cavity wall thickness and calcification present in the nodule, along with clinical characteristics, such as age of the patient, smoking level, previous history of malignancy, and presence of haemoptysis, to determine probability of malignancy (McWilliams, 2013; Gurney, 1993a; Gurney et al., 1993b).
To evaluate our nodule malignancy estimation tool, each nodule is identified by the user via a simple graphical user interface. The algorithm then computes this nodule’s characteristics and describes size, specularity, calcification, edge properties etc. From those properties, the tool automatically derives the likelihood of malignancy in this nodule. The software has been developed and tested on smaller local data sets with encouraging early results. The immediate goal of using the NLST datasets is to extrapolate these results within a larger and sufficiently-powered sample size. We plan to compare the follow up clinical diagnosis from the NLST datasets to enable benchmarking of the new tool against confirmed clinical diagnosis.
The second part of the project is developing a software tool which is capable of supporting a radiologist in identifying pulmonary nodules for input into our malignancy estimator (CAD). In order to do this, we require a large volume of CT studies with pre-identified nodules with appropriate clinical or radiological follow-up. These scans will be used to train and evaluate the underlying machine learning algorithms using standard methods of cross-validation used in other machine learning applications.
Dr. Håkon Olav Leira (Department of Lung medicine, St. Olavs University Hospital)
Dr. Hanne Sorger (Department of Lung Medicine, Levanger Hospital)
Dr. Tore Amundsen (Department of Lung medicine, St. Olavs University Hospital)
Dr. Arne Kildahl-Andersen (Department of Lung medicine, St. Olavs University Hospital)
Dr. Arve Jørgensen (Department of Radiology, St. Olavs University Hospital)