LDCT Pulmonary Nodule Assessment Model Based on Multi-Omics Approach

Principal Investigator

Name
Rayjean Hung

Degrees
Ph.D., M.S.

Institution
Lunenfeld-Tanenbaum Research Institute, Sinai Health System

Position Title
Senior Investigator

Email
rayjean.hung@lunenfeld.ca

About this CDAS Project

Study
NLST (Learn more about this study)

Project ID
NLST-349

Initial CDAS Request Approval
Sep 20, 2017

Title
LDCT Pulmonary Nodule Assessment Model Based on Multi-Omics Approach

Summary
Background: Lung cancer continues to be the most common cancer and reduction of lung cancer related death is a global priority. With the usage of LDCT scans rapidly growing following the NLST report, there is an urgent need to address the issues of what to do when a nodule is found. Our research team is in the unique position to conduct this much needed work as we have already established extensive resources and established collaborations with the 5 additional lung cancer CT screening programs in the Canada and Europe.

The overall goal of this project is to establish a comprehensive nodule assessment models for individuals with LDCT-detected non-calcified pulmonary nodules based on both 2 dimensional-based and 3D volume and radiomics-based probability model, incorporating patients' health information and molecular profile.

Method: In addition to the radiographic features that are routinely collected (such as type, location size, etc), we will include 3D volume measurements, volume doubling time (VDT), as well as quantitative image features for radiomics analysis. Radiomics is an emerging field for high-dimensional image data analysis, which focus on extracting quantitative variables from radiographic features for subsequent agnostic data mining. For best research efficiency, we will initially focus on 4 main groups of image features- intensity, size and shape, texture and wavelet. There are several families of machine learning algorithms that could be used for feature selection and prediction, such as neural network, support vector machine and more. For our project, we will use the two-step SURE screening procedure that consists of individual feature screening following by regularized elastic net regression, and the naïve Bayesian kernel machine regression method, which was shown to outperform the regularized regression by allowing nonlinear effects.

To avoid model overfitting, we will use cross-validation (CV) technique to determine the best inclusion threshold based on optimal signal-to-noise ratio, in addition to the bootstrap method (internal validation). Instead of the conventional m-fold CV (which was shown to often inflate the discrimination accuracy and sub-perform when evaluated in independent dataset), we will conduct cross-study validation to improve the generalizability of the learning algorithm. This approach was shown to account for the heterogeneity across the study and identify outlying studies to produce more generalizable model.

After the predictive performances of these models have been assessed, we will assess the added predictive performance of incorporating epidemiological information, pulmonary function (FEV1/FVC) and genomic data. To assess the added values, a complementary approach is risk stratification table analysis. We have approval to genotype NLST samples under nested case-control design and are now in the process of retrieving samples.

We will compare the model performance with the existing classification system such as Lung-RADS and conduct net benefit and decision curve analysis to assess their clinical usefulness. These models will be very valuable for the general public, clinicians, researchers and health administrators. It will increase the efficiency of lung cancer LDCT screening, and reduce unnecessary workup (and patient anxiety) for those who with detected nodules.

Aims

Background: Only a small fraction of nodules are lung cancers. The NLST reported that only 1 in 20 nodules detected by low-dose computed tomography (LDCT) scans are actually lung cancers. To address this issue, several clinical probability models were proposed (including two models from our team) to improve the nodule assessment, and the release of the Lung-RADS (Lung CT Screening Reporting and Data System) classification from the American College of Radiology were shown to effectively reduce the false-positive rates from 26.6% to 12.8% at the baseline scan and 21.8% to 5.3% after baseline scan with moderate reduction of sensitivity. However, none considered all potential predictors and external validation is yet to be established. Therefore, currently there is still a wide range of clinical protocols on how these patients with LDCT-detected pulmonary nodules should be managed and the diagnostic evaluation of suspicious abnormalities can range from watchful waiting and monitoring, needle biopsy to pulmonary resection.
Our overall objective is to establish a nodule assessment model based on integrative analysis of 3D quantitative features of image data, clinical information and molecular profile. Followings are our specific aims:

Aim 1: (a) To extract quantitative image features from the LDCT images obtained in NLST, focusing on 4 main groups of image features- intensity, size and shape, texture and wavelet - including approximately 100 features that were shown to have high stability
(b) To establish an integrated nodule assessment model based on quantitative image features and individual's clinical information such as lung function based on NLST data. The performance of the model will be assessed by model calibration and discrimination. Model calibration will be assessed by evaluating how much the slope of the calibration line (plotting the predicted vs the observed probabilities) deviates from the ideal of 1. The models' ability to discriminate will be assessed using the AUC or its equivalent, the concordance statistic.

Aim 2: To conduct cross-study validation using three additional independent screening programs with a total of 19,335 patients with pulmonary LDCT scans in Canada (PanCan, BCCA and IELCAP-Toronto), UK (UKLS) and the Netherlands (NELSON).

Aim 3: To assess the added value of genomic information in nodule assessment without additional request and cost (genotyping proposal of NLST samples was previously approved. PI: Amos and Hung). The added value will be assessed by risk stratification table analysis, in which clinically meaningful risk probability cut-points are assigned, and classification accuracy, calibration, and stratification capacity are assessed.

Aim 4: To compare the model effectiveness with the existing guidelines, such as Lung-RADS classification system and the guidelines from the American College for Chest Physicians by sensitivity, false positive result rate, positive predictive value and negative predictive values.

Collaborators

Harry deKoning, PhD, Erasmus MC
John Field, PhD, University of Liverpool
Stephen Lam, PhD, BC Cancer Agency
Christopher Amos, PhD, Geisel School of Medicine at Dartmouth College

Related Publications

Respiratory Function as a Prognostic Factor for Lung Cancer in Screening and General Populations.
Murison KR, Warkentin MT, Khodayari Moez E, Brhane Y, Liu G, Hung RJ
Ann Am Thorac Soc. 2025 Apr; Volume 22 (Issue 4): Pages 591-597 PUBMED
Assessing Lung Cancer Absolute Risk Trajectory Based on a Polygenic Risk Model.
Hung RJ, Warkentin MT, Brhane Y, Chatterjee N, Christiani DC, Landi MT, Caporaso NE, Liu G, Johansson M, Albanes D, Marchand LL, Tardon A, Rennert G, Bojesen SE, Chen C, Field JK, Kiemeney LA, Lazarus P, Zienolddiny S, Lam S, ...show more Andrew AS, Arnold SM, Aldrich MC, Bickeböller H, Risch A, Schabath MB, McKay JD, Brennan P, Amos CI
Cancer Res. 2021 Jan 20 PUBMED