Genetic and AI Multimodal Risk Model for Lung Cancer Screening Efficiency

Principal Investigator

Name
CHRISTOPHER AMOS

Degrees
Ph.D.

Institution
The Regents of the University of New Mexico for its Health Science Center, specifically for its Comprehensive Cancer Center

Position Title
Professor, University of New Mexico

Email
ciamos@salud.unm.edu

About this CDAS Project

Study
NLST (Learn more about this study)

Project ID
NLST-1466

Initial CDAS Request Approval
Jul 11, 2025

Title
Genetic and AI Multimodal Risk Model for Lung Cancer Screening Efficiency

Summary
The purpose of this study is to validate a comprehensive genetic risk and AI model for lung cancer screening efficiency and to analyze low-dose computed tomography (LDCT) images from the National Lung Screening Trial (NLST) to improve lung cancer diagnosis.
• Hypothesis 1: An aggregated genetic risk score can provide a significant improvement in risk stratification for patients at elevated risk for developing lung cancer based on their smoking behavior and age.
• Hypothesis 2: Integrating genetic marker data with smoking and demographic features will improve the accuracy of lung cancer risk assessment by at least 5%.
• Hypothesis 3: Analyzing LDCT images in conjunction with genetic risk models will further enhance the early detection and diagnosis of lung cancer, leading to better screening outcomes.
This is an observational study using de-identified data from the National Lung Screening Trial (NLST). The primary focus is on validating a genetic risk model for lung cancer screening efficiency and optimizing an AI algorithm for accurate lung cancer diagnosis applied to low-dose computed tomography (LDCT) scans. The study will analyze data from 1416 subjects, which includes 400 lung cancer patients and 1016 matched controls. This sample size is justified based on the need to achieve adequate power to detect improvements in risk prediction accuracy.
The data analysis plan involves several key steps to validate the genetic risk model and optimize the AI algorithm (Pulmo-Pilot) for lung cancer diagnosis:
1. Data Preprocessing:
Clean and preprocess the de-identified data from NLST, including genetic information, medical history, demographic data, and LDCT scans.
2. Genetic Risk Model Validation:
Use statistical methods such as logistic regression and Cox proportional hazards models to validate the predictive performance of genetic variants identified by ILCCO.
Integrate genetic markers with smoking behavior and demographic features to improve lung cancer risk prediction models.
3. AI Algorithm Optimization and Validation:
Develop and optimize AI algorithm for analyzing LDCT scans to diagnose lung diseases through detailed semantic features.
Validate the accuracy of the AI tools diagnostic capabilities using established task-specific models and fine-tune the AI tool for robust feature estimation.
The current estimate of AUC improvement from 76% to 81% when using multiple genetic markers selected from the pathways most strongly associated with lung cancer provides a power of 85% when applied to the 950 screen-detected cases (from the NLST, NELSON, Canadian, and UK screening studies) matched to 1900 controls without the added information from smoking behavior. We are assuming 5% significance and comparing the improvement in accuracy, treated as a binomial proportion. The cases and controls are all assumed to be independent. The improvement in area under the curve for risk prediction when adding in smoking behavior index has not yet been performed but we can expect a further improvement in AUC by adding this information.
The study is expected to be completed by April 27, 2030. This includes the time required for data analysis, validation of the genetic risk model, optimization of the AI algorithm, and preparation of report.

Aims

• Aim 1: Validate the predictive performance of genetic variants identified by the International Lung Cancer Consortium (ILCCO) in the NLST.
• Aim 2: Enhance lung cancer risk prediction models by integrating genetic markers with medical history and demographic information.
• Aim 3: Analyze LDCT images from the NLST using AI tools to improve the accuracy of lung cancer diagnosis and screening recommendations.
• Aim 4: Demonstrate the added value of a genetic risk model among high-risk populations that fit current screening criteria.

Collaborators

Avinash Sahu - University of New Mexico
Kushal Virupakshappa - University of New Mexico
Rayjean Hung - University of Toronto
Jinyoung Byun - University of New Mexico
Yafang Li - University of New Mexico
Younghun Han - University of New Mexico
Matthew Schabath - Moffitt Comprehensive Cancer Center