Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Hyunsuk Yoo
Degrees
MD
Institution
Evom AI
Position Title
Chief Medical Officer
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCOI-1892
Initial CDAS Request Approval
Apr 23, 2025
Title
Development of a Multimodal AI Model for Predicting Future Lung Cancer Risk from Chest X-rays in Asymptomatic Individuals
Summary
Background
Lung cancer remains one of the leading causes of cancer-related mortality worldwide. Although screening efforts have largely focused on individuals with heavy smoking histories, recent epidemiological trends show a significant rise in lung cancer incidence among never-smokers and light-smokers. These cases often fall outside current screening guidelines, leading to delayed diagnoses and poorer prognoses. This shift underscores the pressing need for novel approaches that enable earlier and more individualized assessment of lung cancer risk in broader populations.
Traditional risk models based on demographic and clinical variables have demonstrated some utility in risk stratification. However, their predictive performance, particularly among low-risk individuals, remains suboptimal. Advances in artificial intelligence (AI) and machine learning, especially the emergence of multimodal foundation models, offer a promising solution to this challenge.
Recent studies have shown that incorporating image-based information can significantly improve performance for disease risk prediction tasks. One notable example is the Mirai model, which combines mammography images with clinical variables to predict future breast cancer risk. Inspired by these developments, we hypothesize that chest X-ray (CXR) contain valuable visual signals that, when combined with clinical data, can improve long-term lung cancer risk prediction in asymptomatic individuals.

Proposal
We propose to develop, train, and validate a image-based risk predicton model that predicts lung cancer risk occurring 1 to 6 years after a baseline CXR exam. This model will integrate patient-level clinical data and chest radiographs to estimate personalized, calibrated risk scores.

Methods
Model Development
The proposed AI model will input baseline CXR and available clinical variables described in the next section. The AI model consists of two modules: an image encoder and a risk prediction model. The risk prediction uses patient data and image features to produce multitask output: lung cancer risk, age, smoking history, and PLCOm2012 score. The lung cancer risk will be a continuous, calibrated probability score indicating the likelihood of lung cancer diagnosis within 1 to 6 years post-examination.
In the first stage of training, we will use age, smoking history, and PLCOm2012 as a proxy label to fully leverage all available information. In the later stages, we will incorporate actual clinical outcomes for fine-tuning and validation.

Input Data
The proposed model will accept two categories of input data:
- Clinical Variables: Including age, sex, BMI, smoking history (e.g., pack-years), family history of lung cancer, and history of lung-related diseases such as TB, COPD, ILD
- Image Data: CXRs

Data Sources
Model development will be supported by both public and private data sources. Public datasets include the NLST and PLCO data, which provide longitudinal data with linked clinical variables and imaging.
Additionally, we hope to collaborate with tertiary hospitals in South Korea for collecting retrospective cehst CT exams, clinical baseline characteristics, and follow-up information including biopsy results, surgical pathology, and survival data.

Added Clinical Value
The image-based risk prediction model will support early identification of high-risk individuals beyond traditional screening populations, ultimately improving patient outcomes and clinical decision-making.
Aims

We aim to develop and validate a multimodal artificial intelligence (AI) model that predicts future lung cancer risk in asymptomatic individuals, using baseline CXRs images and various types of clinical data. Leveraging both public datasets (e.g., NLST, PLCO) and retrospective cohorts from South Korean tertiary hospitals (e.g. SNUH), the proposed model will output a calibrated, continuous probability score estimating the likelihood of lung cancer development within 1 to 6 years post-baseline. We hope to publish our results to Radiology, Investigative Radiology, NEJM AI, and Nature Medicine, reflecting its potential to contribute meaningfully to the evolving landscape of AI-driven precision medicine.

Collaborators

Hyeonseob Nam, Evom AI
Hyunjae Lee, Evom AI