Treatment-Conditioned World Models for Lung Cancer Progression Prediction Using Longitudinal Low-Dose CT Imaging

Principal Investigator

Name
Luca Pegolotti

Degrees
Ph.D.

Institution
Nucleo Research, Inc.

Position Title
Senior researcher / CTO

Email
luca@nucleoresearch.com

About this CDAS Project

Study
NLST (Learn more about this study)

Project ID
NLST-1502

Initial CDAS Request Approval
Apr 20, 2026

Title
Treatment-Conditioned World Models for Lung Cancer Progression Prediction Using Longitudinal Low-Dose CT Imaging

Summary
We propose to build and evaluate a treatment-conditioned world model that predicts lung cancer progression from longitudinal low-dose CT scans. A world model is a learned simulator: given a patient's current CT scan and a treatment plan, it predicts how the patient's anatomy will evolve over time. Our approach works in a learned latent space, predicting how abstract representations of patient anatomy change under different therapies rather than generating synthetic images. This sidesteps the hallucination problem of pixel-level generative models, which is particularly dangerous in medical imaging.

The system has three components. A frozen pretrained 3D CT encoder (Stanford's Merlin) compresses each volume into a dense 2048-dimensional embedding. A Recurrent State Space Model (RSSM), adapted from DreamerV3, learns to predict how these embeddings evolve between timepoints, conditioned on structured treatment actions encoding therapy type, procedure, and timing. Prediction heads then decode the model's latent state into treatment response categories, survival estimates, and tumor burden changes.

This opens up two capabilities that do not currently exist in clinical practice: simulating disease trajectories under a planned treatment, and comparing alternative treatments from a single baseline scan to predict which therapy is most likely to benefit a specific patient given their anatomy.

We have built a proof-of-concept using the publicly available Anti-PD-1_Lung dataset (TCIA), processing 86 CT series from 46 patients through the full pipeline: DICOM ingestion, NIfTI conversion, Merlin embedding extraction, and RSSM training. The model learns treatment-conditioned dynamics and produces different predictions for treatment versus no-treatment scenarios. Same-patient longitudinal similarity is significantly higher than cross-patient similarity in the embedding space (cosine similarity 0.94 vs. 0.89, p < 1e-15), confirming that the encoder preserves patient-specific signal useful for longitudinal modeling.

With only 38 training pairs, however, the model cannot outperform a no-change baseline, and the compressed embedding space limits the signal-to-noise ratio for detecting treatment effects. These are data-scale problems. NLST offers two complementary sources of training signal. Approximately 1,900 treated lung cancer patients (across both screening arms) with roughly 4,600 treatment procedure records provide the treatment-conditioned supervision central to our model, an 80-fold increase over our proof-of-concept. Additionally, over 70,000 longitudinal CT examinations from non-cancer participants provide a large corpus for learning baseline anatomical dynamics under no treatment, establishing the background behavior the model must separate treatment effects from. The treatment diversity (surgery, radiation, chemotherapy), 12-year survival follow-up, and detailed nodule measurements in NLST make it the right dataset for training a dynamics model that generalizes across treatment modalities and for testing whether richer representations improve longitudinal sensitivity.

Our goal is a decision support tool that lets oncologists simulate personalized treatment outcomes before committing to a therapy. Results will be published in peer-reviewed journals such as Nature Medicine, Radiology: AI, or Medical Image Analysis.

Aims

* Aim 1: Train a treatment-conditioned latent dynamics model on NLST longitudinal CT data and evaluate whether large-scale training overcomes the data bottleneck identified in preliminary experiments.

We will use a two-stage training strategy that takes advantage of NLST's structure. First, we extract Merlin embeddings for all CT-arm examinations (approximately 75,000 CTs across three screening rounds for ~26,000 participants) and pretrain the RSSM on longitudinal transitions under a null treatment action. This teaches the model baseline anatomical dynamics, how lungs change with aging and continued smoking, using the large non-cancer cohort. Second, we fine-tune the model on the approximately 1,900 treated lung cancer patients (~4,600 treatment procedures spanning surgery, radiation, chemotherapy, and combinations), conditioning on treatment actions from the NLST Treatment dataset. The model first learns what normal progression looks like, then learns how treatment deflects that trajectory. Our proof-of-concept on 38 pairs could not beat a no-change baseline; this aim tests whether NLST's 80-fold increase in treated patients changes that. We will also test whether intermediate encoder features improve sensitivity to treatment-induced changes. Success criterion: predicted post-treatment embeddings achieve lower mean squared error than the no-change baseline on held-out patients.

* Aim 2: Validate the model's predicted future states against observed clinical outcomes.

For the approximately 2,050 NLST participants diagnosed with lung cancer (of whom roughly 1,900 received treatment), we will test whether latent states predicted by the dynamics model carry clinically meaningful information. We will train prediction heads on the model's latent features to predict three outcomes: treatment response categories derived from longitudinal nodule size changes in the Spiral CT Abnormalities and Comparison Read datasets, using standard RECIST thresholds; overall survival from the Cause of Death dataset with 12-year follow-up, modeled with Cox proportional hazards; and tumor burden trajectory from aggregate nodule measurements. The question is whether the dynamics model captures prognostic features that a static, single-timepoint model would miss. Success criterion: concordance index exceeding 0.65 for survival and above-chance classification for response categories.

* Aim 3: Validate counterfactual treatment comparison using propensity-score matched cohorts.

The central question in treatment planning is which therapy will work best for the specific patient, not the average patient. Current tools answer this with population-level statistics; the world model addresses it through counterfactual reasoning, predicting outcomes under treatments the patient did not actually receive. To validate this, we will construct propensity-score matched cohorts of patients who received different treatments but had similar baseline characteristics (age, sex, smoking history, tumor stage, nodule size). For each patient, we predict the outcome under the treatment they did not receive and compare against the actual outcome of their matched counterpart who did. This tests whether the model captures real treatment effects or just artifacts of the training process. Success criterion: predicted treatment effect direction agrees with observed subgroup differences in at least 70% of matched comparisons.

Collaborators

Luca Pegolotti Nucleo Research, Inc
Angelica Iacovelli Nucleo Research, Inc
Diego Palumbo San Raffaele Hospital