AI-Driven Non-Invasive Biomarker Prediction for Lung Cancer Using Self-Supervised Learning on Large-Scale CT Imaging Data
Lung and bronchus cancers remain the leading cause of cancer-related mortality in the U.S. While targeted therapies and immunotherapies improve outcomes, invasive biopsies are often required for treatment decisions, posing challenges for many patients. This project aims to develop an AI-based foundation model for non-invasive biomarker prediction using CT imaging. Leveraging large-scale CT datasets from the Penn Medicine Biobank (PMBB) and the National Lung Screening Trial (NLST), we will train a vision transformer (ViT) model with self-supervised learning (SSL) to extract imaging features predictive of cancer risk, survival, and molecular biomarkers. By correlating imaging features with tumor biology, we aim to advance patient stratification and personalized treatment strategies.
AI-based risk models for lung cancer currently require extensive annotations, limiting scalability. We propose an SSL-trained 3D ViT model to learn task-agnostic imaging representations from whole CT volumes, overcoming annotation constraints. To address computational challenges, we will integrate softmax-free or dilated attention mechanisms and masked autoencoding, allowing efficient full-volume modeling while reducing redundancy.
Specific Aim 1: Develop imaging signatures of cancer risk and survival.
We will construct a 3D ViT model for whole-CT analysis, trained on over 500,000 CT scans from PMBB and NLST. Efficient attention mechanisms will enable learning of risk-relevant imaging features without excessive computational burden.
Specific Aim 2: Predict molecular biomarkers from CT imaging.
We will integrate imaging features with genetic biomarkers to predict mutations, gene expression, and PD-L1 status in NSCLC, leveraging vision DL models with an aggregation transformer. This approach aims to reduce the need for invasive biopsies by identifying radiologic correlates of tumor biology.
All methods will be implemented as open-source Python packages. By applying our model to multiple datasets, we aim to establish a robust, AI-driven approach for non-invasive lung cancer biomarker profiling.
Prof. Daniel Truhn, RWTH Aachen University
Prof. Christos Davatzikos, University of Pennsylvania