Skip to Main Content
Principal Investigator
Saeed Hassanpour
Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth
Position Title
Associate Professor
About this CDAS Project
PLCO (Learn more about this study)
Project ID
Initial CDAS Request Approval
May 2, 2023
Multimodal Machine Learning for Precision Lung Cancer Management
Cancer is a major cause of death in the US, ranking second only to cardiovascular diseases. Among all types of cancer, lung cancer is particularly concerning as it is the second most prevalent and the leading cause of cancer death globally, leading to 1.6 million deaths per year or 25% of total cancer deaths. Despite the development of numerous new treatments each year, the effectiveness of these treatments for individual lung cancer patients can be highly unpredictable due to tumor heterogeneity and differences in patient genomics. This uncertainty calls for determination of the most appropriate treatment strategies for optimal treatment effect. However, reviewing the large volume of data in patient health profile to infer indicating patterns for patient responses is highly time-consuming and not practical. To address these challenges, we propose the development of a multimodal machine learning pipeline that can analyze the various types of data from patient health profiles and predict patient resistance to medications among patients. Previous studies (Chen et al., 2021, 2022; Laury et al., 2021; Singh et al., 2018; Wang et al., 2020) have been carried out to build models with patient data like genomics or pathology images to predict patient prognosis. However, few of them looked into patients’ performance and development of resistance under specific treatments and thus provided little insight for the choice of treatments. In this project, we aim to develop a multimodal machine learning pipeline that will be able to take the input of multiple data types (e.g., image, numerical data) from patient clinical profile and integrate them with multiple machine learning algorithms for prediction of patient resistance under selected treatments. This pipeline would combine multiple sources of information to make more informed predictions about patient outcomes, allowing for the development of more effective, personalized treatment plans. Through the automatic analysis of patient health data and providing more accurate predictions, this pipeline has the potential to transform the way that lung cancer, and potentially other types of cancer, is treated and improve patient outcomes.

Aim 1: Develop, optimize, and evaluate a multi-modal machine learning pipeline for predicting patient prognosis who are under treatments by integrating information from different clinical data sources. In aim 1, we propose to build unimodal machine learning models on genomics data and pathology images and then combine them into a multimodal machine learning model with multi-modal fusion layers. Patient data including pathology images, gene mutations, demographic information, findings in radiology reports and oncology notes, and other clinical information will be used to train and validate this pipeline. We will apply an existing image analysis method for analyzing pathology images and integrate it into the pipeline. In addition, we plan to improve the existing methods by adjusting the pipeline based on evaluation metrics like c-index. The initial dataset used for analysis will be lung cancer patients who underwent targeted treatment with tyrosine kinase inhibitors (TKI). We will extend these approaches to cohorts of lung cancer patients receiving other treatments like immunotherapy based on data availability.
Aim 2: To test for statistical associations between patterns in patient clinical profile and patient prognosis under selected treatments for cancer patients. In this aim, We propose to use statistical tests and regression methods to test whether there are factors associated with patient outcomes (e.g., resistance and response to treatment, overall survival) while accounting for possible confounders. Similar to data used in aim 1, the initial cohort will be lung cancer patients who used TKI medications with the plan for extending to other treatments.
Aim 3: Build and validate a multi-modal machine learning pipeline for application on treatments with similar drug resistance variabilities. In aim 3, the pipeline developed for the treatment of interest in aim 2 will be adapted for application on other cancer treatments to demonstrate the generalizability of the machine learning framework developed in this project.
With regard to the expected outcomes, possible associations between patient prognosis and genetic mutation patterns, pathological and radiological findings as well as other clinical information will be tested and identified for each particular treatment. This project will also provide a method for integrating information from different modalities and data sources in patient clinical profile to help tailor treatment plans. These outcomes are expected to assist decision making in cancer management by providing recommendations on treatment choices, and thus have a positive impact on precision care for cancer patients and their outcomes. Finally, these results are expected to have potential generalizability to various cancer types and other diseases and their treatments, which will be explored in future work.


Saeed Hassanpour, Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth
Shuai Jiang, Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth
Arief A. Suriawinata, Department of Pathology and Laboratory Medicine, Dartmouth-Hitchcock Medical Center
Liesbeth Hondelink, Leiden University Medical Center
Faraz Farhadi, Geisel School of Medicine at Dartmouth