Integration of Preprocessing Methods and Deep Learning Models for Predicting Cancer Improvement or Deterioration Using Longitudinal Data

Principal Investigator

Name
Pilar Pazos-Lago

Degrees
Ph.D.

Institution
Old Dominion University

Position Title
Professor

Email
mpazosla@odu.edu

About this CDAS Project

Study
PLCO (Learn more about this study)

Project ID
PLCO-1654

Initial CDAS Request Approval
Aug 28, 2024

Title
Integration of Preprocessing Methods and Deep Learning Models for Predicting Cancer Improvement or Deterioration Using Longitudinal Data

Summary
The rapid advancement in machine learning (ML) and deep learning (DL) technologies has opened new avenues for improving predictive modeling in clinical oncology (Kourou, Exarchos et al. 2021, Chakraborty, Bhattacharya et al. 2023, Ghavidel and Pazos 2023). In this research, we aim to develop a robust framework for predicting the improvement or deterioration of cancer patients using longitudinal data. Our approach will involve integrating advanced preprocessing methods, such as resampling techniques to address data imbalance and feature selection methods to reduce dimensionality. Then, we will use deep learning models designed for time series and longitudinal data to build a dynamic predictive model of cancer. We propose the use of clustering techniques on staging at diagnosis and the use of patient biomarkers, coupled with severity status derived from the clustered staging variable, to enhance the predictive capability of our models. This integration will enable more accurate predictions of cancer outcomes, specifically focusing on the temporal progression of the disease and the likelihood of deterioration or improvement over time.
Longitudinal data, with its capacity to capture temporal changes in patient health, provides a valuable resource for developing dynamic predictive models (Cascarano, Mur-Petit et al. 2023). However, challenges such as data imbalance, high dimensionality, and the complexity of cancer progression demand sophisticated preprocessing and modeling approaches. Our study will leverage the unique strengths of both traditional statistical methods and modern deep learning architectures, like Long Short-Term Memory (LSTM) networks, to overcome the challenges of predictive analytics in healthcare data and offer more accurate predictions of cancer prognosis.

Aims

1. Develop and Evaluate Preprocessing Methods for Longitudinal Cancer Data:

• Implement and evaluate various resampling techniques, such as the Synthetic Minority Over-sampling Technique (SMOTE), to address class imbalance in longitudinal cancer data.
• Apply feature selection methods, such as Random Forest feature importance and recursive feature elimination, to reduce the dimensionality of high-complexity cancer datasets while preserving relevant predictive information.

2. Integrate Deep Learning Models with Preprocessed Data:

• Design and implement deep learning models, particularly Long Short-Term Memory (LSTM) networks and other time series models, to capture the temporal dynamics in longitudinal cancer data.
• Integrate the preprocessed data with these DL models to predict patient outcomes, specifically focusing on the deterioration or improvement of cancer and time-to-incident events.

3. Predict Cancer Deterioration and Improvement Using Longitudinal Data:

• Cluster Patients Based on Staging at Diagnosis: Develop a clustering algorithm to group patients based on their staging at diagnosis, which will serve as a critical variable in assessing the severity of their condition. This clustered staging variable will be used in conjunction with longitudinally measured biomarkers, such as PSA levels and CA125, to assess changes in cancer status over time.
• Incorporate Severity Status and Biomarkers in Predictive Models: Use the clustered staging variable and longitudinal biomarkers to train DL models like LSTM. These models will predict whether a patient’s cancer status is likely to deteriorate or improve over time, taking into account the severity of their condition at diagnosis and subsequent changes in biomarker levels.
• Develop a Comprehensive Prediction Framework: Integrate preprocessing techniques with the DL models to dynamically assess the risk of cancer deterioration or improvement over time. The framework will allow for continuous monitoring and prediction of patient outcomes, providing a more personalized and timely intervention.

4. Assess and Optimize Model Performance:

• Use advanced metrics, including time-dependent ROC curves, to evaluate the accuracy and reliability of the predictive models.
• Optimize model performance through hyperparameter tuning and cross-validation, ensuring robustness across different cancer types and patient demographics.

Collaborators

Arman Ghavidel,
Old Dominion Univerisity