Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Oliver Díaz
Degrees
Ph.D.
Institution
Universitat de Barcelona
Position Title
Associate Professor
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-1742
Initial CDAS Request Approval
Nov 12, 2024
Title
Exploring Multimodal Factors Influencing Breast Cancer Risk: A Data-Driven Approach
Summary
The proposed research aims to investigate the relationship between various patient characteristics, including age, ethnicity, medication, lifestyle factors, and their correlation with breast cancer risk using the PCLO Breast dataset. By employing advanced data science techniques and machine learning algorithms, this study seeks to uncover new patterns and potential biomarkers that could enhance our understanding of breast cancer development and improve risk assessment.

Breast cancer is a complex and multifactorial disease, with numerous factors contributing to its incidence and progression. While significant progress has been made in early detection and treatment, there is a pressing need to further elucidate the underlying mechanisms and identify novel risk factors that can aid in the development of more personalized and effective preventive strategies.

The PCLO Breast dataset provides a rich source of information that has the potential to unveil new insights into the complex interplay between patient demographics, clinical characteristics, and breast cancer risk. By leveraging this comprehensive dataset, the proposed research will explore the correlation between a wide range of patient-level variables, such as age, ethnicity, medication history, lifestyle factors (e.g., physical activity, diet, smoking), and the risk of developing breast cancer.

Through the application of advanced data science and machine learning techniques, the research team aims to identify patterns, associations, and potential biomarkers that can contribute to a more nuanced understanding of breast cancer risk. This knowledge can lead to the development of improved risk assessment tools, personalized screening and intervention strategies, and ultimately, enhanced patient outcomes.

The findings of this study will not only advance the scientific understanding of breast cancer but also have the potential to inform clinical practice and guide the development of targeted prevention and early detection programs. By uncovering new insights into the multifaceted nature of breast cancer risk, this project can pave the way for more effective and tailored approaches to breast cancer management and ultimately improve the quality of life for individuals at risk.
Aims

1.Comprehensive Data Analysis:
-Conduct a thorough exploration and preprocessing of the PCLO Breast dataset to ensure data quality and integrity.
-Perform extensive data visualizations and statistical analyses to identify significant associations between patient characteristics and breast cancer risk.
-Assess the interplay between demographic factors, clinical variables, and lifestyle-related attributes in the context of breast cancer development.

2.Machine Learning Model Development:
-Develop and train advanced machine learning models, such as logistic regression, decision trees, random forests, and neural networks, to predict breast cancer risk based on the identified risk factors.
-Evaluate the performance of these models using appropriate metrics, including accuracy, precision, recall, and F1-score, to ensure robust and reliable risk assessment.
-Explore the use of ensemble techniques and feature engineering to further enhance the predictive capabilities of the models.

3.Biomarker Discovery:
-Employ feature importance and selection techniques to identify the most influential variables contributing to breast cancer risk.
-Investigate the potential of these variables as novel biomarkers that can be used for early detection, risk stratification, and targeted interventions.
-Validate the identified biomarkers through additional statistical analyses and, if feasible, through external validation using independent datasets.

4.Model Interpretation and Explainability:
-Implement interpretable machine learning methods, such as SHAP (Shapley Additive Explanations), to provide insights into the underlying relationships between the input variables and the predicted breast cancer risk.
-Communicate the model's decision-making process in a transparent and understandable manner, enabling better clinical interpretation and facilitating the integration of the developed models into healthcare decision-making processes.

5.Translational Research and Clinical Implications:
-Explore the potential clinical applications of the developed risk assessment models and identified biomarkers.

Collaborators

N/A