An explainable liver disease screening-based machine learning model

Principal Investigator

Name
Angélica Atehortúa

Degrees
Ph.D

Institution
ISGlobal Barcelona Institute for Global Health

Position Title
Postdoctoral fellow

Email
angelica.atehortua@isglobal.org

About this CDAS Project

Study
PLCO (Learn more about this study)

Project ID
PLCO-1712

Initial CDAS Request Approval
Oct 17, 2024

Title
An explainable liver disease screening-based machine learning model

Summary
Unmet need: Liver diseases, including fibrosis, cirrhosis, viral hepatitis, and liver cancer, cause over two million deaths annually and account for 4% of all global deaths. They are often diagnosed late due to asymptomatic early phases. Early detection and treatment are urgently needed to prevent disease progression, where treatments are less effective.
Solution: We are developing a patient-centered, interpretable model using machine learning that combines clinical, demographic, biomarker, and lifestyle variables to diagnose liver fibrosis. An interpretable method could enhance the understanding of complex liver pathologies and ensure proper early personalized detection and monitoring.
Methods: We use a pipeline composed of data preprocessing (management of missing data and categorical data), feature selection, automatic tuning of parameters, and various machine learning models such as Random Forests, Gradient Boosting Trees (e.g., XGBoost, LightGBM, CatBoost), Neural Networks, and ensemble strategies. We apply SHAP (SHapley Additive exPlanations), Permutation Importance, and LIME (Local Interpretable Model-agnostic Explanations) to interpret features at both the local level (e.g., the importance of each feature per individual and clustering of individual liver fibrosis signatures for patient stratification into phenotypic classes) and the global level (e.g., understanding how disease development is related to comorbidities, age, sex, etc.).
Impact: A predictive tool for the early diagnosis of liver fibrosis that combines accessible data (biomarkers, lifestyle, and clinical information) and provides both local and global explainability of the variables involved in disease development.

This work is part of the LIVERAIM project in the generation of a personalized biomarker platform for early diagnosis of liver fibrosis: the LIVERAIM platform.

Aims

1) Robust Data management: standard data quality framework reproducible across different types of data for regulatory decision-making and/or health technology assessment, with a characterization of the data collection, management, and reporting and an empirical data quality validation.
2) Generation and validation of a liver fibrosis screening algorithm based on machine learning and advanced analytics leveraging demographical, clinical, and serum biomarker data for the personalized risk assessment of liver fibrosis in the general population.
3) Development of an explainable AI layer for the screening platform to inform which are the most informative variables/features/ biomarkers for the population and the individual “early liver fibrosis signature” for each patient.
4) Development and deployment of a scalable screening platform for the personalized risk assessment of liver fibrosis to be implemented in WP4 based on the validated machine learning algorithms.

Collaborators

Fundacio de Recerca Clinici Barcelona, Hospital Clinic de Barcelona, Universitat Autonoma de Barcelona, Odense University Hospital, Università Degli Studi Di Padova, University of Newcastle Upon Tyne, Fondation Cardiometabolisme Nutrition, Università Degli Studi Di Torino, Siemens, Roche, Sysmex, Nordic