An explainable liver disease screening-based machine learning model
Solution: We are developing a patient-centered, interpretable model using machine learning that combines clinical, demographic, biomarker, and lifestyle variables to diagnose liver fibrosis. An interpretable method could enhance the understanding of complex liver pathologies and ensure proper early personalized detection and monitoring.
Methods: We use a pipeline composed of data preprocessing (management of missing data and categorical data), feature selection, automatic tuning of parameters, and various machine learning models such as Random Forests, Gradient Boosting Trees (e.g., XGBoost, LightGBM, CatBoost), Neural Networks, and ensemble strategies. We apply SHAP (SHapley Additive exPlanations), Permutation Importance, and LIME (Local Interpretable Model-agnostic Explanations) to interpret features at both the local level (e.g., the importance of each feature per individual and clustering of individual liver fibrosis signatures for patient stratification into phenotypic classes) and the global level (e.g., understanding how disease development is related to comorbidities, age, sex, etc.).
Impact: A predictive tool for the early diagnosis of liver fibrosis that combines accessible data (biomarkers, lifestyle, and clinical information) and provides both local and global explainability of the variables involved in disease development.
This work is part of the LIVERAIM project in the generation of a personalized biomarker platform for early diagnosis of liver fibrosis: the LIVERAIM platform.
1) Robust Data management: standard data quality framework reproducible across different types of data for regulatory decision-making and/or health technology assessment, with a characterization of the data collection, management, and reporting and an empirical data quality validation.
2) Generation and validation of a liver fibrosis screening algorithm based on machine learning and advanced analytics leveraging demographical, clinical, and serum biomarker data for the personalized risk assessment of liver fibrosis in the general population.
3) Development of an explainable AI layer for the screening platform to inform which are the most informative variables/features/ biomarkers for the population and the individual “early liver fibrosis signature” for each patient.
4) Development and deployment of a scalable screening platform for the personalized risk assessment of liver fibrosis to be implemented in WP4 based on the validated machine learning algorithms.
Fundacio de Recerca Clinici Barcelona, Hospital Clinic de Barcelona, Universitat Autonoma de Barcelona, Odense University Hospital, Università Degli Studi Di Padova, University of Newcastle Upon Tyne, Fondation Cardiometabolisme Nutrition, Università Degli Studi Di Torino, Siemens, Roche, Sysmex, Nordic