Explainable multi-label prediction of respiratory conditions in CTs with multi-task Dirichlet-prior VAE
disorder and lung cancer screenings). As a consequence, in practice multiple pathologies are often observed in a single exam. Co-occurring disease features can appear similarly or obscure and distract, often creating diagnostic uncertainty for both expert radiologists and deep learning systems. The learning of explainable representations of clinically complex medical images, where anatomical and extrinsic image acquisition-related features are disentangled from relevant disease features, and disease-specific features are disentangled from each other, is a crucial step in overcoming these challenges.
In this study, we propose to use variational autoencoders (VAEs) to learn disentangled multi-modal latent representations of CTs for explainable classification of multiple co-occurring labels/classes (i.e. multi-label classification). VAEs are able to learn compressed representations of data, in which observed variations of salient visual features are captured by a number of latent factors. Disentanglement is achieved when each factor corresponds uniquely to some meaningful composite part of the data. Disentangled representations offer a more explainable latent space, unlike their entangled counterparts. The matching of latent factors to interpretable variations in the data manifold, afforded by disentangled representations, makes them advantageous to downstream predictive tasks that utilise the learned latent space (e.g. classification, regression, clustering).
Popular approaches to learning disentangled representations with VAEs, includes, β-VAE, Factor-VAE and β-TCVAE. While these approaches improve disentanglement in Gaussian-prior VAEs, success is limited due to the dense and unimodal nature of a multivariate Gaussian prior. With this in mind, we propose to use a VAE with a Dirichlet prior, referred to as Dirichlet VAE (DirVAE). We hypothesise that a DirVAE will allow a multi-modal latent representation of CXRs to be learned through multi-peak sampling and will encourage latent disentanglement due to the sparse nature of the Dirichlet prior.
- We aim to develop a deep learning method for explainable multi-label classification of CT images.
- We aim to perform rigorous model evaluations, for both predictive capacity and explainability. We will conduct a subgroup analysis of model prediction performance. Explainability will be compared with well-established and widely used methods such as GradCAM/LIME.
- We aim to extend the VAE framework to incorporate multiple data types, such that important patient information (i.e., gender, ethnicity, age, etc.) is incorporated in the learned data representation, and therefore is considered in model prediction
Dr Nishant Ravikumar, University of Leeds
Dr Kieran Zucker, University of Leeds