Interpretable Deep Learning for Early Detection of Lung Disease Using the PLCO Dataset: Integrating Image Features with Large Language Models
In a novel extension, this project will also examine the role of textual context by incorporating structured clinical annotations or large language model (LLM)-derived interpretations. The aim is to compare and contrast the image-based CNN outputs with natural language summaries, allowing an investigation into whether textual and visual cues align, support, or contradict each other. While the primary emphasis is on lung pathology, additional pathology site data will be used to test how well the interpretability framework generalizes to different organ systems. This comparative analysis will provide insight into the robustness and transferability of methods beyond the lung domain, an important consideration for future applications of interpretable AI in healthcare.
This work will contribute to the growing field of explainable AI in healthcare, providing tools and frameworks to ensure that models not only perform well but also offer interpretable, clinically meaningful insights. The broader impact includes increased clinician trust in AI systems, improved diagnostic accuracy, and the potential for earlier intervention in lung-related diseases. This research is conducted as part of a doctoral dissertation project and will contribute to peer-reviewed publications and open-source tools for the medical AI community.
Aim 1: Develop a deep learning model for multi-class classification of lung-associated diseases using PLCO imaging data. We will train CNN models on chest imaging data to distinguish between disease classes present in the PLCO dataset. Model training will be accompanied by rigorous evaluation metrics (sensitivity, specificity, AUC) across different disease types. Preprocessing methods such as lung field segmentation and anatomical registration will be employed to minimize artifact-driven learning and focus on biologically meaningful regions.
Aim 2: Generate and evaluate interpretable outputs to visualize model decision-making.
To improve clinical relevance, we will generate class activation maps (CAMs), saliency maps, and other visualization outputs that indicate specific regions of an image that contribute most to model decisions. These maps will be compared across correctly and incorrectly classified images to understand when and why the model fails. We will also assess the consistency of highlighted features across different model architectures and training conditions. In addition, we will perform the analysis on other pathology sites from the PLCO dataset to assess whether the interpretability framework consistently identifies biologically relevant regions across different organ systems to test the generalizability of the methods beyond the lungs.
Aim 3: Integrate clinical text or LLM-generated interpretations to contextualize model predictions. We will explore how structured text data (radiology reports) and large language model (LLM)-generated summaries can support, augment, or contradict image-based AI predictions. We hypothesize that combining visual and textual modalities will enhance diagnostic performance and provide a mechanism to cross-validate model decisions. This aim will involve both quantitative comparison and qualitative analysis of how radiologists versus models interpret similar features.
N/A