Skip to Main Content

An official website of the United States government

Government Funding Lapse

Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit  cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

About this Publication
Title
Vision-language model-based semantic-guided imaging biomarker for lung nodule malignancy prediction.
Pubmed ID
41161557 (View this publication on the PubMed website)
Digital Object Identifier
Publication
J Biomed Inform. 2025 Oct 27; Pages 104947
Authors
Zhuang L, Tabatabaei SMH, Salehi-Rad R, Tran LM, Aberle DR, Prosper AE, Hsu W
Affiliations
  • Medical & Imaging Informatics, Department of Radiological Sciences, David Geffen School of Medicine at UCLA, Los Angeles, 90095, CA, USA.
  • Department of Medicine, Division of Pulmonology and Critical Care, David Geffen School of Medicine at UCLA, Los Angeles, 90095, CA, USA.
  • Medical & Imaging Informatics, Department of Radiological Sciences, David Geffen School of Medicine at UCLA, Los Angeles, 90095, CA, USA. Electronic address: whsu@mednet.ucla.edu.
Abstract

OBJECTIVE: Machine learning models have utilized semantic features, deep features, or both to assess lung nodule malignancy. However, their reliance on manual annotation during inference, limited interpretability, and sensitivity to imaging variations hinder their application in real-world clinical settings. Thus, this research aims to integrate semantic features derived from radiologists' assessments of nodules, guiding the model to learn clinically relevant, robust, and explainable imaging features for predicting lung cancer.

METHODS: We obtained 938 low-dose CT scans from the National Lung Screening Trial (NLST) with 1,261 nodules and semantic features. Additionally, the Lung Image Database Consortium dataset contains 1,018 CT scans, with 2,625 lesions annotated for nodule characteristics. Three external datasets were obtained from UCLA Health, the LUNGx Challenge, and the Duke Lung Cancer Screening. For imaging input, we obtained 2D nodule slices in nine directions from 50×50×50mm nodule crop. We converted structured semantic features into sentences using Gemini. We fine-tuned a pretrained Contrastive Language-Image Pretraining (CLIP) model with a parameter-efficient fine-tuning approach to align imaging and semantic text features and predict the one-year lung cancer diagnosis.

RESULTS: Our model outperformed the state-of-the-art (SOTA) models in the NLST test set with an AUROC of 0.901 and AUPRC of 0.776. It also showed robust results in external datasets. Using CLIP, we also obtained predictions on semantic features through zero-shot inference, such as nodule margin (AUROC: 0.807), nodule consistency (0.812), and pleural attachment (0.840).

CONCLUSION: By incorporating semantic features into the vision-language model, our approach surpasses the SOTA models in predicting lung cancer from CT scans collected from diverse clinical settings. It provides explainable outputs, aiding clinicians in comprehending the underlying meaning of model predictions. The code is available at https://github.com/luotingzhuang/CLIP_nodule.

Related CDAS Studies