Skip to Main Content

An official website of the United States government

Government Funding Lapse

Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit  cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

About this Publication
Title
Decoding ancestry-specific genetic risk: interpretable deep feature selection reveals prostate cancer SNP disparities in diverse populations.
Pubmed ID
41024066 (View this publication on the PubMed website)
Digital Object Identifier
Publication
BioData Min. 2025 Sep 29; Volume 18 (Issue 1): Pages 66
Authors
Chen Z, Lao Z, Lu Y, Zhang W, Edwards A, Zhang K
Affiliations
  • School of Computing, Southern Illinois University, Carbondale, IL, 62901, USA.
  • Mechanical Engineering and Applied Mechanics, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, 19104, USA.
  • Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70118, USA.
  • Department of Computer Science, Xavier University of Louisiana, New Orleans, LA, 70125, USA.
  • Department of Computer Science, Xavier University of Louisiana, New Orleans, LA, 70125, USA. kzhang@xula.edu.
Abstract

BACKGROUND: The clinical potential of single nucleotide polymorphisms (SNPs) in prostate cancer (PCa) diagnosis has been extensively explored using conventional statistical and machine learning approaches. However, the predictive power and interpretability of these methods remain inadequate for clinical translation, primarily due to limited generalization across high-dimensional SNP datasets. This study addresses the contested diagnostic utility of SNPs by integrating interpretable feature selection with deep learning to enhance both classification performance and biological relevance.

METHODS: We propose an interpretable deep feature selection framework designed to enhance both the classification performance and biological relevance of SNP markers in distinguishing between benign and malignant prostate cancer samples. This study specifically investigates the debated diagnostic value of SNPs in PCa classification by integrating feature selection with deep learning to uncover actionable insights. Specifically, our framework comprises four key components: (1) Heuristic feature reduction, which eliminates irrelevant SNPs during gradient computation for training deep neural networks (DNNs); (2) Iterative SNP subset optimization, aiming at maximizing classification AUC during model training; (3) Gradient variance minimization, mitigating instability caused by limited sample sizes; and (4) Nonlinear interaction modeling, which extracts high-level SNP interactions through hierarchical representations.

RESULTS: Evaluated on the PLCO, BPC3, and MEC-AA datasets, our method achieved mean AUC scores of 0.747, 0.751, and 0.559, respectively, demonstrating statistically significant improvements (p < 0.05, a paired t-test) over existing approaches. Notably, the lower AUC for MEC-AA may reflect inherent population-specific complexities, as this dataset focuses on African American men, a group historically underrepresented in genomic studies. For interpretability, our framework identified 345, 373, and 437 consensus SNP markers across the PLCO, BPC3, and MEC-AA cohorts, respectively. Key SNPs were further validated against prior research on PCa racial disparities: rs10086908 and rs2273669 (PLCO); rs12284087, rs902774, rs9364554, and rs7611694 (BPC3); and rs3123078 and rs1447295 (MEC-AA) exhibited strong concordance with established loci linked to ethnic-specific risk profiles. For instance, rs1447295 on chromosome 8q24, recurrently associated with African ancestry, underscores the method's ability to recover population-relevant variants.

CONCLUSION: By synergizing interpretable feature selection with deep learning, this work advances the translation of SNP-based biomarkers into clinically actionable tools while clarifying their contested diagnostic role in PCa.

Related CDAS Studies
Related CDAS Projects