Skip to Main Content

An official website of the United States government

Principal Investigator
Md. Salman Karim Khan
Chittagong Medical College
Position Title
About this CDAS Project
PLCO (Learn more about this study)
Project ID
Initial CDAS Request Approval
Jan 17, 2024
Predicting Prostate cancer Tissue type with machine learning
Prostate cancer ranked as the second most common cancer among men, poses a substantial health challenge. According to the 2023 projections by the American Cancer Society, an estimated 288,300 new cases of prostate cancer are anticipated, with approximately 34,700 expected deaths.
The gravity of these statistics underscores the urgency to improve and refine our approach to prostate cancer diagnosis. In the pursuit of enhanced detection methods, this research leverages a combination of BERT (Bidirectional Encoder Representations from Transformers) embeddings and Support Vector Machine (SVM) classification. Such an integrative approach aims to address the complexities of prostate cancer diagnosis, offering a more comprehensive understanding of the disease.
The American Urological Association advocates for proactive discussions between men and their healthcare providers regarding prostate cancer testing. For men within the age bracket of 45 to 69, with an average risk, initiating a conversation with their doctor is recommended. However, heightened vigilance is advised for those at an increased risk, such as African-American men or individuals with a family history of cancer, who may benefit from discussions as early as 40-54 years of age.

Existing studies on prostate cancer diagnosis primarily focus on numerical data like PSA levels and biopsy features. While these provide valuable information, textual data in medical records has been largely underutilized. Patient narratives often contain rich descriptions of symptoms, medical history, and lifestyle factors that could potentially enhance diagnostic accuracy.
Motivated by this underutilized potential of textual data, we aimed to develop a novel approach for prostate cancer diagnosis by combining numerical clinical features with textual information from patient histories. By leveraging the strengths of both NLP and ML, we sought to build a model that could extract hidden insights from textual data and improve diagnostic accuracy beyond traditional methods.


Asma Sadia Khan