Predicting Prostate cancer Tissue type with machine learning
The gravity of these statistics underscores the urgency to improve and refine our approach to prostate cancer diagnosis. In the pursuit of enhanced detection methods, this research leverages a combination of BERT (Bidirectional Encoder Representations from Transformers) embeddings and Support Vector Machine (SVM) classification. Such an integrative approach aims to address the complexities of prostate cancer diagnosis, offering a more comprehensive understanding of the disease.
The American Urological Association advocates for proactive discussions between men and their healthcare providers regarding prostate cancer testing. For men within the age bracket of 45 to 69, with an average risk, initiating a conversation with their doctor is recommended. However, heightened vigilance is advised for those at an increased risk, such as African-American men or individuals with a family history of cancer, who may benefit from discussions as early as 40-54 years of age.
Existing studies on prostate cancer diagnosis primarily focus on numerical data like PSA levels and biopsy features. While these provide valuable information, textual data in medical records has been largely underutilized. Patient narratives often contain rich descriptions of symptoms, medical history, and lifestyle factors that could potentially enhance diagnostic accuracy.
Motivated by this underutilized potential of textual data, we aimed to develop a novel approach for prostate cancer diagnosis by combining numerical clinical features with textual information from patient histories. By leveraging the strengths of both NLP and ML, we sought to build a model that could extract hidden insights from textual data and improve diagnostic accuracy beyond traditional methods.
Asma Sadia Khan