Utilization of machine learning to create a predictive model for ovarian cancer based on pelvic ultrasound, CA-125, and clinical characteristics

Principal Investigator

Name
Graham Chapman

Degrees
M.D.

Institution
Case Western Reserve University

Position Title
Fellow

Email
graham.chapman@uhhospitals.org

About this CDAS Project

Study
PLCO (Learn more about this study)

Project ID
PLCO-1074

Initial CDAS Request Approval
Oct 18, 2022

Title
Utilization of machine learning to create a predictive model for ovarian cancer based on pelvic ultrasound, CA-125, and clinical characteristics

Summary
Ovarian cancer is the second most common gynecologic malignancy, and the most common cause of gynecologic cancer mortality in the United States. Attempts to identify methods of screening for ovarian cancer have proven ineffective while increasing the risk of unnecessary surgery and related morbidity, thus the diagnosis is generally centered around imaging in patients with suspicious symptoms or physical exam findings consistent with a pelvic mass. The primary imaging modality for ovarian masses is ultrasound, which is used in conjunction with serum tumor markers and clinical characteristics. In spite of this multimodal diagnostic approach, diagnostic uncertainty remains high which results in a high rate of false positivity. Given that the gold standard for diagnosis of an ovarian mass is through surgical management, the ability to discriminate between benign and cancerous ovarian masses is highly important.

Machine learning techniques are increasingly being used to assimilate complicated clinical information and generate predictive models that can improve the accuracy of clinical diagnosis. The Prostate, Lung, Colorectal, and Ovarian cancer Screening Trial collected data which is extremely pertinent to the diagnosis of ovarian cancer including detailed and standardized pelvic ultrasound measurements, serum CA-125 level, and patient information such as family history, contraception use, and medical comorbidities. We aim to utilize machine learning techniques to general a predictive model which can be used by clinicians to input readily available clinical information to provide an estimated risk of an ovarian mass on ultrasound to actually represent ovarian cancer.

Data will be divided into a training and testing cohort. Multiple machine learning algorithms will be generated utilizing the training cohort and subsequent internal validation will be conducted with data from the testing cohort. If a successful prediction model is generated, future studies may focus on external validation of the model utilizing hospital data followed by prospective study of this model. Ultimately, a successful prediction model should decrease the risk of false-positive results and decrease unnecessary surgical morbidity while maintaining high sensitivity for the diagnosis of ovarian cancer.

Aims

1. Utilize the PLCO dataset (training cohort) including pelvic ultrasound measurements, CA-125 levels, family history, medical history, and health-questionnaire results to determine which factors are associated with ovarian cancer in a patient with an ovarian mass.
-Multiple machine learning algorithms will be generated in order to help identify the most accurate model
-In an attempt to improve the simplicity and clinical utility of this model we will attempt sequentially remove predictive characteristics in order to minimize the number of clinical parameters that would be entered into the model without compromising accuracy

2. Internal validation of the model
-Utilize the PLCO dataset (testing cohort) to identify the most accurate machine learning algorithm
-Perform internal validation to provide AUC and other measures of accuracy

3. External validation
-Externally validate the model utilizing retrospective hospital-based data in conjunction with University Hospitals Cleveland Medical Center Institutional Review Board
-Further external validation utilizing prospective and potentially randomized methodology

Collaborators

David Sheyn MD, Soumya Ray PhD