Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Jason Wright
Degrees
M.D
Institution
Columbia University
Position Title
Sol Goldman Associate Professor, Chief Division of Gynecologic Oncology
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-515
Initial CDAS Request Approval
Aug 23, 2019
Title
Towards Precision Prevention: Applying advanced data analytic and machine learning techniques to improve ovarian cancer screening.
Summary
Ovarian cancer is the leading cause of death by gynecologic malignancy.  Early detection of localized disease increases 5 year mortaility by over 3-fold compared to late stage diagnosis. (5-year survival rates range from 92.5% for localized cancer to 28.9% for cancer with distant spread.) Early stage disease is often asymptomatic and there are no effective screening strategies for asymptomatic women who are not known to be at high risk for ovarian cancer.

More sensitive and specific screening strategies are needed. While research is ongoing into novel serologic tests and better diagnostic imaging, which have future potential to improve disease detection , there is the potential for analysis of available data using advanced data analytics and machine learning techniques that may lead to better screening strategies, which are testable in the near term.

We are proposing to analyze the PLCO study data employing a wider range of advanced data analytics and machine learning techniques than any previous studies in ovarian cancer detection. We will study relevant variables in the PLCO ovarian cancer data sets which include: comprehensive screening, screening abnormalities, diagnostic procedures, medical complications, and treatments. We will use regression and classification techniques including linear models; both ordinary linear regression as well as regression using regularization techniques such as Ridge, Lasso, and Elastic Net. We will test a wide variety of other techniques such as random forests, gradient boosting, neural networks, decision trees, and support vector machines.
Aims

AIMS.

(1) Improve our understanding of the risk factors associated with ovarian cancer, identify better screening procedures, and subsets of the general population who might derive greater benefits from screening: specific questions addressed will include:

• Can we identify is a subset of identifiable patients who may benefit from evaluation by TVUS and CA125 and were not picked up in the highest quality studies to date. (PLCO and UKCTOCS)?

• Can we identify an algorithm that can improve current methods to calculate\ risk of undetected ovarian cancer?

(2) To generate the best ensemble risk models from a variety of constituent models using a variety of techniques.

(3) We will also be employing advanced data preprocessing techniques to get the data in the right format where it can be properly analyzed by the techniques mentioned earlier. A major problem which often occurs is missing data. We have developed sophisticated data imputation techniques for handling missing data which go beyond what existing software packages offer. Our advanced preprocessing and data imputation techniques will also be key differentiators for enhancing the value of this research.

Collaborators

Arun Iyengar, Ph.D., Thomas J. Watson Research Center, Yorktown Heights, NY USA

Robert C. Knapp, MD, William H. Baker Professor of Gynecology, Emeritus, Harvard Medical Schoo