Stratifying Cancer Risks for Individuals Based on Deep Learning of PLCO Data
Principal Investigator
Name
Jun Deng
Degrees
Ph.D.
Institution
Yale University
Position Title
Professor
Email
jun.deng@yale.edu
About this CDAS Project
Study
PLCO
(Learn more about this study)
Project ID
PLCO-392
Initial CDAS Request Approval
Aug 10, 2018
Title
Stratifying Cancer Risks for Individuals Based on Deep Learning of PLCO Data
Summary
Cancer is a worldwide public health issue with an estimated 21.7 million new cases and 13 million cancer deaths globally by 2030. Despite a tremendous amount of money and resources have been spent on cancer care over the past 50 years, there are estimated 1,735,350 new cancer cases and 609,640 cancer deaths in the United States alone in 2018. Besides the disparities in resource, environment, diet, and lifestyle, one of the major reasons for high cancer mortality rates is the failure to diagnose cancers at early stages, missing perhaps the best window of opportunity for intervention and cure. Recently artificial intelligence has shown great promise in leveraging big health data to diagnose and stage diseases, reduce cost, improve healthcare and patient outcome. While access to big data stored in silo-like electronic medical record systems has been very restrictive, the large amount of health data collected in some nation-wide multi-centered studies, such as the National Health Interview Survey (NHIS) and the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO), is readily available. Recently, we have demonstrated that deep neural networks can be used for cancer prediction with acceptable sensitivity and specificity based only on personal health data from NHIS, without family history, cancer staging or follow-up information. The advantages of this approach lie in its predicting power, cost-effectiveness, non-invasiveness, real-time turnaround, and easy-to-obtain health data at individual level. Built upon our preliminary study, we hypothesize that the large amount of data collected in the PLCO trial, including not only personal health data but also the staging information of 18 types of cancer (i.e., Prostate, Lung, Colorectal, Ovarian, Biliary, Bladder, Breast, Endometrial, Glioma, Head & Neck, Hematopoietic, Liver, Male Breast, Melanoma, Pancreas, Renal, Thyroid, and Upper GI), long-term follow-up, family history, socio-behavior, lifestyle, dietary data, serum vitamin D, and cancer-specific screening results such as colonoscopy, PSA, and ovarian pathology, can be used to train and validate a deep learning algorithm to stratify cancer risks for individuals. Hence, the goal of this project is to develop a risk predictor and classifier based on deep learning of PLCO data for better risk stratification and more precise intervention, hence improving outcomes for high-risk people while minimizing overtreatment of low-risk disease. If implemented successfully, we envision a risk predictor and classifier tool embedded in the clinical workflow for effective risk stratification and prevention for millions of people worldwide, hence reducing cancer mortality in the long run. With the daunting cost of healthcare and nontrivial rate of cancer mortality, it is critical to stratify an individual’s cancer risks prior to its onset to maximize outcomes for people with aggressive disease and minimize unnecessary treatment of indolent disease.
Aims
1. Develop a deep learning model based on PLCO data for dynamic tracking of individual cancer risks.
2. Identify core factors highly correlated with individual cancer risks for effective cancer prevention.
3. Embed the developed model in an electronic medical record system and assess its efficacy.
Collaborators
Roy Decker, M.D., Yale University, New Haven, CT
James Duncan, Ph.D., Yale University, New Haven, CT
James Farrell, M.D., Yale University, New Haven, CT
Cary Gross, M.D., Yale University, New Haven, CT
Gloria Huang, M.D., Yale University, New Haven, CT
Melinda Irwin, Ph.D., Yale University, New Haven, CT
Kimberly Johung, M.D., Yale University, New Haven, CT
Michael Leapman, M.D., Yale University, New Haven, CT
Xavier Llor, M.D., Ph.D., Yale University, New Haven, CT
Shuangge Ma, Ph.D., Yale University, New Haven, CT
Xiaomei Ma, Ph.D., Yale University, New Haven, CT
Sahand Negahban, Ph.D., Yale University, New Haven, CT
Bonnie Rothberg, M.D., Ph.D., Yale University, New Haven, CT
Richard Yang, Ph.D., Yale University, New Haven, CT
James Yu, M.D., Yale University, New Haven, CT
Yawei Zhang, Ph.D., Yale University, New Haven, CT
Harrison Zhou, Ph.D., Yale University, New Haven, CT
Related Publications
-
Predicting time-to-first cancer diagnosis across multiple cancer types.
Lau K, Hart GR, Deng J
Sci Rep. 2025 Jul 9; Volume 15 (Issue 1): Pages 24790 PUBMED -
Liver cancer risk quantification through an artificial neural network based on personal health data.
Ataei A, Deng J, Muhammad W
Acta Oncol. 2023 May 21; Pages 1-8 PUBMED -
Population-Based Screening for Endometrial Cancer: Human vs. Machine Intelligence.
Hart GR, Yan V, Huang GS, Liang Y, Nartowt BJ, Muhammad W, Deng J
Front Artif Intell. 2020; Volume 3: Pages 539879 PUBMED -
Robust Machine Learning for Colorectal Cancer Risk Prediction and Stratification.
Nartowt BJ, Hart GR, Muhammad W, Liang Y, Stark GF, Deng J
Front Big Data. 2020; Volume 3: Pages 6 PUBMED -
Pancreatic Cancer Prediction Through an Artificial Neural Network.
Muhammad W, Hart GR, Nartowt B, Farrell JJ, Johung K, Liang Y, Deng J
Front Artif Intell. 2019; Volume 2: Pages 2 PUBMED -
Predicting breast cancer risk using personal health data and machine learning models.
Stark GF, Hart GR, Nartowt BJ, Deng J
PLoS One. 2019; Volume 14 (Issue 12): Pages e0226765 PUBMED -
Scoring colorectal cancer risk with an artificial neural network based on self-reportable personal health data.
Nartowt BJ, Hart GR, Roffman DA, Llor X, Ali I, Muhammad W, Liang Y, Deng J
PLoS One. 2019; Volume 14 (Issue 8): Pages e0221421 PUBMED -
Stratifying Ovarian Cancer Risk Using Personal Health Data.
Hart GR, Nartowt BJ, Muhammad W, Liang Y, Huang GS, Deng J
Front Big Data. 2019; Volume 2: Pages 24 PUBMED -
A multi-parameterized artificial neural network for lung cancer risk prediction.
Hart GR, Roffman DA, Decker R, Deng J
PLoS One. 2018; Volume 13 (Issue 10): Pages e0205264 PUBMED -
Predicting non-melanoma skin cancer via a multi-parameterized artificial neural network.
Roffman D, Hart G, Girardi M, Ko CJ, Deng J
Sci Rep. 2018 Jan 26; Volume 8 (Issue 1): Pages 1701 PUBMED