Identifying risk factors for colorectal neoplasia through machine learning
Principal Investigator
Name
Daniella Araujo
Degrees
Ph.D student
Institution
Universidade Federal de Minas Gerais
Position Title
Research Associate of Artificial Intelligence Lab
Email
About this CDAS Project
Study
PLCO
(Learn more about this study)
Project ID
PLCO-870
Initial CDAS Request Approval
Nov 30, 2021
Title
Identifying risk factors for colorectal neoplasia through machine learning
Summary
Research has demonstrated that screening for colorectal cancer (CRC) improves survival and reduces mortality rate. The American Cancer Society clinical guidelines, based on scientific evidence, recommend CRC screening for all individuals over the age of 50 even if no additional risk factors are present. In Brazil, CRC was the fifth most diagnosed cancer in Brazil 20 years ago, and now is the second for both women and men, and about half of patients die in less than five years after treatment. This progressive increase in the number of cases is due to the evolution of our country towards the industrialization process, similar to the characteristics of more developed countries where colorectal cancer is the second or third most important cause of malignant neoplasms. There is a growing need for a screening program for early detection. We propose to create a cheap and minimum invasive pre-screening process using only demographics and plasma biomarkers data.
Using a machine learning methodology developed in Artificial Intelligence Lab in UFMG, we have already had relevant
results using plasma data in prodromal Alzheimer's disease identification; COVID-19 prognosis and diagnosis (similar to
RT-PCR); polycystic ovary syndrome (PCOS) identification; and we are currently working with breast cancer data. This methodology has four steps: data engineering; large scale exploration; feature selection and interpretability.
Using a machine learning methodology developed in Artificial Intelligence Lab in UFMG, we have already had relevant
results using plasma data in prodromal Alzheimer's disease identification; COVID-19 prognosis and diagnosis (similar to
RT-PCR); polycystic ovary syndrome (PCOS) identification; and we are currently working with breast cancer data. This methodology has four steps: data engineering; large scale exploration; feature selection and interpretability.
Aims
- To predict the risk of CRC using only basic and cheap data (such as demographics data and/or plasma
analytes). This model could be further used as a pre-screening process for CRC;
- To understand the patterns involved in the model decision making, that could grasp general patterns about CRC
pathogenesis.
Collaborators
Adriano Alonso Veloso
Gianlucca Zuin