Predictive analytics with machine learning for ovarian cancer

Principal Investigator

Name
Jacob Levman

Degrees
Ph.D., M.A.Sc., B.A.Sc.

Institution
St. Francis Xavier University

Position Title
Associate Professor of Computer Science

Email
jlevman@stfx.ca

About this CDAS Project

Study
PLCO (Learn more about this study)

Project ID
PLCO-1990

Initial CDAS Request Approval
Sep 22, 2025

Title
Predictive analytics with machine learning for ovarian cancer

Summary
In this research project, we propose to apply automated machine learning (AutoML) technology to predictive analytics of ovarian cancer.
Automated machine learning (AutoML) is an emergent research domain that includes the automated search for high-quality, reliable machine learning models. The AutoML package, df-analyze, is a tool for performing automated machine learning on small to medium sized datasets [1]. The package supports various machine learning algorithms, and is able to perform several forms of feature selection [2], and two forms of validation [1]. The package creates summary tables for the performance of all combinations of learning machines and feature selection approaches, as well as markdown files with readable reports and statistical assessments [1]. It also supports a novel redundancy-aware wrapper-based step-up feature selection method, a technique that helps find small feature sets that may exhibit predictive potential [1]. Df-analyze has been used previously for diagnostics of schizophrenia [3], chronic kidney disease [4], ethical artificial intelligence [5], predicting thyroid cancer recurrence [6], diagnosing, predicting treatment, and staging in pediatric appendicitis [7], and studying proteins potentially linked with learning in the cerebral cortex [8].
In this study, we hypothesize that the application of df-analyze AutoML technology to Ovarian cancer data may: 1) create technologies with diagnostic, staging, and/or prognostic value, and 2) may help elucidate our understanding of factors predictive of important aspects of ovarian cancer and its management, including providing potential insights into factors predictive of severity, issues associated with detection methods (screen vs. interval detected tumours), etc.

References:
1. stfxecutables. (2024). df-analyze documentation. Retrieved from https://github.com/stfxecutables/df-analyze
2. Train in Data. (n.d.). Feature selection with wrapper methods. Retrieved October 21, 2024, from https://www.blog.trainindata.com/feature-selection-with-wrapper-methods/
3. Levman, J.; Jennings, M.; Rouse, E.; Berger, D.; Kabaria, P.; Nangaku, M.; Gondra, I.; Takahashi, E. A Morphological Study of Schizophrenia with Magnetic Resonance Imaging, Advanced Analytics, and Machine Learning. Front. Neurosci. 2022, 16, doi:10.3389/fnins.2022.926426.
4. Figueroa, J., Etim, P., Shibu, A., Berger, D., Levman, J. Diagnosing and Characterizing Chronic Kidney Disease with Machine Learning: The Value of Clinical Patient Characteristics as Evidenced from an Open Dataset. Electronics 2024;13, 4326.
5. Saville, K.; Berger, D.; Levman, J. Mitigating Bias Due to Race and Gender in Machine Learning Predictions of Traffic Stop Outcomes. Information. Accepted for Publication Oct. 28th, 2024.
6. M. Penner, D. Berger, X. Guo, J. Levman “Machine Learning in Differentiated Thyroid Cancer Recurrence and Risk Prediction,” Applied Sciences, 15(17), 9397, 2025.
7. Kendall, J.; Gaspar, G.; Berger, D; Levman, J. Machine Learning and Feature Selection in Pediatric Appendicitis. Tomography 2025, 11(8), 90. https://doi.org/10.3390/tomography11080090
8. Huang, X.; Gauthier, C.; Berger, D.; Cai, H.; Levman, J. Identifying Cortical Molecular Biomarkers Potentially Associated with Learning in Mice Using Artificial Intelligence. Int. J. Mol. Sci. 2025, 26, 6878. https://doi.org/10.3390/ijms26146878.

Aims

The specific aims are to: 1) develop machine learning models for predicting important aspects of ovarian cancer, such as tumor stage/severity, detection method (screen vs. interval cancers), etc.
2) to uncover patterns in the data by identifying highly predictive subsets of features for each predictive task considered. Thus we aim not only to develop novel technologies that could assist in improving the standard of patient care, but we also are interested in the potential to educate the medical research community as to feature subsets that are highly predictive of important aspects of ovarian cancer. By not only looking at methods for predicting disease severity, but also investigating characteristics of tumors that are predictive of screen vs. interval cancers, our analyses may identify factors that can improve our understanding of the shortcomings of current screening methods.

Collaborators

Jacob Levman St. Francis Xavier University
Keely Ralf St. Francis Xavier University
Xuchen Guo St. Francis Xavier University