Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Ivan Shtepliuk
Degrees
Ph.D.
Institution
Linköping University
Position Title
Principal Research Engineer
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-1879
Initial CDAS Request Approval
Apr 21, 2025
Title
Blood Analysis-Driven Machine Learning for Ovarian Cancer Detection
Summary
The development of reliable methods for detecting ovarian cancer is of critical importance due to its high mortality rates and the challenges associated with early diagnosis, which significantly improves patient outcomes. One promising approach involves the use of an electronic nose to detect volatile organic compounds (VOCs) emitted from the blood plasma of healthy individuals and ovarian cancer patients. In our research group, we have developed a machine learning model based on electronic nose measurement data, capable of classifying blood samples with an impressive accuracy of 97%. The model also demonstrates a sensitivity and specificity of 97%, highlighting its potential as a robust diagnostic tool.
To further validate and enhance this approach, it is essential to compare the electronic nose-based method with other existing diagnostic techniques. In this context, we aim to develop an additional machine learning model utilizing available clinical data, specifically blood test results for biomarkers from healthy individuals and ovarian cancer patients. The concentrations of these biomarkers will serve as features for the model. The development of this model not only promises to provide a complementary diagnostic tool but is also valuable in its own right, as it may identify novel biomarkers associated with ovarian cancer. This could prove instrumental in creating a unified panel of indicators, ultimately improving the prediction and early-stage detection of ovarian cancer, thereby offering significant benefits for patient prognosis and treatment.
Aims

1. Elucidate Key Biomarkers Differentiating Classes: Achieve a comprehensive understanding of which biomarkers exhibit the most significant differences between healthy individuals and ovarian cancer patients. This will involve statistical analysis, such as the t-test, to identify biomarkers with the greatest discriminatory power based on their concentrations in blood samples.
2. Develop and Compare Classification Models: Construct a variety of classification models, including k-Nearest Neighbors (kNN), Support Vector Machines (SVM), ensemble models, and neural networks, to classify blood samples into healthy and ovarian cancer categories. Evaluate and compare their performance metrics (e.g., accuracy, sensitivity, specificity) to determine the most effective approach.
3. Identify High-Impact Biomarkers: Determine which biomarkers have the greatest influence on distinguishing between the two classes through feature importance analysis. This will provide insights into the biological relevance of specific markers and guide future diagnostic development.
4. Assess Model Stability: Validate the stability and robustness of the optimal classification model using appropriate methods, such as cross-validation or bootstrapping, to ensure consistent performance across diverse datasets and conditions.
5. Develop Regression Models for Predictive Indexing: Create regression models to link biomarker concentrations to a probability-based predictive index, analogous to the Risk of Ovarian Malignancy Algorithm (ROMA). This will enable quantitative risk assessment for ovarian cancer based on clinical biomarker data.
6. Integrate Dual Detection Approaches: Devise a method to combine the machine learning-enhanced electronic nose approach (based on volatile organic compounds) with the clinical biomarker-based detection method. This unified strategy aims to enhance diagnostic accuracy and early-stage prediction by leveraging complementary data sources.

Collaborators

Jens Eriksson, Linköping University
Donatella Puglisi, Linköping University