Ovarian Cancer Classification using machine learning approaches

Principal Investigator

Name
DSS Laskshmi Kumari P

Degrees
Ph.D

Institution
Puducherry Technological University

Position Title
Research scholar

Email
dsslakshmikumari@ptuniv.edu.in

About this CDAS Project

Study
PLCO (Learn more about this study)

Project ID
PLCO-1651

Initial CDAS Request Approval
Aug 28, 2024

Title
Ovarian Cancer Classification using machine learning approaches

Summary
Ovarian cancer (OC) is one of the most lethal cancers affecting women, often diagnosed at an advanced stage due to its asymptomatic nature in early stages. The late detection of OC significantly reduces survival rates, making early diagnosis crucial for improving patient outcomes. Despite considerable research efforts, the disease remains a "silent killer," with many patients diagnosed only at advanced stages where treatment options are limited.

This project aims to develop an interpretable machine learning algorithm for the early detection of ovarian cancer, focusing on the analysis of serous biomarker concentration patterns. The ultimate goal is to create a tool that can be easily adopted by medical researchers and practitioners, enhancing the accessibility and effectiveness of OC screening and diagnosis.

Machine learning has shown great potential in data mining tasks; however, in fields like medical research where explainability is critical, there is a need for algorithms that not only perform well but also provide insights into their decision-making processes. This project will focus on developing machine learning models with high specificity and sensitivity for OC detection, while also incorporating interpretable AI techniques to ensure that the models are transparent and accountable.

In recent years, advanced technologies such as mass spectrometry have been used to detect cancer through biomarker analysis. For example, the cancer antigen 125 (CA125) biomarker is used for early detection, identifying 50-60% of women with stage 1 ovarian cancer. This project will build on these advancements by integrating machine learning with image-based diagnostics, utilizing MRI images to enhance the accuracy of early detection.

The proposed approach includes the use of optimization algorithms for feature selection and extraction, employing methods such as multi-layer convolutional neural networks (CNNs) for classification. The project will emphasize evaluating and comparing performance metrics, such as peak signal-to-noise ratio, to improve image quality and processing efficiency. The outcome is expected to be a robust diagnostic tool that can detect ovarian cancer at an early stage, potentially reducing the mortality rate associated with the disease.

By improving early detection through interpretable machine learning, this project has the potential to significantly impact public health by enabling more timely and accurate diagnosis of ovarian cancer, ultimately saving lives.

Aims

Ovarian Cancer Data Collection
Collect and curate comprehensive datasets related to ovarian cancer, including diagnostic test data, imaging, and clinical records. These datasets will enable the identification of ovarian cancer presence and the extent to which it has spread to other organs.
Data Preprocessing
Develop and implement robust data preprocessing techniques to convert raw, unstructured data into a clean and formatted dataset. This process will ensure that the data is suitable for analysis and machine learning model development, facilitating accurate and efficient processing.
Feature Extraction from Data
Employ feature extraction methods to reduce the dimensionality of the raw data, thereby simplifying the dataset while preserving the essential characteristics. This will enhance the efficiency of the machine learning algorithms and ensure that they can process the data with fewer computational resources without losing critical information.
Feature Selection from Data
Implement feature selection techniques to identify and retain the most relevant attributes from the dataset. This process will focus on including the most informative features, eliminating irrelevant or redundant ones, thereby improving the performance and interpretability of the machine learning models.
Learning Algorithm Selection
Explore and select appropriate machine learning algorithms for both classification (e.g., SVM, Neural Networks, KNN, Logistic Regression, Random Forest) and regression (e.g., Decision Tree, Linear Regression, Neural Networks, SVR, Polynomial Regression) tasks. The chosen algorithms will be tailored to effectively address the challenges of early ovarian cancer detection.
Model Training
Train machine learning models using both supervised and unsupervised learning techniques. Supervised learning will focus on labeled datasets to predict specific attributes, while unsupervised learning will explore hidden patterns and similarities within the data. The goal is to optimize the models for accurate classification and prediction.
Model Performance Evaluation
Evaluate the performance of the machine learning models using classification metrics (e.g., Accuracy, Sensitivity, Specificity) and regression metrics (e.g., R², MSE, RMSE). The aim is to maximize the sensitivity and specificity of the models for early detection of ovarian cancer, ensuring that the models are reliable and accurate.
Enhancing Predictive Behavior for Early OC Detection
Improve the predictive behavior of the machine learning algorithms, particularly their sensitivity and specificity, to enhance their effectiveness in the early detection of ovarian cancer. This will contribute to reducing the mortality rate by enabling earlier intervention.
Study of Interpretability and Explainability Techniques
Investigate and integrate interpretability and explainability techniques in the machine learning models, ensuring that the results are transparent and comprehensible to medical researchers and practitioners. This will help in the adoption of the models in clinical settings by providing clear explanations for their predictions.

Collaborators

Dr.P.Maragathavalli

Related Publications

A leakage-aware entropy screening protocol for structured biomarker panel evaluation in ovarian cancer risk modelling.
P DLK, P M
MethodsX. 2026 Jun; Volume 16: Pages 103880 PUBMED