Skip to Main Content

An official website of the United States government

Principal Investigator
Undergraduate program
Kongju National University
Position Title
university student
About this CDAS Project
PLCO (Learn more about this study)
Project ID
Initial CDAS Request Approval
Apr 2, 2024
Prediction of Mortality Rates Based on Types of Melanoma Using Machine Learning
We are currently enrolled in a combined Bachelor's and Master's degree program, and based on the project topic, we aim to utilize the melanoma section of the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO). The variables we intend to utilize include patient-related variables and melanoma-related variables. Examples of these variables include the patient's age, gender, melanoma diagnosis, personal history of previous cancers, underlying cause of death, melanoma behavior, and melanoma morphology.

We plan to employ the random forest analysis method for several reasons: it allows for the evaluation of variable importance, prevents overfitting as trees are trained independently, and generally enhances the model's generalization performance.

Our approach involves loading melanoma-related data, performing necessary data preprocessing, selecting independent and dependent variables, splitting the data into training and testing sets, training the model on the training data, and evaluating its performance on the testing data. We will then output the model's accuracy, inspect the classification report to assess the model's performance, and perform feature selection to filter out unnecessary data variables.

We have chosen accuracy as our model evaluation method, aiming to calculate the accuracy of the melanoma mortality prediction model based on the type of melanoma, and expect this to provide an overall assessment of the model's performance. Additionally, we plan to utilize ROC curves to visualize the relationship between true positive rate (TPR) and false positive rate (FPR) as classification thresholds change, which we believe will be useful for evaluating the model's performance, especially in cases of class imbalance.

We aspire to visually confirm the performance of the model we aim to create. We intend to execute this project using Python. As students in a combined Bachelor's and Master's degree program and still in our fourth year, we may lack extensive experience with projects like these, which may result in some shortcomings in our summary and objectives. However, as we planned this project with the intention of utilizing PLCO data, we are committed to using it.

Our set goals are as follows:

1. Predicting Mortality Rates Based on Types of Melanoma:
Developing a model to predict mortality rates based on the types of melanoma. Analyzing mortality rate patterns for each type of melanoma and determining the impact of specific types of melanoma on mortality rates.

2. Evaluating Prediction Model Accuracy:
Assessing the accuracy of the developed prediction model through methods such as cross-validation or using metrics like accuracy, precision, and recall.

3. Verifying Predictive Capability:
Verifying how accurately the model predicts mortality rates based on the types of melanoma. Evaluating the model's predictive capability by comparing actual mortality rates with predicted ones.

4. Confirming Generalization of Developed Model:
Testing the generalization ability of the developed model by applying it to external datasets. Evaluating the model's accuracy and its ability to generalize to new patients.

5. Interpreting and Improving the Model:
Identifying key factors influencing mortality rates based on the model's prediction results. Seeking to improve the model's performance by considering additional variables or modifying the model structure.

Expected outcomes of our project include:

1. Efficient Early Diagnosis and Monitoring:
Predicting the prognosis of melanoma based on tumor characteristics and patient status, allowing for early detection of tumors that may deteriorate and closely monitoring patients with specific types of melanoma.

2. Patient Counseling and Education:
Providing information on tumor prognosis to patients and their families and educating them about related risks and treatment options to help patients understand their situation and make informed decisions.

3. Tailored Treatment Based on Specific Melanoma Types:
Developing personalized treatment plans based on individual patient and tumor characteristics to maximize treatment effectiveness and minimize side effects.

We anticipate the project results will be utilized as follows:

1. Drug Development and Side Effect Prediction:
Identifying effective treatment candidates using the prediction model and efficiently selecting patients for clinical trials to develop new treatments. Predicting and managing potential side effects in specific patient groups to improve safety.

2. Efficient Use of Medical Resources:
Identifying patients with a high risk of mortality to prioritize treatment for those with high mortality rate melanoma types. Allocating medical resources more efficiently by using more general tracking and management methods for other patients.

3. Utilization of Melanoma Type-related Research:
Aiding in understanding various factors that influence tumor progression and prognosis, which can lead to the development and evaluation of new treatment methods or prevention strategies.


Kongju National University: HyeWon Lee
Kongju National University: SeHyun Park