Skip to Main Content
Principal Investigator
Name
Shandong Wu
Degrees
Ph.D
Institution
University of Pittsburgh, Department of Radiology
Position Title
Professor
Email
About this Project
Study
PLCO (Learn more about this study)
Project ID
PLCOI-519
Title
Data mining study on clinical and imaging data for assessing cancer risk and estimating outcome
Summary

1. Data collection: all clinical and supplementary data collected for patients in the PLCO study.
• The questionnaires for collect basic information (age, sex, BMI, medical history, etc.), dietary, physical activity, etc.)
• Lab examinations: sex hormones (estrone, androstenedione, testosterone, sex hormone-binding globulin, etc.), tumor biomarkers (CA125, CA199, CA153, CA724, etc.), and other lab examination depending on the availability in the database (hepatase, creatinine, urine acid, blood routine, thyroid hormones, etc.).
2. Cancer risk assessment
• Patients in the PLCO database with clinical data and supplementary questionnaires will be included.
• Computational modeling techniques (e.g. data-driven unsupervised k-means clustering and other machine learning methods) will be employed or developed for the classification of patients into several subgroups with different disease characteristics.
• The incidence/risk of different cancers (prostate, lung, colorectal, ovarian cancers, etc.) and their subtypes will be evaluated and compared between subgroups.
3. Prognosis prediction with specific cancers
• Patients in the PLCO database with a specific kind of cancer (e.g. lung cancer) will be included.
• Computational modeling techniques (e.g. data-driven unsupervised k-mean clustering and other machine learning methods) will be employed or developed for the classification of patients into several subgroups with different characteristics in pathological types.
• The effect of treatment and mortality will be calculated and compared between subgroups.
4. Cancer risk prediction in subgroups with sex hormones and tumor biomarkers data available
• Patients in the PLCO database with sex hormones and tumor biomarkers will be included.
• Computational modeling techniques (e.g. data-driven unsupervised k-mean clustering and other machine learning methods) will be employed or developed for the classification of patients into several subgroups with different tumor biomarker profiles.
• The incidence of sex hormone related cancers (prostate, breast, ovarian cancers, etc.) and their subtypes will be calculated and compared between subgroups.
5. Quantitative imaging (both radiology and pathology images) analysis for characterizing cancers that have imaging data available (prostate, lung, colorectal, and ovarian cancers, etc.).
• Patients with a radiological image examination (chest radiograph, transvaginal ultrasound, digital rectal examination, etc.) on a specific disease will be included.
• A radiomics and/or deep learning modeling will be applied and developed to explore the image features in early detection of cancers (lung, ovarian, colorectal cancers, etc.), subtype classification and cancer prognosis prediction.
• A study connecting pathology and radiology/radiomics will be taken to explore effects of radiomics models in estimating pathology markers/findings.

Aims

We will perform a retrospective study on the data available in the PLCO study; here data refer to patient’s clinical profiles, supplementary questionnaires, and outcome data. A wide range of clinical profile data (such as but not limited to disease types, medical history, lifestyle, sex hormone, tumor markers, etc.) will be collected from the PLCO study. Computational modeling methods using data mining and artificial intelligence techniques will be adapted for the classification of the patients in terms of sub-clustering the patients. Then for patients with outcome data available such as survival, recurrence, prognosis, and treatment etc., correlation study will be conducted between clinical data and outcome prediction. For the subset patients with imaging (radiology and/or pathology) data available, radiomics analysis and deep learning will be conducted on these patients.

We propose the following specific aims:

1. Classify patients in the PLCO database with different clinical profiles (medical history, lifestyle, etc.) into different subgroups, and link the subgroups to the incidence/risk of different cancers (prostate, lung, colorectal, ovarian cancers, etc.).
2. Classify patients with a specific cancer diagnosis (e.g. lung cancer) into different cancer subgroups and link the subgroups to the treatment effect, prognosis, survival etc.
3. Classify a subset of patients who have a lab test of sex hormones and tumor biomarkers into different subgroups and link the subgroups to the incidents of sex hormone related cancers (prostate, breast, ovarian cancers, etc.) and its treatment effect, prognosis, survival, etc.
4. Apply a radiomics and/or deep learning modeling on a subset of patients who have imaging examination available (such as chest radiograph, transvaginal ultrasound, digital rectal examination, etc.), and correlate the quantitative image features to the risk assessment of cancers (lung, ovarian, colorectal cancers, etc.).

Collaborators

NA