Skip to Main Content

An official website of the United States government

Principal Investigator
Name
gao longfei
Degrees
Doctoral graduate student
Institution
Harbin Medical University School of Bioinformatics and Technology
Position Title
Doctoral graduate student
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCOI-1771
Initial CDAS Request Approval
Dec 16, 2024
Title
New subtype classification of colorectal cancer based on deep learning analysis of pathological images
Summary
In this study, we aim to leverage advanced machine learning techniques and molecular data to identify new subtypes of colorectal cancer (CRC) based on histopathological slide images. The first step involves using a pre-trained ResNet50 model for feature extraction from the TCGA-CRC pathological slide dataset. ResNet50, a deep convolutional neural network (CNN), is well-known for its powerful feature extraction capabilities in medical image analysis. By applying this pre-trained model, we can obtain high-dimensional features that capture key information from the histopathological images, which serve as inputs for downstream analysis.

Once the features are extracted, we perform clustering analysis on all the patients to identify potential new subtypes of CRC. Various clustering algorithms, such as k-means or hierarchical clustering, can be employed to group patients based on the similarity of their extracted features. These subtypes may reveal novel insights into CRC classification, potentially highlighting previously under-recognized variations in tumor biology.

After identifying the subtypes, we proceed with further analysis to evaluate the clinical relevance of these new subtypes. Specifically, we examine the prognosis differences between subtypes, assessing survival outcomes such as overall survival (OS) or progression-free survival (PFS) across the different groups. In addition, we investigate the molecular differences between subtypes by analyzing various omics data, such as gene expression, somatic mutations, and protein levels, to determine which biomarkers are associated with each subtype.

Finally, to validate the robustness and generalizability of our findings, we use the PLCO (Prostate, Lung, Colorectal, and Ovarian) cancer dataset as an external validation set. This allows us to assess how well the identified subtypes from TCGA-CRC data hold up in an independent dataset, providing confidence in the clinical applicability of our findings. The validation process serves as an important step to ensure that our results are not overfitting and are likely to be reproducible in real-world clinical settings.
Aims

MSI,BRAF,KRAS.......

Collaborators

just me