Skip to Main Content

An official website of the United States government

Principal Investigator
University of Texas at Arlington
Position Title
About this CDAS Project
NLST (Learn more about this study)
Project ID
Initial CDAS Request Approval
Jun 30, 2015
Machine Learning algorithm development for lung cancer subtype identification and survival prediction
Lung cancer is one of serious diseases causing death for both men and women. In this project, we will propose a computer-aided subtype lung cancer diagnosis and survival prediction on NLST data. At first, we start from a challenging and important clinical case, i.e., differentiation of two subtypes of Non-small cell lung cancer (NSCLC) which is the most common type of lung cancer. The whole process will include feature extraction and subtype cancer classification. For feature extraction, we plan to extract local and holistic features from NLST histopathology images. To extract local features, a robust cell detection and segmentation method is adopted to segment each individual cell in images. Then, based on cell detection results, a set of extensive local features are extracted using efficient geometry and texture descriptors. To investigate the effectiveness of holistic features in lung cancer images, we extract architecture features from labeled nuclei centroids. Each subtype lung cancer sample will be described by one high-dimensional feature vector. Moreover, to reduce the high dimension, we will use machine learning methods to find out important features (markers) from all features. After feature extraction, several different classification techniques like Support Vector Machine (SVM) and Random Forest that can handle high-dimensional data will be evaluated.

Survival analysis is related to death in biological organisms and failure in mechanical systems. In survival analysis, cox proportional hazards model is one of the most commonly used multivariate approaches to analyze the survival time data in medical research. It is a semi-parametric method that does not need a specific baseline hazard function and has the capability to effectively handle censoring problem. In our project, a Cox proportional hazards model based on important features is fitted by component-wise likelihood based boosting. Significant image markers can be discovered using the bootstrap analysis and the survival prediction performance of the model will be evaluated.

In the project, we first aim to investigate important and novel image features for both computer aided diagnosis and prognosis of lung cancer. In our plan, the framework include cell detection, segmentation, feature extraction, classification, and survival analysis for NLST NSCLC Histopathology images. A complete set of cellular features are extracted and several advanced machine learning classification approaches are compared using image features extracted in previous steps. If it works successfully, we can find representative feature variable for NSCLC subtype classification.

We conduct survival analysis based on a Cox model and also apply several survival analysis approaches to evaluate the discovered image features. By these evaluation, a set of prognostic image markers that are highly correlated to NSCLC patients’ survival analysis will be found. Using these image markers, we can accurately predict NSCLC patients’ survival. Together with clinical information, it provides significant clinical values for patients’ prognosis.

In summary, our project based on NLST data aims to design a system to assist doctors for more objective and accurate diagnoses and prognoses of lung cancer.

Related Publications