Study
NLST
(Learn more about this study)
Project ID
NLST-114
Initial CDAS Request Approval
Jan 21, 2015
Title
QIN Pilot Project: Object Oriented Data Analysis of NLST Results
Summary
One of the aims of our Quantitative Imaging Network U01 is to utilize NLST data available from the Cancer Imaging Archive (TCIA) in a pilot project to explore the use of novel graph analysis and statistics tools for identification of candidate multi-parametric biomarkers. The goal is to extract the full set of anonymized trial data from TCIA's clinical trials database and using the Eureka! tool from our collaborators at Emory University, convert this data to a database of graphs. These graphs will be created and stored in a secure database at Washington University (WU) that is hosted in the same protected cloud computing environment as TCIA. Graphs will then be analyzed using a set of graph statistics tools developed at WU and implemented on the WU center for high performance computing cluster. The goal of the project is to identify statistically significant subgraphs that distinguish between participants who developed cancer and those who did not. The hypothesis being that these subgraphs represent candidate biomarkers. This type of graph statistical analysis is one form of Object Oriented Data Analysis and represents a state-of-the-art approach to data mining. The results of this pilot project will be published and the tools (but not the NLST data) will be made available to other NCI-funded, QIN researchers. In the second phase of this project, a subset of CT images will be analyzed using a feature extraction pipeline to characterize lung nodules and the full set of pathology images will be automatically segmented to identify all cellular nuclei which will then also undergo feature extraction and characterization. These quantitative image features will be added to the Phase 1 graphs for a more detailed analysis.
Aims
Support biomarker development and validation with advanced analytics. Object Oriented Data Analysis (OODA) statistical techniques developed by our team support hypothesis testing, and statistical analysis of populations of graphs. We will deploy an instance of Eureka! Clinical Analytics to create OODA-ready graphs to represent hierarchies of clinical, imaging and pathology data elements. Eureka! and OODA analytic tools will provide a novel environment to identify new candidate multi-parametric cancer biomarkers. Existing TCIA-hosted data from the National Lung Screening Trial provides a rich set of data elements and relationships to demonstrate the power of these tools.
Collaborators
Ashish Sharma, PhD - Emory University
Andrew Post, PhD - Emory University
Lawrence Tarbox, PhD - UAMS
WIlliam Shannon, PhD - WUSTL
Malcolm Tobias, PhD - WUSTL
Ken Clark, MBA - WUSTL
Paul Koppel, PhD - WUSTL
Elena Deych, MS - WUSTL
Joel Saltz, MD/PhD - Stony Brook University
Tahsin Kurc, PhD - Stony Brook University
Yi Gao, PhD - Stony Brook University
Liangjia Zhu, PhD - Stony Brook University
Si Wen, PhD - Stony Brook University
Fusheng Wang, PhD - Stony Brook University
Jonas Almeida, PhD - Stony Brook University
Erich Bremer, M.Sc. - Stony Brook University
Le Hou, M.S. - Stony Brook University
Mary Saltz, MD - Stony Brook University
Barbara Nemesure, PhD - Stony Brook University