Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Hengyong Yu
Degrees
PhD
Institution
Wake Forest University Health Sciences
Position Title
Assistant Professor
Email
About this CDAS Project
Study
NLST (Learn more about this study)
Project ID
NLST-53
Initial CDAS Request Approval
Feb 3, 2014
Title
Tensor-based Dictionary Learning for Imaging Biomarkers
Summary
As a central concept in systems biomedicine, biomarkers are multi-scale, diverse, and inter-connected indicators of physiological and pathological states and activities. Over the past decade, the research in this area has been active and exciting, including imaging informatics based on imaging biomarkers. In this context, the genome-wide association studies are being performed to establish fundamental links between genotypic and phenotypic biomarkers but a prime challenge is that progress along this direction has been far from what was widely expected. A critical observation is that while data are exploding from genome sequencing and epigenetic analysis, in most cases medical image features are still subjective or only defined in classic fashions, which seems an unreasonable imbalance between genotypic and phenotypic worlds.
Lung cancer screening is an emerging CT application and an opportunity to identify imaging biomarkers. Like other cancers, lung cancer is not one but many diseases. It is different in each patient and even in each tumor site with overwhelming nonlinearity and dynamics. It is crystal-clear that comprehensive, adaptive and individualized therapies are needed to win the battle against lung cancer. Being consistent to this big picture, research on sophisticated, instead of simplistic, biomarkers is not only helpful but also necessary in cancer research, and imaging informatics must perform exclusive and intelligent mining through rich in vivo imaging data for biomarkers so that correlative and predictive models could be established. The general hypothesis behind this R21 project is that new phenotypic information can be unlocked in tomographic data to improve sensitivity and specificity significantly in lung cancer CT screening. The overall goal of this project is to develop a tensor-based dictionary learning approach for extraction of CT imaging biomarkers, and optimize an artificial neural network to use these biomarkers for differentiation between true/false positive/negative CT lung screening results. The major innovation of this project is to synergistically integrate tensor decomposition, dictionary learning, compressive sensing, low-dose reconstruction, neural network, super-computing and big data mining into a brand-new imaging informatics approach, which can be viewed as “phenome sequencing” in analog of genome sequencing.
Upon the successful completion of this project, the identified imaging biomarkers will have been demonstrated instrumental in reducing the false positive rate significantly for lung CT scans while the false negative rate is kept constant. It will also help accurately stage lung cancers and non-invasively monitor cancer progression and therapeutic response. Equally important is the technical significance of this project. If it is established, a lasting impact will be generated on the field of imaging informatics at large.
Aims

The general hypothesis behind this project is that new phenotypic information can be unlocked in tomographic data to improve sensitivity and specificity significantly in lung cancer CT screening. The goal is to develop a tensor-based dictionary learning approach for extraction of CT imaging biomarkers, and optimize an artificial neural network to use these biomarkers for differentiation between true/false positive/negative CT lung screening results. The major innovation of this project is to synergistically integrate tensor decomposition, dictionary learning (DL), compressive sensing (CS), low-dose reconstruction, neural network, super-computing and big data mining into a brand-new imaging informatics approach, which can be viewed as “phenome sequencing” in analog of genome sequencing. The two specific aims are defined as follows.
Specific Aim 1 – Methodological Development: We will develop tensor-based discriminative dictionary learning and artificial neural network methods to extract CT biomarkers for lung cancer screening.
Task.1.1. Tensor Dictionary Learning: We will use a state-of-the-art tensor decomposition technique for tensor-based discriminative dictionary learning, analyze the dictionary learning based reconstruction in terms of cancer-relevant dictionary atoms, and treat their frequencies and relationships as novel phenotypic biomarkers.
Task.1.2. Neural Network Optimization: We will use tensor-based biomarkers defined in Task 1.1 as an integrated input tensor to a multilayer neural network, optimize the neural network architecture for highest accuracy and precision in lung cancer CT screening, and accelerate the computational speed using super-computing techniques.
Specific Aim 2 – Clinical Application: We will demonstrate that the identified imaging biomarkers are instrumental in distinguishing true positive results from lung cancer CT scans.
Task.2.1. Numerical Simulation: The correctness of individual algorithmic components will be verified using synthesized images. Then, the proposed algorithms will be accelerated to process the clinical data outlined in Task 2.2. Guided by the clinical co-investigator Dr. Chiles, we will select representative clinical lung volumetric CT images as the base for numerical simulation of various digital nodules and texture features. Also, the parameters used to minimize the multi-goal objective function for dictionary learning and image reconstruction will be adjusted in extensive numerical tests.
Task.2.2. Retrospective Clinical Trial: In the NLST, 26,722 patients went through three low-dose CT screenings (T0, T1, and T2) at 1-year intervals, which resulted in 75,126 cases in total. 17,327 positive results were reported, among them 649 cancers were diagnosed. Moreover, 44 negative screening results were subsequently confirmed cancers. All the CT images are available to investigators with approved access. We will perform a pilot study on this database, and show that the proposed imaging informatics methodology can help differentiate true/false positive/negative lung CT screening results. We plan to study 3,000 datasets. The number of cases could be increased for a statistical significance (p<0.05).
Upon the successful completion of this project, compared to the screening results in NLST, the identified imaging biomarkers will have been demonstrated instrumental in reducing the false positive rate from 22.2% to <15% while the false negative rate is maintained.

Collaborators

Ge Wang, Rensselaer Polytechnic Institute
Caroline Chiles, Wake Forest University Health Sciences