Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Yan Li
Degrees
Ph.D.
Institution
University of Michigan
Position Title
postdoc Fellow
Email
About this CDAS Project
Study
NLST (Learn more about this study)
Project ID
NLST-326
Initial CDAS Request Approval
Jul 6, 2017
Title
Precision medicine via survival analysis
Summary
Recently, precision medicine is becoming a new national effort, which aims at providing personalized prevention methods and treatment strategies. The precision medicine has some unique characteristics that pose several challenges to the state-of-the-art machine learning and data mining methods. Some of them are as following: (i) Albeit the last few years have witnessed an explosive increase of healthcare data volume, variety and veracity, it is insufficient to build a robust personalized prediction model; (ii) Commonly, precision medicine requires integrating heterogeneous multiple sources of information, e.g., molecular biology, medical image diagnosis, and laboratory diagnosis, to make accurate decisions; (iii) Modeling with both static and time-dependent information. While overcoming the above challenges opens up several new exciting research directions.
Aims

1) To solve the first challenge mentioned above, we can cluster the patients into several subgroups, and build survival prediction and/or diagnosis models for each sub-group sharing common susceptibility to a particular disease, but rather target a unique patient. To be specific, a latent class survival regression model can be designed to simultaneously cluster the patients into several sub groups and train one accurate survival prediction model for each sub-group. This model can be trained via expectation-maximization (EM) algorithm or tensor decomposition based methods.
2) Integration of medical image information and other clinical information is a real challenge, not only because the data sources are heterogeneous but also because medical image information usually contains much more number of features than other clinical information. Thus, the integration methods should be able to prevent the clinical information be overwhelmed. To achieve this goal, we can use some representation learning methods to map the raw data from multi-source into intermediate representations, which preserve the properties of each data source and can be easily combined. Alternatively, we can build a survival prediction model for each data source and then integrate the learned models to get the final prediction.
3) In healthcare analysis, data is always longitudinal with both time-dependent features, e.g., blood pressure and blood glucose, and static features, e.g., race and sex. To encode both time-dependent and static features in survival prediction model, we can structure the time-dependent features as a third order tensor with modes sample*feature*time. Thus, the temporal smoothness and the concept drift of the time-dependent features can be efficient encoded via adding regularization term such as fused lasso and trace norm in the slices of a third order tensor.

Collaborators

jieping Ye, university of Michigan
jiayu zhou, Michigan state University
Lu Wang, Wayne State University