Building a Common Data Model for Cancer Research
Principal Investigator
Name
Guoqian Jiang
Degrees
M.D., Ph.D.
Institution
Mayo Clinic
Position Title
Associate Professor of Biomedical Informatics
Email
About this CDAS Project
Study
NLST
(Learn more about this study)
Project ID
NLST-285
Initial CDAS Request Approval
Feb 17, 2017
Title
Building a Common Data Model for Cancer Research
Summary
The Observational Health Data Sciences and Informatics (OHDSI) has been established as a multi-stakeholder, interdisciplinary collaborative to create open-source solutions that bring out the value of observational health data through large-scale analytics. The OHDSI Common Data Model (CDM), originally developed as part of the Observational Medical Outcome Partnership (OMOP), is a deep information model that specifies how to encode and store clinical data at a fine-grained level, ensuring that the same query can be applied consistently to databases around world. With the advancement of the OHDSI CDM and its applications, there is an increasingly need to leverage promising methods and tools developed in the OHDSI consortium for advanced cancer research. However, the cancer research communities face the following major challenges: 1) current OHDSI vocabulary services do not contain cancer-specific vocabularies (e.g., NCI Thesaurus); 2) the OHDSI CDM does not cover sufficient elements that are required for capturing critical cancer data from unstructured sources extracted by natural language processing (NLP) applications; and 3) the extract, transformation, load (ETL) tools are not optimized for translational cancer research. The proposed collaboration will develop a common cancer research clinical data model and associated ETL tools to enable cancer research data to be automatically loaded into an OHDSI data repository.
Aims
Aim 1: Examine metadata associated with existing cancer datasets;
Aim 2: Create methods for enhancing the OHDSI CDM with cancer-specific vocabularies and NLP elements
Aim 3: Create tools for loading the enhanced OHDSI CDM with both structured and unstructured cancer data
Aim 4: Evaluate system utility using the NLST datasets
Collaborators
Dr. Chen Wang, Mayo Clinic