Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Jonas De Almeida
Degrees
Ph.D
Institution
National Cancer Institute
Position Title
Senior Investigator, Chief Data Scientist
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-843
Initial CDAS Request Approval
Oct 29, 2021
Title
PLCO Data Modeling as RDF/JSON-LD
Summary
Context

Some of PLCO GWAS data has recently been wrangled to support the development of HTTP REST APIs at https://exploregwas-dev.cancer.gov/plco-atlas/#/api-access (requires VPN). I am involved in that data modeling process through the development of client SDK libraries such as https://github.com/episphere/plco .

Rationale

While pursuing data modeling as a data API would not make sense for the whole dataset, the incremental development of a Linked Data model would validate, and greatly facilitate, the exploration of larger non-public data by those authorized to traverse it from CADC's managed data enclave. Accordingly, a formal study of PLCO's data structure is proposed here as a Linked Data exercise pursued within the JSON-LD framework. Specifically (see specific aim), the final result would be provided as JSON-LD metadata model.
Aims

The specific delivery proposed by this project is to Identify a JSON-LD description of PLCO data resources. The specific aim of this project is to script a metadata description engine for a Linked-Data model. This is proposed along the same lines we've done for The Cancer Genome Atlas (TCGA) in the past [Robbins 2013, Saleem 2014], but with a more compact descriptor that can be ported without carrying any sensitive information. This new approach is allowed by more recent advancements in the field of data modeling, which approach the description of data resources as its predication by metadata Links.

References

Robbins DE, A Gruneberg, HF Deus, MM Tanik, JS Almeida (2013) A Self-Updating Roadmap of The Cancer Genome Atlas. Bioinformatics 29(10):1333-40 [PMID:23595662]

Saleem M, SS Padmanabhuni, AN Ngomo, A Iqbal, JS Almeida, S Decker, HF Deus (2014) TopFed: TCGA Tailored Federated Query Processing and Linking to LOD. J Biomedical Semantics [PMID:25937882].

Collaborators

Jonas S Almeida, PhD - Senior Investigator and Chief Data Scientist at NCI/DCEG
Daniel Russ, PhD - Staff Scientist at DCEG's Data Science and Egineering group