Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Mohammed Eslami
Degrees
Ph.D
Institution
Netrias, LLC
Position Title
Chief Scientist
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-1789
Initial CDAS Request Approval
Jan 13, 2025
Title
Automated Data Harmonization
Summary
PLCO questionnaires have opportunities for respondents to manually respond to certain questions. The MUQ is one of the few where the majority of the responses are manually entered. The data primarily comprises of drug names. We want to see if we can develop a language model to automatically harmonize those entries.
Aims

Aim 1: Conduct an exploratory analysis of responses provided for the medication list to identify variations (e.g. typos, specific acronyms, etc) often used for drug. names.

Aim 2: Develop a set of data generators to produce identified variations using the therapeutic_agent entity in the NCI Thesaurus.

Aim 3: Train and evaluate harmonization model with the data generators to semi-automatically standardize response for medication names.

Collaborators

NA