Skip to Main Content

An official website of the United States government

Government Funding Lapse

Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit  cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Principal Investigator
Name
Mohammed Eslami
Degrees
Ph.D
Institution
Netrias, LLC
Position Title
Chief Scientist
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-1789
Initial CDAS Request Approval
Jan 13, 2025
Title
Automated Data Harmonization
Summary
PLCO questionnaires have opportunities for respondents to manually respond to certain questions. The MUQ is one of the few where the majority of the responses are manually entered. The data primarily comprises of drug names. We want to see if we can develop a language model to automatically harmonize those entries.
Aims

Aim 1: Conduct an exploratory analysis of responses provided for the medication list to identify variations (e.g. typos, specific acronyms, etc) often used for drug. names.

Aim 2: Develop a set of data generators to produce identified variations using the therapeutic_agent entity in the NCI Thesaurus.

Aim 3: Train and evaluate harmonization model with the data generators to semi-automatically standardize response for medication names.

Collaborators

NA