Skip to Main Content

COVID-19 is an emerging, rapidly evolving situation.

What people with cancer should know:

Get the latest public health information from CDC:

Get the latest research information from NIH:

Principal Investigator
Timothy Rebbeck
Dana-Farber Cancer Institute & Harvard School of Public Health
Position Title
Vincent L. Gregory, Jr. Professor of Cancer Prevention
About this CDAS Project
PLCO (Learn more about this study)
Project ID
Initial CDAS Request Approval
Sep 27, 2019
Self-Identified Race and Ethinicity: Ancestry and Geospatial Context
Self-identified race and ethnicity (SIRE) is a complex, multifactoral trait. SIRE is inherently a social construct, defined by the individual based on their personal phenotype, history, behavior, and experiences. SIRE is often used as a comparator in studies of health disparities, yet the specific definition of this trait is often poorly defined and understood. A better understanding of SIRE may be of value in defining groups that require specific interventions or have unique risk profiles in studies of health disparities.

Currently, most health disparities studies rely on SIRE as the basis for comparison between groups who may have different disease risks or outcomes. This approach has been extremely valuable and robust in identifying health disparities. However, the use of SIRE to eliminate health disparities has met with limited success because SIRE-based groups are heterogeneous with respect to disease etiology. Without a more precise definition of the groups that experience health disparities, it is questionable whether elimination of health disparities may be completely achieved.

A number of correlates of SIRE exist that may be used to better understand the nature of this complex trait. First, SIRE is correlated with the social, behavioral, cultural, and historical environment in which an individual lives. Second, SIRE is correlated with ancestral (e.g., continental) origin. Inherent to this ancestral origin are specific patterns of genomic architecture including mutational frequency, haplotype block structure, and other features of the individual's genome that have arisen through evolutionary history. Both of these non-genetic and genetic factors are also correlated with one another in complex ways. Both of these classes of factors are also strongly correlated with individual predisposition to develop disease, as well as respond to prevention and treatment strategies related to the disease.

Because of the complex nature of SIRE, a multilevel approach may be required to understand the relationships among underlying factors associated with SIRE, and how they may be used to define groups who may have different disease risks or responses to prevention and treatment. The goal of this approach is to disentangle the complex relationships among genetic and non-genetic factors correlated with SIRE. With an improved understanding of the underlying correlates of SIRE, novel metrics could be constructed that better capture SIRE. These metrics may then be used to define the population groups who are at greatest risk for development of disease, unfavorable outcomes, response to prevention and treatment modalities, and other disease endpoints. . As a pilot study, we will assess the feasibility of incorporating genomic ancestry and neighborhood contextual measures correlated with SIRE to develop a mortality risk stratification model in men in the PLCO trial.

In self-identified African Americans, evaluate the relationship of genomic ancestry with neighborhood-level data though the following aims:
1) The proportion of African genomic ancestry differs by residential characteristics related to socio-economic characteristics, including education, wealth, and income.
2) The proportion of African genomic ancestry differs by residential characteristics related to segregation and discrimination.
3) The proportion of African genomic ancestry differs by residential characteristics related to health care access, including distance, transportation and insurance.
4) Apply methods for studying multiple correlated environmental factors to develop models for mortality risk stratification based on genomic ancestry and neighborhood correlates of SIRE.