Genome Wide Association Studies (GWAS)
Every PLCO participant that consented to genetic testing and had a usable biospecimen available has been included in various Genome-Wide Association Studies. A harmonized GWAS dataset containing ~110,000 participants, called "Total GWAS Set," is strongly recommended for secondary GWAS analyses. These genetic data may be requested through the database of the Genotypes and Phenotypes (dbGaP) website. The dbGaP application includes verification by the institution that the investigators are full time employees, the IT system meets specific standards for the protection of sensitive data, any posted embargo will be respected, and the research will conform to standard protections of privacy and confidentiality. The investigator must also describe appropriate research goals.
The GWAS Explorer website contains additional information and is an interactive resource for any researchers interested in the Total GWAS Set. Summary statistics of GWAS findings for selected phenotypes may be browsed and downloaded, and the site can generate a variety of plots specified by the user.
Several legacy GWAS datasets are also archived on dbGaP should they be needed. These GWAS datasets individually contain fewer participants than the Total GWAS Set and often include data from non-PLCO cohorts and used older genotyping platforms. As noted, it is suggested that the Total GWAS Set be requested for most analyses, even if the population of interest is a small subset of the ~110,000 participants.
GWAS data from PLCO samples are available through the NIH and NCI established data policies that enhance access, maximize scientific use, adhere to ethical guidelines, ensure fair play, and protect stakeholders. Clear guidelines can also avoid duplicative studies and enhance collaborative opportunities through a transparent process for approval and making available information on existing studies.
All posted PLCO GWAS data have limited covariates associated with the genotypes. Individuals who receive GWAS data do not automatically receive access to other covariate data. Additional covariates are available only through the CDAS website by submitting a project proposal. Approval can be gained through the PLCO Data-Only project process, or through the EEMS application process if biospecimens are also requested. Investigators may request and receive data from a more comprehensive set of covariates. After approval is obtained, a dataset will be created for the investigator containing requested covariates and ID linkage to GWAS data. Attempts to identify individuals for other linkage to preexisting PLCO datasets will be considered a violation of the Data Use Certification that is agreed upon for the release of genotype data. These violations are taken very seriously by NCI and NIH and may lead to censure and removal of funding.