Skip to Main Content

Requests for two kinds of PLCO images can be submitted through the CDAS system, as described below.

Digitized Screening Chest X-ray Section

Chest x-rays were used to screen for lung cancer in the PLCO Trial. Of the 237,000 x-rays taken, approximately 205,000 are available in TIF format with a low-contrast compression technique (images may appear black to the naked eye but various image viewing applications can be used to adjust the contrast). Nearly 58,000 participants who completed at least one screen have a digitized chest x-ray image available. NCI currently limits image requests to a subset of 25,000 participants. A standardized selection of 25,000 participants, optimized to meet the requirements of most analyses, is available to streamline the selection process. The selection of participants can be customized when the standard selection is not adequate to meet the project requirements.

Standard 25k Selection

A standardized selection of 25,000 participants, optimized to meet the requirements of most analyses, is available to streamline the selection process. This Standard 25k Selection is available via download with an approved CDAS project with a fully executed Data Transfer Agreement (DTA) and includes:

  • All participants with a T0-T5 lung cancer (n=585) or T0-T5 non-target cancer (n=12)
  • All participants with any abnormal/suspicious screens (n=10,152)
  • Randomly selected participants with any abnormal/not suspicious screens (n=7,781)
  • Randomly selected participants with all negative screens (n=6,470)

Due to the size of the images for 25,000 participants (~89,000 images totaling 850GB ), x-ray images are organized into 12 batches (ranging in size from 30 – 83.5 GB) to facilitate download. By default, the batches are arranged in equally-sized training and testing sets comprised of 6 batches each. The training and testing sets have been created to have equal numbers of cases as well as controls.

For investigator convenience, the first batch of images from the training and test batches have been set aside as a screen-based case/control set (~6,000 images totaling ~60GB). The Case-Control Subset provides investigators with a powerful analytic set of images and data totaling ~60GB. This subset is designed to provide the most relevant lung cancer cases and matched controls at a more manageable size. The Case-Control Subset provides investigators with the x-ray for the year of diagnosis for all cancers diagnosed within one year of an x-ray image. The selection also includes 15 control images for every case image, including: a 5:1 ratio of screen images of abnormal/suspicious, abnormal/not suspicious, and negative screens, respectively. All control images are matched to cases based on study year and smoking status at the time of randomization. In detail, the Case-Control Subset includes:

  • Study image from the year of diagnosis of a screen-detected cancer or interval cancer within one year of screening (n=374)
  • Study image from five randomly selected screening years with an abnormal/suspicious screen result for every case (n=1870)
  • Study image from five randomly selected screening years with an abnormal/not suspicious screen result for every case (n=1870)
  • Study image from five randomly selected screening years with a negative screen result for every case (n=1870)
  • Note: For ~6% of study years with an image, participants have multiple chest x-ray images that will be included.

Sampling weights were calculated to show how the randomly selected individuals and images reflect on the greater PLCO screened population. The Standard 25k Selection includes person-based weights while each image in the Case-Control Subset is assigned a screen-based weight. For more information on the sampling weights, please see the user’s guide.

Custom Selections

Investigators may opt to select their own custom populations, limited to at most 25,000 participants. However, custom selections are only available via hard drive and are not able to be downloaded from the CDAS website. The requestor must bear the expense of the hard drive and all related shipping expenses, including prepaid return shipping. Signature requirement and acknowledgement of physical possession are required in a timely manner. Contact Us if you have any questions.

Digital Pathology Images

PLCO pathology images come from select prostate, lung, colorectal, colorectal adenoma, ovarian, female breast, male breast and bladder cancer patients. They are images of hematoxylin and eosin (H&E) and immunohistochemistry (IHC)-stained slides obtained as part of a pathology specimen collection to construct tissue microarrays (TMAs). The H&E slides (and corresponding images) each came from blocks of tissue resected during diagnosis and treatment of cancer and preserved by pathology labs. Slides were imaged at 40x magnification with standard brightfield settings utilizing the Aperio AT2 whole slide scanner (Leica) and saves in .svs format at the Cancer Genomics Research Laboratory (CGR) at the NCI.

Requests for images are handled at CDAS. If your CDAS request for pathology images is approved and data agreements are in place, you will be connected with staff at CGR who will arrange access to the images via a NIH-approved file share application. Linkage between the images and the PLCO data will be provided in the data delivery package for your project.

A summary of the image catalog now available is shown below.

PLCO Tissues Slide Image Catalog: 13,165 Images
Cancer Tissue type Stain Type Subject Count Image Count Folder Size Comment
Adenoma Whole Tissue H&E 768 1103 931 GB
Bladder Whole Tissue H&E 285 483 185 GB
Breast Whole Tissue H&E 1012 2909 4.74 TB
Breast in situ Whole Tissue IHC 52 434 152 GB Counts includes non-PLCO IHC control slides
Male Breast Whole Tissue H&E 18 56 95 GB
Colorectal Whole Tissue H&E 749 2777 4.32 TB
Lung Whole Tissue H&E 492 1521 1.49 TB
Ovarian Whole Tissue H&E 227 874 1.67 TB
Prostate Whole Tissue H&E 1095 2999 8.37 TB