Skip to Main Content

COVID-19 is an emerging, rapidly evolving situation.

What people with cancer should know:

Get the latest public health information from CDC:

Get the latest research information from NIH:

Principal Investigator
Aakanksha Sanctis
B.E Computer Science,MSc Economics
Maastricht University,Intitute of Data Science
Position Title
Research Intern
About this CDAS Project
NLST (Learn more about this study)
Project ID
Initial CDAS Request Approval
Oct 24, 2019
Application and evaluation of Deep Learning to Predict tumors in medical images annotated using crowdsourcing
Lung cancer is one of the most deadly cancers in the world, claiming over 2.5 million lives every year. If diagnosed early, it could be potentially treated through procedures like chemotherapy, immunotherapy and other systemic anti-cancer therapies. However successful diagnosis requires an expert in this field to identify a tumor on the scan. This cannot scale and the ever-increasing amounts of clinical image data analysis can become expensive and time consuming. Therefore we require a scalable, affordable and quicker means of annotating large amount so tumor images. With the emergence of deep learning architectures, many computer assisted diagnosis (CAD) systems have been developed to automatically detect and identify tumors in CT scans. The main problem with Computer Aided Diagnostic (CAD) systems in the domain of tumor detection is that the size of datasets for modelling these systems are quite small. This may result in overfitting of our deep learning architectures and result in lesser accuracy than what could be achieved with a bigger dataset. Tumors are highly variable in density, shape, size and other features. Using larger dataset for training may help us to include more variety of tumors allowing our CAD system to be more inclusive. But curating a medical image dataset is quite expensive and time consuming as it requires a team of experts to annotate the dataset. For this purpose, we are proposing a crowdsourcing approach by non-experts to annotate the dataset for our research.

The pipeline that we are proposing can mainly be classified into two stages:
• First we implement an effective method of crowdsourcing our CT scan images (2D slices of CT scans) by non-experts on a crowdsourcing platform like Amazon Mechanical Turk. Appropriate feedback mechanisms and trust measures will be applied to ensure that the crowdsourced annotations are reliable and not erroneous.

• The main research outcome will be evaluation of the feasibility of using deep learning architectures and combining humans and machine for lung cancer classification tasks. We also expect to find optimal inflection points where machine output will outperform human output based on the specific lung cancer images. This will shed light on which subset of annotations is easier for the humans and which one is easier for the machine
• Another output will be an annotated corpus of the curated lung cancer images.
• This project will reveal strengths and weaknesses of such models that may open new areas of research for involving humans-in-the-loop in annotation tasks (e.g. text, images, videos). This research will be reported in the form of a scientific article for submission to a journal or conference such as Human Computation and Crowdsourcing conference or Human Computation journal.
• The end product will be a reusable, generalized software suite, which enables users to use active learning in their use cases


Dr. Amrapali Zaveri, PostDoc, Maastricht University,Institute of Data Science,Netherlands
Dr. Deniz Iren,PostDoc, Open Universiteit,Netherlands