Skip to Main Content

An official website of the United States government

Government Funding Lapse

Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit  cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Principal Investigator
Name
Mengjie Shen
Degrees
Master of Infomation Technology
Institution
The University of Auckland
Position Title
Graduate Teaching Asistance
Email
About this CDAS Project
Study
NLST (Learn more about this study)
Project ID
NLST-431
Initial CDAS Request Approval
Jul 31, 2018
Title
Computerized Early Diagnosis of Lung Cancer
Summary
Use the NLST Participant dataset to generate a model that combines screening, smoking, disease history, alcohol and family history. When an individual’s information is provided, provide a reliable diagnosis based on the proposed model.

This study will be conducted using 4 solutions:
1. IBM Software Analytic Solution (SPSS Modeller).
2. Microsoft Software Analytics Solution (Microsoft SQL Server, Azure Machine Learning, and Power BI).
3. Open Source Analytics Solution (Python, Jupyter, MySQL, MySQL Workbench, Kettle/Spoon, Tableau, and Weka).
4. Big Data Analytics Solution (AWS, Jupyter, PySpark, Spark)

The modelling methods used in this study will be four classification algorithms:
1. IF-THEN Rule
2. Decision tree
3. Bayesian classifiers
4. Neural networks
Aims

- Diagnose lung cancer in screening before severe symptoms appear.
- Potentially increase the ratio of early diagnosis of lung cancer patients.

Collaborators

Project supervisor: David Sundaram