Risk analysis of prostate cancer occurrence based on PLCO data
Additionally since prostate cancer seems to be less frequent in South-East Asia (China especially) than in Western countries and the main obvious difference between both areas are dietary habits, I would like to test if I can identify any significant correlation in dietary and diet history data sets compared to the main prostate cancer data set.
Also using ancillary data I would like to verify existence of any potential trends that might occur since patients had started their participation in the study. As PSA level measurement was also collected during the study and it is a commonly used indicator in prostate cancer diagnostics, I would like to include this variable as precisely as I can in a potential model that will arise in my project. Controversy that is associated with PSA level related diagnostic seems to be even more interesting when comparing to any other correlation that can be identified based on the prostate cancer dataset.
Data mining technics will be applied in order to identify any potential trends and dependencies that might exists in prostate cancer datasets. An attempt will also be made to create a model that explains observations included in datasets (decision trees, neural networks). Results of my project will be summarized in a thesis concluding my post graduate study in statistical analysis and data mining at Warsaw School of Economics, Poland.
individual work