PLCO-1338: Investigation into the Effect of Dimensionality Reduction Techniques in Machine … - Approved Projects

Studies on CDAS

Additional Studies...

More Information

Principal Investigator

Name

Oluwademilade Adeyemo

Degrees

M.Sc

Institution

University of Hull

Position Title

Student

alexadeyemo@gmail.com

About this CDAS Project

Study

PLCO (Learn more about this study)

Project ID

PLCO-1338

Initial CDAS Request Approval

Sep 28, 2023

Title

Investigation into the Effect of Dimensionality Reduction Techniques in Machine Learning Algorithms

Summary

Machine learning's efficacy heavily relies on the quality and quantity of features within a dataset. However, as feature counts grow, so does the computational burden and the risk of overfitting. Dimensionality reduction (DR) emerges as a pivotal tool to address these challenges. This research aims to delve into the influence of various DR methods on machine learning model performance.

Research is commenced by examining existing literature, unearthing insights from seminal works. This foundational knowledge will serve as a benchmark for evaluating our findings. I then apply DR techniques to the dataset subsequently training classification models, including Decision Trees, Support Vector Machines, and Naive Bayes.

The key objectives revolve around understanding how DR impacts model performance metrics such as accuracy, generalization, and computational efficiency. Does it genuinely enhance these crucial facets? The research aims to ascertain whether DR strikes a balance between improved efficiency and the potential loss or distortion of data.

Autoencoders, a novel addition to conventional DR methods like PCA and t-SNE, will play a prominent role in our investigation. These neural network designs exhibit remarkable potential for feature extraction and data reconstruction. Autoencoders encapsulate valuable latent features while reducing dimensionality by encoding complex patterns into a compressed representation, which we believe can enhance data quality.

The anticipated outcomes are multifaceted. It is anticipated that the utilization of autoencoders as a DR technique will improve data quality, harnessing their innate ability to decipher intricate patterns. Consequently, we expect enhanced model performance in terms of computational efficiency, accuracy, and generalization.

Moreover, a comparative analysis between autoencoders and conventional methods will illuminate the relative efficacy of these approaches. If autoencoders prove superior, their practical applications could proliferate, catalyzing the adoption of artificial intelligence and machine learning across diverse industries.

Intriguingly, the research promises a unique perspective on the impact of dimensionality-reduced data on subsequent machine learning tasks, particularly model training. It is hypothesized that such data can mitigate overfitting tendencies and potentially elevate accuracy.

In summary, the project embarks on a journey to unravel the symbiotic relationship between dimensionality reduction and machine learning model performance. With a combination of advanced techniques, it is aimed to harness the power of autoencoders, unlock new insights, and contribute to the ongoing evolution of machine learning methodologies, facilitating their adoption across various sectors. Ultimately, the research seeks to advance the boundaries of data-driven decision-making by optimizing the data preprocessing stage, paving the way for more efficient and accurate machine learning models.

Aims

-Literature Review:
Conduct an extensive review of existing literature on dimensionality reduction techniques and their impact on machine learning model performance.
Identify key research papers and methodologies that will serve as benchmarks for our investigation.

-Dimensionality Reduction Techniques:
Implement a variety of dimensionality reduction methods, including Principal Component Analysis (PCA), t-distributed Stochastic Neighbour Embedding (t-SNE), and autoencoders.
Apply these techniques to the datasets to reduce feature dimensions while preserving critical information.

-Model Training and Evaluation:
Train machine learning classification models, such as Decision Trees, Support Vector Machines, and Naive Bayes, using both the original and dimensionality-reduced datasets.
Evaluate model performance metrics, including accuracy, precision, recall, F1-score, and computational efficiency.

-Comparison of Dimensionality Reduction Methods:
Conduct a comparative analysis of the performance of autoencoders against traditional techniques like PCA and t-SNE.
Assess which method yields superior results in terms of model accuracy, generalization, and computational efficiency.

-Overfitting Mitigation and Accuracy Enhancement:
Investigate the effects of dimensionality reduction on overfitting tendencies in machine learning models.
Analyze whether dimensionality-reduced data contributes to improved model accuracy.

-Trade-off Analysis:
Examine the trade-offs associated with dimensionality reduction, including potential data loss and the introduction of distortion.
Evaluate the balance between data reduction and maintaining the integrity of essential information.

-Anticipated Outcomes and Conclusions:
Summarize the study's findings and their implications for machine learning practitioners.

-Provide a definitive answer to the research question: Does dimensionality reduction influence machine learning model performance in terms of computing efficiency, accuracy, and generalization ability?

-Contributions and Future Directions:
Highlight the unique contributions of the research, particularly regarding the utilization of autoencoders as a dimensionality reduction methodology.
Suggest potential future research directions based on the outcomes of this study.

-Practical Applications and Industry Impact:
Discuss the practical applications of our research findings in real-world scenarios, emphasizing the potential for improving machine learning model performance across various industries.
Promote the adoption of artificial intelligence and machine learning by showcasing the benefits of dimensionality reduction.

Collaborators

Oluwademilade Adeyemo- Lead investigator
Dr Lawrence Bilton - Project Supervisor