Utilizing Machine Learning for Causal Inference using Observational Data.
The identification of the cause and effect relationships among the variables of a system from the corresponding data is called Causal Discovery (CD). Causal discovery or causal structures indicates which variables have a direct effect on others variables. But, it doesn’t specify how strong this effect and whether it is positive or negative, then causal inference are used to estimating the causal effect. So, the causal inference estimating the treatment effect of some intervention on outcome target variable. Estimating the causal effect and discovering causal relations among a set of variables is fundamental and challenging task in scientific research. Although, interventions or randomized trials can be used for inferring causal effect and relations, in many cases they can be unethical, expensive and even impossible. Hence, it is more desired to perform causal effect estimation and causal discovery from passively observed data. There are many practical applications in a range of fields, including survival analysis, public health, epidemiology, medicine, education, artificial intelligence, econometrics and marketing etc. In this research study, we will investigate some already existing methods for causal inference and discovery, also modify new treatment effect estimators based on advanced statistical and machine learning techniques.
Research Objectives
1. The primary objective of causal inference and causal discovery is to identify causal relationships between variables. This involves determining whether changes in one variable cause changes in another variable. Research may focus on developing methods and algorithms to accurately identify causal relationships from observational and experimental data.
2. Once causal relationships are identified, researchers often aim to estimate the magnitude and direction of causal effects. This involves quantifying the impact of one variable on another, accounting for confounding factors and other sources of bias.
3. Many real-world systems involve multiple interconnected variables and complex dependencies. Research objectives may include developing methods to infer causal relationships in such complex systems, including network-based approaches, time-series analysis.
4. With the abundance of data available in various fields, researchers aim to develop techniques for causal inference in big data settings. The objective is to handle high-dimensional data and extract meaningful causal relationships, accounting for potential challenges such as spurious correlations and overfitting.
5. Research objectives may also revolve around applying causal inference and causal discovery techniques to specific domains or problems, such as survival analysis, healthcare, economics, epidemiology, social sciences, or policy evaluation. The aim is to provide actionable insights and improve decision-making in these domains.
6. One of the main objective of this research study is to propose the machine learning and reinforcement learning based methods for causal inference and causal discovery.
Proposed new methodologies or improve existing ones for estimating causal inference and causal discovery. We will consider some approaches such as propensity score matching, instrumental variables, graphical models, Bayesian network, machine learning and causal discovery algorithms.
Causal inference and causal discovery methods are used to estimate and discover causal effects in observational and experimental data. Here are some traditional causal inference and causal discovery methods in the field: i.e.
1. Randomized controlled Trials (RCTs)
2. Difference-in-Differences (DiD)
3. Instrumental Variables (IV)
4. Propensity Score Matching (PSM) and IPW
5. Structural Equation Modelling (SEM)
6. Doubly Robust Estimation methods
7. Graphical model estimation methods
8. Bayesian Networks
9. Granger causality
10. Regression Discontinuity Design (RDD)
11. Generalized Random Forest (GRF)
12. Neural Networks
13. Support Vector Machine (SVM) and K-Nearest Neighbours (KNN) can be used to estimate propensity scores, which are the probabilities of treatment assignment given observed covariates. The estimated propensity scores can then be used for matching or weighting methods to estimate the causal effect.
We will modify and improve some of the above methods, then compare and analyse the strengths, weaknesses, and applicability of different methods for estimating causal inference and causal discovery. Then we evaluate the performance and robustness of the proposed methodologies through extensive simulations using simulated and real-world observed and unobserved data.
Dr. Wang Honq
Professor at Central South University, China