Causal Discovery from Observational Data in the Presence of Latent Confounders and Other Data Complexities

Author(s)
Yang, Yuqin
Advisor(s)
Davenport, Mark
Kiyavash, Negar
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
Supplementary to:
Abstract
Causal discovery aims to recover causal relationships among variables of interest in the system. In the situations where interventions (controlled experiments) on system variables are not allowed, causal discovery from only observational data has been studied, which either utilizes the conditional independence relations among observed variables, or asserts additional semi-parametric assumptions on the underlying model. However, there are complexities in real-life data that make causal discovery even more challenging. Some of the main sources of data complexity include: (i) Latent confounding, where there may exist unobserved variables that affect more than one observed variables in the system; (ii) Deterministic relations, where one observed variable may be fully dependent on other observed variables in the system; (iii) Measurement error, where we may not observe a exact value of the variables, but rather a corrupted version of them; (iv) Data heterogeneity, where the data are collected from multiple domains and do not follow the same distribution. The majority of causal discovery methods assume that these complexities are absent in the system. Naturally, naive applications of these approaches to the settings that indeed are subject to data complexity issues lead to detecting spurious or erroneous causal links among variables of interest. The focus of the dissertation is on developing causal discovery methods that are capable of handling these data complexities. Specifically, -We study the problem of causal discovery in linear causal models with deterministic relations and latent confounding. We provide necessary and sufficient conditions for unique identifiability of the model under separability condition (i.e., the matrix indicating the independent exogenous noise terms pertaining to the observed variables is identifiable). -We study the problem of causal discovery in linear causal models in the presence of latent confounding and/or measurement error. We characterize the extent of identifiability of the model under separability condition together with two versions of faithfulness assumptions. We provide graphical characterization of the models that are observationally equivalent. -We study the problem of learning the unknown intervention targets in linear or nonlinear causal models from a collection of interventional data obtained from multiple environments. We propose LIT algorithm which allows latent confounders to be intervention targets. Our theoretical analysis shows that LIT algorithm gives a more accurate estimate of the intervention target set than previous works.
Sponsor
Date
2024-07-27
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI