Beyond Correlation: the Search for Causal Relationships Between Flow Percentiles and Watershed Variables

Thumbnail Image
Ssegane, Herbert
Tollner, E. W.
Mohamoud, Yusuf
Rasmussen, Todd C.
Dowd, John F.
Carroll, G. Denise
Associated Organization(s)
Supplementary to
The study explored use of causal feature selection algorithms to select dominant watershed variables that drive high, medium, and low flows. A two step approach was implemented. The first step minimized variable redundancy by examining variable relevance, variable redundancy, and conditional relevance of variable pairs whose correlation was greater than 0.9. The second step used six algorithms that seek to reconstruct a Bayesian network structure around a target variable for each flow percentile. Nineteen (19) flow percentiles were used to characterize high, medium, and low flow conditions of 26 Piedmont watersheds in the Mid-Atlantic. The algorithms included: (1) Grow-Shrink (GS); (2) interleaved-Incremental Association Markov Boundary (interIAMB) (3) Incremental Association Markov Boundary with Peter-Clark (IAMBnPC); (4) Local Causal Discovery (LCD2); (5) HITON-PC; and (6) HITON-MB. A new method was developed to quantify the reliability of each algorithm and its performance was compared to existing reliability methods. The effect of the initial number of variables on the final variable set selected by each algorithm was tested. Fusion of the algorithms was used to determine the overall dominant features for each flow percentile.
Sponsored by: Georgia Environmental Protection Division U.S. Geological Survey, Georgia Water Science Center U.S. Department of Agriculture, Natural Resources Conservation Service Georgia Institute of Technology, Georgia Water Resources Institute The University of Georgia, Water Resources Faculty
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI