Developing trust and managing uncertainty in partially observable sequential decision-making environments

Thumbnail Image
Bishop, Robert Reid
White, Chelsea C., III
Associated Organization(s)
Supplementary to
This dissertation consists of three distinct, although conceptually related, papers that are unified in their focus on data-driven, stochastic sequential decision-making environments, but differentiated in their respective applications. In Chapter 2, we discuss a special class of partially observable Markov decision processes (POMDPs) in which the sources of uncertainty can be naturally separated into a hierarchy of effects — controllable, completely observable effects and exogenous, partially observable effects. For this class of POMDPs, we provide conditions under which value and policy function structural properties are inherited from an analogous class of MDPs, and discuss specialized solution procedures. In Chapter 3, we discuss an inventory control problem in which actions are time-lagged, and there are three explicit sources of demand uncertainty — the state of the macroeconomy, product-specific demand variability, and information quality. We prove that a base stock policy — defined with respect to pipeline inventory and a Bayesian belief distribution over states of the macroeconomy — is optimal, and demonstrate how to compute these base stock levels efficiently using support vector machines and Monte Carlo simulation. Further, we show how to use these results to determine how best to strategically allocate capital toward a better information infrastructure or a more agile supply chain. Finally, in Chapter 4, we consider how to generate trust in so-called development processes, such as supply chains, certain artificial intelligence systems, and maintenance processes, in which there can be adversarial manipulation and we must hedge against the risk of misapprehension of attacker objectives and resources. We show how to model dynamic agent interaction using a partially-observable Markov game (POMG) framework, and present a heuristic solution procedure, based on self-training concepts, for determining a robust defender policy.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI