Context Aware Policy Selection

Thumbnail Image
Liu, Anthony J.
Boots, Byron
Associated Organization(s)
Organizational Unit
Organizational Unit
Supplementary to
In of optimal control and reinforcement learning, the difference in the performance of a state-of-the-art policy and a mediocre one is minuscule in comparison to their difference in amortized computational cost. Further, in certain situations the mediocre policy will be able to perform as well as the state-of-the-art policy, and it will all the while have used significantly less computational resources to do so. This phenomenon is a consequence of the necessity for additional compute to solve difficult scenarios; however, while sparsely occurring, these scenarios can be catastrophic for most planning tasks. In this work, we focus on addressing this imbalance between performance and computational cost in the context of planning. We combine ideas that have been prevalent in other machine learning problems and in Hierarchical Reinforcement Learning, and propose a Context-Aware Adaptive Policy Selector (CAAPS). We utilize our selector to create a meta-policy which can minimize these catastrophic states (thus maximizing the policy's ultimate performance), while also minimizing the computational cost necessary to run the policy. Our meta-policy accomplishes this by adaptively selecting from a set of pre-trained candidate policies which vary in performance and complexity, and we show that in certain environments we are able to plan trajectories at near-optimal performance while minimizing the amortized computational cost.
Date Issued
Resource Type
Resource Subtype
Undergraduate Thesis
Rights Statement
Rights URI