Title:
Policy Shaping: Integrating Human Feedback with Reinforcement Learning
Policy Shaping: Integrating Human Feedback with Reinforcement Learning
Files
Author(s)
Griffith, Shane
Subramanian, Kaushik
Scholz, Jonathan
Isbell, Charles L.
Thomaz, Andrea L.
Subramanian, Kaushik
Scholz, Jonathan
Isbell, Charles L.
Thomaz, Andrea L.
Advisor(s)
Editor(s)
Collections
Supplementary to
Permanent Link
Abstract
A long term goal of Interactive Reinforcement Learning is to
incorporate non-
expert human feedback to solve complex tasks. Some state-of
-the-art methods
have approached this problem by mapping human information to rewards and values and iterating over them to compute better control policies. In this paper we
argue for an alternate, more effective characterization of
human feedback: Policy
Shaping. We introduce
Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise
to state-of-the-art approaches and show that it can outperform
them and is robust to infrequent and inconsistent human feedback.
Sponsor
Date Issued
2013
Extent
Resource Type
Text
Resource Subtype
Proceedings