Organizational Unit:
Socially Intelligent Machines Lab

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)

Publication Search Results

Now showing 1 - 5 of 5
  • Item
    Policy Shaping: Integrating Human Feedback with Reinforcement Learning
    (Georgia Institute of Technology, 2013) Griffith, Shane ; Subramanian, Kaushik ; Scholz, Jonathan ; Isbell, Charles L. ; Thomaz, Andrea L.
    A long term goal of Interactive Reinforcement Learning is to incorporate non- expert human feedback to solve complex tasks. Some state-of -the-art methods have approached this problem by mapping human information to rewards and values and iterating over them to compute better control policies. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise to state-of-the-art approaches and show that it can outperform them and is robust to infrequent and inconsistent human feedback.
  • Item
    Object Focused Q-Learning for Autonomous Agents
    (Georgia Institute of Technology, 2013) Cobo, Luis C. ; Isbell, Charles L. ; Thomaz, Andrea L.
    We present Object Focused Q-learning (OF-Q), a novel reinforcement learning algorithm that can offer exponential speed-ups over classic Q-learning on domains composed of independent objects. An OF-Q agent treats the state space as a collection of objects organized into different object classes. Our key contribution is a control policy that uses non-optimal Q-functions to estimate the risk of ignoring parts of the state space. We compare our algorithm to traditional Q-learning and previous arbitration algorithms in two domains, including a version of Space Invaders.
  • Item
    Automatic Task Decomposition and State Abstraction from Demonstration
    (Georgia Institute of Technology, 2012-06) Cobo, Luis C. ; Isbell, Charles L. ; Thomaz, Andrea L.
    Both Learning from Demonstration (LfD) and Reinforcement Learning (RL) are popular approaches for building decision-making agents. LfD applies supervised learning to a set of human demonstrations to infer and imitate the human policy, while RL uses only a reward signal and exploration to find an optimal policy. For complex tasks both of these techniques may be ineffective. LfD may require many more demonstrations than it is feasible to obtain, and RL can take an inadmissible amount of time to converge. We present Automatic Decomposition and Abstraction from demonstration (ADA), an algorithm that uses mutual information measures over a set of human demonstrations to decompose a sequential decision process into several sub- tasks, finding state abstractions for each one of these sub- tasks. ADA then projects the human demonstrations into the abstracted state space to build a policy. This policy can later be improved using RL algorithms to surpass the performance of the human teacher. We find empirically that ADA can find satisficing policies for problems that are too complex to be solved with traditional LfD and RL algorithms. In particular, we show that we can use mutual information across state features to leverage human demonstrations to reduce the effects of the curse of dimensionality by finding subtasks and abstractions in sequential decision processes.
  • Item
    Combining function approximation, human teachers, and training regimens for real-world RL
    (Georgia Institute of Technology, 2010) Zang, Peng ; Irani, Arya ; Zhou, Peng ; Isbell, Charles L. ; Thomaz, Andrea L.
  • Item
    Batch versus Interactive Learning by Demonstration
    (Georgia Institute of Technology, 2010) Zang, Peng ; Tian, Runhe ; Thomaz, Andrea L. ; Isbell, Charles L.
    Agents that operate in human environments will need to be able to learn new skills from everyday people. Learning from demonstration (LfD) is a popular paradigm for this. Drawing from our interest in Socially Guided Machine Learning, we explore the impact of interactivity on learning from demonstration. We present findings from a study with human subjects showing people who are able to interact with the learning agent provide better demonstrations (in part) by adapting based on learner performance which results in improved learning performance. We also find that interactivity increases a sense of engagement and may encourage players to participate longer. Our exploration of interactivity sheds light on how best to obtain demonstrations for LfD applications.