Person:
Isbell, Charles L.

Associated Organization(s)
Organizational Unit
ORCID
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 10 of 13
  • Item
    Let’s Talk about Bias and Diversity in Data, Software, and Institutions
    ( 2020-11-20) Deng, Tiffany ; Desai, Deven ; Gontijo Lopes, Raphael ; Isbell, Charles L.
    Bias and lack of diversity have long been deep-rooted problems across industries. We discuss how these issues impact data, software, and institutions, and how we can improve moving forward. The panel will feature thought leaders from Google, Georgia Tech, and Queer in AI, who will together answer questions like "What implications and problems exist or will exist if the tech workforce does not become more diverse?" and "How does anyone make sure they are not introducing their bias into a given system? What questions should we be asking or actions should we be taking to avoid this?"
  • Item
    The Realities of Resilience: An Authentic Leadership Discussion
    (Georgia Institute of Technology, 2019-11-21) Alvarez-Robinson, Sonia ; Durham, Lynn ; Isbell, Charles L. ; Stein, John M.
    In this rare and candid panel discussion, Georgia Tech leaders Lynn Durham, John Stein and Dr. Charles Isbell will share their insights on building personal resilience. Through their own stories of adversity, opportunity, tragedy and triumph, the panel will describe ways to use challenges as a catalyst for personal and professional growth.
  • Item
    Policy Shaping: Integrating Human Feedback with Reinforcement Learning
    (Georgia Institute of Technology, 2013) Griffith, Shane ; Subramanian, Kaushik ; Scholz, Jonathan ; Isbell, Charles L. ; Thomaz, Andrea L.
    A long term goal of Interactive Reinforcement Learning is to incorporate non- expert human feedback to solve complex tasks. Some state-of -the-art methods have approached this problem by mapping human information to rewards and values and iterating over them to compute better control policies. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise to state-of-the-art approaches and show that it can outperform them and is robust to infrequent and inconsistent human feedback.
  • Item
    Object Focused Q-Learning for Autonomous Agents
    (Georgia Institute of Technology, 2013) Cobo, Luis C. ; Isbell, Charles L. ; Thomaz, Andrea L.
    We present Object Focused Q-learning (OF-Q), a novel reinforcement learning algorithm that can offer exponential speed-ups over classic Q-learning on domains composed of independent objects. An OF-Q agent treats the state space as a collection of objects organized into different object classes. Our key contribution is a control policy that uses non-optimal Q-functions to estimate the risk of ignoring parts of the state space. We compare our algorithm to traditional Q-learning and previous arbitration algorithms in two domains, including a version of Space Invaders.
  • Item
    Automatic Task Decomposition and State Abstraction from Demonstration
    (Georgia Institute of Technology, 2012-06) Cobo, Luis C. ; Isbell, Charles L. ; Thomaz, Andrea L.
    Both Learning from Demonstration (LfD) and Reinforcement Learning (RL) are popular approaches for building decision-making agents. LfD applies supervised learning to a set of human demonstrations to infer and imitate the human policy, while RL uses only a reward signal and exploration to find an optimal policy. For complex tasks both of these techniques may be ineffective. LfD may require many more demonstrations than it is feasible to obtain, and RL can take an inadmissible amount of time to converge. We present Automatic Decomposition and Abstraction from demonstration (ADA), an algorithm that uses mutual information measures over a set of human demonstrations to decompose a sequential decision process into several sub- tasks, finding state abstractions for each one of these sub- tasks. ADA then projects the human demonstrations into the abstracted state space to build a policy. This policy can later be improved using RL algorithms to surpass the performance of the human teacher. We find empirically that ADA can find satisficing policies for problems that are too complex to be solved with traditional LfD and RL algorithms. In particular, we show that we can use mutual information across state features to leverage human demonstrations to reduce the effects of the curse of dimensionality by finding subtasks and abstractions in sequential decision processes.
  • Item
    CPATH EAE: Extending contextualized computing in multiple institutions using threads
    (Georgia Institute of Technology, 2012-06) Isbell, Charles L. ; Biggers, Maureen S.
  • Item
    Combining function approximation, human teachers, and training regimens for real-world RL
    (Georgia Institute of Technology, 2010) Zang, Peng ; Irani, Arya ; Zhou, Peng ; Isbell, Charles L. ; Thomaz, Andrea L.
  • Item
    Batch versus Interactive Learning by Demonstration
    (Georgia Institute of Technology, 2010) Zang, Peng ; Tian, Runhe ; Thomaz, Andrea L. ; Isbell, Charles L.
    Agents that operate in human environments will need to be able to learn new skills from everyday people. Learning from demonstration (LfD) is a popular paradigm for this. Drawing from our interest in Socially Guided Machine Learning, we explore the impact of interactivity on learning from demonstration. We present findings from a study with human subjects showing people who are able to interact with the learning agent provide better demonstrations (in part) by adapting based on learner performance which results in improved learning performance. We also find that interactivity increases a sense of engagement and may encourage players to participate longer. Our exploration of interactivity sheds light on how best to obtain demonstrations for LfD applications.
  • Item
    Autonomous Nondeterministic Tour Guides: Improving Quality of Experience with TTD-MDPs
    (Georgia Institute of Technology, 2007) Cantino, Andrew S. ; Roberts, David L. ; Isbell, Charles L.
    In this paper, we address the problem of building a system of autonomous tour guides for a complex environment, such as a museum with many visitors. Visitors may have varying preferences for types of art or may wish to visit different areas across multiple visits. Often, these goals conflict. For example, many visitors may wish to see the museum's most popular work, but that could cause congestion, ruining the experience. Thus, our task is to build a set of agents that can satisfy their visitors' goals while simultaneously providing quality experiences for all. We use targeted trajectory distribution MDPs (TTD-MDPs), a technology developed to guide players in an interactive entertainment setting. The solution to a TTD-MDP is a probabilistic policy that results in a specific distribution of trajectories through a state space. We motivate TTD-MDPs for the museum tour problem, then describe the development of a number of models of museum visitors. Additionally, we propose a museum model and simulate tours using personalized TTD-MDP tour guides for each kind of visitor. We explain how the use of probabilistic policies reduces the congestion experienced by visitors while preserving their ability to pursue and realize goals.
  • Item
    Horizon-based Value Iteration
    (Georgia Institute of Technology, 2007) Zang, Peng ; Irani, Arya ; Isbell, Charles L.
    We present a horizon-based value iteration algorithm called Reverse Value Iteration (RVI). Empirical results on a variety of domains, both synthetic and real, show RVI often yields speedups of several orders of magnitude. RVI does this by ordering backups by horizons, with preference given to closer horizons, thereby avoiding many unnecessary and incorrect backups. We also compare to related work, including prioritized and partitioned value iteration approaches, and show that our technique performs favorably. The techniques presented in RVI are complementary and can be used in conjunction with previous techniques. We prove that RVI converges and often has better (but never worse) complexity than standard value iteration. To the authors’ knowledge, this is the first comprehensive theoretical and empirical treatment of such an approach to value iteration.