Organizational Unit:

School of Interactive Computing

Permanent Link

https://hdl.handle.net/1853/70783

Parent Organization

Organizational Unit

College of Computing

ArchiveSpace Name Record

https://finding-aids.library.gatech.edu/agents/corporate_entities/1113

Full item page

Publication Search Results

Now showing 1 - 10 of 10

Policy-based exploration for efficient reinforcement learning

(Georgia Institute of Technology, 2020-04-25) Subramanian, Kaushik

Reinforcement Learning (RL) is the field of research focused on solving sequential decision-making tasks modeled as Markov Decision Processes. Researchers have shown RL to be successful at solving a variety of problems like system operations (logistics), robot tasks (soccer, helicopter control) and computer games (Go, backgammon); however, in general, standard RL approaches do not scale well with the size of the problem. The reason this problem arises is that RL approaches rely on obtaining samples useful for learning the underlying structure. In this work we tackle the problem of smart exploration in RL, autonomously and using human interaction. We propose policy-based methods that serve to effectively bias exploration towards important aspects of the domain. Reinforcement Learning agents use function approximation methods to generalize over large and complex domains. One of the most well-studied approaches is using linear regression algorithms to model the value function of the decision-making problem. We introduce a policy-based method that uses statistical criteria derived from linear regression analysis to bias the agent to explore samples useful for learning. We show how we can learn exploration policies autonomously and from human demonstrations (using concepts of active learning) to facilitate fast convergence to the optimal policy. We then tackle the problem of human-guided exploration in RL. We present a probabilistic method which combines human evaluations, instantiated as policy signals, with Bayesian RL. We show how this approach provides performance speedups while being robust to noisy, suboptimal human signals. We also present an approach that makes use of some of the inherent structure in the exploratory human demonstrations to assist Monte Carlo RL to overcome its limitations and efficiently solve large-scale problems. We implement our methods on popular arcade games and highlight the improvements achieved using our approach. We show how the work on using humans to help agents efficiently explore sequential decision-making tasks is an important and necessary step in applying Reinforcement Learning to complex problems.
Advanced machine learning approaches for characterization of transcriptional regulatory elements and genome-wide associations

(Georgia Institute of Technology, 2020-03-20) Hassanzadeh, Hamid Reza

The deep learning revolution has initiated a surge of remarkable achievements in diverse research areas where large volumes of data that underlie complex processes exist. Despite the successful application of deep models in solving certain problems in the Biomedical and Bioinformatics domains, the field has not brought any promise in solving many other challenging problems that deal with the genomic complexities. The goal of my Ph.D. research has been to develop advanced machine learning techniques to address two relevant challenging problems in the Bioinformatics domain, namely, the characterization of transcriptional regulatory elements and, modeling genome-wide associations and linkage disequilibrium using genomic and evolutionary annotation of variants. Genome codes for almost all biological phenomena that take place inside living cells. One such key interactions is the association between transcription factors and a number of degenerate binding sites on DNA which facilitate initiation of transcription of genes. While each protein can potentially bind to any site on the DNA, it is the strength of this binding that plays the key role in the initiation process. Predicting these binding sites as well as binding affinities, are two interesting and yet challenging problems that remain largely unsolved. Yet, we know that the cell machineries constantly identify such sites on DNA with near perfect accuracy. The last two decade witnessed production of multiple in-vivo and in-vitro high-throughput technologies for elucidating these interactions. Protein Binding Microarrays (PBM) have been one of the most effective in-vitro technologies developed so far. The result of PBM experiments, however, are not easily interpretable and require advanced downstream analysis tools to discover the patterns of bindings. In the first half of my thesis, I will develop a series of computational methods that can learn such patterns from data generated by this technology, using tools and techniques from the natural language and image processing domains. I will also show the superiority of my proposed pipelines in predicting binding patterns and affinity. The second part of my thesis is devoted to developing methods for modeling of genome-wide associations and the linkage disequilibrium. Both of these tasks pose similar challenges that restrict our ability in utilizing recent advances in deep learning research. Specifically, when dealing with GWA studies, we are often bound by high dimensionality of variants data, a significant degree of missing information (i.e. missing heritability), high complexity weak patterns to learn, and relatively small datasets. As a consequence, the state-of-the-art approaches for GWAS that are used in practice are different variations of linear models. In my thesis, I showed that part of the failure in learning higher-capacity models can be attributed to how we are training such models. Specifically, I showed that using Siamese networks and tools from graph theory we can achieve a performance higher or on par with the state-of-the-art Bayesian non-parametric approaches. Being successful in learning weak relationships using the proposed model, I then extended my approach to show that there is a relation between variants annotations and their underlying haplotype structure, which was not known before. Existence of such a relationship can increase the power of GWA models and if proved biologically will have important implications in population genetics.
Manipulating state space distributions for sample-efficient imitation-learning

(Georgia Institute of Technology, 2020-03-16) Schroecker, Yannick Karl Daniel

Imitation learning has emerged as one of the most effective approaches to train agents to act intelligently in unstructured and unknown domains. On its own or in combination with reinforcement learning, it enables agents to copy the expert's behavior and to solve complex, long-term decision making problems. However, to utilize demonstrations effectively and learn from a finite amount of data, the agent needs to develop an understanding of the environment. This thesis investigates estimators of the state-distribution gradient as a means to influence which states the agent will see and thereby guide it to imitate the expert's behavior. Furthermore, this thesis will show that approaches which reason over future states in this way are able to learn from sparse signals and thus provide a way to effectively program agents. Specifically, this dissertation aims to validate the following thesis statement: Exploiting inherent structure in Markov chain stationary distributions allows learning agents to reason about likely future observations, and enables robust and efficient imitation learning, providing an effective and interactive way to teach agents from minimal demonstrations.
Robust approaches and optimization for 3D data

(Georgia Institute of Technology, 2018-04-06) Sawhney, Rahul

We introduce a robust, purely geometric, representation framework for fundamental association and analysis problems involving multiple views and scenes. The framework utilizes surface patches / segments as the underlying data unit, and is capable of effectively harnessing macro scale 3D geometry in real world scenes. We demonstrate how this results in discriminative characterizations that are robust to high noise, local ambiguities, sharp viewpoint changes, occlusions, partially overlapping content and related challenges. We present a novel approach to find localized geometric associations between two vastly varying views of a scene, through semi-dense patch correspondences, and align them. We then present means to evaluate structural content similarity between two scenes, and to ascertain their potential association. We show how this can be utilized to obtain geometrically diverse data frame retrievals, and resultant rich, atemporal reconstructions. The presented solutions are applicable over both depth images and point cloud data. They are able to perform in settings that are significantly less restrictive than ones under which existing methods operate. In our experiments, the approaches outperformed pure 3D methods in literature. Under high variability, the approaches also compared well with solutions based on RGB and RGB-D. We then introduce a robust loss function that is generally applicable to estimation and learning problems. The loss, which is nonconvex as well as nonsmooth, is shown to have a desirable combination of theoretical properties well suited for estimation (or fitting) and outlier suppression (or rejection). In conjunction, we also present a methodology for effective optimization of a broad class of nonsmooth, nonconvex objectives --- some of which would prove problematic for popular methods in literature. Promising results were obtained from our empirical analysis on 3D data. Finally, we discuss a nonparametric approach for robust mode seeking. It is based on mean shift, but does not assume homoscedastic or isotropic bandwidths. It is useful for finding modes and clustering in irregular data spaces.
Physics-based reinforcement learning for autonomous manipulation

(Georgia Institute of Technology, 2015-08-21) Scholz, Jonathan

With recent research advances, the dream of bringing domestic robots into our everyday lives has become more plausible than ever. Domestic robotics has grown dramatically in the past decade, with applications ranging from house cleaning to food service to health care. To date, the majority of the planning and control machinery for these systems are carefully designed by human engineers. A large portion of this effort goes into selecting the appropriate models and control techniques for each application, and these skills take years to master. Relieving the burden on human experts is therefore a central challenge for bringing robot technology to the masses. This work addresses this challenge by introducing a physics engine as a model space for an autonomous robot, and defining procedures for enabling robots to decide when and how to learn these models. We also present an appropriate space of motor controllers for these models, and introduce ways to intelligently select when to use each controller based on the estimated model parameters. We integrate these components into a framework called Physics-Based Reinforcement Learning, which features a stochastic physics engine as the core model structure. Together these methods enable a robot to adapt to unfamiliar environments without human intervention. The central focus of this thesis is on fast online model learning for objects with under-specified dynamics. We develop our approach across a diverse range of domestic tasks, starting with a simple table-top manipulation task, followed by a mobile manipulation task involving a single utility cart, and finally an open-ended navigation task with multiple obstacles impeding robot progress. We also present simulation results illustrating the efficiency of our method compared to existing approaches in the learning literature.
Representing and reasoning about videogame mechanics for automated design support

(Georgia Institute of Technology, 2015-05-14) Nelson, Mark J.

Videogame designers hope to sculpt gameplay, but actually work in the concrete medium of computation. What they create is code, artwork, dialogue---everything that goes inside a videogame cartridge. In other materially constrained design domains, design-support tools help bridge this gap by automating portions of a design in some cases, and helping a designer understand the implications of their design decisions in others. I investigate AI-based videogame-design support, and do so from the perspective of putting knowledge-representation and reasoning (KRR) at the front. The KRR-centric approach starts by asking whether we can formalize an aspect of the game-design space in a way suitable for automated or semi-automated analysis, and if so, what can be done with the results. It begins with the question, "what could a computer possibly do here?", attempts to show that the computer actually can do so, and then looks at the implications of the computer doing so for design support. To organize the space of game-design knowledge, I factor the broad notion of game mechanics mechanics into four categories: abstract mechanics, concrete audiovisual representations, thematic mappings, and input mappings. Concretely, I investigate KRR-centric formalizations in three domains, which probe into different portions of the four quadrants of game-design knowledge: 1. using story graphs and story-quality functions for writing interactive stories, 2. automatic game design focused on the "aboutness" of games, which auto-reskins videogames by formalizing generalized spaces of thematic references, and 3. enhancing mechanics-oriented videogame prototypes by encoding the game mechanics in temporal logic, so that they can be both played and queried.
Utilizing negative policy information to accelerate reinforcement learning

(Georgia Institute of Technology, 2015-04-08) Irani, Arya John

A pilot study by Subramanian et al. on Markov decision problem task decomposition by humans revealed that participants break down tasks into both short-term subgoals with a defined end-condition (such as "go to food") and long-term considerations and invariants with no end-condition (such as "avoid predators"). In the context of Markov decision problems, behaviors having clear start and end conditions are well-modeled by an abstraction known as options, but no abstraction exists in the literature for continuous constraints imposed on the agent's behavior. We propose two representations to fill this gap: the state constraint (a set or predicate identifying states that the agent should avoid) and the state-action constraint (identifying state-action pairs that should not be taken). State-action constraints can be directly utilized by an agent, which must choose an action in each state, while state constraints require an approximation of the MDP’s state transition function to be used; however, it is important to support both representations, as certain constraints may be more easily expressed in terms of one as compared to the other, and users may conceive of rules in either form. Using domains inspired by classic video games, this dissertation demonstrates the thesis that explicitly modeling this negative policy information improves reinforcement learning performance by decreasing the amount of training needed to achieve a given level of performance. In particular, we will show that even the use of negative policy information captured from individuals with no background in artificial intelligence yields improved performance. We also demonstrate that the use of options and constraints together form a powerful combination: an option and constraint can be taken together to construct a constrained option, which terminates in any situation where the original option would violate a constraint. In this way, a naive option defined to perform well in a best-case scenario may still accelerate learning in domains where the best-case scenario is not guaranteed.
Scaling solutions to Markov Decision Problems

(Georgia Institute of Technology, 2011-11-14) Zang, Peng

The Markov Decision Problem (MDP) is a widely applied mathematical model useful for describing a wide array of real world decision problems ranging from navigation to scheduling to robotics. Existing methods for solving MDPs scale poorly when applied to large domains where there are many components and factors to consider. In this dissertation, I study the use of non-tabular representations and human input as scaling techniques. I will show that the joint approach has desirable optimality and convergence guarantees, and demonstrates several orders of magnitude speedup over conventional tabular methods. Empirical studies of speedup were performed using several domains including a clone of the classic video game, Super Mario Bros. In the course of this work, I will address several issues including: how approximate representations can be used without losing convergence and optimality properties, how human input can be solicited to maximize speedup and user engagement, and how that input should be used so as to insulate against possible errors.
Computational techniques for reasoning about and shaping player experiences in interactive narratives

(Georgia Institute of Technology, 2010-04-06) Roberts, David L.

Interactive narratives are marked by two characteristics: 1) a space of player interactions, some subset of which are specified as aesthetic goals for the system; and 2) the affordance for players to express self-agency and have meaningful interactions. As a result, players are (often unknowing) participants in the creation of the experience. They cannot be assumed to be cooperative, nor adversarial. Thus, we must provide paradigms to designers that enable them to work with players to co-create experiences without transferring the system's goals (specified by authors) to players and without systems having a model of players' behaviors. This dissertation formalizes compact representations and efficient algorithms that enable computer systems to represent, reason about, and shape player experiences in interactive narratives. Early work on interactive narratives relied heavily on "script-and-trigger" systems, requiring sizable engineering efforts from designers to provide concrete instructions for when and how systems can modify an environment to provide a narrative experience for players. While there have been advances in techniques for representing and reasoning about narratives at an abstract level that automate the trigger side of script-and-trigger systems, few techniques have reduced the need for scripting system adaptations or reconfigurations---one of the contributions of this dissertation. We first describe a decomposition of the design process for interactive narrative into three technical problems: goal selection, action/plan selection/generation, and action/plan refinement. This decomposition allows techniques to be developed for reasoning about the complete implementation of an interactive narrative. We then describe representational and algorithmic solutions to these problems: a Markov Decision Process-based formalism for goal selection, a schema-based planning architecture using theories of influence from social psychology for action/plan selection/generation, and a natural language-based template system for action/plan refinement. To evaluate these techniques, we conduct simulation experiments and human subjects experiments in an interactive story. Using these techniques realizes the following three goals: 1) efficient algorithmic support for authoring interactive narratives; 2) design a paradigm for AI systems to reason and act to shape player experiences based on author-specified aesthetic goals; and 3) accomplish (1) and (2) with players feeling more engaged and without perceiving a decrease in self-agency.
Estimating the discriminative power of time varying features for EEG BMI

(Georgia Institute of Technology, 2009-11-16) Mappus, Rudolph Louis, IV

In this work, we present a set of methods aimed at improving the discriminative power of time-varying features of signals that contain noise. These methods use properties of noise signals as well as information theoretic techniques to factor types of noise and support signal inference for electroencephalographic (EEG) based brain-machine interfaces (BMI). EEG data were collected over two studies aimed at addressing Psychophysiological issues involving symmetry and mental rotation processing. The Psychophysiological data gathered in the mental rotation study also tested the feasibility of using dissociations of mental rotation tasks correlated with rotation angle in a BMI. We show the feasibility of mental rotation for BMI by showing comparable bitrates and recognition accuracy to state-of-the-art BMIs. The conclusion is that by using the feature selection methods introduced in this work to dissociate mental rotation tasks, we produce bitrates and recognition rates comparable to current BMIs.