Organizational Unit:

Institute for Robotics and Intelligent Machines (IRIM)

Permanent Link

https://hdl.handle.net/1853/70657

Includes Organization(s)

Organizational Unit

Biorobotics and Human Modeling Lab

Organizational Unit

Humanoid Robotics Laboratory

Full item page

Publication Search Results

Now showing 1 - 10 of 42

Deep Segments: Comparisons between Scenes and their Constituent Fragments using Deep Learning

(Georgia Institute of Technology, 2014-09) Doshi, Jigar ; Mason, Celeste ; Wagner, Alan ; Kira, Zsolt

We examine the problem of visual scene understanding and abstraction from first person video. This is an important problem and successful approaches would enable complex scene characterization tasks that go beyond classification, for example characterization of novel scenes in terms of previously encountered visual experiences. Our approach utilizes the final layer of a convolutional neural network as a high-level, scene specific, representation which is robust enough to noise to be used with wearable cameras. Researchers have demonstrated the use of convolutional neural networks for object recognition. Inspired by results from cognitive and neuroscience, we use output maps created by a convolutional neural network as a sparse, abstract representation of visual images. Our approach abstracts scenes into constituent segments that can be characterized by the spatial and temporal distribution of objects. We demonstrate the viability of the system on video taken from Google Glass. Experiments examining the ability of the system to determine scene similarity indicate ρ (384) = ±0:498 correlation to human evaluations and 90% accuracy on a category match problem. Finally, we demonstrate high-level scene prediction by showing that the system matches two scenes using only a few initial segments and predicts objects that will appear in subsequent segments.
Multimodal Real-Time Contingency Detection for HRI

(Georgia Institute of Technology, 2014-09) Chu, Vivian ; Bullard, Kalesha ; Thomaz, Andrea L.

Our goal is to develop robots that naturally engage people in social exchanges. In this paper, we focus on the problem of recognizing that a person is responsive to a robot’s request for interaction. Inspired by human cognition, our approach is to treat this as a contingency detection problem. We present a simple discriminative Support Vector Machine (SVM) classifier to compare against previous generative meth- ods introduced in prior work by Lee et al. [1]. We evaluate these methods in two ways. First, by training three separate SVMs with multi-modal sensory input on a set of batch data collected in a controlled setting, where we obtain an average F₁ score of 0.82. Second, in an open-ended experiment setting with seven participants, we show that our model is able to perform contingency detection in real-time and generalize to new people with a best F₁ score of 0.72.
Feasibility of Identifying Eating Moments from First-Person Images Leveraging Human Computation

(Georgia Institute of Technology, 2013-11) Thomaz, Edison ; Parnami, Aman ; Essa, Irfan ; Abowd, Gregory D.

There is widespread agreement in the medical research community that more effective mechanisms for dietary assessment and food journaling are needed to fight back against obesity and other nutrition-related diseases. However, it is presently not possible to automatically capture and objectively assess an individual’s eating behavior. Currently used dietary assessment and journaling approaches have several limitations; they pose a significant burden on individuals and are often not detailed or accurate enough. In this paper, we describe an approach where we leverage human computation to identify eating moments in first-person point-of-view images taken with wearable cameras. Recognizing eating moments is a key first step both in terms of automating dietary assessment and building systems that help individuals reflect on their diet. In a feasibility study with 5 participants over 3 days, where 17,575 images were collected in total, our method was able to recognize eating moments with 89.68% accuracy.
Learning Stable Pushing Locations

(Georgia Institute of Technology, 2013-08) Hermans, Tucker ; Li, Fuxin ; Rehg, James M. ; Bobick, Aaron F.

We present a method by which a robot learns to predict effective push-locations as a function of object shape. The robot performs push experiments at many contact locations on multiple objects and records local and global shape features at each point of contact. The robot observes the outcome trajectories of the manipulations and computes a novel push-stability score for each trial. The robot then learns a regression function in order to predict push effectiveness as a function of object shape. This mapping allows the robot to select effective push locations for subsequent objects whether they are previously manipulated instances, new instances from previously encountered object classes, or entirely novel objects. In the totally novel object case, the local shape property coupled with the overall distribution of the object allows for the discovery of effective push locations. These results are demonstrated on a mobile manipulator robot pushing a variety of household objects on a tabletop surface.
Generating Human-like Motion for Robots

(Georgia Institute of Technology, 2013-07) Gielniak, Michael J. ; Liu, C. Karen ; Thomaz, Andrea L.

Action prediction and fluidity are key elements of human-robot teamwork. If a robot’s actions are hard to understand, it can impede fluid HRI. Our goal is to improve the clarity of robot motion by making it more humanlike. We present an algorithm that autonomously synthesizes human-like variants of an input motion. Our approach is a three stage pipeline. First we optimize motion with respect to spatio-temporal correspondence (STC), which emulates the coordinated effects of human joints that are connected by muscles. We present three experiments that validate that our STC optimization approach increases human-likeness and recognition accuracy for human social partners. Next in the pipeline, we avoid repetitive motion by adding variance, through exploiting redundant and underutilized spaces of the input motion, which creates multiple motions from a single input. In two experiments we validate that our variance approach maintains the human-likeness from the previous step, and that a social partner can still accurately recognize the motion’s intent. As a final step, we maintain the robot’s ability to interact with it’s world by providing it the ability to satisfy constraints. We provide experimental analysis of the effects of constraints on the synthesized human-like robot motion variants.
Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition

(Georgia Institute of Technology, 2013-06) Bettadapura, Vinay ; Schindler, Grant ; Plötz, Thomas ; Essa, Irfan

We present data-driven techniques to augment Bag of Words (BoW) models, which allow for more robust modeling and recognition of complex long-term activities, especially when the structure and topology of the activities are not known a priori. Our approach specifically addresses the limitations of standard BoW approaches, which fail to represent the underlying temporal and causal information that is inherent in activity streams. In addition, we also propose the use of randomly sampled regular expressions to discover and encode patterns in activities. We demonstrate the effectiveness of our approach in experimental evaluations where we successfully recognize activities and detect anomalies in four complex datasets.
DDF-SAM 2.0: Consistent Distributed Smoothing and Mapping

(Georgia Institute of Technology, 2013-05) Cunningham, Alexander ; Indelman, Vadim ; Dellaert, Frank

This paper presents an consistent decentralized data fusion approach for robust multi-robot SLAM in dan- gerous, unknown environments. The DDF-SAM 2.0 approach extends our previous work by combining local and neigh- borhood information in a single, consistent augmented local map, without the overly conservative approach to avoiding information double-counting in the previous DDF-SAM algo- rithm. We introduce the anti-factor as a means to subtract information in graphical SLAM systems, and illustrate its use to both replace information in an incremental solver and to cancel out neighborhood information from shared summarized maps. This paper presents and compares three summarization techniques, with two exact approaches and an approximation. We evaluated the proposed system in a synthetic example and show the augmented local system and the associated summarization technique do not double-count information, while keeping performance tractable.
A Visualization Framework for Team Sports Captured using Multiple Static Cameras

(Georgia Institute of Technology, 2013) Hamid, Raffay ; Kumar, Ramkrishan ; Hodgins, Jessica K. ; Essa, Irfan

We present a novel approach for robust localization of multiple people observed using a set of static cameras. We use this location information to generate a visualization of the virtual offside line in soccer games. To compute the position of the offside line, we need to localize players' positions, and identify their team roles. We solve the problem of fusing corresponding players' positional information by finding minimum weight K-length cycles in a complete K-partite graph. Each partite of the graph corresponds to one of the K cameras, whereas each node of a partite encodes the position and appearance of a player observed from a particular camera. To find the minimum weight cycles in this graph, we use a dynamic programming based approach that varies over a continuum from maximally to minimally greedy in terms of the number of graph-paths explored at each iteration. We present proofs for the efficiency and performance bounds of our algorithms. Finally, we demonstrate the robustness of our framework by testing it on 82,000 frames of soccer footage captured over eight different illumination conditions, play types, and team attire. Our framework runs in near-real time, and processes video from 3 full HD cameras in about 0.4 seconds for each set of corresponding 3 frames.
Object Focused Q-Learning for Autonomous Agents

(Georgia Institute of Technology, 2013) Cobo, Luis C. ; Isbell, Charles L. ; Thomaz, Andrea L.

We present Object Focused Q-learning (OF-Q), a novel reinforcement learning algorithm that can offer exponential speed-ups over classic Q-learning on domains composed of independent objects. An OF-Q agent treats the state space as a collection of objects organized into different object classes. Our key contribution is a control policy that uses non-optimal Q-functions to estimate the risk of ignoring parts of the state space. We compare our algorithm to traditional Q-learning and previous arbitration algorithms in two domains, including a version of Space Invaders.
Linguistic Transfer of Human Assembly Tasks to Robots

(Georgia Institute of Technology, 2012-10) Dantam, Neil ; Essa, Irfan ; Stilman, Mike

We demonstrate the automatic transfer of an assembly task from human to robot. This work extends efforts showing the utility of linguistic models in verifiable robot control policies by now performing real visual analysis of human demonstrations to automatically extract a policy for the task. This method tokenizes each human demonstration into a sequence of object connection symbols, then transforms the set of sequences from all demonstrations into an automaton, which represents the task-language for assembling a desired object. Finally, we combine this assembly automaton with a kinematic model of a robot arm to reproduce the demonstrated task.