Organizational Unit:

School of Computer Science

Permanent Link

https://hdl.handle.net/1853/70781

Parent Organization

Organizational Unit

College of Computing

ArchiveSpace Name Record

https://finding-aids.library.gatech.edu/agents/corporate_entities/945

Full item page

Publication Search Results

Now showing 1 - 10 of 11

Grasp selection strategies for robot manipulation using a superquadric-based object representation

(Georgia Institute of Technology, 2016-07-29) Huaman, Ana Consuelo

This thesis presents work on the implementation of a robotic system targeted to perform a set of basic manipulation tasks instructed by a human user. The core motivation on the development of this system was in enabling our robot to achieve these tasks reliably, in a time-efficient manner and under mildly realistic constraints. Robot manipulation as a field has grown exponentially during the last years, presenting us with a vast array of robots exhibiting skills as sophisticated as preparing dinner, making an expresso or operating a drill. These complex tasks are in general achieved by using equally complex frameworks, assuming extensive pre-existing knowledge, such as perfect environment knowledge, sizable amounts of training data or availability of crowdsourcing resources. In this work we postulate that elementary tasks, such as pick-up, pick-and-place and pouring, can be realized with online algorithms and a limited knowledge of the objects to be manipulated. The presented work shows a fully implemented pipeline where each module is designed to meet the core requirements specified above. We present a number of experiments involving a database of 10 household objects used in 3 selected elementary manipulation tasks. Our contributions are distributed in each module of our pipeline: (1) We demonstrate that superquadrics are useful primitive shapes suitable to represent on-the-fly a considerable number of convex household objects; their parametric nature (3 axis and 2 shape parameters) is shown to be helpful to represent simple semantic labels for objects (i.e. for a pouring task) useful for grasp and motion planning. (2) We introduce a hand-and-arm metric that considers both grasp robustness and arm end-comfort to select grasps for simple pick-up tasks. We show with real and simulation results that considering both hand and arm aspects of the manipulation task helps to select grasps that are easier to execute in real environments without sacrificing grasp stability on the process. (3) We present grasp selection and planning strategies that exploit task constraints to select the more appropriate grasp to carry out a manipulation task in an online and efficient manner (in terms of planning and execution time).
Planning in constraint space for multi-body manipulation tasks

(Georgia Institute of Technology, 2016-04-05) Erdogan, Can

Robots are inherently limited by physical constraints on their link lengths, motor torques, battery power and structural rigidity. To thrive in circumstances that push these limits, such as in search and rescue scenarios, intelligent agents can use the available objects in their environment as tools. Reasoning about arbitrary objects and how they can be placed together to create useful structures such as ramps, bridges or simple machines is critical to push beyond one's physical limitations. Unfortunately, the solution space is combinatorial in the number of available objects and the configuration space of the chosen objects and the robot that uses the structure is high dimensional. To address these challenges, we propose using constraint satisfaction as a means to test the feasibility of candidate structures and adopt search algorithms in the classical planning literature to find sufficient designs. The key idea is that the interactions between the components of a structure can be encoded as equality and inequality constraints on the configuration spaces of the respective objects. Furthermore, constraints that are induced by a broadly defined action, such as placing an object on another, can be grouped together using logical representations such as Planning Domain Definition Language (PDDL). Then, a classical planning search algorithm can reason about which set of constraints to impose on the available objects, iteratively creating a structure that satisfies the task goals and the robot constraints. To demonstrate the effectiveness of this framework, we present both simulation and real robot results with static structures such as ramps, bridges and stairs, and quasi-static structures such as lever-fulcrum simple machines.
Timing multimodal turn-taking in human-robot cooperative activity

(Georgia Institute of Technology, 2015-04-07) Chao, Crystal

Turn-taking is a fundamental process that governs social interaction. When humans interact, they naturally take initiative and relinquish control to each other using verbal and nonverbal behavior in a coordinated manner. In contrast, existing approaches for controlling a robot's social behavior do not explicitly model turn-taking, resulting in interaction breakdowns that confuse or frustrate the human and detract from the dyad's cooperative goals. They also lack generality, relying on scripted behavior control that must be designed for each new domain. This thesis seeks to enable robots to cooperate fluently with humans by automatically controlling the timing of multimodal turn-taking. Based on our empirical studies of interaction phenomena, we develop a computational turn-taking model that accounts for multimodal information flow and resource usage in interaction. This model is implemented within a novel behavior generation architecture called CADENCE, the Control Architecture for the Dynamics of Embodied Natural Coordination and Engagement, that controls a robot's speech, gesture, gaze, and manipulation. CADENCE controls turn-taking using a timed Petri net (TPN) representation that integrates resource exchange, interruptible modality execution, and modeling of the human user. We demonstrate progressive developments of CADENCE through multiple domains of autonomous interaction encompassing situated dialogue and collaborative manipulation. We also iteratively evaluate improvements in the system using quantitative metrics of task success, fluency, and balance of control.
The roles of allocentric representations in autonomous local navigation

(Georgia Institute of Technology, 2015-02-20) Ta Huynh, Duy Nguyen

In this thesis, I study the computational advantages of the allocentric represen- tation as compared to the egocentric representation for autonomous local navigation. Whereas in the allocentric framework, all variables of interest are represented with respect to a coordinate frame attached to an object in the scene, in the egocentric one, they are always represented with respect to the robot frame at each time step. In contrast with well-known results in the Simultaneous Localization and Mapping literature, I show that the amounts of nonlinearity of these two representations, where poses are elements of Lie-group manifolds, do not affect the accuracy of Gaussian- based filtering methods for perception at both the feature level and the object level. Furthermore, although these two representations are equivalent at the object level, the allocentric filtering framework is better than the egocentric one at the feature level due to its advantages in the marginalization process. Moreover, I show that the object- centric perspective, inspired by the allocentric representation, enables novel linear- time filtering algorithms, which significantly outperform state-of-the-art feature-based filtering methods with a small trade-off in accuracy due to a low-rank approximation. Finally, I show that the allocentric representation is also better than the egocentric representation in Model Predictive Control for local trajectory planning and obstacle avoidance tasks.
Deep Segments: Comparisons between Scenes and their Constituent Fragments using Deep Learning

(Georgia Institute of Technology, 2014-09) Doshi, Jigar ; Mason, Celeste ; Wagner, Alan ; Kira, Zsolt

We examine the problem of visual scene understanding and abstraction from first person video. This is an important problem and successful approaches would enable complex scene characterization tasks that go beyond classification, for example characterization of novel scenes in terms of previously encountered visual experiences. Our approach utilizes the final layer of a convolutional neural network as a high-level, scene specific, representation which is robust enough to noise to be used with wearable cameras. Researchers have demonstrated the use of convolutional neural networks for object recognition. Inspired by results from cognitive and neuroscience, we use output maps created by a convolutional neural network as a sparse, abstract representation of visual images. Our approach abstracts scenes into constituent segments that can be characterized by the spatial and temporal distribution of objects. We demonstrate the viability of the system on video taken from Google Glass. Experiments examining the ability of the system to determine scene similarity indicate ρ (384) = ±0:498 correlation to human evaluations and 90% accuracy on a category match problem. Finally, we demonstrate high-level scene prediction by showing that the system matches two scenes using only a few initial segments and predicts objects that will appear in subsequent segments.
Support-theoretic subgraph preconditioners for large-scale SLAM and structure from motion

(Georgia Institute of Technology, 2014-06-19) Jian, Yong-Dian

Simultaneous localization and mapping (SLAM) and Structure from Motion (SfM) are important problems in robotics and computer vision. One of the challenges is to solve a large-scale optimization problem associated with all of the robot poses, camera parameters, landmarks and measurements. Yet neither of the two reigning paradigms, direct and iterative methods, scales well to very large and complex problems. Recently, the subgraph-preconditioned conjugate gradient method has been proposed to combine the advantages of direct and iterative methods. However, how to find a good subgraph is still an open problem. The goal of this dissertation is to address the following two questions: (1) What are good subgraph preconditioners for SLAM and SfM? (2) How to find them? To this end, I introduce support theory and support graph theory to evaluate and design subgraph preconditioners for SLAM and SfM. More specifically, I make the following contributions: First, I develop graphical and probabilistic interpretations of support theory and used them to visualize the quality of subgraph preconditioners. Second, I derive a novel support-theoretic metric for the quality of spanning tree preconditioners and design an MCMC-based algorithm to find high-quality subgraph preconditioners. I further improve the efficiency of finding good subgraph preconditioners by using heuristics and domain knowledge available in the problems. Our results show that the support-theoretic subgraph preconditioners significantly improve the efficiency of solving large SLAM problems. Third, I propose a novel Hessian factor graph representation, and use it to develop a new class of preconditioners, generalized subgraph preconditioners, that combine the advantages of subgraph preconditioners and Hessian-based preconditioners. I apply them to solve large SfM problems and obtain promising results. Fourth, I develop the incremental subgraph-preconditioned conjugate gradient method for large-scale online SLAM problems. The main idea is to combine the advantages of two state-of-the-art methods, incremental smoothing and mapping, and the subgraph-preconditioned conjugate gradient method. I also show that the new method is efficient, optimal and consistent. To sum up, preconditioning can significantly improve the efficiency of solving large-scale SLAM and SfM problems. While existing preconditioning techniques do not utilize the problem structure and have no performance guarantee, I take the first step toward a more general setting and have promising results.
Constructing mobile manipulation behaviors using expert interfaces and autonomous robot learning

(Georgia Institute of Technology, 2013-11-19) Nguyen, Hai Dai

With current state-of-the-art approaches, development of a single mobile manipulation capability can be a labor-intensive process that presents an impediment to the creation of general purpose household robots. At the same time, we expect that involving a larger community of non-roboticists can accelerate the creation of new novel behaviors. We introduce the use of a software authoring environment called ROS Commander (ROSCo) allowing end-users to create, refine, and reuse robot behaviors with complexity similar to those currently created by roboticists. Akin to Photoshop, which provides end-users with interfaces for advanced computer vision algorithms, our environment provides interfaces to mobile manipulation algorithmic building blocks that can be combined and configured to suit the demands of new tasks and their variations. As our system can be more demanding of users than alternatives such as using kinesthetic guidance or learning from demonstration, we performed a user study with 11 able-bodied participants and one person with quadriplegia to determine whether computer literate non-roboticists will be able to learn to use our tool. In our study, all participants were able to successfully construct functional behaviors after being trained. Furthermore, participants were able to produce behaviors that demonstrated a variety of creative manipulation strategies, showing the power of enabling end-users to author robot behaviors. Additionally, we introduce how using autonomous robot learning, where the robot captures its own training data, can complement human authoring of behaviors by freeing users from the repetitive task of capturing data for learning. By taking advantage of the robot's embodiment, our method creates classifiers that predict using visual appearances 3D locations on home mechanisms where user constructed behaviors will succeed. With active learning, we show that such classifiers can be learned using a small number of examples. We also show that this learning system works with behaviors constructed by non-roboticists in our user study. As far as we know, this is the first instance of perception learning with behaviors not hand-crafted by roboticists.
Trust and reputation for formation and evolution of multi-robot teams

(Georgia Institute of Technology, 2013-11-15) Pippin, Charles Everett

Agents in most types of societies use information about potential partners to determine whether to form mutually beneficial partnerships. We can say that when this information is used to decide to form a partnership that one agent trusts another, and when agents work together for mutual benefit in a partnership, we refer to this as a form of cooperation. Current multi-robot teams typically have the team's goals either explicitly or implicitly encoded into each robot's utility function and are expected to cooperate and perform as designed. However, there are many situations in which robots may not be interested in full cooperation, or may not be capable of performing as expected. In addition, the control strategy for robots may be fixed with no mechanism for modifying the team structure if teammate performance deteriorates. This dissertation investigates the application of trust to multi-robot teams. This research also addresses the problem of how cooperation can be enabled through the use of incentive mechanisms. We posit a framework wherein robot teams may be formed dynamically, using models of trust. These models are used to improve performance on the team, through evolution of the team dynamics. In this context, robots learn online which of their peers are capable and trustworthy to dynamically adjust their teaming strategy. We apply this framework to multi-robot task allocation and patrolling domains and show that performance is improved when this approach is used on teams that may have poorly performing or untrustworthy members. The contributions of this dissertation include algorithms for applying performance characteristics of individual robots to task allocation, methods for monitoring performance of robot team members, and a framework for modeling trust of robot team members. This work also includes experimental results gathered using simulations and on a team of indoor mobile robots to show that the use of a trust model can improve performance on multi-robot teams in the patrolling task.
Learning descriptive models of objects and activities from egocentric video

(Georgia Institute of Technology, 2013-06-13) Fathi, Alireza

Recent advances in camera technology have made it possible to build a comfortable, wearable system which can capture the scene in front of the user throughout the day. Products based on this technology, such as GoPro and Google Glass, have generated substantial interest. In this thesis, I present my work on egocentric vision, which leverages wearable camera technology and provides a new line of attack on classical computer vision problems such as object categorization and activity recognition. The dominant paradigm for object and activity recognition over the last decade has been based on using the web. In this paradigm, in order to learn a model for an object category like coffee jar, various images of that object type are fetched from the web (e.g. through Google image search), features are extracted and then classifiers are learned. This paradigm has led to great advances in the field and has produced state-of-the-art results for object recognition. However, it has two main shortcomings: a) objects on the web appear in isolation and they miss the context of daily usage; and b) web data does not represent what we see every day. In this thesis, I demonstrate that egocentric vision can address these limitations as an alternative paradigm. I will demonstrate that contextual cues and the actions of a user can be exploited in an egocentric vision system to learn models of objects under very weak supervision. In addition, I will show that measurements of a subject's gaze during object manipulation tasks can provide novel feature representations to support activity recognition. Moving beyond surface-level categorization, I will showcase a method for automatically discovering object state changes during actions, and an approach to building descriptive models of social interactions between groups of individuals. These new capabilities for egocentric video analysis will enable new applications in life logging, elder care, human-robot interaction, developmental screening, augmented reality and social media.
Computational video: post-processing methods for stabilization, retargeting and segmentation

(Georgia Institute of Technology, 2013-04-05) Grundmann, Matthias

In this thesis, we address a variety of challenges for analysis and enhancement of Computational Video. We present novel post-processing methods to bridge the difference between professional and casually shot videos mostly seen on online sites. Our research presents solutions to three well-defined problems: (1) Video stabilization and rolling shutter removal in casually-shot, uncalibrated videos; (2) Content-aware video retargeting; and (3) spatio-temporal video segmentation to enable efficient video annotation. We showcase several real-world applications building on these techniques. We start by proposing a novel algorithm for video stabilization that generates stabilized videos by employing L1-optimal camera paths to remove undesirable motions. We compute camera paths that are optimally partitioned into constant, linear and parabolic segments mimicking the camera motions employed by professional cinematographers. To achieve this, we propose a linear programming framework to minimize the first, second, and third derivatives of the resulting camera path. Our method allows for video stabilization beyond conventional filtering, that only suppresses high frequency jitter. An additional challenge in videos shot from mobile phones are rolling shutter distortions. Modern CMOS cameras capture the frame one scanline at a time, which results in non-rigid image distortions such as shear and wobble. We propose a solution based on a novel mixture model of homographies parametrized by scanline blocks to correct these rolling shutter distortions. Our method does not rely on a-priori knowledge of the readout time nor requires prior camera calibration. Our novel video stabilization and calibration free rolling shutter removal have been deployed on YouTube where they have successfully stabilized millions of videos. We also discuss several extensions to the stabilization algorithm and present technical details behind the widely used YouTube Video Stabilizer. We address the challenge of changing the aspect ratio of videos, by proposing algorithms that retarget videos to fit the form factor of a given device without stretching or letter-boxing. Our approaches use all of the screen's pixels, while striving to deliver as much video-content of the original as possible. First, we introduce a new algorithm that uses discontinuous seam-carving in both space and time for resizing videos. Our algorithm relies on a novel appearance-based temporal coherence formulation that allows for frame-by-frame processing and results in temporally discontinuous seams, as opposed to geometrically smooth and continuous seams. Second, we present a technique, that builds on the above mentioned video stabilization approach. We effectively automate classical pan and scan techniques by smoothly guiding a virtual crop window via saliency constraints. Finally, we introduce an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by over-segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a "region graph" over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, and allows subsequent applications to choose from varying levels of granularity. We demonstrate the use of spatio-temporal segmentation as users interact with the video, enabling efficient annotation of objects within the video.