Organizational Unit:
Institute for Robotics and Intelligent Machines (IRIM)

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Includes Organization(s)
Organizational Unit
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 4 of 4
  • Item
    Representing and learning affordance-based behaviors
    (Georgia Institute of Technology, 2014-03-27) Hermans, Tucker Ryer
    Autonomous robots deployed in complex, natural human environments such as homes and offices need to manipulate numerous objects throughout their deployment. For an autonomous robot to operate effectively in such a setting and not require excessive training from a human operator, it should be capable of discovering how to reliably manipulate novel objects it encounters. We characterize the possible methods by which a robot can act on an object using the concept of affordances. We define affordance-based behaviors as object manipulation strategies available to a robot, which correspond to specific semantic actions over which a task-level planner or end user of the robot can operate. This thesis concerns itself with developing the representation of these affordance- based behaviors along with associated learning algorithms. We identify three specific learning problems. The first asks which affordance-based behaviors a robot can successfully apply to a given object, including ones seen for the first time. Second, we examine how a robot can learn to best apply a specific behavior as a function of an object’s shape. Third, we investigate how learned affordance knowledge can be transferred between different objects and different behaviors. We claim that decomposing affordance-based behaviors into three separate factors— a control policy, a perceptual proxy, and a behavior primitive—aids an autonomous robot in learning to manipulate. Having a varied set of affordance-based behaviors available allows a robot to learn which behaviors perform most effectively as a function of an object’s identity or pose in the workspace. For a specific behavior a robot can use interactions with previously encountered objects to learn to robustly manipulate a novel object when first encountered. Finally, our factored representation allows a robot to transfer knowledge learned with one behavior to effectively manipulate an object in a qualitatively different manner by using a distinct controller or behavior primitive. We evaluate all work on a bimanual, mobile-manipulator robot. In all experiments the robot interacts with real-world objects sensed by an RGB-D camera.
  • Item
    Learning descriptive models of objects and activities from egocentric video
    (Georgia Institute of Technology, 2013-06-13) Fathi, Alireza
    Recent advances in camera technology have made it possible to build a comfortable, wearable system which can capture the scene in front of the user throughout the day. Products based on this technology, such as GoPro and Google Glass, have generated substantial interest. In this thesis, I present my work on egocentric vision, which leverages wearable camera technology and provides a new line of attack on classical computer vision problems such as object categorization and activity recognition. The dominant paradigm for object and activity recognition over the last decade has been based on using the web. In this paradigm, in order to learn a model for an object category like coffee jar, various images of that object type are fetched from the web (e.g. through Google image search), features are extracted and then classifiers are learned. This paradigm has led to great advances in the field and has produced state-of-the-art results for object recognition. However, it has two main shortcomings: a) objects on the web appear in isolation and they miss the context of daily usage; and b) web data does not represent what we see every day. In this thesis, I demonstrate that egocentric vision can address these limitations as an alternative paradigm. I will demonstrate that contextual cues and the actions of a user can be exploited in an egocentric vision system to learn models of objects under very weak supervision. In addition, I will show that measurements of a subject's gaze during object manipulation tasks can provide novel feature representations to support activity recognition. Moving beyond surface-level categorization, I will showcase a method for automatically discovering object state changes during actions, and an approach to building descriptive models of social interactions between groups of individuals. These new capabilities for egocentric video analysis will enable new applications in life logging, elder care, human-robot interaction, developmental screening, augmented reality and social media.
  • Item
    Visual place categorization
    (Georgia Institute of Technology, 2009-07-06) Wu, Jianxin
    Knowing the semantic category of a robot's current position not only facilitates the robot's navigation, but also greatly improves its ability to serve human needs and to interpret the scene. Visual Place Categorization (VPC) is addressed in this dissertation, which refers to the problem of predicting the semantic category of a place using visual information collected from an autonomous robot platform. Census Transform (CT) histogram and Histogram Intersection Kernel (HIK) based visual codebooks are proposed to represent an image. CT histogram encodes the stable spatial structure of an image that reflects the functionality of a location. It is suitable for categorizing places and has shown better performance than commonly used descriptors such as SIFT or Gist in the VPC task. HIK has been shown to work better than the Euclidean distance in classifying histograms. We extend it in an unsupervised manner to generate visual codebooks for the CT histogram descriptor. HIK codebooks help CT histogram to deal with the huge variations in VPC and improve system accuracy. A computational method is also proposed to generate HIK codebooks in an efficient way. The first significant VPC dataset in home environments is collected and is made publicly available, which is also used to evaluate the VPC system based on the proposed techniques. The VPC system achieves promising results for this challenging problem, especially for important categories such as bedroom, bathroom, and kitchen. The proposed techniques achieved higher accuracies than competing descriptors and visual codebook generation methods.
  • Item
    Object categorization for affordance prediction
    (Georgia Institute of Technology, 2008-07-01) Sun, Jie
    A fundamental requirement of any autonomous robot system is the ability to predict the affordances of its environment, which define how the robot can interact with various objects. In this dissertation, we demonstrate that the conventional direct perception approach can indeed be applied to the task of training robots to predict affordances, but it does not consider that objects can be grouped into categories such that objects of the same category have similar affordances. Although the connection between object categorization and the ability to make predictions of attributes has been extensively studied in cognitive science research, it has not been systematically applied to robotics in learning to predict a number of affordances from recognizing object categories. We develop a computational framework of learning and predicting affordances where a robot explicitly learns the categories of objects present in its environment in a partially supervised manner, and then conducts experiments to interact with the objects to both refine its model of categories and the category-affordance relationships. In comparison to the direct perception approach, we demonstrate that categories make the affordance learning problem scalable, in that they make more effective use of scarce training data and support efficient incremental learning of new affordance concepts. Another key aspect of our approach is to leverage the ability of a robot to perform experiments on its environment and thus gather information independent of a human trainer. We develop the theoretical underpinnings of category-based affordance learning and validate our theory on experiments with physically-situated robots. Finally, we refocus the object categorization problem of computer vision back to the theme of autonomous agents interacting with a physical world consisting of categories of objects. This enables us to reinterpret and extend the Gluck-Corter category utility function for the task of learning categorizations for affordance prediction.