Rehg, James M.

Associated Organization(s)
Organizational Unit
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 10 of 11
  • Item
    The Middle Child Problem: Revisiting Parametric Min-cut and Seeds for Object Proposals
    (Georgia Institute of Technology, 2015-12) Humayun, Ahmad ; Li, Fuxin ; Rehg, James M.
    Object proposals have recently fueled the progress in detection performance. These proposals aim to provide category-agnostic localizations for all objects in an image. One way to generate proposals is to perform parametric min-cuts over seed locations. This paper demonstrates that standard parametric-cut models are ineffective in obtaining medium-sized objects, which we refer to as the middle child problem. We propose a new energy minimization framework incorporating geodesic distances between segments which solves this problem. In addition, we introduce a new superpixel merging algorithm which can generate a small set of seeds that reliably cover a large number of objects of all sizes. We call our method POISE - "Proposals for Objects from Improved Seeds and Energies." POISE enables parametric min-cuts to reach their full potential. On PASCAL VOC it generates ~2,640 segments with an average overlap of 0.81, whereas the closest competing methods require more than 4,200 proposals to reach the same accuracy. We show detailed quantitative comparisons against 5 state-of-the-art methods on PASCAL VOC and Microsoft COCO segmentation challenges.
  • Item
    An In Depth View of Saliency
    (Georgia Institute of Technology, 2013-09) Ciptadi, Arridhana ; Hermans, Tucker ; Rehg, James M.
    Visual saliency is a computational process that identifies important locations and structure in the visual field. Most current methods for saliency rely on cues such as color and texture while ignoring depth information, which is known to be an important saliency cue in the human cognitive system. We propose a novel computational model of visual saliency which incorporates depth information. We compare our approach to several state of the art visual saliency methods and we introduce a method for saliency based segmentation of generic objects. We demonstrate that by explicitly constructing 3D lay-out and shape features from depth measurements, we can obtain better performance than methods which treat the depth map as just another image channel. Our method requires no learning and can operate on scenes for which the system has no previous knowledge. We conduct object segmentation experiments on a new dataset of registered RGB-D images captured on a mobile-manipulator robot.
  • Item
    Learning Stable Pushing Locations
    (Georgia Institute of Technology, 2013-08) Hermans, Tucker ; Li, Fuxin ; Rehg, James M. ; Bobick, Aaron F.
    We present a method by which a robot learns to predict effective push-locations as a function of object shape. The robot performs push experiments at many contact locations on multiple objects and records local and global shape features at each point of contact. The robot observes the outcome trajectories of the manipulations and computes a novel push-stability score for each trial. The robot then learns a regression function in order to predict push effectiveness as a function of object shape. This mapping allows the robot to select effective push locations for subsequent objects whether they are previously manipulated instances, new instances from previously encountered object classes, or entirely novel objects. In the totally novel object case, the local shape property coupled with the overall distribution of the object allows for the discovery of effective push locations. These results are demonstrated on a mobile manipulator robot pushing a variety of household objects on a tabletop surface.
  • Item
    Categorizing Turn-Taking Interactions
    (Georgia Institute of Technology, 2012-10) Prabhakar, Karthir ; Rehg, James M.
    We address the problem of categorizing turn-taking interactions between individuals. Social interactions are characterized by turn-taking and arise frequently in real-world videos. Our approach is based on the use of temporal causal analysis to decompose a space-time visual word representation of video into co-occuring independent segments, called causal sets [1]. These causal sets then serves the input to a multiple instance learning framework to categorize turn- taking interactions. We introduce a new turn-taking interactions dataset consisting of social games and sports rallies. We demonstrate that our formulation of multiple instance learning (QP-MISVM) is better able to leverage the repetitive structure in turn-taking interactions and demonstrates superior performance relative to a conventional bag of words model.
  • Item
    Automated Macular Pathology Diagnosis in Retinal OCT Images Using Multi-Scale Spatial Pyramid and Local Binary Patterns in Texture and Shape Encoding
    (Georgia Institute of Technology, 2011-10) Liu, Yu-Ying ; Chen, Mei ; Ishikawa, Hiroshi ; Wollstein, Gadi ; Schuman, Joel S. ; Rehg, James M.
    We address a novel problem domain in the analysis of optical coherence tomography (OCT) images: the diagnosis of multiple macular pathologies in retinal OCT images. The goal is to identify the presence of normal macula and each of three types of macular pathologies, namely, macular edema, macular hole, and age-related macular degeneration, in the OCT slice centered at the fovea. We use a machine learning approach based on global image descriptors formed from a multi-scale spatial pyramid. Our local features are dimension-reduced Local Binary Pattern histograms, which are capable of encoding texture and shape information in retinal OCT images and their edge maps, respectively. Our representation operates at multiple spatial scales and granularities, leading to robust performance. We use 2-class Support Vector Machine classifiers to identify the presence of normal macula and each of the three pathologies. To further discriminate sub-types within a pathology, we also build a classifier to differentiate full-thickness holes from pseudo-holes within the macular hole category. We conduct extensive experiments on a large dataset of 326 OCT scans from 136 subjects. The results show that the proposed method is very effective (all AUC > 0:93).
  • Item
    Learning to Recognize Objects in Egocentric Activities
    (Georgia Institute of Technology, 2011-06) Fathi, Alireza ; Ren, Xiaofeng ; Rehg, James M.
    This paper addresses the problem of learning object models from egocentric video of household activities, using extremely weak supervision. For each activity sequence, we know only the names of the objects which are present within it, and have no other knowledge regarding the appearance or location of objects. The key to our approach is a robust, unsupervised bottom up segmentation method, which exploits the structure of the egocentric domain to partition each frame into hand, object, and background categories. By using Multiple Instance Learning to match object instances across sequences, we discover and localize object occurrences. Object representations are refined through transduction and object-level classifiers are trained. We demonstrate encouraging results in detecting novel object instances using models produced by weakly- supervised learning.
  • Item
    HDCCSR: software self-awareness using dynamic analysis and Markov models
    (Georgia Institute of Technology, 2008-12-20) Harrold, Mary Jean ; Rugaber, Spencer ; Rehg, James M.
  • Item
    On-line Learning of the Traversability of Unstructured Terrain for Outdoor Robot Navigation
    (Georgia Institute of Technology, 2006) Oh, Sang Min ; Rehg, James M. ; Dellaert, Frank
  • Item
    Data-Driven MCMC for Learning and Inference in Switching Linear Dynamic Systems
    (Georgia Institute of Technology, 2005-07) Oh, Sang Min ; Rehg, James M. ; Balch, Tucker ; Dellaert, Frank
    Switching Linear Dynamic System (SLDS) models are a popular technique for modeling complex nonlinear dynamic systems. An SLDS has significantly more descriptive power than an HMM, but inference in SLDS models is computationally intractable. This paper describes a novel inference algorithm for SLDS models based on the Data- Driven MCMC paradigm. We describe a new proposal distribution which substantially increases the convergence speed. Comparisons to standard deterministic approximation methods demonstrate the improved accuracy of our new approach. We apply our approach to the problem of learning an SLDS model of the bee dance. Honeybees communicate the location and distance to food sources through a dance that takes place within the hive. We learn SLDS model parameters from tracking data which is automatically extracted from video. We then demonstrate the ability to successfully segment novel bee dances into their constituent parts, effectively decoding the dance of the bees.
  • Item
    Segmental Switching Linear Dynamic Systems
    (Georgia Institute of Technology, 2005) Oh, Sang Min ; Rehg, James M. ; Dellaert, Frank
    We introduce Segmental Switching Linear Dynamic Systems (S-SLDS), which improve on standard SLDSs by explicitly incorporating duration modeling capabilities. We show that S-SLDSs can adopt arbitrary finite-sized duration models that describe data more accurately than the geometric distributions induced by standard SLDSs. We also show that we can convert an S-SLDS to an equivalent standard SLDS with sparse structure in the resulting transition matrix. This insight makes it possible to adopt existing inference and learning algorithms for the standard SLDS models to the S-SLDS framework. As a consequence, the more powerful S-SLDS model can be adopted with only modest additional effort in most cases where an SLDS model can be applied. The experimental results on honeybee dance decoding tasks demonstrate the robust inference capabilities of the proposed S-SLDS model.