Person:
Rehg, James M.

Associated Organization(s)
Organizational Unit
ORCID
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 10 of 46
  • Item
    The Middle Child Problem: Revisiting Parametric Min-cut and Seeds for Object Proposals
    (Georgia Institute of Technology, 2015-12) Humayun, Ahmad ; Li, Fuxin ; Rehg, James M.
    Object proposals have recently fueled the progress in detection performance. These proposals aim to provide category-agnostic localizations for all objects in an image. One way to generate proposals is to perform parametric min-cuts over seed locations. This paper demonstrates that standard parametric-cut models are ineffective in obtaining medium-sized objects, which we refer to as the middle child problem. We propose a new energy minimization framework incorporating geodesic distances between segments which solves this problem. In addition, we introduce a new superpixel merging algorithm which can generate a small set of seeds that reliably cover a large number of objects of all sizes. We call our method POISE - "Proposals for Objects from Improved Seeds and Energies." POISE enables parametric min-cuts to reach their full potential. On PASCAL VOC it generates ~2,640 segments with an average overlap of 0.81, whereas the closest competing methods require more than 4,200 proposals to reach the same accuracy. We show detailed quantitative comparisons against 5 state-of-the-art methods on PASCAL VOC and Microsoft COCO segmentation challenges.
  • Item
    Visualizing State-Based Hypertension Progression Models
    (Georgia Institute of Technology, 2014-11) Gupta, Amrita ; Liu, Yu-Ying ; Sun, Jimeng ; Rehg, James M.
    We present a novel interactive visualization scheme for state-based hypertension progression modeling using hidden Markov models, applied to electronic health records of a cohort population enrolled in a hypertension management program. The visualization tool provides an interface for exploratory analysis and model validation, and improves the interpretability of the model results for healthcare researchers. We demonstrate a preliminary application of the visualization to compare states visited and transitions taken by two different subgroups with distinctive hypertension trajectories.
  • Item
    BioGlass: Physiological Parameter Estimation Using a Head - mounted Wearable Device
    (Georgia Institute of Technology, 2014-11) Hernandez, Javier ; Li, Yin ; Rehg, James M. ; Picard, Rosalind W.
    This work explores the feasibility of using sensors embedded in Google Glass, a head-mounted wearable device, to measure physiological signals of the wearer. In particular, we develop new methods to use Glass’s accelerometer, gyroscope, and camera to extract pulse and respiratory rates of 12 participants during a controlled experiment. We show it is possible to achieve a mean absolute error of 0.83 beats per minute (STD: 2.02) for heart rate and 1.18 breaths per minute (STD: 2.04) for respiration rate when considering different combinations of sensors. These results included testing across sitting, supine, and standing still postures before and after physical exercise.
  • Item
    Inferring Object Properties from Incidental Contact with a Tactile-Sensing Forearm
    (Georgia Institute of Technology, 2014-09) Bhattacharjee, Tapomayukh ; Rehg, James M. ; Kemp, Charles C.
    Whole-arm tactile sensing enables a robot to sense properties of contact across its entire arm. By using this large sensing area, a robot has the potential to acquire useful information from incidental contact that occurs while performing a task. Within this paper, we demonstrate that data-driven methods can be used to infer mechanical properties of objects from incidental contact with a robot’s forearm. We collected data from a tactile-sensing forearm as it made contact with various objects during a simple reaching motion. We then used hidden Markov models (HMMs) to infer two object properties (rigid vs. soft and fixed vs. movable) based on low-dimensional features of time-varying tactile sensor data (maximum force, contact area, and contact motion). A key issue is the extent to which data-driven methods can generalize to robot actions that differ from those used during training. To investigate this issue, we developed an idealized mechanical model of a robot with a compliant joint making contact with an object. This model provides intuition for the classification problem. We also conducted tests in which we varied the robot arm’s velocity and joint stiffness. We found that, in contrast to our previous methods [1], multivariate HMMs achieved high cross-validation accuracy and successfully generalized what they had learned to new robot motions with distinct velocities and joint stiffnesses.
  • Item
    Learning to Reach into the Unknown: Selecting Initial Conditions When Reaching in Clutter
    (Georgia Institute of Technology, 2014-09) Park, Daehyung ; Kapusta, Ariel ; Kim, You Keun ; Rehg, James M. ; Kemp, Charles C.
    Often in highly-cluttered environments, a robot can observe the exterior of the environment with ease, but cannot directly view nor easily infer its detailed internal structure (e.g., dense foliage or a full refrigerator shelf). We present a data-driven approach that greatly improves a robot’s success at reaching to a goal location in the unknown interior of an environment based on observable external properties, such as the category of the clutter and the locations of openings into the clutter (i.e., apertures). We focus on the problem of selecting a good initial configuration for a manipulator when reaching with a greedy controller. We use density estimation to model the probability of a successful reach given an initial condition and then perform constrained optimization to find an initial condition with the highest estimated probability of success. We evaluate our approach with two simulated robots reaching in clutter, and provide a demonstration with a real PR2 robot reaching to locations through random apertures. In our evaluations, our approach significantly outperformed two alter- native approaches when making two consecutive reach attempts to goals in distinct categories of unknown clutter. Notably, our approach only uses sparse readily-apparent features.
  • Item
    Joint Semantic Segmentation and 3D Reconstruction from Monocular Video
    (Georgia Institute of Technology, 2014-09) Kundu, Abhijit ; Li, Yin ; Dellaert, Frank ; Li, Fuxin ; Rehg, James M.
    We present an approach for joint inference of 3D scene structure and semantic labeling for monocular video. Starting with monocular image stream, our framework produces a 3D volumetric semantic + occupancy map, which is much more useful than a series of 2D semantic label images or a sparse point cloud produced by traditional semantic segmentation and Structure from Motion(SfM) pipelines respectively. We derive a Conditional Random Field (CRF) model defined in the 3D space, that jointly infers the semantic category and occupancy for each voxel. Such a joint inference in the 3D CRF paves the way for more informed priors and constraints, which is otherwise not possible if solved separately in their traditional frameworks. We make use of class specific semantic cues that constrain the 3D structure in areas, where multiview constraints are weak. Our model comprises of higher order factors, which helps when the depth is unobservable. We also make use of class specific semantic cues to reduce either the degree of such higher order factors, or to approximately model them with unaries if possible. We demonstrate improved 3D structure and temporally consistent semantic segmentation for diffcult, large scale, forward moving monocular image sequence.
  • Item
    RIGOR: Reusing Inference in Graph Cuts for generating Object Regions
    (Georgia Institute of Technology, 2014-06) Humayun, Ahmad ; Li, Fuxin ; Rehg, James M.
    Popular figure-ground segmentation algorithms generate a pool of boundary-aligned segment proposals that can be used in subsequent object recognition engines. These algorithms can recover most image objects with high accuracy, but are usually computationally intensive since many graph cuts are computed with different enumerations of segment seeds. In this paper we propose an algorithm, RIGOR, for efficiently generating a pool of overlapping segment proposals in images. By precomputing a graph which can be used for parametric min-cuts over different seeds, we speed up the generation of the segment pool. In addition, we have made design choices that avoid extensive computations without losing performance. In particular, we demonstrate that the segmentation performance of our algorithm is slightly better than the state-of-the-art on the PASCAL VOC dataset, while being an order of magnitude faster.
  • Item
    The Secrets of Salient Object Segmentation
    (Georgia Institute of Technology, 2014-06) Li, Yin ; Hou, Xiaodi ; Koch, Christof ; Rehg, James M. ; Yuille, Alan L.
    In this paper we provide an extensive evaluation of fixation prediction and salient object segmentation algorithms as well as statistics of major datasets. Our analysis identifies serious design flaws of existing salient object bench- marks, called the dataset design bias, by over emphasising the stereotypical concepts of saliency. The dataset design bias does not only create the discomforting disconnection between fixations and salient object segmentation, but also misleads the algorithm designing. Based on our analysis, we propose a new high quality dataset that offers both fixation and salient object segmentation ground-truth. With fixations and salient object being presented simultaneously, we are able to bridge the gap between fixations and salient objects, and propose a novel method for salient object segmentation. Finally, we report significant benchmark progress on 3 existing datasets of segmenting salient objects.
  • Item
    Movement Pattern Histogram for Action Recognition and Retrieval
    (Georgia Institute of Technology, 2014) Ciptadi, Arridhana ; Goodwin, Matthew S. ; Rehg, James M.
    We present a novel action representation based on encoding the global temporal movement of an action. We represent an action as a set of movement pattern histograms that encode the global temporal dynamics of an action. Our key observation is that temporal dynamics of an action are robust to variations in appearance and viewpoint changes, making it useful for action recognition and retrieval. We pose the problem of computing similarity between action representations as a maximum matching problem in a bipartite graph. We demonstrate the effectiveness of our method for cross-view action recognition on the IXMAS dataset. We also show how our representation complements existing bag- of-features representations on the UCF50 dataset. Finally we show the power of our representation for action retrieval on a new real-world dataset containing repetitive motor movements emitted by children with autism in an unconstrained classroom setting.
  • Item
    Video Segmentation by Tracking Many Figure-Ground Segments
    (Georgia Institute of Technology, 2013-12) Li, Fuxin ; Kim, Taeyoung ; Humayun, Ahmad ; Tsai, David ; Rehg, James M.
    We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figure-ground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing high-order statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework out-performs state-of-the-art approaches in the dataset, showing its efficiency and robustness to challenges in different video sequences.