Rehg, James M.

Associated Organization(s)
Organizational Unit
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 7 of 7
  • Item
    C⁴ : A Real-time Object Detection Framework
    (Georgia Institute of Technology, 2013-10) Wu, Jianxin ; Liu, Nini ; Geyer, Christopher ; Rehg, James M.
    A real-time and accurate object detection framework, C⁴, is proposed in this paper. C⁴ achieves 20 fps speed and state-of-the-art detection accuracy, using only one processing thread without resorting to special hardwares like GPU. Real-time accurate object detection is made possible by two contributions. First, we conjecture (with supporting experiments) that contour is what we should capture and signs of comparisons among neighboring pixels are the key information to capture contour cues. Second, we show that the CENTRIST visual descriptor is suitable for contour based object detection, because it encodes the sign information and can implicitly represent the global contour. When CENTRIST and linear classifier are used, we propose a computational method that does not need to explicitly generate feature vectors. It involves no image preprocessing or feature vector normalization, and only requires O(1) steps to test an image patch. C⁴ is also friendly to further hardware acceleration. It has been applied to detect objects such as pedestrians, faces, and cars on benchmark datasets. It has comparable detection accuracy with state-of-the-art methods, and has a clear advantage in detection speed.
  • Item
    Learning to Recognize Daily Actions using Gaze
    (Georgia Institute of Technology, 2012-10) Fathi, Alireza ; Li, Yin ; Rehg, James M.
    We present a probabilistic generative model for simultaneously recognizing daily actions and predicting gaze locations in videos recorded from an egocentric camera. We focus on activities requiring eye-hand coordination and model the spatio-temporal relationship between the gaze point, the scene objects, and the action label. Our model captures the fact that the distribution of both visual features and object occurrences in the vicinity of the gaze point is correlated with the verb-object pair describing the action. It explicitly incorporates known properties of gaze behavior from the psychology literature, such as the temporal delay between fixation and manipulation events. We present an inference method that can predict the best sequence of gaze locations and the associated action label from an input sequence of images. We demonstrate improvements in action recognition rates and gaze prediction accuracy relative to state-of-the-art methods, on two new datasets that contain egocentric videos of daily activities and gaze.
  • Item
    Computerized Macular Pathology Diagnosis in Spectral Domain Optical Coherence Tomography Scans Based on Multiscale Texture and Shape Features
    (Georgia Institute of Technology, 2011-10) Liu, Yu-Ying ; Ishikawa, Hiroshi ; Chen, Mei ; Wollstein, Gadi ; Duker, Jay S. ; Fujimoto, James G. ; Schuman, Joel S. ; Rehg, James M.
    To develop an automated method to identify the normal macula and three macular pathologies (macular hole [MH], macular edema [ME], and age-related macular degeneration [AMD]) from the fovea-centered cross sections in threedimensional (3D) spectral-domain optical coherence tomography (SD-OCT) images.
  • Item
    Automated Macular Pathology Diagnosis in Retinal OCT Images Using Multi-Scale Spatial Pyramid and Local Binary Patterns in Texture and Shape Encoding
    (Georgia Institute of Technology, 2011-10) Liu, Yu-Ying ; Chen, Mei ; Ishikawa, Hiroshi ; Wollstein, Gadi ; Schuman, Joel S. ; Rehg, James M.
    We address a novel problem domain in the analysis of optical coherence tomography (OCT) images: the diagnosis of multiple macular pathologies in retinal OCT images. The goal is to identify the presence of normal macula and each of three types of macular pathologies, namely, macular edema, macular hole, and age-related macular degeneration, in the OCT slice centered at the fovea. We use a machine learning approach based on global image descriptors formed from a multi-scale spatial pyramid. Our local features are dimension-reduced Local Binary Pattern histograms, which are capable of encoding texture and shape information in retinal OCT images and their edge maps, respectively. Our representation operates at multiple spatial scales and granularities, leading to robust performance. We use 2-class Support Vector Machine classifiers to identify the presence of normal macula and each of the three pathologies. To further discriminate sub-types within a pathology, we also build a classifier to differentiate full-thickness holes from pseudo-holes within the macular hole category. We conduct extensive experiments on a large dataset of 326 OCT scans from 136 subjects. The results show that the proposed method is very effective (all AUC > 0:93).
  • Item
    CENTRIST: A Visual Descriptor for Scene Categorization
    (Georgia Institute of Technology, 2011-08) Wu, Jianxin ; Rehg, James M.
    CENTRIST (CENsus TRansform hISTogram), a new visual descriptor for recognizing topological places or scene categories, is introduced in this paper. We show that place and scene recognition, especially for indoor environments, require its visual descriptor to possess properties that are different from other vision domains (e.g. object recognition). CENTRIST satisfies these properties and suits the place and scene recognition task. It is a holistic representation and has strong generalizability for category recognition. CENTRIST mainly encodes the structural properties within an image and suppresses detailed textural information. Our experiments demonstrate that CENTRIST outperforms the current state-of-the-art in several place and scene recognition datasets, compared with other descriptors such as SIFT and Gist. Besides, it is easy to implement and evaluates extremely fast.
  • Item
    Visual Place Categorization: Problem, Dataset, and Algorithm
    (Georgia Institute of Technology, 2009-10) Wu, Jianxin ; Rehg, James M. ; Christensen, Henrik I.
    In this paper we describe the problem of Visual Place Categorization (VPC) for mobile robotics, which involves predicting the semantic category of a place from image measurements acquired from an autonomous platform. For example, a robot in an unfamiliar home environment should be able to recognize the functionality of the rooms it visits, such as kitchen, living room, etc. We describe an approach to VPC based on sequential processing of images acquired with a conventional video camera.We identify two key challenges: Dealing with non-characteristic views and integrating restricted-FOV imagery into a holistic prediction. We present a solution to VPC based upon a recently-developed visual feature known as CENTRIST (CENsus TRansform hISTogram). We describe a new dataset for VPC which we have recently collected and are making publicly available. We believe this is the first significant, realistic dataset for the VPC problem. It contains the interiors of six different homes with ground truth labels. We use this dataset to validate our solution approach, achieving promising results.
  • Item
    Learning and Inferring Motion Patterns Using Parametric Segmental Switching Linear Dynamic Systems
    (Georgia Institute of Technology, 2008) Oh, Sang Min ; Rehg, James M. ; Balch, Tucker ; Dellaert, Frank
    Switching Linear Dynamic System (SLDS) models are a popular technique for modeling complex nonlinear dynamic systems. An SLDS provides the possibility to describe complex temporal patterns more concisely and accurately than an HMM by using continuous hidden states. However, the use of SLDS models in practical applications is challenging for several reasons. First, exact inference in SLDS models is computationally intractable. Second, the geometric duration model induced in standard SLDSs limits their representational power. Third, standard SLDSs do not provide a systematic way to robustly interpret systematic variations governed by higher order parameters. The contributions in this paper address all three challenges above. First, we present a data-driven MCMC sampling method for SLDSs as a robust and efficient approximate inference method. Second, we present segmental switching linear dynamic systems (S-SLDS), where the geometric distributions are replaced with arbitrary duration models. Third, we extend the standard model with a parametric model that can capture systematic temporal and spatial variations. The resulting parametric SLDS model (P-SLDS) uses EM to robustly interpret parametrized motions by incorporating additional global parameters that underly systematic variations of the overall motion. The overall development of the proposed inference methods and extensions for SLDSs provide a robust framework to interpret complex motions. The framework is applied to the honey bee dance interpretation task in the context of the on-going BioTracking project at Georgia Institute of Technology. The experimental results suggest that the enhanced models provide an effective framework for a wide range of motion analysis applications.