GVU Technical Report Series

Series

GVU Technical Report Series

Permanent Link

https://hdl.handle.net/1853/71001

Series Type

Publication Series

Associated Organization(s)

Organizational Unit

GVU Center

Full item page

Publication Search Results

Now showing 1 - 10 of 15

CENTRIST: A Visual Descriptor for Scene Categorization

(Georgia Institute of Technology, 2009-07-23) Wu, Jianxin ; Rehg, James M.

CENTRIST (CENsus TRansform hISTogram), a new visual descriptor for recognizing topological places or scene categories, is introduced in this paper. We show that place and scene recognition, especially for indoor environments, require its visual descriptor to possess properties that are different from other vision domains (e.g. object recognition). CENTRIST satisfy these properties and suits the place and scene recognition task. It is a holistic representation and has strong generalizability for category recognition. CENTRIST mainly encodes the structural properties within an image and suppresses detailed textural information. Our experiments demonstrate that CENTRIST outperforms the current state-of-the art in several place and scene recognition datasets, compared with other descriptors such as SIFT and Gist. Besides, it is easy to implement. It has nearly no parameter to tune, and evaluates extremely fast.
Shadow Elimination and Blinding Light Suppression for Interactive Projected Displays

(Georgia Institute of Technology, 2006) Summet, Jay W. ; Flagg, Matthew ; Cham, Tat-Jen ; Rehg, James M. ; Sukthankar, Rahul

A major problem with interactive displays based on front-projection is that users cast undesirable shadows on the display surface. This situation is only partially-addressed by mounting a single projector at an extreme angle and warping the projected image to undo keystoning distortions. This paper demonstrates that shadows can be muted by redundantly-illuminating the display surface using multiple projectors, all mounted at different locations. However, this technique alone does not eliminate shadows: multiple projectors create multiple dark regions on the surface (penumbral occlusions) and cast undesirable light onto the users. These problems can be solved by eliminating shadows and suppressing the light that falls on occluding users by actively modifying the projected output. This paper categorizes various methods that can be used to achieve redundant illumination, shadow elimination, and blinding light suppression, and evaluates their performance.
Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems

(Georgia Institute of Technology, 2006) Oh, Sang Min ; Rehg, James M. ; Balch, Tucker ; Dellaert, Frank

Switching Linear Dynamic System (SLDS) models are a popular technique for modeling complex nonlinear dynamic systems. An SLDS has significantly more descriptive power than an HMM by using continuous hidden states. However, the use of SLDS models in practical applications is challenging for several reasons. First, exact inference in SLDS models is computationally intractable. Second, the geometric duration model induced in standard SLDSs limits their representational power. Third, standard SLDSs do not provide a systematic way to robustly interpret systematic variations governed by higher order parameters. The contributions in this paper address all three challenges above. First, we present a data-driven MCMC sampling method for SLDSs as a robust and efficient approximate inference method. Second, we present segmental switching linear dynamic systems (S-SLDS), where the geometric distributions are replaced with arbitrary duration models. Third, we extend the standard model with a parametric model that can capture systematic temporal and spatial variations. The resulting parametric SLDS model (P-SLDS) uses EM to robustly interpret parametrized motions by incorporating additional global parameters that underly systematic variations of the overall motion. The overall development of the proposed inference methods and extensions for SLDSs provide a robust framework to interpret complex motions. The framework is applied to the honey bee dance interpretation task in the context of the ongoing BioTracking project at Georgia Institute of Technology. The experimental results suggest that the enhanced models provide an effective framework for a wide range of motion analysis applications.
Learning for Ground Robot Navigation with Autonomous Data Collection

(Georgia Institute of Technology, 2005) Su, Jie ; Rehg, James M. ; Bobick, Aaron F.

Robot navigation using vision is a classic example of a scene understanding problem. We describe a novel approach to estimating the traversability of an unknown environment based on modern object recognition methods. Traversability is an example of an affordance jointly determined by the environment and the physical characteristics of a robot vehicle, whose definition is clear in context. However, it is extremely difficult to estimate the traversability of a given terrain structure in general, or to find rules which work for a wide variety of terrain types. However, by learning to recognize similar terrain structures, it is possible to leverage a limited amount of interaction between the robot and its environment into global statements about the traversability of the scene. We describe a novel on-line learning algorithm that learns to recognize terrain features from images and aggregate the traversability information acquired by a navigating robot. An important property of our method, which is desirable for any learning-based approach to object recognition, is the ability to autonomously acquire arbitrary amounts of training data as needed without any human intervention. Tests of our algorithm on a real robot in complicated unknown natural environments suggest that it is both robust and efficient.
On the Design of Cascades of Boosted Ensembles for Face Detection

(Georgia Institute of Technology, 2005) Brubaker, S. Charles ; Wu, Jianxin ; Sun, Jie ; Mullin, Matthew D. ; Rehg, James M.

Cascades of boosted ensembles have become popular in the object detection community following their highly successful introduction in the face detector of Viola and Jones. Since then, researchers have sought to improve upon the original approach by incorporating new methods along a variety of axes (e.g. alternative boosting methods, feature sets, etc). We explore several axes that have not yet received adequate attention in this context: cascade learning, stronger weak hypotheses, and feature filtering. We present a novel strategy to determine the appropriate balance between false positive and detection rates in the individual stages of the cascade, enabling us to control our experiments to a degree not previously possible. We show that while the choice of boosting method has little impact on the detector's performance and feature filtering is largely ineffective, the use of stronger weak hypotheses based on CART classifiers can significantly improve upon the standard results.
A Variational inference method for Switching Linear Dynamic Systems

(Georgia Institute of Technology, 2005) Oh, Sang Min ; Ranganathan, Ananth ; Rehg, James M. ; Dellaert, Frank

This paper aims to present a structured variational inference algorithm for switching linear dynamical systems (SLDSs) which was initially introduced by Pavlovic and Rehg. Starting with the need for the variational approach, we proceed to the derivation of the generic (model-independent) variational update formulas which are obtained under the mean field assumption. This leads us to the derivation of an approximate variational inference algorithm for an SLDS. The details of deriving the SLDS-specific variational update equations are presented.
Fast Asymmetric Learning for Cascade Face Detection

(Georgia Institute of Technology, 2005) Wu, Jianxin ; Brubaker, S. Charles ; Mullin, Matthew D. ; Rehg, James M.

A cascade face detector uses a sequence of node classifiers to distinguish faces from non-faces. This paper presents a new approach to design node classifiers in the cascade detector. Previous methods used machine learning algorithms that simultaneously select features and form ensemble classifiers. We argue that if these two parts are decoupled, we have the freedom to design a classifier that explicitly addresses the difficulties caused by the asymmetric learning goal. There are three contributions in this paper. The first is a categorization of asymmetries in the learning goal, and why they make face detection hard. The second is the Forward Feature Selection (FFS) algorithm and a fast caching strategy for AdaBoost. FFS and the fast AdaBoost can reduce the training time by approximately 100 and 50 times, in comparison to a naive implementation of the AdaBoost feature selection method. The last contribution is Linear Asymmetric Classifier (LAC), a classifier that explicitly handles the asymmetric learning goal as a well-defined constrained optimization problem. We demonstrated experimentally that LAC results in improved ensemble classifier performance.
Oil Painting Assistance Using Projected Light: Bridging the Gap Between Digital and Physical Art

(Georgia Institute of Technology, 2005) Flagg, Matthew ; Rehg, James M.

This paper presents a novel interactive system for guiding artists to paint using traditional media and tools. The enabling technology is a multi-projector display capable of controlling the appearance of an artist's canvas. Artists are guided by this display-on-canvas to execute painting techniques. The artist paints according to a linear process of painting by numbers, one layer at a time. Each layer is painted using a set of interaction modes. Preview mode shows the entire layer as the current painting goal. Blank mode shows the state of the painting. Color selection mode displays where to paint a certain color, orientation mode shows how to paint it, and texture highlight mode enhances the texture of the paint following its application. These interaction modes enable the novice to focus on painting sub-tasks in order to simplify the painting process while providing technical guidance ranging from high-level composition to detailed brushwork. In addition to assisting artists for painting, we discuss how our system could be extended to sculpture, woodwork, and other areas of the fine arts.
Boosted Bayesian Network Classifiers

(Georgia Institute of Technology, 2005) Jing, Yushi ; Pavlovic, Vladimir ; Rehg, James M.

The use of Bayesian networks for classification problems has received significant recent attention. Although computationally efficient, the standard maximum likelihood learning method tends to be suboptimal due to the mismatch between its optimization criteria (data likelihood) and the actual goal of classification (label prediction accuracy). Recent approaches to optimizing classification performance during parameter or structure learning show promise, but lack the favorable computational properties of maximum likelihood learning. In this paper we present Boosted Bayesian Network Classifiers, a framework to combine discriminative data-weighting with generative training of intermediate models. We show that Boosted Bayesian network Classifiers encompass the basic generative models in isolation, but improve their classification performance when the model structure is suboptimal. This framework can be easily extended to temporal Bayesian network models including HMM and DBN. On a large suite of benchmark data-sets, this approach outperforms generative graphical models such as naive Bayes, TAN, unrestricted Bayesian network and DBN in classification accuracy. Boosted Bayesian network classifiers have comparable or better performance in comparison to other discriminatively trained graphical models including ELR-NB, ELR-TAN, BNC-2P, BNC-MDL and CRF. Furthermore, boosted Bayesian networks require significantly less training time than all of the competing methods.
Sound Source Localization in Domestic Environment

(Georgia Institute of Technology, 2004) Bian, Xuehai ; Rehg, James M. ; Abowd, Gregory D.

Sound source localization strategies can be traced back to radar and sonar localization systems. In the report, we will review the main challenges of sound source, especially talker, localization problem and current major strategies. We proposed a practical peak-weighted PHAT TDOA method to find reliable source location in the Awarehome, which is a residential lab in Georgia Tech. Finally, we suggest the application scenarios in domestic environment and provide future direction of our work.