Person:

Essa, Irfan

Permanent Link

https://hdl.handle.net/1853/71242

Associated Organization(s)

Organizational Unit

School of Interactive Computing

Full item page

Publication Search Results

Now showing 1 - 10 of 14

A Bayesian View of Boosting and Its Extension

(Georgia Institute of Technology, 2005) Bobick, Aaron F. ; Essa, Irfan ; Shi, Yifan

In this paper, we provide a Bayesian perspective of boosting framework, which we refer to as Bayesian Integration. Through this perspective, we prove the standard ADABOOST is a special case of the naive Bayesian tree with a mapped conditional probability table and a particular weighting schema. Based on this perspective, we introduce a new algorithm ADABOOST.BAYES by taking the dependency between the weak classifiers into account, which extends the boosting framework into non-linear combinations of weak classifiers. Compared with standard ADABOOST, ADABOOST.BAYES requires less training iterations but exhibits stronger tendency to overfit. To leverage on both ADABOOST and ADABOOST. BAYES, we introduce a simple switching schema ADABOOST. SOFTBAYES to integrate ADABOOST and ADABOOST.BAYES. Experiments on synthetic data and the UCI data set prove the validity of our framework.
Choreography Driven Characters

(Georgia Institute of Technology, 2002) Sternberg, Daniel ; Essa, Irfan

High-level control of an articulated humanoid character for animation is much desired by animators. Current options of key-framing, motion capture and simulation either give too much or too little control to the animator in dealing with general motions. The main reason for this lack of action level, higher form of control is that low-level representations, mostly driven by data or samples are used by current systems. Though high-level representations of motion do exist, it is difficult to incorporate them into systems for animation. To facilitate this, we first introduce a representation based on dance notation. We then introduce a second notation based on L-systems. We show how the latter representation falls in the middle of the range of notations, allowing us to rotate, encode, and synthesize various movements. We then show the applicability of these representations by presenting animations created by an input of dance notation.
Exemplar Based Non-parametric BRDFs

(Georgia Institute of Technology, 2002) Haro, Antonio ; Essa, Irfan

Realistic rendering of computer modeled three dimensional surfaces typically involves building a parameterized model of the bidirectional reflectance distribution function (BRDF) of the desired surface material. We present a technique to render these surfaces with proper illumination and material properties using only a photograph of a sphere of the desired material under desired lighting conditions. Capitalizing on the fact that the geometry of the material in the photograph is known, we sample pixels of the sphere's reflectance to create photo-realistic renderings of computer models with the same material properties. The reflectance is sampled using texture synthesis techniques that compensate for the fact that very little of the BRDF observed in the photograph is known. The technique uses the limited observations of the function to create a plausible realistic rendering of the surface that can be composited onto a background plate easily.
Visual Coding and Tracking of Speech Related Facial Motion

(Georgia Institute of Technology, 2001) Reveret, Lionel ; Essa, Irfan

This article present a visual characterization of facial motions inherent with speaking. We propose a set of four Facial Speech Parameters (FSP): jaw opening, lips rounding, lips closure, and lips raising, to represent the primary visual gestures of speech articulation into a multidimensional linear manifold. This manifold is initially generated as a statistical model, obtained by analyzing accurate 3D data of a reference human subject. The FSP are then associated to the linear modes of this statistical model, resulting in a 3D parametric facial mesh. We have tested the speaker-independent hypothesis of this manifold with a model-based video tracking task applied on different subjects. Firstly, the parametric model is adapted and aligned to a subject's face for a single shape. Then the face motion is tracked by optimally aligning the incoming video frames with the face model, textured with the first image, and deformed by varying the FSP, head rotations, and translations. We show results of the tracking for different subjects using our method. Finally, we demonstrate the facial activity encoding into the four FSP values to represent speaker-independent phonetic information.
Real-time, Photo-realistic, Physically Based Rendering of Fine Sacle Human Skin Structure

(Georgia Institute of Technology, 2001) Haro, Antonio ; Guenter, Brian K. ; Essa, Irfan

Skin is noticeably bumpy in character, which is clearly visible in close-up shots in a film or game. Methods that rely on simple texture-mapping of faces lack such high frequency shape detail, which makes them look non-realistic. More specifically, this detail is usually ignored in real-time applications, or is drawn in manually by an artist. In this paper, we present techniques for capturing and rendering the fine scale structure of human skin. First, we present a method for creating normal maps of skin with a high degree of accuracy from physical data. We also present techniques inspired by texture synthesis to "grow" skin normal maps to cover the face. Finally, we demonstrate how such skin models can be rendered in real-time on consumer-end graphics hardware.
Machine Learning for Video-Based Rendering

(Georgia Institute of Technology, 2000) Schodl, Arno ; Essa, Irfan

We recently introduced a new paradigm for computer animation, video textures, which allows us to use a recorded video to generate novel animations by replaying the video samples in a new order. Video sprites are a special type of video texture. Instead of storing whole images, the object of interest is separated from the background and the video samples are stored as a sequence of alpha-matted sprites with associated velocity information. They can be rendered anywhere on the screen to create a novel animation of the object. To create such an animation, we have to find a sequence of sprite samples that is both visually smooth and shows the desired motion. In this paper, we address both problems. To estimate visual smoothness, we train a linear classifier to estimate visual similarity between video samples. If the motion path is known in advance, we then use a beam search algorithm to find a good sample sequence. We can also specify the motion interactively by precomputing a set of cost functions using Q-learning.
Motion Based Decompositing of Video

(Georgia Institute of Technology, 1999) Brostow, Gabriel Julian ; Essa, Irfan

We present a method to decompose video sequences into layers that represent the relative depths of complex scenes. Our method combines spatial information with temporal occlusions to determine relative depths of these layers. Spatial information is obtained through edge detection and a customized contour completion algorithm. Activity in a scene is used to extract temporal occlusion events, which are in turn, used to classify objects as occluders or occludees. The path traversed by the moving objects determines the segmentation of the scene. Several examples of decompositing and compositing of video are shown. This approach can be applied in the pre-processing of sequences for compositing or tracking purposes and to determine the approximate 3D structure of a scene.
Robust Tracking of People by a Mobile Robotic Agent

(Georgia Institute of Technology, 1999) Tanawongsuwan, Rawesak ; Stoytchev, Alexander ; Essa, Irfan

We present methods for tracking people in dynamic and changing environments from camera mounted on a mobile robot. We describe processes to extract color, motion, and depth information from video and we present methods to merge these processes to allow for reliable tracking of people. We discuss how this merging of different measurements can aid in instances where there is motion in the scene due to large movements by people, camera movements, lighting variations, even in the presence of skin-like colors in the scene. We also apply the results from our tracking system for gesture recognition in the context of human-robot interaction.
Detecting and Tracking Eyes by Using their Physiological Properties, Dynamics and Appearance

(Georgia Institute of Technology, 1999) Haro, Antonio ; Flickner, Myron ; Essa, Irfan

Reliable detection and tracking of eyes is an important requirement for attentive user interfaces. In this paper, we present a methodology for detecting eyes robustly in indoor environments in real-time. We exploit the physiological properties and appearance of eyes as well as head/eye motion dynamics. Structured infrared lighting is used to capture the physiological properties of eyes, Kalman trackers are used to model eye/head dynamics, and a probabilistic based appearance model is used to represent eye appearance. By combining three separate modalities, with specific enhancements within each modality, our approach allows eyes to be treated as robust features that can be used for other higher-level processing.
Exploiting Human Actions and Object Context for Recognition Tasks

(Georgia Institute of Technology, 1999) Moore, Darnell Janssen ; Essa, Irfan ; Hayes, Monson H.

Our goal is to exploit human motion and object context to perform action recognition and object classification. Towards this end, we introduce a framework for recognizing actions and objects by measuring image-, object- and action-based information from video. Hidden Markov models are combined with object context to classify hand actions, which are aggregated by a Bayesian classifier to summarize activities. We also use Bayesian methods to differentiate the class of unknown objects by evaluating detected actions along with low-level, extracted object features. Our approach is appropriate for locating and classifying objects under a variety of conditions including full occlusion. We show experiments where both familiar and previously unseen objects are recognized using action and context information.