Person:
Essa, Irfan

Associated Organization(s)
Organizational Unit
ORCID
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 8 of 8
  • Item
    Motion Based Decompositing of Video
    (Georgia Institute of Technology, 1999) Brostow, Gabriel Julian ; Essa, Irfan
    We present a method to decompose video sequences into layers that represent the relative depths of complex scenes. Our method combines spatial information with temporal occlusions to determine relative depths of these layers. Spatial information is obtained through edge detection and a customized contour completion algorithm. Activity in a scene is used to extract temporal occlusion events, which are in turn, used to classify objects as occluders or occludees. The path traversed by the moving objects determines the segmentation of the scene. Several examples of decompositing and compositing of video are shown. This approach can be applied in the pre-processing of sequences for compositing or tracking purposes and to determine the approximate 3D structure of a scene.
  • Item
    Robust Tracking of People by a Mobile Robotic Agent
    (Georgia Institute of Technology, 1999) Tanawongsuwan, Rawesak ; Stoytchev, Alexander ; Essa, Irfan
    We present methods for tracking people in dynamic and changing environments from camera mounted on a mobile robot. We describe processes to extract color, motion, and depth information from video and we present methods to merge these processes to allow for reliable tracking of people. We discuss how this merging of different measurements can aid in instances where there is motion in the scene due to large movements by people, camera movements, lighting variations, even in the presence of skin-like colors in the scene. We also apply the results from our tracking system for gesture recognition in the context of human-robot interaction.
  • Item
    Detecting and Tracking Eyes by Using their Physiological Properties, Dynamics and Appearance
    (Georgia Institute of Technology, 1999) Haro, Antonio ; Flickner, Myron ; Essa, Irfan
    Reliable detection and tracking of eyes is an important requirement for attentive user interfaces. In this paper, we present a methodology for detecting eyes robustly in indoor environments in real-time. We exploit the physiological properties and appearance of eyes as well as head/eye motion dynamics. Structured infrared lighting is used to capture the physiological properties of eyes, Kalman trackers are used to model eye/head dynamics, and a probabilistic based appearance model is used to represent eye appearance. By combining three separate modalities, with specific enhancements within each modality, our approach allows eyes to be treated as robust features that can be used for other higher-level processing.
  • Item
    Exploiting Human Actions and Object Context for Recognition Tasks
    (Georgia Institute of Technology, 1999) Moore, Darnell Janssen ; Essa, Irfan ; Hayes, Monson H.
    Our goal is to exploit human motion and object context to perform action recognition and object classification. Towards this end, we introduce a framework for recognizing actions and objects by measuring image-, object- and action-based information from video. Hidden Markov models are combined with object context to classify hand actions, which are aggregated by a Bayesian classifier to summarize activities. We also use Bayesian methods to differentiate the class of unknown objects by evaluating detected actions along with low-level, extracted object features. Our approach is appropriate for locating and classifying objects under a variety of conditions including full occlusion. We show experiments where both familiar and previously unseen objects are recognized using action and context information.
  • Item
    A System for Tracking and Recognizing Multiple People with Multiple Camera
    (Georgia Institute of Technology, 1998) Stillman, Scott T. ; Tanawongsuwan, Rawesak ; Essa, Irfan
    In this paper we present a robust real-time method for tracking and recognizing multiple people with multiple cameras. Our method uses both static and Pan-Tilt-Zoom (PTZ) cameras to provide visual attention. The PTZ camera system uses face recognition to register people in the scene and ``lock-on'' to those individuals. The static camera system provides a global view of the environment and is used to re-adjust the tracking of the system when the PTZ cameras lose their targets. The system works well even when people occlude one another. The underlying visual processes rely on color segmentation, movement tracking and shape information to locate target candidates. Color indexing and face recognition modules help register these candidates with the system.
  • Item
    Head Tracking using a Textured Polygonal Model
    (Georgia Institute of Technology, 1998) Schodl, Arno ; Haro, Antonio ; Essa, Irfan
    We describe the use of a three-dimensional textured model of the human head under perspective projection to track a person's face. The system is hand-initialized by projecting an image of a face onto the polygonal head model. Tracking is achieved by finding the six translation and rotation parameters to register a rendered image of the textured model with the video image. We find the parameters by mapping the derivative of the error with respect to the parameters to intensity gradients in the image, use a robust estimator to pool the information and do gradient descent.
  • Item
    Coding, Analysis, Interpretation, and Recognition of Facial Expressions
    (Georgia Institute of Technology, 1998) Essa, Irfan
    We describe a computer vision system for observing facial motion by using an optimal estimation optical flow method coupled with geometric, physical and motion-based dynamic models describing the facial structure. Our method produces a reliable parametric representation of the face's independent muscle action groups, as well as an accurate estimate of facial motion. Previous efforts at analysis of facial expression have been based on the Facial Action Coding System (FACS), a representation developed in order to allow human psychologists to code expression from static pictures. To avoid use of this heuristic coding scheme, we have used our computer vision system to probabilistically characterize facial motion and muscle activation in an experimental population, thus deriving a new, more accurate representation of human facial expressions that we call FACS. Finally, we show how this method can be used for coding, analysis, interpretation, and recognition of facial expressions.
  • Item
    Object Spaces: Context Management for Human Activity Recognition
    (Georgia Institute of Technology, 1998) Moore, Darnell Janssen ; Essa, Irfan ; Hayes, Monson H.
    In this paper, we propose a vision-based method for developing computer awareness of human activities. We present an object-oriented approach called ObjectSpaces that encapsulates context into scene objects. Objects provide clues about which human motions to anticipate, making them powerful tools for discriminating actions and activities. Our hierarchical process leverages both low- and high-level representations of motion to label human interaction with objects in the surroundings. The Hidden Markov Model and Bayesian relations are used to characterize and summarize activity.