Person:
Essa, Irfan

Associated Organization(s)
Organizational Unit
ORCID
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 9 of 9
  • Item
    Geometric Context from Videos
    (Georgia Institute of Technology, 2013-06) Raza, S. Hussain ; Grundmann, Matthias ; Essa, Irfan
    We present a novel algorithm for estimating the broad 3D geometric structure of outdoor video scenes. Leveraging spatio-temporal video segmentation, we decompose a dynamic scene captured by a video into geometric classes, based on predictions made by region-classifiers that are trained on appearance and motion features. By examining the homogeneity of the prediction, we combine predictions across multiple segmentation hierarchy levels alleviating the need to determine the granularity a priori. We built a novel, extensive dataset on geometric context of video to evaluate our method, consisting of over 100 ground-truth annotated outdoor videos with over 20,000 frames. To further scale beyond this dataset, we propose a semi-supervised learning framework to expand the pool of labeled data with high confidence predictions obtained from unlabeled data. Our system produces an accurate prediction of geometric context of video achieving 96% accuracy across main geometric classes.
  • Item
    Post-processing Approach for Radiometric Self-Calibration of Video
    (Georgia Institute of Technology, 2013-04) Grundmann, Matthias ; McClanahan, Chris ; Kang, Sing Bing ; Essa, Irfan
    We present a novel data-driven technique for radiometric self-calibration of video from an unknown camera. Our approach self-calibrates radiometric variations in video, and is applied as a post-process; there is no need to access the camera, and in particular it is applicable to internet videos. This technique builds on empirical evidence that in video the camera response function (CRF) should be regarded time variant, as it changes with scene content and exposure, instead of relying on a single camera response function. We show that a time-varying mixture of responses produces better accuracy and consistently reduces the error in mapping intensity to irradiance when compared to a single response model. Furthermore, our mixture model counteracts the effects of possible nonlinear exposure-dependent intensity perturbations and white-balance changes caused by proprietary camera firmware. We further show how radiometrically calibrated video improves the performance of other video analysis algorithms, enabling a video segmentation algorithm to be invariant to exposure and gain variations over the sequence. We validate our data-driven technique on videos from a variety of cameras and demonstrate the generality of our approach by applying it to internet video
  • Item
    Weakly Supervised Learning of Object Segmentations from Web-Scale Video
    (Georgia Institute of Technology, 2012-10) Hartmann, Glenn ; Grundmann, Matthias ; Hoffman, Judy ; Tsai, David ; Kwatra, Vivek ; Madani, Omid ; Vijayanarasimhan, Sudheendra ; Essa, Irfan ; Rehg, James M. ; Sukthankar, Rahul
    We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Specifically, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as "dog", without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classifiers for a set of independent spatio-temporal segments. The object seeds obtained using segment-level classifiers are further refined using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we confirm that our proposed methods can learn good object masks just by watching YouTube.
  • Item
    Calibration-Free Rolling Shutter Removal
    (Georgia Institute of Technology, 2012-04) Grundmann, Matthias ; Kwatra, Vivek ; Castro, Daniel ; Essa, Irfan
    We present a novel algorithm for efficient removal of rolling shutter distortions in uncalibrated streaming videos. Our proposed method is calibration free as it does not need any knowledge of the camera used, nor does it require calibration using specially recorded calibration sequences. Our algorithm can perform rolling shutter removal under varying focal lengths, as in videos from CMOS cameras equipped with an optical zoom. We evaluate our approach across a broad range of cameras and video sequences demonstrating robustness, scaleability, and repeatability. We also conducted a user study, which demonstrates preference for the output of our algorithm over other state-of-the art methods. Our algorithm is computationally efficient, easy to parallelize, and robust to challenging artifacts introduced by various cameras with differing technologies.
  • Item
    Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths
    (Georgia Institute of Technology, 2011-06) Grundmann, Matthias ; Kwatra, Vivek ; Essa, Irfan
    We present a novel algorithm for automatically applying constrainable, L1-optimal camera paths to generate stabilized videos by removing undesired motions. Our goal is to compute camera paths that are composed of constant, linear and parabolic segments mimicking the camera motions employed by professional cinematographers. To this end, our algorithm is based on a linear programming framework to minimize the first, second, and third derivatives of the resulting camera path. Our method allows for video stabilization beyond the conventional filtering of camera paths that only suppresses high frequency jitter. We incorporate additional constraints on the path of the camera directly in our algorithm, allowing for stabilized and retargeted videos. Our approach accomplishes this without the need of user interaction or costly 3D reconstruction of the scene, and works as a post-process for videos from any camera or from an online source.
  • Item
    Motion Fields to Predict Play Evolution in Dynamic Sport Scenes
    (Georgia Institute of Technology, 2010-06) Kim, Kihwan ; Grundmann, Matthias ; Shamir, Ariel ; Matthews, Iain ; Hodgins, Jessica K. ; Essa, Irfan
    Videos of multi-player team sports provide a challenging domain for dynamic scene analysis. Player actions and interactions are complex as they are driven by many factors, such as the short-term goals of the individual player, the overall team strategy, the rules of the sport, and the current context of the game. We show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. We start by extracting the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. We evaluate our approach by analyzing videos of a variety of complex soccer plays.
  • Item
    Efficient Hierarchical Graph-Based Video Segmentation
    (Georgia Institute of Technology, 2010-06) Grundmann, Matthias ; Kwatra, Vivek ; Han, Mei ; Essa, Irfan
    We present an efficient and scalable technique for spatiotemporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by oversegmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. We also propose two novel approaches to improve the scalability of our technique: (a) a parallel out-of-core algorithm that can process volumes much larger than an in-core algorithm, and (b) a clip-based processing algorithm that divides the video into overlapping clips in time, and segments them successively while enforcing consistency. We demonstrate hierarchical segmentations on video shots as long as 40 seconds, and even support a streaming mode for arbitrarily long videos, albeit without the ability to process them hierarchically.
  • Item
    Player Localization Using Multiple Static Cameras for Sports Visualization
    (Georgia Institute of Technology, 2010-06) Hamid, Raffay ; Kumar, Ram Krishan ; Grundmann, Matthias ; Kim, Kihwan ; Essa, Irfan ; Hodgins, Jessica K.
    We present a novel approach for robust localization of multiple people observed using multiple cameras. We use this location information to generate sports visualizations, which include displaying a virtual offside line in soccer games, and showing players' positions and motion patterns. Our main contribution is the modeling and analysis for the problem of fusing corresponding players' positional information as finding minimum weight K-length cycles in complete K-partite graphs. To this end, we use a dynamic programming based approach that varies over a continuum of being maximally to minimally greedy in terms of the number of paths explored at each iteration. We present an end-to-end sports visualization framework that employs our proposed algorithm-class. We demonstrate the robustness of our framework by testing it on 60,000 frames of soccer footage captured over 5 different illumination conditions, play types, and team attire.
  • Item
    Discontinuous Seam-Carving for Video Retargeting
    (Georgia Institute of Technology, 2010-06) Grundmann, Matthias ; Kwatra, Vivek ; Han, Mei ; Essa, Irfan
    We introduce a new algorithm for video retargeting that uses discontinuous seam-carving in both space and time for resizing videos. Our algorithm relies on a novel appearance-based temporal coherence formulation that allows for frame-by-frame processing and results in temporally discontinuous seams, as opposed to geometrically smooth and continuous seams. This formulation optimizes the difference in appearance of the resultant retargeted frame to the optimal temporally coherent one, and allows for carving around fast moving salient regions. Additionally, we generalize the idea of appearance-based coherence to the spatial domain by introducing piece-wise spatial seams. Our spatial coherence measure minimizes the change in gradients during retargeting, which preserves spatial detail better than minimization of color difference alone. We also show that per-frame saliency (gradient-based or feature-based) does not always produce desirable retargeting results and propose a novel automatically computed measure of spatio-temporal saliency. As needed, a user may also augment the saliency by interactive region-brushing. Our retargeting algorithm processes the video sequentially, making it conducive for streaming applications.