Person:

Essa, Irfan

Permanent Link

https://hdl.handle.net/1853/71242

Associated Organization(s)

Organizational Unit

School of Interactive Computing

Full item page

Publication Search Results

Now showing 1 - 10 of 32

Selfie-Presentation in Everyday Life: A Large-scale Characterization of Selfie Contexts on Instagram

(Georgia Institute of Technology, 2017) Deeb-Swihart, Julia ; Polack, Christopher ; Gilbert, Eric ; Essa, Irfan

Carefully managing the presentation of self via technology is a core practice on all modern social media platforms. Recently, selfies have emerged as a new, pervasive genre of identity performance. In many ways unique, selfies bring us full-circle to Goffman—blending the online and offline selves together. In this paper, we take an empirical, Goffman-inspired look at the phenomenon of selfies. We report a large-scale, mixed-method analysis of the categories in which selfies appear on Instagram—an online community comprising over 400M people. Applying computer vision and network analysis techniques to 2.5M selfies, we present a typology of emergent selfie categories which represent emphasized identity statements. To the best of our knowledge, this is the first large-scale, empirical research on selfies. We conclude, contrary to common portrayals in the press, that selfies are really quite ordinary: they project identity signals such as wealth, health and physical attractiveness common to many online media, and to offline life.
Towards Using Visual Attributes to Infer Image Sentiment Of Social Events

(Georgia Institute of Technology, 2017) Ahsan, Unaiza ; De Choudhury, Munmun ; Essa, Irfan

Widespread and pervasive adoption of smartphones has led to instant sharing of photographs that capture events ranging from mundane to life-altering happenings. We propose to capture sentiment information of such social event images leveraging their visual content. Our method extracts an intermediate visual representation of social event images based on the visual attributes that occur in the images going beyond sentiment-specific attributes. We map the top predicted attributes to sentiments and extract the dominant emotion associated with a picture of a social event. Unlike recent approaches, our method generalizes to a variety of social events and even to unseen events, which are not available at training time. We demonstrate the effectiveness of our approach on a challenging social event image dataset and our method outperforms state-of-the-art approaches for classifying complex event images into sentiments.
Leveraging Context to Support Automated Food Recognition in Restaurants

(Georgia Institute of Technology, 2015-01) Bettadapura, Vinay ; Thomaz, Edison ; Parnam, Aman ; Abowd, Gregory D. ; Essa, Irfan

The pervasiveness of mobile cameras has resulted in a dramatic increase in food photos, which are pictures re- flecting what people eat. In this paper, we study how tak- ing pictures of what we eat in restaurants can be used for the purpose of automating food journaling. We propose to leverage the context of where the picture was taken, with ad- ditional information about the restaurant, available online, coupled with state-of-the-art computer vision techniques to recognize the food being consumed. To this end, we demon- strate image-based recognition of foods eaten in restaurants by training a classifier with images from restaurant’s on- line menu databases. We evaluate the performance of our system in unconstrained, real-world settings with food im- ages taken in 10 restaurants across 5 different types of food (American, Indian, Italian, Mexican and Thai).
A Practical Approach for Recognizing Eating Moments With Wrist-Mounted Inertial Sensing

(Georgia Institute of Technology, 2015) Thomaz, Edison ; Essa, Irfan ; Abowd, Gregory D.

Recognizing when eating activities take place is one of the key challenges in automated food intake monitoring. Despite progress over the years, most proposed approaches have been largely impractical for everyday usage, requiring multiple on-body sensors or specialized devices such as neck collars for swallow detection. In this paper, we describe the implementation and evaluation of an approach for inferring eating moments based on 3-axis accelerometry collected with a popular off-the-shelf smartwatch. Trained with data collected in a semi-controlled laboratory setting with 20 subjects, our system recognized eating moments in two free-living condition studies (7 participants, 1 day; 1 participant, 31 days), with F-scores of 76.1% (66.7% Precision, 88.8% Recall), and 71.3% (65.2% Precision, 78.6% Recall). This work represents a contribution towards the implementation of a practical, automated system for everyday food intake monitoring, with applicability in areas ranging from health research and food journaling.
Inferring Meal Eating Activities in Real World Settings from Ambient Sounds: A Feasibility Study

(Georgia Institute of Technology, 2015) Thomaz, Edison ; Zhang, Cheng ; Essa, Irfan ; Abowd, Gregory D.

Dietary self-monitoring has been shown to be an effective method for weight-loss, but it remains an onerous task despite recent advances in food journaling systems. Semi-automated food journaling can reduce the effort of logging, but often requires that eating activities be detected automatically. In this work we describe results from a feasibility study conducted in-the-wild where eating activities were inferred from ambient sounds captured with a wrist-mounted device; twenty participants wore the device during one day for an average of 5 hours while performing normal everyday activities. Our system was able to identify meal eating with an F-score of 79.8% in a person-dependent evaluation, and with 86.6% accuracy in a person-independent evaluation. Our approach is intended to be practical, leveraging off-the-shelf devices with audio sensing capabilities in contrast to systems for automated dietary assessment based on specialized sensors.
Egocentric Field-of-View Localization Using First-Person Point-of-View Devices

(Georgia Institute of Technology, 2015-01) Bettadapura, Vinay ; Essa, Irfan ; Pantofaru, Caroline

We present a technique that uses images, videos and sensor data taken from first-person point-of-view devices to perform egocentric field-of-view (FOV) localization. We define egocentric FOV localization as capturing the visual information from a person’s field-of-view in a given environment and transferring this information onto a reference corpus of images and videos of the same space, hence determining what a person is attending to. Our method matches images and video taken from the first-person perspective with the reference corpus and refines the results using the first-person’s head orientation information obtained using the device sensors. We demonstrate single and multi-user egocentric FOV localization in different indoor and outdoor environments with applications in augmented reality, event understanding and studying social interactions.
Automated Assessment of Surgical Skills Using Frequency Analysis

(Georgia Institute of Technology, 2015) Zia, Aneeq ; Sharma, Yachna ; Bettadapura, Vinay ; Sarin, Eric L. ; Clements, Mark A. ; Essa, Irfan

We present an automated framework for visual assessment of the expertise level of surgeons using the OSATS (Objective Structured Assessment of Technical Skills) criteria. Video analysis techniques for extracting motion quality via frequency coefficients are introduced. The framework is tested on videos of medical students with different expertise levels performing basic surgical tasks in a surgical training lab setting. We demonstrate that transforming the sequential time data into frequency components effectively extracts the useful information differentiating between different skill levels of the surgeons. The results show significant performance improvements using DFT and DCT coefficients over known state-of-the-art techniques.
Predicting Daily Activities From Egocentric Images Using Deep Learning

(Georgia Institute of Technology, 2015) Castro, Daniel ; Hickson, Steven ; Bettadapura, Vinay ; Thomaz, Edison ; Abowd, Gregory D. ; Christensen, Henrik I. ; Essa, Irfan

We present a method to analyze images taken from a passive egocentric wearable camera along with the contextual information, such as time and day of week, to learn and predict everyday activities of an individual. We collected a dataset of 40,103 egocentric images over a 6 month period with 19 activity classes and demonstrate the benefit of state-of-the-art deep learning techniques for learning and predicting daily activities. Classification is conducted using a Convolutional Neural Network (CNN) with a classification method we introduce called a late fusion ensemble. This late fusion ensemble incorporates relevant contextual information and increases our classification accuracy. Our technique achieves an overall accuracy of 83.07% in predicting a person's activity across the 19 activity classes. We also demonstrate some promising results from two additional users by fine-tuning the classifier with one day of training data.
Efficient Hierarchical Graph-Based Segmentation of RGBD Videos

(Georgia Institute of Technology, 2014-06) Hickson, Steven ; Birchfield, Stan ; Essa, Irfan ; Christensen, Henrik I.

We present an efficient and scalable algorithm for segmenting 3D RGBD point clouds by combining depth, color, and temporal information using a multistage, hierarchical graph-based approach. Our algorithm processes a moving window over several point clouds to group similar regions over a graph, resulting in an initial over-segmentation. These regions are then merged to yield a dendrogram using agglomerative clustering via a minimum spanning tree algorithm. Bipartite graph matching at a given level of the hierarchical tree yields the final segmentation of the point clouds by maintaining region identities over arbitrarily long periods of time. We show that a multistage segmentation with depth then color yields better results than a linear combination of depth and color. Due to its incremental processing, our algorithm can process videos of any length and in a streaming pipeline. The algorithm’s ability to produce robust, efficient segmentation is demonstrated with numerous experimental results on challenging sequences from our own as well as public RGBD data sets.
Clustering Social Event Images using Kernel Canonical Correlation Analysis

(Georgia Institute of Technology, 2014-06) Ahsan, Unaiza ; Essa, Irfan

Sharing user experiences in form of photographs, tweets, text, audio and/or video has become commonplace in social networking websites. Browsing through large collections of social multimedia remains a cumbersome task. It requires a user to initiate textual search query and manually go through a list of resulting images to find relevant information. We propose an automatic clustering algorithm, which, given a large collection of images, groups them into clusters of different events using the image features and related metadata. We formulate this problem as a kernel canonical correlation clustering problem in which data samples from different modalities or ‘views’ are projected to a space where correlations between the samples’ projections are maximized. Our approach enables us to learn a semantic representation of potentially uncorrelated feature sets and this representation is clustered to give unique social events. Furthermore, we leverage the rich information associated with each uploaded image (such as usernames, dates/timestamps, etc.) and empirically determine which combination of feature sets yields the best clustering score for a dataset of 100,000 images.