Title:
Learning descriptive models of objects and activities from egocentric video

dc.contributor.advisor Rehg, James M.
dc.contributor.author Fathi, Alireza
dc.contributor.committeeMember Bobick, Aaron
dc.contributor.committeeMember Abowd, Gregory D.
dc.contributor.committeeMember Starner, Thad
dc.contributor.committeeMember Hebert, Martial
dc.contributor.committeeMember Torralba, Antonio
dc.contributor.department Computer Science
dc.date.accessioned 2013-08-29T14:12:06Z
dc.date.available 2013-08-29T14:12:06Z
dc.date.created 2013-08
dc.date.issued 2013-06-13
dc.date.submitted August 2013
dc.date.updated 2013-08-29T14:12:07Z
dc.description.abstract Recent advances in camera technology have made it possible to build a comfortable, wearable system which can capture the scene in front of the user throughout the day. Products based on this technology, such as GoPro and Google Glass, have generated substantial interest. In this thesis, I present my work on egocentric vision, which leverages wearable camera technology and provides a new line of attack on classical computer vision problems such as object categorization and activity recognition. The dominant paradigm for object and activity recognition over the last decade has been based on using the web. In this paradigm, in order to learn a model for an object category like coffee jar, various images of that object type are fetched from the web (e.g. through Google image search), features are extracted and then classifiers are learned. This paradigm has led to great advances in the field and has produced state-of-the-art results for object recognition. However, it has two main shortcomings: a) objects on the web appear in isolation and they miss the context of daily usage; and b) web data does not represent what we see every day. In this thesis, I demonstrate that egocentric vision can address these limitations as an alternative paradigm. I will demonstrate that contextual cues and the actions of a user can be exploited in an egocentric vision system to learn models of objects under very weak supervision. In addition, I will show that measurements of a subject's gaze during object manipulation tasks can provide novel feature representations to support activity recognition. Moving beyond surface-level categorization, I will showcase a method for automatically discovering object state changes during actions, and an approach to building descriptive models of social interactions between groups of individuals. These new capabilities for egocentric video analysis will enable new applications in life logging, elder care, human-robot interaction, developmental screening, augmented reality and social media.
dc.description.degree Ph.D.
dc.format.mimetype application/pdf
dc.identifier.uri http://hdl.handle.net/1853/48738
dc.language.iso en_US
dc.publisher Georgia Institute of Technology
dc.subject Gaze
dc.subject Segmentation
dc.subject Egocentric vision
dc.subject Activity recognition
dc.subject Object recognition
dc.subject Attentional cues
dc.subject First-person vision
dc.subject Descriptive models
dc.subject Weakly supervised learning
dc.subject Social interactions
dc.subject Human object interaction
dc.subject Wearable camera
dc.subject.lcsh Computer vision
dc.subject.lcsh Wearable video devices
dc.title Learning descriptive models of objects and activities from egocentric video
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Rehg, James M.
local.contributor.corporatename College of Computing
local.contributor.corporatename School of Computer Science
local.contributor.corporatename Institute for Robotics and Intelligent Machines (IRIM)
relation.isAdvisorOfPublication af5b46ec-ffe2-4ce4-8722-1373c9b74a37
relation.isOrgUnitOfPublication c8892b3c-8db6-4b7b-a33a-1b67f7db2021
relation.isOrgUnitOfPublication 6b42174a-e0e1-40e3-a581-47bed0470a1e
relation.isOrgUnitOfPublication 66259949-abfd-45c2-9dcc-5a6f2c013bcf
thesis.degree.level Doctoral
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
FATHI-DISSERTATION-2013.pdf
Size:
41.51 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
LICENSE_1.txt
Size:
3.87 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
3.74 KB
Format:
Plain Text
Description: