Weakly Supervised Learning of Object Segmentations from Web-Scale Video

dc.contributor.author Hartmann, Glenn
dc.contributor.author Grundmann, Matthias
dc.contributor.author Hoffman, Judy
dc.contributor.author Tsai, David
dc.contributor.author Kwatra, Vivek
dc.contributor.author Madani, Omid
dc.contributor.author Vijayanarasimhan, Sudheendra
dc.contributor.author Essa, Irfan
dc.contributor.author Rehg, James M.
dc.contributor.author Sukthankar, Rahul
dc.contributor.corporatename Georgia Institute of Technology. College of Computing en_US
dc.contributor.corporatename Georgia Institute of Technology. School of Interactive Computing en_US
dc.contributor.corporatename Georgia Institute of Technology. Center for Robotics and Intelligent Machines en_US
dc.contributor.corporatename University of California, Berkeley en_US
dc.contributor.corporatename Google Research en_US
dc.date.accessioned 2013-08-28T16:10:23Z
dc.date.available 2013-08-28T16:10:23Z
dc.date.issued 2012-10
dc.description ©2012 Springer-Verlag Berlin Heidelberg. The original publication is available at www.springerlink.com en_US
dc.description DOI: 10.1007/978-3-642-33863-2_20
dc.description.abstract We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Specifically, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as "dog", without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classifiers for a set of independent spatio-temporal segments. The object seeds obtained using segment-level classifiers are further refined using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we confirm that our proposed methods can learn good object masks just by watching YouTube. en_US
dc.embargo.terms null en_US
dc.identifier.citation Hartmann, G.; Grundmann, M.; Hoffman, J.; Tsai, D.; Kwatra, V.; Madani, O.; Vijayanarasimhan, S.; Essa, I.A.; Rehg, J.M.; & Sukthankar, R. (2012). “Weakly Supervised Learning of Object Segmentations from Web-Scale Video”. Computer Vision – ECCV 2012. Workshops and Demonstrations 7-13 October 2012. Proceedings, Part I. In Lecture Notes in Computer Science, 2012, Vol. 7583, pp. 198-208. en_US
dc.identifier.doi 10.1007/978-3-642-33863-2_20
dc.identifier.isbn 978-3-642-33862-5 (Print)
dc.identifier.isbn 978-3-642-33863-2 (Online)
dc.identifier.issn 0302-9743
dc.identifier.uri http://hdl.handle.net/1853/48736
dc.language.iso en_US en_US
dc.publisher Georgia Institute of Technology en_US
dc.publisher.original Springer-Verlag Berlin / Heidelberg
dc.subject Object masks en_US
dc.subject Spatiotemporal segmentation en_US
dc.subject Video segmentation en_US
dc.subject Video stabilization en_US
dc.title Weakly Supervised Learning of Object Segmentations from Web-Scale Video en_US
dc.type Text
dc.type.genre Book Chapter
dc.type.genre Proceedings
dspace.entity.type Publication
local.contributor.author Essa, Irfan
local.contributor.author Rehg, James M.
local.contributor.author Hoffman, Judy
local.contributor.corporatename Institute for Robotics and Intelligent Machines (IRIM)
relation.isAuthorOfPublication 84ae0044-6f5b-4733-8388-4f6427a0f817
relation.isAuthorOfPublication af5b46ec-ffe2-4ce4-8722-1373c9b74a37
relation.isAuthorOfPublication 403cff3c-8f25-4db5-978b-ef617a9f8b6a
relation.isOrgUnitOfPublication 66259949-abfd-45c2-9dcc-5a6f2c013bcf
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
3.48 MB
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
3.13 KB
Item-specific license agreed upon to submission