Title:
Words, Pictures, and Common Sense

dc.contributor.author Parikh, Devi
dc.contributor.corporatename Georgia Institute of Technology. Institute for Robotics and Intelligent Machine en_US
dc.contributor.corporatename Virginia Polytechnic Institute and State University. Dept. of Electrical and Computer Engineering en_US
dc.date.accessioned 2015-12-08T20:47:38Z
dc.date.available 2015-12-08T20:47:38Z
dc.date.issued 2015-12-01
dc.description Presented on December 1, 2015 at 12:00 p.m. in the TSRB Banquet Hall. en_US
dc.description Devi Parikh is an assistant professor in the Bradley Department of Electrical and Computer Engineering at Virginia Tech (VT) and an Allen Distinguished Investigator of Artificial Intelligence. Parikh’s research interests include computer vision, pattern recognition, and AI, particularly visual recognition problems.
dc.description Runtime: 52:45 minutes
dc.description.abstract As computer vision and natural language processing techniques are maturing, there is heightened activity in exploring the connection between images and language. In this talk, I will present several recent and ongoing projects in my lab that take a new perspective on problems like automatic image captioning, which are receiving a lot of attention lately. In particular, I will start by describing a new methodology for evaluating image-captioning approaches. I will then discuss image specificity — a concept capturing the phenomenon that some images are specific and elicit consistent descriptions from people, while other images are ambiguous and elicit a wider variety of descriptions from different people. Rather than think of this variance as noise, we model this as a signal. We demonstrate that modeling image specificity results in improved performance in applications such as text-based image retrieval. I will then talk about our work on leveraging visual common sense for seemingly non-visual tasks such as textual fill-in-the-blanks or paraphrasing. We propose imagining the scene behind the text to solve these problems. The imagination need not be photorealistic; so we imagine the scene as a visual abstraction using clipart. We show that jointly reasoning about the imagined scene and the text results in improved performance of these textual tasks than reasoning about the text alone. Finally, I will introduce a new task that pushes the understanding of language and vision beyond automatic image captioning — visual question answering (VQA). Not only does it involve computer vision and natural language processing, doing well at this task will require the machine to reason about visual and non-visual common sense, as well as factual knowledge bases. More importantly, it will require the machine to know when to tap which source of information. I will describe our ongoing efforts at collecting a first-of-its-kind, large VQA dataset that will enable the community to explore this rich, challenging, and fascinating task, which pushes the frontier towards truly AI-complete problems. en_US
dc.embargo.terms null en_US
dc.format.extent 52:45 minutes
dc.identifier.uri http://hdl.handle.net/1853/54216
dc.relation.ispartofseries IRIM Seminar Series
dc.subject Automatic image captioning en_US
dc.subject Image modeling en_US
dc.subject Machine learning en_US
dc.title Words, Pictures, and Common Sense en_US
dc.type Moving Image
dc.type.genre Lecture
dspace.entity.type Publication
local.contributor.author Parikh, Devi
local.contributor.corporatename Institute for Robotics and Intelligent Machines (IRIM)
local.relation.ispartofseries IRIM Seminar Series
relation.isAuthorOfPublication 2b8bc15b-448f-472b-8992-ca9862368cad
relation.isOrgUnitOfPublication 66259949-abfd-45c2-9dcc-5a6f2c013bcf
relation.isSeriesOfPublication 9bcc24f0-cb07-4df8-9acb-94b7b80c1e46
Files
Original bundle
Now showing 1 - 3 of 3
No Thumbnail Available
Name:
parikh.mp4
Size:
400.48 MB
Format:
MP4 Video file
Description:
Download video
No Thumbnail Available
Name:
parikh_videostream.html
Size:
985 B
Format:
Hypertext Markup Language
Description:
Streaming Video
No Thumbnail Available
Name:
Transcription.txt
Size:
51.8 KB
Format:
Plain Text
Description:
Transcription
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.13 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections