Parikh, Devi

Associated Organization(s)
Organizational Unit
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 2 of 2
  • Item
    Words, Pictures, and Common Sense
    ( 2015-12-01) Parikh, Devi
    As computer vision and natural language processing techniques are maturing, there is heightened activity in exploring the connection between images and language. In this talk, I will present several recent and ongoing projects in my lab that take a new perspective on problems like automatic image captioning, which are receiving a lot of attention lately. In particular, I will start by describing a new methodology for evaluating image-captioning approaches. I will then discuss image specificity — a concept capturing the phenomenon that some images are specific and elicit consistent descriptions from people, while other images are ambiguous and elicit a wider variety of descriptions from different people. Rather than think of this variance as noise, we model this as a signal. We demonstrate that modeling image specificity results in improved performance in applications such as text-based image retrieval. I will then talk about our work on leveraging visual common sense for seemingly non-visual tasks such as textual fill-in-the-blanks or paraphrasing. We propose imagining the scene behind the text to solve these problems. The imagination need not be photorealistic; so we imagine the scene as a visual abstraction using clipart. We show that jointly reasoning about the imagined scene and the text results in improved performance of these textual tasks than reasoning about the text alone. Finally, I will introduce a new task that pushes the understanding of language and vision beyond automatic image captioning — visual question answering (VQA). Not only does it involve computer vision and natural language processing, doing well at this task will require the machine to reason about visual and non-visual common sense, as well as factual knowledge bases. More importantly, it will require the machine to know when to tap which source of information. I will describe our ongoing efforts at collecting a first-of-its-kind, large VQA dataset that will enable the community to explore this rich, challenging, and fascinating task, which pushes the frontier towards truly AI-complete problems.
  • Item
    Improving the quality of speech in noisy environments
    (Georgia Institute of Technology, 2012-11-06) Parikh, Devi
    In this thesis, we are interested in processing noisy speech signals that are meant to be heard by humans, and hence we approach the noise-suppression problem from a perceptual perspective. We develop a noise-suppression paradigm that is based on a model of the human auditory system, where we process signals in a way that is natural to the human ear. Under this paradigm, we transform an audio signal in to a perceptual domain, and processes the signal in this perceptual domain. This approach allows us to reduce the background noise and the audible artifacts that are seen in traditional noise-suppression algorithms, while preserving the quality of the processed speech. We develop a single- and dual-microphone algorithm based on this perceptual paradigm, and conduct subjecting tests to show that this approach outperforms traditional noise-suppression techniques. Moreover, we investigate the cause of audible artifacts that are generated as a result of suppressing the noise in noisy signals, and introduce constraints on the noise-suppression gain such that these artifacts are reduced.