Title:
Learning from Multi-Source Weak Supervision for Neural Text Classification

Thumbnail Image
Author(s)
Ren, Wendi
Authors
Advisor(s)
Zhang, Chao
Advisor(s)
Person
Editor(s)
Associated Organization(s)
Series
Supplementary to
Abstract
Text classification is a fundamental text mining task with numerous real-life applications. While deep neural nets have achieved superior performance for text classification, they rely on large-scale labeled data to achieve strong performance. Obtaining large-scale labeled data, however, can be prohibitively expensive in many applications. In this project, we study the problem of learning neural text classifiers without using any labeled data, but only easy-to-provide heuristic rules as weak supervision. This problem is challenging because rule-induced weak labels are often noisy and incomplete. To address these challenges, we propose a model that can be learned from multiple weak supervision sources with two key components. The first component is a rule denoiser, which estimates conditional source reliability using a soft attention mechanism and reduces label noise by aggregating rule- induced noisy data. The second is a neural classifier that predicts soft labels for unmatchable samples to address the rule coverage issue. The two components are integrated into a co-training framework, which can be trained end-to-end to mutually enhance each other. We evaluate our model on five benchmarks for four popular text classification tasks, including sentiment analysis, topic classification, spam classification, and relation extraction. The results show that our model outperforms state-of-the-art weakly-supervised and semi-supervised methods, and achieves comparable performance with fully-supervised methods even without any labeled data.
Sponsor
Date Issued
2020-07-28
Extent
Resource Type
Text
Resource Subtype
Thesis
Rights Statement
Rights URI