Learning from Multi-Source Weak Supervision for Neural Text Classification

Ren, Wendi

Title:

Learning from Multi-Source Weak Supervision for Neural Text Classification

Files

REN-THESIS-2020.pdf (2.27 MB)

Author(s)

Ren, Wendi

Advisor(s)

Zhang, Chao

Advisor(s)

Person

Zhang, Chao

Associated Organization(s)

Organizational Unit

College of Computing

Organizational Unit

School of Computational Science and Engineering

Collections

Theses and Dissertations

Permanent Link

http://hdl.handle.net/1853/65124

Abstract

Text classification is a fundamental text mining task with numerous real-life applications. While deep neural nets have achieved superior performance for text classification, they rely on large-scale labeled data to achieve strong performance. Obtaining large-scale labeled data, however, can be prohibitively expensive in many applications. In this project, we study the problem of learning neural text classifiers without using any labeled data, but only easy-to-provide heuristic rules as weak supervision. This problem is challenging because rule-induced weak labels are often noisy and incomplete. To address these challenges, we propose a model that can be learned from multiple weak supervision sources with two key components. The first component is a rule denoiser, which estimates conditional source reliability using a soft attention mechanism and reduces label noise by aggregating rule- induced noisy data. The second is a neural classifier that predicts soft labels for unmatchable samples to address the rule coverage issue. The two components are integrated into a co-training framework, which can be trained end-to-end to mutually enhance each other. We evaluate our model on five benchmarks for four popular text classification tasks, including sentiment analysis, topic classification, spam classification, and relation extraction. The results show that our model outperforms state-of-the-art weakly-supervised and semi-supervised methods, and achieves comparable performance with fully-supervised methods even without any labeled data.

Date Issued

2020-07-28

Resource Type

Text

Resource Subtype

Thesis

Full item page

Title:

Learning from Multi-Source Weak Supervision for Neural Text Classification

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Georgia Tech Library

Title: Learning from Multi-Source Weak Supervision for Neural Text Classification

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Title:

Learning from Multi-Source Weak Supervision for Neural Text Classification