Organizational Unit:
School of Computational Science and Engineering

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)

Publication Search Results

Now showing 1 - 1 of 1
  • Item
    Learning from Multi-Source Weak Supervision for Neural Text Classification
    (Georgia Institute of Technology, 2020-07-28) Ren, Wendi
    Text classification is a fundamental text mining task with numerous real-life applications. While deep neural nets have achieved superior performance for text classification, they rely on large-scale labeled data to achieve strong performance. Obtaining large-scale labeled data, however, can be prohibitively expensive in many applications. In this project, we study the problem of learning neural text classifiers without using any labeled data, but only easy-to-provide heuristic rules as weak supervision. This problem is challenging because rule-induced weak labels are often noisy and incomplete. To address these challenges, we propose a model that can be learned from multiple weak supervision sources with two key components. The first component is a rule denoiser, which estimates conditional source reliability using a soft attention mechanism and reduces label noise by aggregating rule- induced noisy data. The second is a neural classifier that predicts soft labels for unmatchable samples to address the rule coverage issue. The two components are integrated into a co-training framework, which can be trained end-to-end to mutually enhance each other. We evaluate our model on five benchmarks for four popular text classification tasks, including sentiment analysis, topic classification, spam classification, and relation extraction. The results show that our model outperforms state-of-the-art weakly-supervised and semi-supervised methods, and achieves comparable performance with fully-supervised methods even without any labeled data.