Integrating Distributional, Compositional, and Relational Approaches to Neural Word Representations

Pinter, Yuval D.

Title:

Integrating Distributional, Compositional, and Relational Approaches to Neural Word Representations

Files

PINTER-DISSERTATION-2021.pdf (1.42 MB)

Author(s)

Pinter, Yuval D.

Advisor(s)

Eisenstein, Jacob

Advisor(s)

Person

Eisenstein, Jacob

Associated Organization(s)

Organizational Unit

College of Computing

Organizational Unit

School of Interactive Computing

Collections

Theses and Dissertations

Permanent Link

http://hdl.handle.net/1853/67113

Abstract

When the field of natural language processing (NLP) entered the era of deep neural networks, the task of representing basic units of language, an inherently sparse and symbolic medium, using low-dimensional dense real-valued vectors, or embeddings, became crucial. The dominant technique to perform this task has for years been to segment input text sequences into space-delimited words, for which embeddings are trained over a large corpus by means of leveraging distributional information: a word is reducible to the set of contexts it appears in. This approach is powerful but imperfect; words not seen during the embedding learning phase, known as out-of-vocabulary words (OOVs), emerge in any plausible application where embeddings are used. One approach applied in order to combat this and other shortcomings is the incorporation of compositional information obtained from the surface form of words, enabling the representation of morphological regularities and increasing robustness to typographical errors. Another approach leverages word-sense information and relations curated in large semantic graph resources, offering a supervised signal for embedding space structure and improving representations for domain-specific rare words. In this dissertation, I offer several analyses and remedies for the OOV problem based on the utilization of character-level compositional information in multiple languages and the structure of semantic knowledge in English. In addition, I provide two novel datasets for the continued exploration of vocabulary expansion in English: one with a taxonomic emphasis on novel word formation, and the other generated by a real-world data-driven use case in the entity graph domain. Finally, recognizing the recent shift in NLP towards contextualized representations of subword tokens, I describe the form in which the OOV problem still appears in these methods, and apply an integrative compositional model to address it.

Date Issued

2021-06-17

Resource Type

Text

Resource Subtype

Dissertation

Full item page

Title:

Integrating Distributional, Compositional, and Relational Approaches to Neural Word Representations

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Georgia Tech Library

Title: Integrating Distributional, Compositional, and Relational Approaches to Neural Word Representations

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Title:

Integrating Distributional, Compositional, and Relational Approaches to Neural Word Representations