A Zero-Shot Annotation Projection Approach to Multilingual Token Classification

Author(s)
Shah, Vedaant
Advisor(s)
Editor(s)
Associated Organization(s)
Series
Supplementary to:
Abstract
A fundamental objective in the field of natural language processing is that of token classification. Specifically, identifying each token in a given input sentence from a set of discrete classes is a generalization of many other tasks. For example, one may wish to know the part of speech (POS) for each word in a sentence or the words which are most relevant to a certain domain (slot filling). In either case, each token in a sentence must be classified to accomplish the task. In this work, we specifically examine the case of zero-shot multilingual token classification, where training data is only provided in 1 language, and the model needs to generalize to new target languages. We present a novel annotation projection technique to improve upon the performance of baseline models on this objective. Compared to previous work, our method only takes place during inference time and not training time, allowing it to be applied in a variety of settings.
Sponsor
Date
Extent
Resource Type
Text
Resource Subtype
Undergraduate Research Option Thesis
Rights Statement
Rights URI