A Zero-Shot Annotation Projection Approach to Multilingual Token Classification
Author(s)
Shah, Vedaant
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
A fundamental objective in the field of natural language processing is that of token classification.
Specifically, identifying each token in a given input sentence from a set of discrete classes is a
generalization of many other tasks. For example, one may wish to know the part of speech (POS)
for each word in a sentence or the words which are most relevant to a certain domain (slot
filling). In either case, each token in a sentence must be classified to accomplish the task. In this
work, we specifically examine the case of zero-shot multilingual token classification, where
training data is only provided in 1 language, and the model needs to generalize to new target
languages. We present a novel annotation projection technique to improve upon the performance
of baseline models on this objective. Compared to previous work, our method only takes place
during inference time and not training time, allowing it to be applied in a variety of settings.
Sponsor
Date
Extent
Resource Type
Text
Resource Subtype
Undergraduate Research Option Thesis