Modeling of Language-Universal Speech Attributes for Multilingual Speech Recognition and Processing
Author(s)
Yen, Hao
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
The performance of multilingual automatic speech recognition (ASR) systems critically depends on their ability to generalize across diverse languages and linguistic environments. Conventional ASR approaches typically involve training separate models for each language, which can be challenging due to the inherent variability in phonetic and linguistic structures across languages. Alternatively, some methods train a single model on data from multiple languages, but these models often struggle to effectively capture the unique characteristics of each language, particularly when there is a significant imbalance in training data. This disparity can lead to substantial performance degradations, especially in scenarios involving languages with limited or no training data. To address these issues in multilingual speech recognition, it is essential to develop approaches that operate uniformly across multiple languages without being constrained by language-specific characteristics. Current systems often fail to scale effectively due to their reliance on language-dependent tokens such as phonemes and characters, which are not universally applicable. This limitation poses significant challenges in building robust multilingual systems capable of accommodating a wide variety of languages and dialects.
In this dissertation, we aim to establish a language-universal framework for ASR that overcomes language-specific limitations by leveraging universal speech attributes, such as manner and place of articulation. These attributes, which remain consistent across all languages, serve as a foundation for building multilingual models capable of performing effectively across diverse linguistic settings. Our approach seeks to address the lack of knowledge sharing across languages due to linguistic distance, where traditional language-dependent tokens are insufficient. By utilizing a compact set of language-universal speech attributes, we aim to bridge the performance gap for low-resource and unseen languages, enhancing the adaptability and scalability of ASR systems.
Sponsor
Date
2025-08-07
Extent
Resource Type
Text
Resource Subtype
Dissertation (PhD)