Towards perceptual metrics for audio quality assessment

Author(s)
Vinay, Ashvala
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
Supplementary to:
Abstract
Recent years have seen considerable advances in audio synthesis with deep generative models known as Neural Audio Synthesizers (NAS). However, the state-of-the-art is very difficult to quantify directly; different studies use different evaluation methodologies and metrics when reporting results, making a direct comparison to other systems difficult. Furthermore, in the case of objective metrics, the perceptual relevance and meaning of the reported metrics in most cases is unknown, prohibiting any conclusive insights with respect to the generated audio quality. This dissertation work focuses on: 1. Investigating how effective contemporary metrics are at capturing the audio quality of sounds generated from NAS systems, 2. Assessing the behavior of popularly used embedding distance measures and the embeddings that are used and, 3. Designing metrics that can measure the difference in audio quality between two signals, given that one of them is a reference signal. Thus, the proposed work will focus on developing objective metrics to provide comparable, reproducible and perceptually meaningful audio quality measurement strategies to help set the foundation for future NAS development.
Sponsor
Date
2024-08-28
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI