Interpretable Quantitative Evaluation Metrics Of Generative Models In Symbolic Music

Author(s)
Sun, Qianyi
Advisor(s)
Arthur, Claire
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
Supplementary to:
Abstract
Despite the emergence of innovative architectures claiming improved capabilities in modeling human-level creativity, state-of-the-art generative music systems still struggle with creating musical content that follows technical rules and expectations. The conventional subjective evaluation method for generative models can introduce bias and also lacks transparency, rigor, and reproducibility, emphasizing the need for more quantitative metrics. However, existing approaches to quantitative evaluation have either relied on overly- broad criteria that do not capture higher-level music theoretic properties nor perceptual properties, or are narrowly tailored to the design of a specific model, limiting their generalizability. To address this, this thesis proposes a reproducible and interpretable framework for evaluating the output of symbolic music generation models using musicologically and perceptually informed quantitative metrics. Specifically, I assessed the performance of two prominent models, FolkRNN and Jazz Transformer, by comparing the models’ training data against their generated results through systematic computational music analysis. Benchmark testing revealed this approach to surpass the discriminative capabilities of the widely cited, seminal quantitative metrics proposed by Yang and Lerch, offering a more ecologically-valid assessment of model behavior and highlighting areas for targeted improvement. To further substantiate the perceptual validity of my metrics, I then reported results from a listening study employing a Turing Test disguised as a style-classification task. This experiment tested for the salient musical features that influences individuals’ decision-making in identifying stylistic provenance and provided insights into the perceptual dimensions of style imitation challenges in AI music generation. Together, my findings hold the potential to advance the reliability and validity of AI-generated music by incorporating human perceptual attributes.
Sponsor
Date
2024-06-05
Extent
Resource Type
Text
Resource Subtype
Thesis
Rights Statement
Rights URI