Organizational Unit:
School of Music

Research Organization Registry ID
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)

Publication Search Results

Now showing 1 - 4 of 4
  • Item
    Learning to manipulate latent representations of deep generative models
    (Georgia Institute of Technology, 2021-01-14) Pati, Kumar Ashis
    Deep generative models have emerged as a tool of choice for the design of automatic music composition systems. While these models are capable of learning complex representations from data, a limitation of many of these models is that they allow little to no control over the generated music. Latent representation-based models, such as Variational Auto-Encoders, have the potential to alleviate this limitation as they are able to encode hidden attributes of the data in a low-dimensional latent space. However, the encoded attributes are often not interpretable and cannot be explicitly controlled. The work presented in this thesis seeks to address these challenges by learning to manipulate and design latent spaces in a way that allows control over musically meaningful attributes that are understandable by humans. This in turn can allow explicit control of such attributes during the generation process and help users realize their compositional goals. Specifically, three different approaches are proposed to investigate this problem. The first approach shows that we can learn to traverse latent spaces of generative models to perform complex interactive music composition tasks. The second approach uses a novel latent space regularization technique which can encode individual musical attributes along specific dimensions of the latent space. The third approach attempts to use attribute-informed non-linear transformations over an existing latent space such that the transformed latent space allows controllable generation of data. In addition, the problem of disentanglement learning in the context of symbolic music is investigated systematically by proposing a tailor-made dataset for the task and evaluating the performance of several different methods for unsupervised and supervised disentanglement learning. Together, the proposed methods will help address critical shortcomings of deep music generative models and pave the path towards intuitive interfaces which can be used by humans in real compositional settings.
  • Item
    Weakly Supervised Learning for Musical Instrument Classification
    (Georgia Institute of Technology, 2020-08-18) Gururani, Siddharth Kumar
    Automatically recognizing musical instruments in audio recordings is an important task in music information retrieval (MIR). With increasing complexity of modeling techniques, the focus of the Musical Instrument Classification (MIC) task has shifted from single note audio analysis to MIC with real world polytimbral music. Increasingly complex models also increase the need for high quality labeled data. For the MIC task, there do not exist such large-scale fully annotated datasets. Instead researchers tend to utilize multi-track data to obtain fine-grained instrument activity annotation. Such datasets are also known as strongly labeled datasets (SLDs). These datasets are usually small and skewed in terms of genre and instrument distribution. Hence, SLDs are not the ideal choice for training generalizable MIC models. Recently, weakly labeled datasets (WLDs), with only clip-level annotations, have been presented. These are typically larger in scale than SLDs. However, methods popular in MIC literature are designed to be trained and evaluated SLDs. These do not naturally extend to the task of weakly labeled MIC. Additionally, during the labeling process, clips are not necessarily annotated with a class label for each instrument. This leads to missing labels in the dataset making it a partially labeled dataset. In this thesis, three methods are proposed to address challenges posed by weakly labeled and partially labeled data. The first one aims at learning using weak labels. The MIC task is formulated as a multi-instance multi-label classification problem. Under this framework, an attention-based model is proposed that can focus on salient instances in weakly labeled data. The other two methods focus on utilizing any information that may be gained from data with missing labels. These methods fall under the semi-supervised learning (SSL) framework, where models are trained using labeled and unlabeled data. The first semi-supervised method involves deep generative models that extend the unsupervised variational autoencoder to a semi-supervised model. The final method is based on consistency regularization-based SSL. The method proposed uses the mean teacher model, where a teacher model maintains a moving average or low-pass filtered version of a student model. The consistency regularization loss is unsupervised and may thus be applied to both labeled and unlabeled data. Additional experiments on music tagging with a large-scale WLD demonstrates the effectiveness of consistency regularization with limited labeled data. The methods presented in this thesis generally outperform methods developed using SLDs. The findings in this thesis not only impact the MIC task but also impact other music classification tasks where labeled data might be scarce. This thesis hopes to pave the way for future researchers to venture away from purely supervised learning and also consider weakly supervised approaches to solve MIR problems without access to large amounts of data.
  • Item
    Addressing the data challenge in automatic drum transcription with labeled and unlabeled data
    (Georgia Institute of Technology, 2018-07-23) Wu, Chih-Wei
    Automatic Drum Transcription (ADT) is a sub-task of automatic music transcription that involves the conversion of drum-related audio events into musical notations. While noticeable progress has been made in the past by combining pattern recognition methods with audio signal processing techniques, many systems are still impeded by the lack of a meaningful amount of labeled data to support the data-driven algorithms. To address this data challenge in ADT, this work presents three approaches. First, a dataset for ADT tasks is created using a semi-automatic process that minimizes the workload of human annotators. Second, an ADT system that requires minimum training data is designed to account for the presence of other instruments (e.g., non-percussive or pitched instruments). Third, the possibility of improving generic ADT systems with a large amount of unlabeled data from online resources is explored. The main contributions of this work include the introduction of a new ADT dataset, the methods for realizing ADT systems under the constraint of data insufficiency, and a scheme for data-driven methods to benefit from the abundant online resources and might have impact on other audio and music related tasks traditionally impeded by small amounts of labeled data.
  • Item
    Supervised feature learning via sparse coding for music information rerieval
    (Georgia Institute of Technology, 2015-04-24) O'Brien, Cian John
    This thesis explores the ideas of feature learning and sparse coding for Music Information Retrieval (MIR). Sparse coding is an algorithm which aims to learn new feature representations from data automatically. In contrast to previous work which uses sparse coding in an MIR context the concept of supervised sparse coding is also investigated, which makes use of the ground-truth labels explicitly during the learning process. Here sparse coding and supervised coding are applied to two MIR problems: classification of musical genre and recognition of the emotional content of music. A variation of Label Consistent K-SVD is used to add supervision during the dictionary learning process. In the case of Music Genre Recognition (MGR) an additional discriminative term is added to encourage tracks from the same genre to have similar sparse codes. For Music Emotion Recognition (MER) a linear regression term is added to learn an optimal classifier and dictionary pair. These results indicate that while sparse coding performs well for MGR, the additional supervision fails to improve the performance. In the case of MER, supervised coding significantly outperforms both standard sparse coding and commonly used designed features, namely MFCC and pitch chroma.