Series
Doctor of Philosophy with a Major in Music Technology

Series Type
Degree Series
Description
Associated Organization(s)
Associated Organization(s)
Organizational Unit

Publication Search Results

Now showing 1 - 8 of 8
  • Item
    Machine Learning Driven Emotional Musical Prosody for Human-Robot Interaction
    (Georgia Institute of Technology, 2021-11-18) Savery, Richard
    This dissertation presents a method for non-anthropomorphic human-robot interaction using a newly developed concept entitled Emotional Musical Prosody (EMP). EMP consists of short expressive musical phrases capable of conveying emotions, which can be embedded in robots to accompany mechanical gestures. The main objective of EMP is to improve human engagement with, and trust in robots while avoiding the uncanny valley. We contend that music - one of the most emotionally meaningful human experiences - can serve as an effective medium to support human-robot engagement and trust. EMP allows for the development of personable, emotion-driven agents, capable of giving subtle cues to collaborators while presenting a sense of autonomy. We present four research areas aimed at developing and understanding the potential role of EMP in human-robot interaction. The first research area focuses on collecting and labeling a new EMP dataset from vocalists, and using this dataset to generate prosodic emotional phrases through deep learning methods. Through extensive listening tests, the collected dataset and generated phrases were validated with a high level of accuracy by a large subject pool. The second research effort focuses on understanding the effect of EMP in human-robot interaction with industrial and humanoid robots. Here, significant results were found for improved trust, perceived intelligence, and likeability of EMP enabled robotic arms, but not for humanoid robots. We also found significant results for improved trust in a social robot, as well as perceived intelligence, creativity and likeability in a robotic musician. The third and fourth research areas shift to broader use cases and potential methods to use EMP in HRI. The third research area explores the effect of robotic EMP on different personality types focusing on extraversion and neuroticism. For robots, personality traits offer a unique way to implement custom responses, individualized to human collaborators. We discovered that humans prefer robots with emotional responses based on high extraversion and low neuroticism, with some correlation between the humans collaborator’s own personality traits. The fourth and final research question focused on scaling up EMP to support interaction between groups of robots and humans. Here, we found that improvements in trust and likeability carried across from single robots to groups of industrial arms. Overall, the thesis suggests EMP is useful for improving trust and likeability for industrial, social and robot musicians but not in humanoid robots. The thesis bears future implications for HRI designers, showing the extensive potential of careful audio design, and the wide range of outcomes audio can have on HRI.
  • Item
    Composing and Decomposing Electroacoustic Sonifications: Towards a Functional-Aesthetic Sonification Design Framework
    (Georgia Institute of Technology, 2021-05-01) Tsuchiya, Takahiko
    The field of sonification invites musicians and scientists for creating novel auditory interfaces. However, the opportunities for incorporating musical design ideas into general functional sonifications have been limited because of the transparency and communication issues with musical aesthetics. This research proposes a new design framework that facilitates the use of musical ideas as well as a transparent representation or conveyance of data, verified with two human subjects tests. An online listening test analyzes the effect of the structural elements of sound as well as a guided analytical listening to the perceptibility of data. A design test examines the range of variety the framework affords and how the design process is affected by functional and aesthetic design goals. The results indicate that the framework elements, such as the synthetic models and mapping destinations affect the perceptibility of data, with some contradictions between the designer's general strategies and the listener's responses. The analytical listening nor the listener's musical background show little statistical trends, but instead imply complex relationships of types of interpretations and the structural understanding. There are also several contrasting types in the design and listening processes which indicate different levels of structural transparency as well as the applicability of a wider variety of designs.
  • Item
    Learning to manipulate latent representations of deep generative models
    (Georgia Institute of Technology, 2021-01-14) Pati, Kumar Ashis
    Deep generative models have emerged as a tool of choice for the design of automatic music composition systems. While these models are capable of learning complex representations from data, a limitation of many of these models is that they allow little to no control over the generated music. Latent representation-based models, such as Variational Auto-Encoders, have the potential to alleviate this limitation as they are able to encode hidden attributes of the data in a low-dimensional latent space. However, the encoded attributes are often not interpretable and cannot be explicitly controlled. The work presented in this thesis seeks to address these challenges by learning to manipulate and design latent spaces in a way that allows control over musically meaningful attributes that are understandable by humans. This in turn can allow explicit control of such attributes during the generation process and help users realize their compositional goals. Specifically, three different approaches are proposed to investigate this problem. The first approach shows that we can learn to traverse latent spaces of generative models to perform complex interactive music composition tasks. The second approach uses a novel latent space regularization technique which can encode individual musical attributes along specific dimensions of the latent space. The third approach attempts to use attribute-informed non-linear transformations over an existing latent space such that the transformed latent space allows controllable generation of data. In addition, the problem of disentanglement learning in the context of symbolic music is investigated systematically by proposing a tailor-made dataset for the task and evaluating the performance of several different methods for unsupervised and supervised disentanglement learning. Together, the proposed methods will help address critical shortcomings of deep music generative models and pave the path towards intuitive interfaces which can be used by humans in real compositional settings.
  • Item
    Weakly Supervised Learning for Musical Instrument Classification
    (Georgia Institute of Technology, 2020-08-18) Gururani, Siddharth Kumar
    Automatically recognizing musical instruments in audio recordings is an important task in music information retrieval (MIR). With increasing complexity of modeling techniques, the focus of the Musical Instrument Classification (MIC) task has shifted from single note audio analysis to MIC with real world polytimbral music. Increasingly complex models also increase the need for high quality labeled data. For the MIC task, there do not exist such large-scale fully annotated datasets. Instead researchers tend to utilize multi-track data to obtain fine-grained instrument activity annotation. Such datasets are also known as strongly labeled datasets (SLDs). These datasets are usually small and skewed in terms of genre and instrument distribution. Hence, SLDs are not the ideal choice for training generalizable MIC models. Recently, weakly labeled datasets (WLDs), with only clip-level annotations, have been presented. These are typically larger in scale than SLDs. However, methods popular in MIC literature are designed to be trained and evaluated SLDs. These do not naturally extend to the task of weakly labeled MIC. Additionally, during the labeling process, clips are not necessarily annotated with a class label for each instrument. This leads to missing labels in the dataset making it a partially labeled dataset. In this thesis, three methods are proposed to address challenges posed by weakly labeled and partially labeled data. The first one aims at learning using weak labels. The MIC task is formulated as a multi-instance multi-label classification problem. Under this framework, an attention-based model is proposed that can focus on salient instances in weakly labeled data. The other two methods focus on utilizing any information that may be gained from data with missing labels. These methods fall under the semi-supervised learning (SSL) framework, where models are trained using labeled and unlabeled data. The first semi-supervised method involves deep generative models that extend the unsupervised variational autoencoder to a semi-supervised model. The final method is based on consistency regularization-based SSL. The method proposed uses the mean teacher model, where a teacher model maintains a moving average or low-pass filtered version of a student model. The consistency regularization loss is unsupervised and may thus be applied to both labeled and unlabeled data. Additional experiments on music tagging with a large-scale WLD demonstrates the effectiveness of consistency regularization with limited labeled data. The methods presented in this thesis generally outperform methods developed using SLDs. The findings in this thesis not only impact the MIC task but also impact other music classification tasks where labeled data might be scarce. This thesis hopes to pave the way for future researchers to venture away from purely supervised learning and also consider weakly supervised approaches to solve MIR problems without access to large amounts of data.
  • Item
    Empathic Effects of Auditory Heartbeats: A Neurophysiological Investigation
    (Georgia Institute of Technology, 2020-04-22) Winters, Raymond Michael
    I hypothesized that hearing the heartbeat of another person would affect listeners’ empathic state, and designed an experiment to measure changes in behavior and cardiac neurophysiology. In my experiment, participants (N = 27) completed modified versions of the Reading the Mind in the Eyes Task (RMET) in different auditory heartbeat conditions (slow, fast, silence, audio-only). For each trial, participants completed two measures of empathic state: cognitive (“What is this person feeling?”) and affective (“How well could you feel what they were feeling?”). From my results, I found that the presence of auditory heartbeats i) changed cognitive empathy and ii) increased affective empathy, and these responses depended on the heartbeat tempo. I also analyzed two markers of cardiac neurophysiology: i) Heart Rate (HR) and ii) the Heartbeat-Evoked Potential (HEP). I found that the auditory heartbeat decreased listeners’ HR, and there were additional effects due to tempo and affective empathy. Finally, a frontal component of the HEP was more negative in the time-range of 350-500ms, which I attribute to a decrease in cardiac attention (i.e. “interoception”) when listening empathically to the heartbeat of others.
  • Item
    Addressing the data challenge in automatic drum transcription with labeled and unlabeled data
    (Georgia Institute of Technology, 2018-07-23) Wu, Chih-Wei
    Automatic Drum Transcription (ADT) is a sub-task of automatic music transcription that involves the conversion of drum-related audio events into musical notations. While noticeable progress has been made in the past by combining pattern recognition methods with audio signal processing techniques, many systems are still impeded by the lack of a meaningful amount of labeled data to support the data-driven algorithms. To address this data challenge in ADT, this work presents three approaches. First, a dataset for ADT tasks is created using a semi-automatic process that minimizes the workload of human annotators. Second, an ADT system that requires minimum training data is designed to account for the presence of other instruments (e.g., non-percussive or pitched instruments). Third, the possibility of improving generic ADT systems with a large amount of unlabeled data from online resources is explored. The main contributions of this work include the introduction of a new ADT dataset, the methods for realizing ADT systems under the constraint of data insufficiency, and a scheme for data-driven methods to benefit from the abundant online resources and might have impact on other audio and music related tasks traditionally impeded by small amounts of labeled data.
  • Item
    The algorithmic score language: Extending common western music notation for representing logical behaviors
    (Georgia Institute of Technology, 2018-05-22) Martinez Nieto, Juan Carlos
    This work proposes extensions to Western Music Notation so it can play a dual role: first as a human-readable representation of the music performance information in the context of live-electronics, and second as a programming language which is executed during the live performance of a piece. This novel approach simplifies the compositional workflow, the communication with performers, the musical analysis, and the actual performance of scored pieces that involve computer interactions. Extending Western Music Notation as a programming language creates musical scores which encode music information for performance that is human-readable, cohesive, self-contained and sustainable, making the interactive music genre attractive to a wide spectrum of composers and performers of new music. A collection of pieces was composed and performed based on the new extended notation and some repertoire pieces were transcribed enabling the syntax evaluation in the context of different compositional aesthetics. The results of this research created a unique approach to composition and performance of interactive music that is supported by technology and founded in traditional music practices that have been used for centuries.
  • Item
    Towards an embodied musical mind: Generative algorithms for robotic musicians
    (Georgia Institute of Technology, 2017-04-19) Bretan, Peter Mason
    Embodied cognition is a theory stating that the processes and functions comprising the human mind are influenced by a person's physical body. The theory of embodied musical cognition holds that a person's body largely influences his or her musical experiences and actions. This work presents multiple frameworks for computer music generation as it pertains to robotic musicianship such that the musical decisions result from a joint optimization between the robot's physical constraints and musical knowledge. First, a generative framework based on hand-designed higher level musical concepts and the Viterbi beam search algorithm is described. The system allows for efficient and autonomous exploration on the relationship between music and physicality and the resulting music that is contingent on such a connection. It is evaluated objectively based on its ability to plan a series of sound actuating robotic movements (path planning) that minimize risk of collision, the number of dropped notes, spurious movements, and energy expenditure. Second, a method for developing higher level musical concepts (semantics) based on machine learning is presented. Using strategies based on neural networks and deep learning we show that it is possible to learn perceptually meaningful higher-level representations of music. These learned musical ``embeddings'' are applied to an autonomous music generation system that utilizes unit selection. The embeddings and generative system are evaluated based on objective ranking tasks and a subjective listening study. Third, the method for learning musical semantics is extended to a robot such that its embodiment becomes integral to the learning process. The resulting embeddings simultaneously encode information describing both important musical features and the robot's physical constraints.