Organizational Unit:

School of Music

Permanent Link

https://hdl.handle.net/1853/70774

Parent Organization

Organizational Unit

College of Design

ArchiveSpace Name Record

https://archivesspace.library.gatech.edu/agents/agent_corporate_entity/134/edit

Full item page

Publication Search Results

Now showing 1 - 10 of 98

Rhythm Recreation Study To Inform Intelligent Pedagogy Systems

(Georgia Institute of Technology, 2023-08-28) Alben, Noel

Web-based intelligent pedagogy systems have great potential to provide interactive music lessons to those unable to access conventional, face-to-face music instruction from human experts. A key component of any effective pedagogy system is the expert domain knowledge used to generate, present, and evaluate the teachable content that makes up the ''syllabus'' of the system (Brusilovskiy, 1994). In this work, we investigate the application of computational musicology algorithms to devise the ''syllabus'' of intelligent rhythm pedagogy software. Many computational metrics that quantify and characterize rhythmic patterns have been proposed (Toussaint). We employ Cao et al.'s (2012) family theory of rhythms as a metric of rhythmic similarity and an entropy-based coded-element metric of rhythmic complexity (Thul, 2008). Both metrics have been shown to correlate with human judgments of rhythmic similarity and complexity. A rhythmic syllabus that uses these metrics to determine the order in which rhythmic patterns are learned will be easier for musicians to progress through. We test this hypothesis in a rhythm reproduction study hosted on a custom-designed web-based experimental interface. Our experiment consists of six individual blocks: In each block, a participant listens to five unique rhythmic patterns, which they must then reproduce by clapping into their computer's microphone. Each rhythmic pattern is two measures long on an eighth-note grid, presented at 105 BPM, and looped four times. The order and content of rhythmic patterns within each block are determined using our chosen complexity and similarity metrics. A participant completes a block when they reproduce all the rhythmic patterns of the block within the performance constraints defined by automatic performance assessment built into the experimental interface. Each of our six blocks represents key interactions: the order of the stimuli determined by our prescribed metrics, melodic information added to the rhythmic stimuli, and the presence of a visual representation of the rhythmic pattern. We also have control blocks where the patterns of each block are selected randomly without any theoretically informed metrics. Dependent variables to measure the effectiveness of the syllabus are the number of trials taken to reproduce a given rhythmic stimuli accurately. Participant reproductions are stored to afford future analyses, and the designed interface helps efficiently automate the data collection, making it more accessible for future rhythm reproduction studies. We conducted the rhythm recreation study with 28 participants across the United States, who accessed the experiment through a web-based portal. The data gathered from our experiment implies that computational music theory algorithms can contribute to creating syllabi that align with human perception. However, these results deviate from my initial predictions. Furthermore, It appears that while incorporating visual stimuli aided in learning rhythmic patterns, the introduction of pitched onsets negatively affected reproduction performance.
Toward Natural Singing Via External Prosthesis

(Georgia Institute of Technology, 2022-12-15) Irvin, Bryce

The accessibility of expressive singing is limited by the physical mechanisms that produce speech and singing. For individuals without these physical mechanisms, singing is either difficult or impossible. Through this work, we propose the development of an external electronic prosthesis capable of inducing a natural singing voice in a performer without the need for traditional singing mechanisms. The novelty introduced by this prosthesis will serve as a new way for performers of any background and ability to express themselves and participate in social music activities. Specifically, we first aim to resolve issues with common prosthesis transducers. We then aim to discover methods for inducing the most natural singing voice in users, focusing on the nature of the excitation waveform used to drive the transducer of the prosthesis.
Using music to modulate emotional memory

(Georgia Institute of Technology, 2021-12-14) Mehdizadeh, Sophia Kaltsouni

Music is powerful in both affecting emotion and evoking memory. This thesis explores if music might be able to modulate, or change, aspects of our emotional episodic memories. We present a behavioral, human-subjects experiment with a cognitive memory task targeting the reconsolidation mechanism. Memory reconsolidation allows for a previous experience to be relived and simultaneously reframed in memory. Moreover, reconsolidation of emotional, potentially maladaptive, autobiographical episodic memories has become a research focus in the development of new affective psychotherapy protocols. To this end, we propose that music may be a useful tool in driving and reshaping our memories and their associated emotions. This thesis additionally focuses on the roles that affect and preference may play in these memory processes. Through this research, we provide evidence supporting music’s ability to serve as a context for emotional autobiographical episodic memories. Overall, our results suggest that affective characteristics of the music and the emotions induced in the listener significantly inﬂuence memory creation and retrieval, and that furthermore, the musical emotion may be equally as powerful as the musical structure in contextualizing and cueing memories. We also find support for individual differences and personal relevance of the musical context playing a determining role in these processes. This thesis establishes a foundation for subsequent neuroimaging work and future clinical research directions.
Machine Learning Driven Emotional Musical Prosody for Human-Robot Interaction

(Georgia Institute of Technology, 2021-11-18) Savery, Richard

This dissertation presents a method for non-anthropomorphic human-robot interaction using a newly developed concept entitled Emotional Musical Prosody (EMP). EMP consists of short expressive musical phrases capable of conveying emotions, which can be embedded in robots to accompany mechanical gestures. The main objective of EMP is to improve human engagement with, and trust in robots while avoiding the uncanny valley. We contend that music - one of the most emotionally meaningful human experiences - can serve as an effective medium to support human-robot engagement and trust. EMP allows for the development of personable, emotion-driven agents, capable of giving subtle cues to collaborators while presenting a sense of autonomy. We present four research areas aimed at developing and understanding the potential role of EMP in human-robot interaction. The first research area focuses on collecting and labeling a new EMP dataset from vocalists, and using this dataset to generate prosodic emotional phrases through deep learning methods. Through extensive listening tests, the collected dataset and generated phrases were validated with a high level of accuracy by a large subject pool. The second research effort focuses on understanding the effect of EMP in human-robot interaction with industrial and humanoid robots. Here, significant results were found for improved trust, perceived intelligence, and likeability of EMP enabled robotic arms, but not for humanoid robots. We also found significant results for improved trust in a social robot, as well as perceived intelligence, creativity and likeability in a robotic musician. The third and fourth research areas shift to broader use cases and potential methods to use EMP in HRI. The third research area explores the effect of robotic EMP on different personality types focusing on extraversion and neuroticism. For robots, personality traits offer a unique way to implement custom responses, individualized to human collaborators. We discovered that humans prefer robots with emotional responses based on high extraversion and low neuroticism, with some correlation between the humans collaborator’s own personality traits. The fourth and final research question focused on scaling up EMP to support interaction between groups of robots and humans. Here, we found that improvements in trust and likeability carried across from single robots to groups of industrial arms. Overall, the thesis suggests EMP is useful for improving trust and likeability for industrial, social and robot musicians but not in humanoid robots. The thesis bears future implications for HRI designers, showing the extensive potential of careful audio design, and the wide range of outcomes audio can have on HRI.
Composing and Decomposing Electroacoustic Sonifications: Towards a Functional-Aesthetic Sonification Design Framework

(Georgia Institute of Technology, 2021-05-01) Tsuchiya, Takahiko

The field of sonification invites musicians and scientists for creating novel auditory interfaces. However, the opportunities for incorporating musical design ideas into general functional sonifications have been limited because of the transparency and communication issues with musical aesthetics. This research proposes a new design framework that facilitates the use of musical ideas as well as a transparent representation or conveyance of data, verified with two human subjects tests. An online listening test analyzes the effect of the structural elements of sound as well as a guided analytical listening to the perceptibility of data. A design test examines the range of variety the framework affords and how the design process is affected by functional and aesthetic design goals. The results indicate that the framework elements, such as the synthetic models and mapping destinations affect the perceptibility of data, with some contradictions between the designer's general strategies and the listener's responses. The analytical listening nor the listener's musical background show little statistical trends, but instead imply complex relationships of types of interpretations and the structural understanding. There are also several contrasting types in the design and listening processes which indicate different levels of structural transparency as well as the applicability of a wider variety of designs.
Learning to manipulate latent representations of deep generative models

(Georgia Institute of Technology, 2021-01-14) Pati, Kumar Ashis

Deep generative models have emerged as a tool of choice for the design of automatic music composition systems. While these models are capable of learning complex representations from data, a limitation of many of these models is that they allow little to no control over the generated music. Latent representation-based models, such as Variational Auto-Encoders, have the potential to alleviate this limitation as they are able to encode hidden attributes of the data in a low-dimensional latent space. However, the encoded attributes are often not interpretable and cannot be explicitly controlled. The work presented in this thesis seeks to address these challenges by learning to manipulate and design latent spaces in a way that allows control over musically meaningful attributes that are understandable by humans. This in turn can allow explicit control of such attributes during the generation process and help users realize their compositional goals. Specifically, three different approaches are proposed to investigate this problem. The first approach shows that we can learn to traverse latent spaces of generative models to perform complex interactive music composition tasks. The second approach uses a novel latent space regularization technique which can encode individual musical attributes along specific dimensions of the latent space. The third approach attempts to use attribute-informed non-linear transformations over an existing latent space such that the transformed latent space allows controllable generation of data. In addition, the problem of disentanglement learning in the context of symbolic music is investigated systematically by proposing a tailor-made dataset for the task and evaluating the performance of several different methods for unsupervised and supervised disentanglement learning. Together, the proposed methods will help address critical shortcomings of deep music generative models and pave the path towards intuitive interfaces which can be used by humans in real compositional settings.
Directed Evolution in Live Coding Music Performance

(Georgia Institute of Technology, 2020-10-24) Dasari, Sandeep ; Freeman, Jason

Genetic algorithms are extensively used to understand, simulate, and create works of art and music. In this paper, a similar approach is taken to apply basic evolutionary algorithms to perform music live using code. Often considered an improvisational or experimental performance, live coding music comes with its own set of challenges. Genetic algorithms offer potential to address these long-standing challenges. Traditional evolutionary applications in music focused on novelty search to create new sounds, sequences of notes or chords, and effects. In contrast, this paper focuses on live performance to create directed evolving musical pieces. The paper also details some key design decisions, implementation, and usage of a novel genetic algorithm API created for a popular live coding language.
Weakly Supervised Learning for Musical Instrument Classification

(Georgia Institute of Technology, 2020-08-18) Gururani, Siddharth Kumar

Automatically recognizing musical instruments in audio recordings is an important task in music information retrieval (MIR). With increasing complexity of modeling techniques, the focus of the Musical Instrument Classification (MIC) task has shifted from single note audio analysis to MIC with real world polytimbral music. Increasingly complex models also increase the need for high quality labeled data. For the MIC task, there do not exist such large-scale fully annotated datasets. Instead researchers tend to utilize multi-track data to obtain fine-grained instrument activity annotation. Such datasets are also known as strongly labeled datasets (SLDs). These datasets are usually small and skewed in terms of genre and instrument distribution. Hence, SLDs are not the ideal choice for training generalizable MIC models. Recently, weakly labeled datasets (WLDs), with only clip-level annotations, have been presented. These are typically larger in scale than SLDs. However, methods popular in MIC literature are designed to be trained and evaluated SLDs. These do not naturally extend to the task of weakly labeled MIC. Additionally, during the labeling process, clips are not necessarily annotated with a class label for each instrument. This leads to missing labels in the dataset making it a partially labeled dataset. In this thesis, three methods are proposed to address challenges posed by weakly labeled and partially labeled data. The first one aims at learning using weak labels. The MIC task is formulated as a multi-instance multi-label classification problem. Under this framework, an attention-based model is proposed that can focus on salient instances in weakly labeled data. The other two methods focus on utilizing any information that may be gained from data with missing labels. These methods fall under the semi-supervised learning (SSL) framework, where models are trained using labeled and unlabeled data. The first semi-supervised method involves deep generative models that extend the unsupervised variational autoencoder to a semi-supervised model. The final method is based on consistency regularization-based SSL. The method proposed uses the mean teacher model, where a teacher model maintains a moving average or low-pass filtered version of a student model. The consistency regularization loss is unsupervised and may thus be applied to both labeled and unlabeled data. Additional experiments on music tagging with a large-scale WLD demonstrates the effectiveness of consistency regularization with limited labeled data. The methods presented in this thesis generally outperform methods developed using SLDs. The findings in this thesis not only impact the MIC task but also impact other music classification tasks where labeled data might be scarce. This thesis hopes to pave the way for future researchers to venture away from purely supervised learning and also consider weakly supervised approaches to solve MIR problems without access to large amounts of data.
The sound within: Learning audio features from electroencephalogram recordings of music listening

(Georgia Institute of Technology, 2020-04-28) Vinay, Ashvala

We look at the intersection of music, machine Learning and neuroscience. Specifically, we are interested in understanding how we can predict audio onset events by using the electroencephalogram response of subjects listening to the same music segment. We present models and approaches to this problem using approaches derived by deep learning. We worked with a highly imbalanced dataset and present methods to solve it - tolerance windows and aggregations. Our presented methods are a feed-forward network, a convolutional neural network (CNN), a recurrent neural network (RNN) and a RNN with a custom unrolling method. Our results find that at a tolerance window of 40 ms, a feed-forward network performed well. We also found that an aggregation of 200 ms suggested promising results, with aggregations being a simple way to reduce model complexity.
Empathic Effects of Auditory Heartbeats: A Neurophysiological Investigation

(Georgia Institute of Technology, 2020-04-22) Winters, Raymond Michael

I hypothesized that hearing the heartbeat of another person would affect listeners’ empathic state, and designed an experiment to measure changes in behavior and cardiac neurophysiology. In my experiment, participants (N = 27) completed modified versions of the Reading the Mind in the Eyes Task (RMET) in different auditory heartbeat conditions (slow, fast, silence, audio-only). For each trial, participants completed two measures of empathic state: cognitive (“What is this person feeling?”) and affective (“How well could you feel what they were feeling?”). From my results, I found that the presence of auditory heartbeats i) changed cognitive empathy and ii) increased affective empathy, and these responses depended on the heartbeat tempo. I also analyzed two markers of cardiac neurophysiology: i) Heart Rate (HR) and ii) the Heartbeat-Evoked Potential (HEP). I found that the auditory heartbeat decreased listeners’ HR, and there were additional effects due to tempo and affective empathy. Finally, a frontal component of the HEP was more negative in the time-range of 350-500ms, which I attribute to a decrease in cardiac attention (i.e. “interoception”) when listening empathically to the heartbeat of others.