Organizational Unit:
School of Music

Research Organization Registry ID
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)

Publication Search Results

Now showing 1 - 10 of 86
  • Item
    Addressing the data challenge in automatic drum transcription with labeled and unlabeled data
    (Georgia Institute of Technology, 2018-07-23) Wu, Chih-Wei
    Automatic Drum Transcription (ADT) is a sub-task of automatic music transcription that involves the conversion of drum-related audio events into musical notations. While noticeable progress has been made in the past by combining pattern recognition methods with audio signal processing techniques, many systems are still impeded by the lack of a meaningful amount of labeled data to support the data-driven algorithms. To address this data challenge in ADT, this work presents three approaches. First, a dataset for ADT tasks is created using a semi-automatic process that minimizes the workload of human annotators. Second, an ADT system that requires minimum training data is designed to account for the presence of other instruments (e.g., non-percussive or pitched instruments). Third, the possibility of improving generic ADT systems with a large amount of unlabeled data from online resources is explored. The main contributions of this work include the introduction of a new ADT dataset, the methods for realizing ADT systems under the constraint of data insufficiency, and a scheme for data-driven methods to benefit from the abundant online resources and might have impact on other audio and music related tasks traditionally impeded by small amounts of labeled data.
  • Item
    The algorithmic score language: Extending common western music notation for representing logical behaviors
    (Georgia Institute of Technology, 2018-05-22) Martinez Nieto, Juan Carlos
    This work proposes extensions to Western Music Notation so it can play a dual role: first as a human-readable representation of the music performance information in the context of live-electronics, and second as a programming language which is executed during the live performance of a piece. This novel approach simplifies the compositional workflow, the communication with performers, the musical analysis, and the actual performance of scored pieces that involve computer interactions. Extending Western Music Notation as a programming language creates musical scores which encode music information for performance that is human-readable, cohesive, self-contained and sustainable, making the interactive music genre attractive to a wide spectrum of composers and performers of new music. A collection of pieces was composed and performed based on the new extended notation and some repertoire pieces were transcribed enabling the syntax evaluation in the context of different compositional aesthetics. The results of this research created a unique approach to composition and performance of interactive music that is supported by technology and founded in traditional music practices that have been used for centuries.
  • Item
    Regressing dexterous finger flexions using machine learning and multi-channel single element ultrasound transducers
    (Georgia Institute of Technology, 2018-04-27) Hantrakul, Lamtharn
    Human Machine Interfaces or "HMI's" come in many shapes and sizes. The mouse and keyboard is a typical and familiar HMI. In applications such as Virtual Reality or Music performance, a precise HMI for tracking finger movement is often required. Ultrasound, a safe and non-invasive imaging technique, has shown great promise as an alternative HMI interface that addresses the shortcomings of vision-based and glove-based sensors. This thesis develops a first-in-class system enabling real-time regression of individual and simultaneous finger flexions using single element ultrasound transducers. A comprehensive dataset of ultrasound signals is collected is collected from a study of 10 users. A series of machine learning experiments using this dataset demonstrate promising results supporting the use of single element transducers as a HMI device.
  • Item
    Towards an embodied musical mind: Generative algorithms for robotic musicians
    (Georgia Institute of Technology, 2017-04-19) Bretan, Peter Mason
    Embodied cognition is a theory stating that the processes and functions comprising the human mind are influenced by a person's physical body. The theory of embodied musical cognition holds that a person's body largely influences his or her musical experiences and actions. This work presents multiple frameworks for computer music generation as it pertains to robotic musicianship such that the musical decisions result from a joint optimization between the robot's physical constraints and musical knowledge. First, a generative framework based on hand-designed higher level musical concepts and the Viterbi beam search algorithm is described. The system allows for efficient and autonomous exploration on the relationship between music and physicality and the resulting music that is contingent on such a connection. It is evaluated objectively based on its ability to plan a series of sound actuating robotic movements (path planning) that minimize risk of collision, the number of dropped notes, spurious movements, and energy expenditure. Second, a method for developing higher level musical concepts (semantics) based on machine learning is presented. Using strategies based on neural networks and deep learning we show that it is possible to learn perceptually meaningful higher-level representations of music. These learned musical ``embeddings'' are applied to an autonomous music generation system that utilizes unit selection. The embeddings and generative system are evaluated based on objective ranking tasks and a subjective listening study. Third, the method for learning musical semantics is extended to a robot such that its embodiment becomes integral to the learning process. The resulting embeddings simultaneously encode information describing both important musical features and the robot's physical constraints.
  • Item
    reNotate: The Crowdsourcing and Gamification of Symbolic Music Encoding
    (Georgia Institute of Technology, 2016-04) Taylor, Benjamin ; Shanahan, Daniel ; Wolf, Matthew ; Allison, Jesse ; Baker, David John
    Musicologists and music theorists have, for quite some time, hoped to be able to make use of computational methods to examine large corpora of music. As far back as the 1940s, an IBM card-sorter was used to implement patternfinding in traditional British folk songs (Bronson 1949, 1959). Alan Lomax famously implemented statistical methods in his Cantometrics project (Lomax, 1968), which sought to collate a large corpus of folk music from across many cultures. In the 1980s and 90s, a number of encoding projects were instituted in an attempt to be able to make searchable music notation on a large scale. The Essen Folksong Collection (Schaffrath, 1995) collected ethnographic transcriptions, whereas projects at the Center for Computer Assisted Research in the Humanities (CCARH) focused on scores in the Western Art Music tradition (Bach chorales, Mozart sonatas, instrumental themes, etc.). Recently, scholars have focused on improving Optical Music Recognition, in the hopes of facilitating the acquisition of large numbers of musical scores (Fujinaga, et al., 2014), but non-notated music, such as improvisational jazz, is often overlooked. While there have been many advances in music information retrieval in recent years, parameters that would facilitate in-depth musicological analysis are still out of reach (for example, stream segregation to examine specific melodic lines, or the analysis of harmony at a resolution that would allow for an analysis of specific chord voicings). Our project seeks to implement methods similar to those used in CAPTCHA and RECAPTCHA technology to crowdsource the symbolic encoding of musical information through a web-based gaming interface. The introductory levels ask participants to tap along with an audio recording's tempo, giving us an approximate BPM, while the second level asks for participants to tap with onsets. The third level asks them to match a contour of a three-note segment, and the final stage asks for specific note matching within that contour. A social-gaming interface allows for users to compete against one another. It is our hope that this work can be generalized to many types of musical genres, and that a web-based framework might facilitate the encoding of musicological and music-theoretic datasets that might be underrepresented by current MIR work.
  • Item
    Live Looping Electronic Music Performance with MIDI Hardware
    (Georgia Institute of Technology, 2016-04) McKegg, Matt
    I have spent much of the last three years building a Web Audio based desktop application for live electronic music performance called Loop Drop. It was inspired by my own frustration with existing tools for live performance. I wanted a tool that would give me, as a performer, the level of control and expression desired but still feel like playing a musical instrument instead of programming a computer. My application was built using web technologies such as JavaScript and HTML5, leveraging existing experience as a web developer and providing an excellent workflow for quick prototyping and user interface design. In combination with Electron, a single developer can build a desktop audio application very efficiently. Loop Drop uses Web MIDI to interface with hardware such as Novation Launchpad. The software allows creation of sounds using synthesis and sampling, and arranges these into “chunks” which may be placed in any configuration across the midi controller’s button grid. These sounds may be triggered directly, or played quantised to the current tempo at a given rate using “beat repeat”. Everything the performer plays is collected in a buffer that at any time may be turned into a loop. This allows the performer to avoid recording anxiety — a common problem with most live looping systems. They can jam out ideas, then once happy with the sequence, press the loop button to lock it in. In my performance, I will use Loop Drop in conjunction with multiple Novation Launchpad midi controllers, to improvise 15 minutes of electronic music using sounds that I have organised ahead of time. The user interface will be visible to the audience as a projection. I will also be demonstrating the power of hosting Web Audio in Electron by interfacing with an external LED array connected over Serial Peripheral Interface Bus (SPI) to be used as an audio visualiser light show.
  • Item
    An interactive, graphical coding environment for EarSketch online using Blockly and Web Audio API
    (Georgia Institute of Technology, 2016-04) Mahadevan, Anand ; Freeman, Jason ; Magerko, Brian
    This paper presents an interactive graphical programming environment for EarSketch, using Blockly and Web Audio API. This visual programming element sidesteps syntac- tical challenges common to learning text-based languages, thereby targeting a wider range of users in both informal and academic settings. The implementation allows seamless integration with the existing EarSketch web environment, saving block-based code to the cloud as well as exporting it to Python and JavaScript.
  • Item
    BPMTimeline: JavaScript Tempo Functions and Time Mappings using an Analytical Solution
    (Georgia Institute of Technology, 2016-04) Dias, Bruno ; Pinto, H. Sofia ; Matos, David M.
    Time mapping is a common feature in many (commercial and/or open-source) Digital Audio Workstations, allowing the musician to automate tempo changes of a musical performance or work, as well as to visualize the relation between score time (beats) and real/performance time (seconds). Unfortunately, available music production, performance and remixing tools implemented with web technologies like JavaScript and Web Audio API do not offer any mechanism for exible, and seamless, tempo manipulation and automation. In this paper, we present BPMTimeline, a time mapping library, providing a seamless mapping between score and performance time. To achieve this, we model tempo changes as tempo functions (a well documented subject in literature) and realize the mappings through integral and inverse of integral of tempo functions.
  • Item
    rMIXr: how we learned to stop worrying and love the graph
    (Georgia Institute of Technology, 2016-04) Fields, Ben ; Phippen, Sam
    In this talk we present a case study in the use of the web audio APIs. Specifically, our use of them for the creation of a rapidly developed prototype application. The app, called rMIXr (, is a simple digital audio workstation (DAW) for fan remix contests. We created rMIXr in 48 hours at the Midem Hack Day in June 2015. We’ll give a brief demo of the app and show multi-channel sync. We'll also show various effects as well as cutting/time-slicing.
  • Item
    Virtual Sound Gallery
    (Georgia Institute of Technology, 2016-04) Bundin, Andrey
    Virtual Sound Gallery (VSG) is a web stage for modern multichannel music, sound, and audiovisual art. It is an accessible, web-based virtual reality (VR) environment for a visualized binaural simulation of multichannel sound reproduction. In this environment, a user can change their location among virtual loudspeakers and rotate their head to get the best spatial listening experience. In addition, an integrated video engine provides the ability to play visual content on one or several virtual screens in sync with the audio. VSG provides access to different electroacoustic music compositions presented in several virtual exhibitions and classified by concepts, styles, and organizations.