Title:
ATTENTION-BASED CONVOLUTIONAL NEURAL NETWORK MODEL AND ITS COMBINATION WITH FEW-SHOT LEARNING FOR AUDIO CLASSIFICATION

dc.contributor.advisor Anderson, David V.
dc.contributor.author Wang, You
dc.contributor.committeeMember Rozell, Christopher J.
dc.contributor.committeeMember Davenport, Mark A.
dc.contributor.committeeMember Plötz, Thomas
dc.contributor.committeeMember Dyer, Eva L.
dc.contributor.department Electrical and Computer Engineering
dc.date.accessioned 2022-08-25T13:39:52Z
dc.date.available 2022-08-25T13:39:52Z
dc.date.created 2022-08
dc.date.issued 2022-07-30
dc.date.submitted August 2022
dc.date.updated 2022-08-25T13:39:52Z
dc.description.abstract Environmental sound and acoustic scene classification are crucial tasks in audio signal processing and audio pattern recognition. In recent years, deep learning methods such as convolutional neural networks (CNN), recurrent neural networks (RNN), and their com- binations, have achieved great success in such tasks. However, there are still numerous challenges left to be addressed in this domain. For example, in most cases, the sound events of interest will be present through only a portion of the entire audio clip, and the clip can also suffer from the background noise. Furthermore, in many application scenarios where the amount of labelled training data can be very limited, the application of few- shot learning methods especially prototypical networks have achieved great success. But metric learning methods such as prototypical networks often suffer from bad feature em- beddings of support samples or outliers, or may not perform well on noisy data. Therefore, the proposed work seeks to overcome the above limitations by introducing a multi-channel temporal attention-based CNN model and then introduce a hybrid attention module into the framework of prototypical networks. Additionally, a Π-model is integrated into our model to improve performance on noisy data, and a new time-frequency feature is explored. Var- ious experiments have shown that our proposed framework is capable of dealing with the above mentioned issues and providing promising results.
dc.description.degree Ph.D.
dc.format.mimetype application/pdf
dc.identifier.uri http://hdl.handle.net/1853/67314
dc.language.iso en_US
dc.publisher Georgia Institute of Technology
dc.subject Deep learning
dc.subject Attention mechanism
dc.subject Few-shot learning
dc.subject Audio classification
dc.title ATTENTION-BASED CONVOLUTIONAL NEURAL NETWORK MODEL AND ITS COMBINATION WITH FEW-SHOT LEARNING FOR AUDIO CLASSIFICATION
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Anderson, David V.
local.contributor.corporatename School of Electrical and Computer Engineering
local.contributor.corporatename College of Engineering
relation.isAdvisorOfPublication eefeec08-2c7a-4e05-9f4b-7d25059e20a0
relation.isOrgUnitOfPublication 5b7adef2-447c-4270-b9fc-846bd76f80f2
relation.isOrgUnitOfPublication 7c022d60-21d5-497c-b552-95e489a06569
thesis.degree.level Doctoral
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
WANG-DISSERTATION-2022.pdf
Size:
12.05 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
3.86 KB
Format:
Plain Text
Description: