Title:
Adaptation of hybrid deep neural network-hidden Markov model speech recognition system using a sub-space approach

dc.contributor.advisor Anderson, David V.
dc.contributor.author Rizwan, Muhammad
dc.contributor.committeeMember Clements, Mark A.
dc.contributor.committeeMember Davenport, Mark A.
dc.contributor.committeeMember Inan, Omer T.
dc.contributor.committeeMember Liu, Fang (Cherry)
dc.contributor.committeeMember Daley, Wayne
dc.contributor.department Electrical and Computer Engineering
dc.date.accessioned 2018-08-20T15:31:14Z
dc.date.available 2018-08-20T15:31:14Z
dc.date.created 2017-08
dc.date.issued 2017-08-02
dc.date.submitted August 2017
dc.date.updated 2018-08-20T15:31:14Z
dc.description.abstract The performance of automatic speech recognition (ASR) system can be enhanced by adaptation of the ASR for a particular speaker or a group of speakers. In ASR, training and testing data often do not follow the same statistics; they are often mismatched, which leads to a gap in performance. The difference between training and testing statistics can be minimized by speaker adaptation techniques, which require adaptation data from a target speaker to optimize system performance. In many cases, only a limited amount of adaptation data is available for the target speaker. This thesis proposes multiple methods for the adaptation of speech recognition system by using a limited amount of data (a few words). The first method classifies accent of a speaker to identify variability in speaking style. Results indicated that using multiple words from a speaker can be efficient and can provide better accent classification accuracy. Next adaptive phoneme classification method is proposed based on target speaker similarity with speakers in the training data. DNNs last hidden layer activations are found to be more useful in identifying phoneme classes of frames as compared with traditional raw Mel-frequency cepstral coefficients as features. Finally, speaker adaptation of ASR is presented by augmenting the speech features with the speaker features. The universal background sparse coding can provide useful speaker information for the speaker adaptation. These methods may lead to some new opportunities for research for the adaptation of the ASR.
dc.description.degree Ph.D.
dc.format.mimetype application/pdf
dc.identifier.uri http://hdl.handle.net/1853/60171
dc.language.iso en_US
dc.publisher Georgia Institute of Technology
dc.subject Speaker adaptation
dc.subject Adaptive phoneme classification
dc.subject Deep neural networks
dc.subject Accent classification
dc.title Adaptation of hybrid deep neural network-hidden Markov model speech recognition system using a sub-space approach
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Anderson, David V.
local.contributor.corporatename School of Electrical and Computer Engineering
local.contributor.corporatename College of Engineering
relation.isAdvisorOfPublication eefeec08-2c7a-4e05-9f4b-7d25059e20a0
relation.isOrgUnitOfPublication 5b7adef2-447c-4270-b9fc-846bd76f80f2
relation.isOrgUnitOfPublication 7c022d60-21d5-497c-b552-95e489a06569
thesis.degree.level Doctoral
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
RIZWAN-DISSERTATION-2017.pdf
Size:
1.13 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
3.87 KB
Format:
Plain Text
Description: