Title:
Discriminative and adaptive training for robust speech recognition and understanding

dc.contributor.author Meng, Zhong
dc.contributor.committeeMember Lee, Chin-Hui
dc.contributor.committeeMember Moore, Elliot
dc.contributor.committeeMember McClellan, James H.
dc.contributor.committeeMember Xie, Yao
dc.contributor.department Electrical and Computer Engineering
dc.date.accessioned 2018-08-20T15:36:35Z
dc.date.available 2018-08-20T15:36:35Z
dc.date.created 2018-08
dc.date.issued 2018-07-25
dc.date.submitted August 2018
dc.date.updated 2018-08-20T15:36:35Z
dc.description.abstract Robust automatic speech recognition (ASR) and understanding (ASU) under various conditions remains to be a challenging problem even with the advances of deep learning. To achieve robust ASU, two discriminative training objectives are proposed for keyword spotting and topic classification: (1) To accurately recognize the semantically important keywords, the non-uniform error cost minimum classification error training of deep neural network (DNN) and bi-directional long short-term memory (BLSTM) acoustic models is proposed to minimize the recognition errors of only the keywords. (2) To compensate for the mismatched objectives of speech recognition and understanding, minimum semantic error cost training of the BLSTM acoustic model is proposed to generate semantically accurate lattices for topic classification. Further, to expand the application of the ASU system to various conditions, four adaptive training approaches are proposed to improve the robustness of the ASR under different conditions: (1) To suppress the effect of inter-speaker variability on speaker-independent DNN acoustic model, speaker-invariant training is proposed to learn a deep representation in the DNN that is both senone-discriminative and speaker-invariant through adversarial multi-task training (2) To achieve condition-robust unsupervised adaptation with parallel data, adversarial teacher-student learning is proposed to suppress multiple factors of condition variability in the procedure of knowledge transfer from a well-trained source domain LSTM acoustic model to the target domain. (3) To further improve the adversarial learning for unsupervised adaptation with unparallel data, domain separation networks are used to enhance the domain-invariance of the senone-discriminative deep representation by explicitly modeling the private component that is unique to each domain. (4) To achieve robust far-field ASR, an LSTM adaptive beamforming network is proposed to estimate the real-time beamforming filter coefficients to cope with non-stationary environmental noise and dynamic nature of source and microphones positions.
dc.description.degree Ph.D.
dc.format.mimetype application/pdf
dc.identifier.uri http://hdl.handle.net/1853/60262
dc.language.iso en_US
dc.publisher Georgia Institute of Technology
dc.subject Discriminative training
dc.subject Adaptation
dc.subject Deep neural network
dc.subject Acoustic model
dc.title Discriminative and adaptive training for robust speech recognition and understanding
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.corporatename School of Electrical and Computer Engineering
local.contributor.corporatename College of Engineering
relation.isOrgUnitOfPublication 5b7adef2-447c-4270-b9fc-846bd76f80f2
relation.isOrgUnitOfPublication 7c022d60-21d5-497c-b552-95e489a06569
thesis.degree.level Doctoral
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
MENG-DISSERTATION-2018.pdf
Size:
1.34 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
3.86 KB
Format:
Plain Text
Description: