Title:
Estimation of glottal source features from the spectral envelope of the acoustic speech signal

dc.contributor.advisor Moore, Elliot
dc.contributor.author Torres, Juan Félix en_US
dc.contributor.committeeMember Haas, Kevin
dc.contributor.committeeMember Hayes, Monson
dc.contributor.committeeMember Lee, Chin-Hui
dc.contributor.committeeMember Wu, Hongwei
dc.contributor.department Electrical and Computer Engineering en_US
dc.date.accessioned 2010-09-15T18:53:59Z
dc.date.available 2010-09-15T18:53:59Z
dc.date.issued 2010-05-17 en_US
dc.description.abstract Speech communication encompasses diverse types of information, including phonetics, affective state, voice quality, and speaker identity. From a speech production standpoint, the acoustic speech signal can be mainly divided into glottal source and vocal tract components, which play distinct roles in rendering the various types of information it contains. Most deployed speech analysis systems, however, do not explicitly represent these two components as distinct entities, as their joint estimation from the acoustic speech signal becomes an ill-defined blind deconvolution problem. Nevertheless, because of the desire to understand glottal behavior and how it relates to perceived voice quality, there has been continued interest in explicitly estimating the glottal component of the speech signal. To this end, several inverse filtering (IF) algorithms have been proposed, but they are unreliable in practice because of the blind formulation of the separation problem. In an effort to develop a method that can bypass the challenging IF process, this thesis proposes a new glottal source information extraction method that relies on supervised machine learning to transform smoothed spectral representations of speech, which are already used in some of the most widely deployed and successful speech analysis applications, into a set of glottal source features. A transformation method based on Gaussian mixture regression (GMR) is presented and compared to current IF methods in terms of feature similarity, reliability, and speaker discrimination capability on a large speech corpus, and potential representations of the spectral envelope of speech are investigated for their ability represent glottal source variation in a predictable manner. The proposed system was found to produce glottal source features that reasonably matched their IF counterparts in many cases, while being less susceptible to spurious errors. The development of the proposed method entailed a study into the aspects of glottal source information that are already contained within the spectral features commonly used in speech analysis, yielding an objective assessment regarding the expected advantages of explicitly using glottal information extracted from the speech signal via currently available IF methods, versus the alternative of relying on the glottal source information that is implicitly contained in spectral envelope representations. en_US
dc.description.degree Ph.D. en_US
dc.identifier.uri http://hdl.handle.net/1853/34736
dc.publisher Georgia Institute of Technology en_US
dc.subject Inverse filtering en_US
dc.subject Glottal waveform en_US
dc.subject Voice source en_US
dc.subject Speech processing en_US
dc.subject.lcsh Glottalization (Phonetics)
dc.subject.lcsh Speech synthesis
dc.subject.lcsh Machine learning
dc.subject.lcsh Supervised learning (Machine learning)
dc.title Estimation of glottal source features from the spectral envelope of the acoustic speech signal en_US
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Moore, Elliot
local.contributor.corporatename School of Electrical and Computer Engineering
local.contributor.corporatename College of Engineering
relation.isAdvisorOfPublication f9dfded5-fb9c-46e1-9c74-6efcda63ec13
relation.isOrgUnitOfPublication 5b7adef2-447c-4270-b9fc-846bd76f80f2
relation.isOrgUnitOfPublication 7c022d60-21d5-497c-b552-95e489a06569
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
torres_juan_f_201008_phd.pdf
Size:
1.76 MB
Format:
Adobe Portable Document Format
Description: