Coupled Auto-Enrollment and Speaker Identification Platform for Real-Time Applications
Loading...
Author(s)
Shu, Nicolas
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
While many individuals naturally experience a natural cognitive decline due to old age, a significant number of individuals experience a faster cognitive decline, which may manifest itself as dementia. While there are many studies on preventative care for those individuals, the quantity of social interactions for those individuals, on a daily basis, plays a large role in whether or not they experience a rapid cognitive decline.
A monitoring system capable of identifying speakers with very little data would a crucial first step to understand the nature of those interactions, but it is challenging to identify speakers with close to zero training data. There have been many advancements in speaker identification in the field of speech processing. Speaker identification is a different type of classification problem, given that it requires an enrollment component to it. Some high performant frameworks may sometimes need long durations of audio for each speaker, and while there has been progress on diminishing the amount of training data to develop such speaker identification systems, research on these topics are nevertheless important. A system capable of quickly enrolling speakers for identification could lead to many more applications beyond preventative health care.
One example could be that movies, TV shows and other forms of media could have enhanced subtitles. While present day subtitles inform the audience what is being said, it doesn't always inform who is speaking. Hard-of-hearing audiences suffer from such a visualization, as they can only infer that the person speaking is present on the screen. If such a framework is capable of running as an online algorithm, thousands of hours of videos/podcasts could be properly tagged to assist deaf individuals. Additionally, if such a system was capable of running in real-time, journalistic interviews, presidential debates, sports commentaries could also benefit from such an expansion.
This work builds the backbone and a functional system capable of doing speaker identification in real-time, aiming to bridge the gaps for the purposes of monitoring the quantity of interactions for at-risk populations. It paves a different pathway for an individual to map the interior of a space (e.g. home, office), determine optimal locations to place microphone arrays, set up a server and edge nodes, and run the aforementioned autonomous system capable of detecting new classes (i.e. speakers) with only 2.5 seconds of audio, auto-enrolling new speakers, and re-identifying the speakers in real-time.
Sponsor
Date
2024-04-17
Extent
Resource Type
Text
Resource Subtype
Dissertation