Most people are familiar with security technology that scans a person’s handprint or eye for identification purposes. Now, thanks in part to research from North Carolina State University, we are closer to practical technology that can test someone’s voice to confirm their identity.
“The big picture is speaker authentication by computer,” says Dr. Robert Rodman, professor of computer science at NC State and co-author of a new paper on the subject. “The acoustic parameters of the voice are affected by the shape of the vocal tract, and different people have different vocal tracts,” Rodman explains. “This new research will help improve the speed of speech authentication, without sacrificing accuracy.”
Rodman explains that speech authentication could have a host of applications in this age of heightened security and mobile electronics. “Potential users of this technology include government, financial, health-care and telecommunications industries,” Rodman says, “for applications ranging from preventing ID theft and fraud to data protection.”
Current computer models that are used to compare acoustic profiles, effectively evaluating whether a speaker is who he says he is, may take several seconds or more to process the information, which is still too long for the technology to gain widespread acceptance. “In order for this technology to gain traction among users,” Rodman says, “the response time needs to improve without increasing the error rate.”
To address this problem, Rodman and his fellow researchers modified existing computer models to streamline the authentication process so that it operates more efficiently. “This is part of the evolution of speech authentication software,” Rodman says, “and it moves us closer to making this technology a practical, secure tool.”
The research was co-authored by NC State’s Rodman; Rahim Saeidi, Tomi Kinnunen and Pasi Franti of the University of Joensuu in Finland; and Hamid Reza Sadegh Mohammadi of the Iranian Academic Center for Education, Culture & Research.
The research, “Joint Frame and Gaussian Selection for Text Independent Speaker Verification,” will be presented at the International Conference on Acoustics, Speech and Signal Processing (ICASSP) in Dallas, March 14-19. The research was funded, in part, by the Centre for International Mobility.
NC State’s Department of Computer Science is part of the university’s College of Engineering.
Note to Editors: The presentation abstract follows.
“Joint Frame and Gaussian Selection for Text Independent Speaker Verification”
Authors: Robert Rodman, North Carolina State University; Rahim Saeidi, Tomi Kinnunen and Pasi Franti, University of Joensuu; Hamid Reza Sadegh Mohammadi, Academic Center for Education, Culture & Research
Presented: March 14-19, 2010, at the International Conference on Acoustics, Speech and Signal Processing in Dallas.
Abstract: Gaussian selection is a technique applied in the GMM-UBM framework to accelerate score calculation. We have recently introduced a novel Gaussian selection method known as sorted GMM (SGMM). SGMM uses scalar-indexing of the universal background model mean vectors to achieve fast search of the top-scoring Gaussians. In the present work we extend this method by using 2-dimensional indexing, which leads to simultaneous frame and Gaussian selection. Our results on the NIST 2002 speaker recognition evaluation corpus indicate that both the 1- and 2-dimensional SGMMs outperform frame decimation and temporal tracking of top-scoring Gaussians by a wide margin (in terms of Gaussian computations relative to GMM-UBM as baseline).