Automatic Speaker Recognition 于嘉威 2018/8/13
Automatic Speaker Recognition 于嘉威 2018/8/13
Outline 第一,鉴于有些同学不了解SRE的相关工作,所以我先把双周任 务时候我报告的东西快速回顾一下,让大家有个直观的印象 第二,我会总结一些最新的研究(基本是CASF2018有关SRE的 内容)要点,以及我的一些思考和问题
Outline • 第一,鉴于有些同学不了解SRE的相关工作,所以我先把双周任 务时候我报告的东西快速回顾一下,让大家有个直观的印象 • 第二,我会总结一些最新的研究(基本是ICASSP2018有关SRE的 内容)要点,以及我的一些思考和问题
Outline Introduction The i-vector methodology of speaker recognition The d-vector methodology of speaker recognition The end-to-end methodology of speaker recognition Inter-speaker variability in speaker recognition EXample of variations in speaker recognition State-of-art approach in SRE
Outline • Introduction • The i-vector methodology of speaker recognition • The d-vector methodology of speaker recognition • The end-to-end methodology of speaker recognition • Inter-speaker variability in speaker recognition • Example of variations in speaker recognition • State-of-art approach in SRE
Introduction Definition: It is the method of recognizing a person based on his Voice Speaker identification Speaker verification Speaker diarization Speaker recognition Text dependent Text independent pen set Close set
Introduction • Definition: It is the method of recognizing a person based on his voice Speaker recognition Speaker identification Speaker verification Speaker diarization Text dependent Text independent Open set Close set
Speaker Identification Definition: Determine whether unknown speaker matches one of a set known speakers One-to-many mapping Often assumed that unknown voice must come from a set of known speakers-referred to as close-set identification Adding hone of the above option to closed -set identification gives open-set identification nose voice is this?
Speaker Identification • Definition: Determine whether unknown speaker matches one of a set known speakers • One-to-many mapping • Often assumed that unknown voice must come from a set of known speakers – referred to as close-set identification • Adding “none of the above” option to closed-set identification gives open-set identification
Speaker Verification Determine whether unknown speaker matches a specific speaker One-to-one mapping Close-set verification: The population of clients is fixed Open-set verification: New clients can be added without having to redesign the system Is this Bob' s voice?
Speaker Verification • Determine whether unknown speaker matches a specific speaker • One-to-one mapping • Close-set verification: The population of clients is fixed • Open-set verification: New clients can be added without having to redesign the system
Speaker diarization Determine when a speaker change has occurred in speech signal (segmentation) Group together speech segments corresponding to the same speaker( clustering) Prior speaker information may or may not be available Where are speaker Which segments are from changes? the same speaker?
Speaker diarization • Determine when a speaker change has occurred in speech signal (segmentation) • Group together speech segments corresponding to the same speaker (clustering) • Prior speaker information may or may not be available
Introduction: Generic Speaker Recognition System Basic structure of a speaker recognition system Unknow Analysis Feature Frames eatureVector Decision Speech Preprocessing Pattern Extraction Matching Enrollment Feature Preprocessing Extraction Speaker Models
Introduction: Generic Speaker Recognition System • Basic structure of a speaker recognition system Preprocessing Feature Extraction Pattern Matching Preprocessing Feature Extraction Speaker Models Unknow Speech Analysis Frames Feature Vector Enrollment Scoring Decision
Introduction Main research fields on sre Feature Extraction Pattern matching Scoring method
Introduction: Main Research Fields on SRE • Feature Extraction • Pattern matching • Scoring method
PROPERTIES OF DEAL FEATURES ideally a feature parameter should Nolan, 1983 show high between-speaker variability and low within-speaker variabili be resistant to attempted disguise or mimicry have a high frequency of occurrence in relevant materials be robust in transmission be relatively easy to extract and measure
PROPERTIES OF IDEAL FEATURES ideally a feature parameter should(F.Nolan,1983): • show high between-speaker variability and low within-speaker variability • be resistant to attempted disguise or mimicry • have a high frequency of occurrence in relevant materials • be robust in transmission • be relatively easy to extract and measure