2002 viu oct 2007 : speaker recognition1f. schiel florian schiel venice international university oct...
Post on 22-Dec-2015
217 views
TRANSCRIPT
![Page 1: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/1.jpg)
2002
VIU Oct 2007 : Speaker Recognition 1 F. Schiel
Florian SchielVenice International University
Oct 2007
Speaker Recognition =Speaker Identification, Speaker Verification
![Page 2: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/2.jpg)
2002
VIU Oct 2007 : Speaker Recognition 2 F. Schiel
Agenda
• See the Context
• Speech Recognition vs. Speaker Recognition
• Speaker Identification vs. Speaker Verification
• Speaker Recognition: Basics
• Speaker Verification using HMM
• Discussion
• and then ...
![Page 3: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/3.jpg)
2002
VIU Oct 2007 : Speaker Recognition 3 F. Schiel
General Approach to Authentification
• Three general ways to perform authentification:- proof of knowledge (e.g. password),- proof of possession (e.g. chip card),- proof of property (biometrics), and their combinations
• Biometrics: physiological based vs. behavioural based• Biometrical features:
Fingerprint, iris scan, facial scan, hand geometry, signature, voice
from U. Türk 2007
![Page 4: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/4.jpg)
2002
VIU Oct 2007 : Speaker Recognition 4 F. Schiel
Biometric Features: General Requirements
• universal: can be found in any user• unique: even for identical twins• measurable: does not require human evaluation• robust to short-term and long-term variability• low dimensionality• robust to changing environment• robust to impersonation
from U. Türk 2007
++++++ooo+
![Page 5: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/5.jpg)
2002
VIU Oct 2007 : Speaker Recognition 5 F. Schiel
Taxonomie Speech Processing
Natural Language Processing(NLP)
Spoken Language Processing(SLP)
Lexica
SyntaxParsing
Spellers
Search /IndexingSemantics
Terminology
Thesaurus
Dialogue systems
SpeechIdentification
Speech Synthesis
Speaker recognition
Speech Recognition
Forensics
![Page 6: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/6.jpg)
2002
VIU Oct 2007 : Speaker Recognition 6 F. Schiel
Speech Recognition
"Decode the spoken content from the acoustic signal"
Speaker Recognition
"Determine the identity of a speaker from acoustic signal"
ASR "Sehr geehrter .." SI/SVAccepted/Rejected
ID
SpeechModels
SpeakerCharacteristics
ClaimedIdentity
![Page 7: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/7.jpg)
2002
VIU Oct 2007 : Speaker Recognition 7 F. Schiel
Speaker Verification• Authentification according to
claimed identity• Result is binary:
"accept" / "reject"• Scaling: effort independent
of number of participants• Accuracy: dependent of size
of enrolment data
Speaker Identification• Identification from limited number
of participants• Result is speaker identity• Scaling: effort increases linear
with number of participants• Accuracy: dependent of
+ size of enrolment data+ number of participants
reject
Identität falsch
accept
Identität ok correctidentity ok
accept
Identität ok falsereject
rejectreject
Identität falsch correct
accept
falseaccept
identity wrong
100
NCor
rect
ness
Speaker Recognition
![Page 8: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/8.jpg)
2002
VIU Oct 2007 : Speaker Recognition 8 F. Schiel
• Applications:– Access Control
– Verification of identity
via the phone
– Automatic Teller Machines
– Password resetting
– Banking: Identity for new
accounts etc.
– Protection against theft (cars...)
Speaker Verification
• Applications:– Forensics
– Police Work
– Automatic User Settings
– Speaker Classification:
Advertising
Speaker Identification
![Page 9: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/9.jpg)
2002
VIU Oct 2007 : Speaker Recognition 9 F. Schiel
Speaker Verification: Doddington's Zoo (1)
User = registered speaker, Impostor = non-registered speaker
• Goats : users that are often rejected wrongly (increasing 'false reject' errors)
• Lambs : users that are easily imitated (increasing 'false accept' errors)
• Sheep : users that 'behave' (not goats and not lambs)• Wolfs : particulary successful impostors
(increasing 'false accept' errors)
from Doddington 1998
![Page 10: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/10.jpg)
2002
VIU Oct 2007 : Speaker Recognition 10 F. Schiel
Speaker Verification: Doddington's Zoo (2)
Wolfs may perform zero-effort or active impostor attempts to break into a SV system.
Problem:Speaker verification data bases do not contain active impostorattempts data of wolfs -> most technical evaluations are based on non-realistic data!
![Page 11: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/11.jpg)
2002
VIU Oct 2007 : Speaker Recognition 11 F. Schiel
Technical Speech Processing
Featuredetection
DekoderHighpass
Analog Signal
0
t
Digital Signal
t
Vectors
m1
.
.mN
m1
.
.mN
10 20
...• "Call Richard!"• "Radio off!"• "216"
Symbols
Symbols:• Text• Action• Semantics
A / DAnti-
AliasingFilter
![Page 12: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/12.jpg)
2002
VIU Oct 2007 : Speaker Recognition 12 F. Schiel
Verification"Accept""Reject"
Featuredetection
Highpass
A / DAnti-
AliasingFilter
Claimedidentity
PINFingerprint
ASR
SelectID
Speaker Models
Speaker Verifikation: Basics (1)
![Page 13: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/13.jpg)
2002
VIU Oct 2007 : Speaker Recognition 13 F. Schiel
VerificationFeature
detectionHighpass
Speaker Verification: Basics (2)
ffsam
/2
Analog low pass filterto avoid anti-aliasingeffects
+ Analog-DigitalConverter
„Accept”„Reject”A / D
Anti-Aliasing
Filter
Anti-aliasing
filterA / D
![Page 14: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/14.jpg)
2002
VIU Oct 2007 : Speaker Recognition 14 F. Schiel
Speaker Verification: Basics (3)
Features:• speaker specific• robust against noise• partly long term
0
Extraction ofSpeakercharacteristics
m1
...mN
m1
...mN
10 20
m1
...mN
m1
...mN
30 40
...
Window
25 ms
Merkmals-berechnung
VerificationHighpass
A / DAnti-
AliasingFilter
"Accept""Reject"A / D
Anti-Aliasing
FilterFeature
detection
![Page 15: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/15.jpg)
2002
VIU Oct 2007 : Speaker Recognition 15 F. Schiel
Featuredetection
Highpass
A / DAnti-
AliasingFilter
Verification
"Accept""Reject"
p(S | ID) < threshold
vector sequenceS
m1
.
.mN
m1
.
.mN
10 20
...
decision
p(S | ID) > threshold
"Accept"
"Reject"
speaker modelof claimed ID
Speaker Verification: Basics (4)
![Page 16: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/16.jpg)
2002
VIU Oct 2007 : Speaker Recognition 16 F. Schiel
Speaker Verification: Tuning
• Error types highly dependent on threshold
high security -> false accept low false reject highuser friendly -> false reject low false accept high
EqualErrorRate
falseaccept
falsereject
• Both errors increase by:- channel disturbance- crosstalk- noise- room acoustics
threshold
• Solution:- multiple enrolments- adaptive learning
![Page 17: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/17.jpg)
2002
VIU Oct 2007 : Speaker Recognition 17 F. Schiel
Speaker Verification: Score Normalisation (1)
Problem:How to set the optimal threshold?
HMMs generate a priori probabilities:O : observation = sequence of featuresl : speaker model
Bayes:
but is dependent on various factors
P l∣O=p O∣l P l P O
p O∣l
P O
![Page 18: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/18.jpg)
2002
VIU Oct 2007 : Speaker Recognition 18 F. Schiel
Speaker Verification: Score Normalisation (2)
Solution: Bayesian Decision Rule:
with Bayes and log to both sides this leads to:
P l∣O =p O∣l P l P O
C FR P l∣O C FAP l∣O
log p O∣l − log p O∣l log C FAP l C FRP l
=threshold
CFR
, CFA
: cost functions
![Page 19: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/19.jpg)
2002
VIU Oct 2007 : Speaker Recognition 19 F. Schiel
Speaker Verification: Score Normalisation (3)
Often assumed: costs are equal and speakers occurequally distributed
is estimated using a world or cohort model
world model : speaker model trained to all speakers
cohort model : speaker model trained to a group of
most competing models (wolfs)
lo g p O∣l − lo g p O∣l lo g N − 1
N : number of users∧ im postors
p O∣l
![Page 20: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/20.jpg)
2002
VIU Oct 2007 : Speaker Recognition 20 F. Schiel
Speaker Verification: Enrolment
Method
Fixed, pre-specified sentence:e.g. "My voice is my password"
Fixed, selectable sentence:e.g. maiden name of grandmother
Changing number triplets:e.g. fifteen, thirtynine, seventythree
System generates a new sentencefor each verification
Enrolment Remarks
Speak sentence3 - 5 times
Speak sentence3 – 5 times
Speak each number3 – 5 times
Sentence may be intercepted and played back
Additional securityby content
High security by manypossible combinations
Elaborate enrolment,high processing effort,very high security
Speak each phoneme3 – 5 times
![Page 21: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/21.jpg)
2002
VIU Oct 2007 : Speaker Recognition 21 F. Schiel
Speaker Verification: HMM types
Method
pre-specified sentence
recombination of segments taken from enrolment data
modeling without time structure
Model Security
Accuracy
linear
piecewise linear
ergodic
o
![Page 22: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/22.jpg)
2002
VIU Oct 2007 : Speaker Recognition 22 F. Schiel
Speaker Verification: Features (1)
Variable signal characteristics• often required: telephone band 300 – 3300 Hz
(higher resonances cut off)• changing channel characteristics, caused by
transmission line, handset, distance to mouth• static and intermittent noise • user: health, intoxication, fatigue
![Page 23: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/23.jpg)
2002
VIU Oct 2007 : Speaker Recognition 23 F. Schiel
Speaker Verification: Features (2)
Candidates determined by physiology:• fundamental frequency, average• wave form of vocal folds, jimmer, jitter, irregularities• formants: average and dynamics• places of articulation: fricatives, plosives• nasal cavity resonance• sub-glottal resonance
![Page 24: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/24.jpg)
2002
VIU Oct 2007 : Speaker Recognition 24 F. Schiel
Speaker Verification: Features (3)
Candidates determined by behaviour:• voiced/unvoice ratio• fundamental frequency, dynamics• syllable rate, pause/speech ratio• dialectal features: vowel qualityCandidates determined by speech technology:• Linear Predictor Coefficients (LPC)• filter bank, Bark filter bank, Mel filter bank• Cepstrum, Mel-Cepstrum• (derivations with respect to time)
![Page 25: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/25.jpg)
2002
VIU Oct 2007 : Speaker Recognition 25 F. Schiel
Sprecherverifikation: Road Map
1990 Heute 2010 2020
ZugangskontrollenSicherheitsbereich
Authentifizierungüber Telefon
Geräte "erkennen"ihren Benutzer
Sprecherprofilauf Chipkarten
Zugangskontrolle fürTastaturlose PDAs
Authentifizierungim Hintergrund
ÖffentlicheSprecherprofile
Automatischer Alkohol-test im Fahrzeug
![Page 26: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d7e5503460f94a61a7a/html5/thumbnails/26.jpg)
2002
VIU Oct 2007 : Speaker Recognition 26 F. Schiel
Thank You!