voice bio metrics

Upload: mrz200

Post on 07-Apr-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 Voice Bio Metrics

    1/22

    Voice BiometricsVoice Biometrics

  • 8/4/2019 Voice Bio Metrics

    2/22

    General DescriptionGeneral Description Each individual has individual voiceEach individual has individual voice

    components calledcomponents called phonemesphonemes..

    Each phoneme has aEach phoneme has a pitchpitch,, cadencecadence, and, and

    inflectioninflectionThese three give each one of us a unique voiceThese three give each one of us a unique voice

    sound.sound.

    The similarity in voice comes from cultural andThe similarity in voice comes from cultural andregional influences in the form of accents.regional influences in the form of accents.

  • 8/4/2019 Voice Bio Metrics

    3/22

    General DescriptionGeneral Description

    According to the National Center of Voice and Speech, as one phonate, theAccording to the National Center of Voice and Speech, as one phonate, thevocal folds and produces a complex sound spectrum made up of a range ofvocal folds and produces a complex sound spectrum made up of a range offrequencies and overtones. As the spectrum travels through the variousfrequencies and overtones. As the spectrum travels through the various--sizedsizedareas in the vocal track, some of the frequencies resonate more than others.areas in the vocal track, some of the frequencies resonate more than others. Larger spaces resonate at a lower frequenciesLarger spaces resonate at a lower frequencies

    Smaller at higher frequenciesSmaller at higher frequencies The two largest spaces in the vocal track and, the throat, and the mouth,The two largest spaces in the vocal track and, the throat, and the mouth,

    produce the two lowest resonant frequencies or formants.produce the two lowest resonant frequencies or formants. Certain inflections and pitches we learn from family members.Certain inflections and pitches we learn from family members. Voice physiological and behavior biometric are influenced by our body,Voice physiological and behavior biometric are influenced by our body,

    environment, and age.environment, and age.

    It is possible that our voice does not always sound the same.It is possible that our voice does not always sound the same. So is voice a good biometric?So is voice a good biometric?

  • 8/4/2019 Voice Bio Metrics

    4/22

    FormantsFormants are the resonant frequencies of the vocalare the resonant frequencies of the vocaltract when vowels are pronounced. While vowels aretract when vowels are pronounced. While vowels areattributed to this periodic resonance, consonants areattributed to this periodic resonance, consonants are

    not periodic. They are produced by restriction of airnot periodic. They are produced by restriction of airflow with the mouth, tongue, and jaw.flow with the mouth, tongue, and jaw.

    Linguists classify each type of speech sound (calledLinguists classify each type of speech sound (calledphenomes) into different categories. In order to identifyphenomes) into different categories. In order to identifyeach phenome, it is oftentimes useful to look at itseach phenome, it is oftentimes useful to look at itsspectrogram or frequency response where one can findspectrogram or frequency response where one can findthe characteristic formantsthe characteristic formants

    General DescriptionGeneral Description

  • 8/4/2019 Voice Bio Metrics

    5/22

    Although all phenomes have their own formants, vowelAlthough all phenomes have their own formants, vowelsound formants are usually the easiest to identifysound formants are usually the easiest to identify

    All formants have the trait of waxing and waning inAll formants have the trait of waxing and waning inenergy in all frequencies, which is caused by theenergy in all frequencies, which is caused by therepeated closing and opening of the human vocal tract.repeated closing and opening of the human vocal tract.On average, this repeated closing and opening occurs at a rate ofOn average, this repeated closing and opening occurs at a rate of125 times per second in an adult male and 250 times per second125 times per second in an adult male and 250 times per second

    in an adult female.in an adult female. This rate gives the sensation of pitch (higherThis rate gives the sensation of pitch (higher

    frequencies result in higher pitches).frequencies result in higher pitches). Formant valuesFormant values can vary widely from person to person,can vary widely from person to person,

    but the spectrogram reader learns to recognize patternsbut the spectrogram reader learns to recognize patternswhich are independent of particular frequencies andwhich are independent of particular frequencies andwhich identify the various phonemes with a high degreewhich identify the various phonemes with a high degreeof reliability.of reliability.

  • 8/4/2019 Voice Bio Metrics

    6/22

    Vowel IVowel A

  • 8/4/2019 Voice Bio Metrics

    7/22

    Formants can be seen very clearly in a widebandFormants can be seen very clearly in a widebandspectrogram, where they are displayed as dark bands.spectrogram, where they are displayed as dark bands.

    The darker a formant is reproduced in theThe darker a formant is reproduced in thespectrogram, the stronger it is (the more energy therespectrogram, the stronger it is (the more energy thereis there, or the more audible it is):is there, or the more audible it is):

  • 8/4/2019 Voice Bio Metrics

    8/22

    But there is a difference between oral vowels on oneBut there is a difference between oral vowels on onehand, and consonants and nasal vowels on the other.hand, and consonants and nasal vowels on the other.

    Nasal consonants and nasal vowels can exhibitNasal consonants and nasal vowels can exhibit

    additional formants, nasal formants, arising fromadditional formants, nasal formants, arising fromresonance within the nasal branch.resonance within the nasal branch.

    Consequently, nasal vowels may show one or moreConsequently, nasal vowels may show one or moreadditional formants due to nasal resonance, while oneadditional formants due to nasal resonance, while one

    or more oral formants may be weakened or missing dueor more oral formants may be weakened or missing dueto nasal antiresonance.to nasal antiresonance.

  • 8/4/2019 Voice Bio Metrics

    9/22

  • 8/4/2019 Voice Bio Metrics

    10/22

    Oral formants are numbered consecutively upwardsOral formants are numbered consecutively upwardsfrom the lowest frequency. In the example, fragmentfrom the lowest frequency. In the example, fragment

    from the previous wideband spectrogram shows thefrom the previous wideband spectrogram shows thesequence [ins] from the beginning. Five formants aresequence [ins] from the beginning. Five formants arevisible in this [i], labeled F1visible in this [i], labeled F1--F5. Four are visible in thisF5. Four are visible in this[n] (F1[n] (F1--F4) and there is a hint of the fifth. There areF4) and there is a hint of the fifth. There are

    four more formants between 5000Hz and 8000Hz in [i]four more formants between 5000Hz and 8000Hz in [i]and [n] but they are too weak to show up on theand [n] but they are too weak to show up on thespectrogram, and mostly they are also too weak to bespectrogram, and mostly they are also too weak to beheard.heard.

    The situation is reversed in this [s], where F4The situation is reversed in this [s], where F4--F9 showF9 showvery strongly, but there is little to be seen belowF4.very strongly, but there is little to be seen belowF4.

  • 8/4/2019 Voice Bio Metrics

    11/22

    Individual Differences in VowelIndividual Differences in Vowel

    ProductionProductionThere are differences in individual formantThere are differences in individual formantfrequencies attributable to: size, age, gender,frequencies attributable to: size, age, gender,environment, and speech.environment, and speech.

    The acoustic differences that allow us toThe acoustic differences that allow us todifferentiate between various vowel productionsdifferentiate between various vowel productionsare usually explained by aare usually explained by a sourcesource--filter theoryfilter theory..

    The source is the sound spectrum created byThe source is the sound spectrum created byairflow through the glottis which varies as vocalairflow through the glottis which varies as vocalfolds vibrate. The filter is the vocal track itselffolds vibrate. The filter is the vocal track itself--its shape is controlled by the speaker.its shape is controlled by the speaker.

  • 8/4/2019 Voice Bio Metrics

    12/22

    The three figures below (taken from Miller)The three figures below (taken from Miller)

    illustrate how different configurations of theillustrate how different configurations of the

    vocal tract selective pass certain frequencies andvocal tract selective pass certain frequencies andnot others. The first shows the configuration ofnot others. The first shows the configuration of

    the vocal tract while articulating the phoneme [i]the vocal tract while articulating the phoneme [i]

    as in the word "beet," the second the phonemeas in the word "beet," the second the phoneme[a], as in "father," and the third [u] as in "boot."[a], as in "father," and the third [u] as in "boot."

    Note how each configuration uniquely affectsNote how each configuration uniquely affectsthe acoustic spectrumthe acoustic spectrum----i.e., the frequencies thati.e., the frequencies that

    are passedare passed

  • 8/4/2019 Voice Bio Metrics

    13/22

  • 8/4/2019 Voice Bio Metrics

    14/22

    Voice CaptureVoice Capture

    Voice can be captured in two ways:Voice can be captured in two ways:

    Dedicated resource like a microphoneDedicated resource like a microphone

    Existing infrastructure like a telephoneExisting infrastructure like a telephone

    Captured voice is influenced by two factors:Captured voice is influenced by two factors:

    Quality of the recording deviceQuality of the recording device

    The recording environmentThe recording environment

    In wireless communication, voice travels through openIn wireless communication, voice travels through openair and then through terrestrial lines, it therefore,air and then through terrestrial lines, it therefore,suffers from great interference.suffers from great interference.

  • 8/4/2019 Voice Bio Metrics

    15/22

    Algorithms for Voice InterpretationAlgorithms for Voice Interpretation

    Algorithms used to capture, enroll and matchAlgorithms used to capture, enroll and match

    voice fall into the following categories:voice fall into the following categories:

    F

    ixed phase verificationF

    ixed phase verification Fixed vocabulary verificationFixed vocabulary verification

    Flexible vocabulary verificationFlexible vocabulary verification

    TextText--independent verification.independent verification.

  • 8/4/2019 Voice Bio Metrics

    16/22

    Voice VerificationVoice Verification Voice biometrics works by digitizing a profile of aVoice biometrics works by digitizing a profile of a

    person's speech to produce a stored model voice print,person's speech to produce a stored model voice print,or template.or template.

    Biometric technology reduces each spoken word toBiometric technology reduces each spoken word tosegments composed of several dominant frequenciessegments composed of several dominant frequenciescalled formants.called formants.

    Each segment has several tones that can be captured inEach segment has several tones that can be captured ina digital format.a digital format.

    The tones collectively identify the speaker's uniqueThe tones collectively identify the speaker's uniquevoice print.voice print.

    Voice prints are stored in databases in a manner similarVoice prints are stored in databases in a manner similarto the storing of fingerprints or other biometric data.to the storing of fingerprints or other biometric data.

  • 8/4/2019 Voice Bio Metrics

    17/22

    Application of Voice TechnologyApplication of Voice Technology

    Voice technology is applicable in a variety of areas butVoice technology is applicable in a variety of areas butfor us, those used in biometric technology include:for us, those used in biometric technology include: Voice VerificationVoice Verification

    Internet/intranet security:Internet/intranet security:

    onon--line bankingline banking onon--line security tradingline security trading

    access to corporate databasesaccess to corporate databases

    onon--line information servicesline information services

    PC access restriction softwarePC access restriction software

    Parental controlParental control

    Business software as a DSP solution at check points where smartBusiness software as a DSP solution at check points where smartcards or PINcards or PIN used entrance / exit control pointsused entrance / exit control points

  • 8/4/2019 Voice Bio Metrics

    18/22

    Voice RecognitionVoice Recognition

    hands free devices, for example car mobile hands free setshands free devices, for example car mobile hands free sets

    electronic devices, for example telephone, PC, or ATM cashelectronic devices, for example telephone, PC, or ATM cashdispenserdispenser

    software applications, for example games, educational or officesoftware applications, for example games, educational or officesoftwaresoftware

    industrial areas, warehouses, etc.industrial areas, warehouses, etc. spoken multiple choice in interactive voice response systems,spoken multiple choice in interactive voice response systems,

    for example in telephonyfor example in telephony

    applications for people with disabilitiesapplications for people with disabilities

  • 8/4/2019 Voice Bio Metrics

    19/22

    Voice verification systems are different from voice recognitionVoice verification systems are different from voice recognitionsystems although the two are often confused.systems although the two are often confused.

    Voice recognition is used to translate the spoken word into aVoice recognition is used to translate the spoken word into aspecific response. The goal of voice recognition systems isspecific response. The goal of voice recognition systems issimply to understand the spoken word, not to establish thesimply to understand the spoken word, not to establish theidentity of the speaker. A good familiar example of voiceidentity of the speaker. A good familiar example of voicerecognition systems is that of an automated call center asking arecognition systems is that of an automated call center asking auser to press the number one on his phone keypad or say theuser to press the number one on his phone keypad or say the

    word one. In this case, the system is not verifying the identityword one. In this case, the system is not verifying the identityof the person who says the word one; it is merely checkingof the person who says the word one; it is merely checkingthat the word one was said instead of another option.that the word one was said instead of another option.

    Voice verification verifies the vocal characteristics against thoseVoice verification verifies the vocal characteristics against thoseassociated with the enrolled user.associated with the enrolled user.

    The US PORTPASS Program, deployed at remote locationsThe US PORTPASS Program, deployed at remote locationsalong the U.S.along the U.S.Canadian border, recognizes voices of enrolledCanadian border, recognizes voices of enrolledlocal residents speaking into a handset. This system enableslocal residents speaking into a handset. This system enablesenrollees to cross the border when the port is unstaffed.enrollees to cross the border when the port is unstaffed.

  • 8/4/2019 Voice Bio Metrics

    20/22

    How is voice recognition performed?How is voice recognition performed? Voice recognition can be divided into two classes:Voice recognition can be divided into two classes:

    template matchingtemplate matching -- template matching is the simplest technique and hastemplate matching is the simplest technique and hasthe highest accuracy when used properly, but it also suffers from the mostthe highest accuracy when used properly, but it also suffers from the mostlimitations.limitations.

    feature analysisfeature analysis

    The first step is for the user to speak a word or phrase into aThe first step is for the user to speak a word or phrase into amicrophone.microphone.

    The electrical signal from the microphone is digitized by anThe electrical signal from the microphone is digitized by an"analog"analog--toto--digital (A/D) converter", and is stored in memory.digital (A/D) converter", and is stored in memory.

    To determine the "meaning" of this voice input, the computerTo determine the "meaning" of this voice input, the computer

    attempts to match the input with a digitized voice sample, orattempts to match the input with a digitized voice sample, ortemplate, that has a known meaning.template, that has a known meaning.

    This technique is a close analogy to the traditional commandThis technique is a close analogy to the traditional commandinputs from a keyboard. The program contains the inputinputs from a keyboard. The program contains the inputtemplate, and attempts to match this template with the actualtemplate, and attempts to match this template with the actual

    input using a simple conditional statement.input using a simple conditional statement.

  • 8/4/2019 Voice Bio Metrics

    21/22

    The two stages of a biometric system

  • 8/4/2019 Voice Bio Metrics

    22/22

    oftwareSoftware Open Source Speech Software from Carnegie MellonOpen Source Speech Software from Carnegie Mellon

    UniversityUniversity HephaestusHephaestus: Open Source activities at Carnegie Mellon: Open Source activities at Carnegie Mellon

    CMU SphinxCMU Sphinx recognition enginesrecognition engines ---- Sphinx 2, Sphinx 3, Sphinx 4, andSphinx 2, Sphinx 3, Sphinx 4, andSphinxTrain.SphinxTrain.

    PocketSphinxPocketSphinx Sphinx for embedded platforms.Sphinx for embedded platforms.

    Festvox ProjectFestvox Project speech synthesis engines, voices and toolsspeech synthesis engines, voices and tools CMU Statistical Language Modeling ToolkitCMU Statistical Language Modeling Toolkit (CMU SLM)(CMU SLM)

    CMUdictCMUdict ---- pronunciation dictionarypronunciation dictionary

    OpenVXIOpenVXI ---- VoiceXMLVoiceXML browserbrowser

    SALT browserSALT browser -- finally online!finally online!

    Audio DatabasesAudio Databases ---- AN4, Microphone array, etcAN4, Microphone array, etc RavenClawRavenClaw--OlympusOlympus Dialog system development toolkit.Dialog system development toolkit.

    We will try CMU Sphinx Group Open Source SpeechWe will try CMU Sphinx Group Open Source SpeechRecognitionRecognitionhttp://cmusphinx.sourceforge.net/html/cmusphinx.phphttp://cmusphinx.sourceforge.net/html/cmusphinx.php