university of joensuu dept. of computer science p.o. box 111 fin- 80101 joensuu tel. +358 13 251...
TRANSCRIPT
![Page 1: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/1.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Automatic Speaker Recognition for Series 60
Mobile Devices
University of Joensuu,Department of Computer Science
Specom’2004, Sep 20, 2004
Juhani Saastamoinen, Evgeny Karpov,Ville Hautamäki, and Pasi Fränti
![Page 2: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/2.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Background
• Project in National FENIX programme– New Methods and Applications in Speech
Technology
• 7 research institutes• Project partners: NRC, Lingsoft, National
Bureau of Investigation, etc.• Joensuu: Speaker Recognition• http://cs.joensuu.fi/pages/pums
![Page 3: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/3.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Research Group
Pasi FräntiProfessor
Juhani SaastamoinenProject manager
Evgeny KarpovProject researcher
Ville HautamäkiProject researcher
Tomi KinnunenResearcher
Ismo Kärkkäinen Clustering algorithms
PUMS project
![Page 4: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/4.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Application Scenarios
Speaker VerificationSpeaker Verification Speaker IdentificationSpeaker Identification
Speaker RecognitionSpeaker Recognition
Whose voice is this?Is this Bob’s voice?
(Claim)+
Verification
Imposter!
?Identification
![Page 5: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/5.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Project Goal
Port speaker recognition to Series 60 mobile phone
![Page 6: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/6.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Symbian Phones
• Series 60 phone features:– 16 MB ROM– 8 MB RAM
– 176 x 208 display
– ARM-processor
– No floating-point unit!!!
Series 80
Series 60UIQ
![Page 7: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/7.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Symbian OS
• Defined by Symbian consortium
• Based on EPOC• Operating system for mobile phones
– Real-time system– Long uptime required
• Multitasking, multithreading
![Page 8: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/8.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Problems of Porting
• Usual considerations when porting to phone– GUI event driven program(ming)
– Platform specific programming model
– Real-time system, exceptions
• Application specific porting problems– Number crunching without floating point unit!!!
– Signal processing numerically challenging
![Page 9: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/9.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Identification System
Speaker Recognition:Classify input speech
based on existing profiles
Signal ProcessingFeature Extraction
Speaker Modelling:Create speaker
profileFeatureVectors
SpeechAudio
Add speaker profiles during training
Read and use all profiles during recognition
Decision
Speaker ProfileDatabase
![Page 10: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/10.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
MFCC Signal Processing
Time windowin
gDFT Abs
Filter bank
Log
DCT
Digital speechsignal frame
Featurevector
Pre-emphasis
• pre-emph. coeff. 0.97, Hamm window, 30 triangular mel-filters, base-2 logarithm, output 12 MFCC's
![Page 11: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/11.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Fixed-Point Implementation
• Numerical analysis needed for fixed-point arithmetic implementation
• Truncation and re-scaling to avoid overflows in the converted algorithm
• Minimize information loss caused by computation in fixed-point arithmetic – Minimize relative error
![Page 12: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/12.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
FFT, Fixed-Point
• Frequency spectrum of speech– Biggest source of numerical error– Butterflies have multiplications– Layers repeat truncation errors
• Fixed number of bits per element– 32, native integer size in many systems
• Reference implementation: FFTGEN– http://www.jjj.de/fft/fftgen.tgz
![Page 13: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/13.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
FFTGEN (16/16)
• Multiplication: 32 x 32 -bit result must fit in 32 bits: truncate input
• FFTGEN: Truncate inputs to 16/16 bits
32-bit multiplication result
FFT layer input FFT Twiddle FactorX
X16-bit integer 16-bit integer
FFT layer output (part of it)Crop-off for next layer: 16 bits!16-bit integer
16 used bits 16 crop-off bits
![Page 14: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/14.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Info Preserving FFT (22/10)• Approximate DFT operator F with G• Increase ||F-G||, preserve more signal information
– minimize maximum relative error in scaled sine values with respect to scale; 980 good for FFT sizes up to 1024
– Truncate multiplication inputs to 22/10 bits (signal/op)
22 used bits 10 crop-off bits
32-bit multiplication result
X32-bit integer, 22 bits used 16-bit integer, 10 bits used
32-bit integer
FFT layer input FFT Twiddle FactorX
FFT layer output (part of it)Crop-off for next layer: 10 bits
![Page 15: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/15.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
FFT Spectrum, Fixed-Point
originalTIMIT signal
TIMIT signal x 4
16/16 abs values 22/10 abs values
• x-axis: fixed-point FFT element abs. values
• y-axis: correct FFT element abs. values
![Page 16: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/16.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Scale of Error in Proposed FFT
16/16 22/10
Log10 of relative error in FFT elements
16/16 22/10
average -0.775 -2.118
standard deviation 0.797 0.590
![Page 17: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/17.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
• Compute complex absolute values using maximum coordinate and coordinate ratio
• Suppose |x| > |y| for z = x + i y, then
• Interpret the (squared) y/x by t• Approx. square root by a polynomial P(t)• Constant time algorithm (vs. Newton)
Magnitude Spectrum, Fixed-Point
222 /1 xy+x=y+x|=z|
![Page 18: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/18.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Logarithm, Fixed-Point
• Use base 2 instead of base 10– corresponds to output multiplication
• Standard technique:– Return problem to interval [1,2)– Use linear interpolation from values
stored in a look-up table– 8 bits used for indexing the look-up
table values
![Page 19: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/19.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Rest of System, Fixed-Point
• No improvement needed in VQ/GLA• Should apply similar technique as
with FFT to other signal processing– Pre-emphasis, utilize full 32 bits– Time windowing, use less bits in
windowing function– FB, use less bits in frequency responses– DCT, use less bits for the cosines
![Page 20: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/20.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Effect of Signal Processing
• TIMIT data sets, varying number of speakers (N)• For each N repeat (6x, 5x, 2x) train/recognize
cycles (eliminate GLA initial solution randomness)• FFTGEN: FFT with 16/16 multiplication• Fixed-point: use proposed 22/10 FFT• Mixed: floating-point DSP, fixed-point GLA/VQ
N=10 (6x) N=20 (5x) N=100 (2x)FFTGEN 93,3% 68,0% 59,5%Fixed-point 98,3% 95,0% 82,5%Mixed 100,0% 100,0% 100,0%Floating-point 100,0% 100,0% 100,0%
![Page 21: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/21.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Effect of Signal Quality
• GSM/PC data: 16 aligned dual recordings
• All computations in floating-point arith.
• Signal recorded with laptop and PC mic gives average recognition rate 100%
• Signal recorded with Nokia 3660 results in average recognition rate 84,9%
13/16 14/16 15/16 16/16Symbian audio 1 3 3 10PC audio 0 0 0 17
![Page 22: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Automatic](https://reader035.vdocuments.us/reader035/viewer/2022062620/551a69875503463e778b5d52/html5/thumbnails/22.jpg)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Conclusion
• Speaker identification was ported to Symbian Series 60 mobile phone
• 22/10 bit usage in multiplication proposed instead of “standard” 16/16
• Experiments indicate that recognition accuracy improves from 68% to 95%