university of joensuu dept. of computer science p.o. box 111 fin- 80101 joensuu tel. +358 13 251...
TRANSCRIPT
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Speaker Recognition Research in Joensuu
Speech and Image Processing Unit (SIPU) http://cs.joensuu.fi/sipu/
Puheteknologian talviseminaari
Pasi Fränti
Joensuu10.3.2006
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Goals for PUMS season 3 (1/2)
1. Usability of automatic speaker identification in forensic applications
2. Compatibility with large databases
3. Automatization of LTAS + fusion with MFCC.
4. Voice activity detection
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Goals for PUMS season 3 (2/2)
5. Speaker verification in real (noisy) environment
6. Prototype for access control7. Solving technical requirements for
prototype in elevator.8. Usability for detecting sound sources in
general9. Key word search (using HTK or Lingsoft
Recognizer)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Research Group
Pasi FräntiProfessor
Juhani Saastamoinen, PhLic
Tomi Kinnunen, PhD (Singapore)Ville Hautamäki, MScIsmo Kärkkäinen, MSc
PUMS personnel
Marko Tuononen, BSc
Doctoral researchers
Collaborators
Rosa Gonzalez-Hautamäki, MSc
Ilja Sidoroff
Victoria YanulevskayaEvgeny Karpov, MSc (NRC)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
1. Applicability to forensic applications
• Automatic speaker recognition study has been done.
• Results are not reported but actions taken within tasks 3 and 4.
• Material can be found in Kinnunen’s PhD thesis [4] and Niemi-Laitinen’s presentation.
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
2. Support for large databases
- Not yet done -
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
3. LTAS and other features
• Automatic calculation of LTAS done. Integration to WinSprofiler in progress. Reporting in progress.
• Benefit of LTAS is merely its speed and ease of use: no difficult control parameters.
• No additional benefit to recognition accuracy. MFCC includes the same information.
• Could be used for preliminary pruning in case of large datasets.
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Noise robustness of F0 featureResults reported in [3, 5]
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
4. Voice activity detection
• Software for speech segmentation (VoiceGrep).
• Command line version for Linux.
• Windows version in WinSprofiler.
• Testing done in SIPU laboratory.– Labtec® pc mic 333, 44,1 kHz
– Recordings were emphasized 24 dB by Audacity voice editor
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
4a. Test material and results• Material
– 4 hours in total.
– Bad quality recordings: 11 bits data, of which 4-5 informatio, and the rest noise.
– VoiceGrep made 168 detections:
– 56 speech (33%)
– 112 non-speech (67%)
• Material included 71 real speech segments: – Average segment length 16 s.
– VoiceGrep found 25 of these (35 %)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
4b. VoiceGrep overall results
Misc(5 %)Music
(4 %)Rattle, clatter
(7 %)
Kitchen businesses
(7%)
Running water (8 %)
Speech(33 %)
Doors (36 %)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
4c. VoiceGrep example(Correct detection)
Start of the speech is detected correctly
End of the speech is missed
Play sample #1
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Door opening Running water
Walking Door
4d. VoiceGrep example(false detections)
Play sample #2 Play sample #3
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
4e. VoiceGrep example(missed speech segment)
Door Speech and walkingDoorPlay sample #4
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
4f. Entire data set(4 hours)
Speech segments
Result of VoiceGrep
Data
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
5. Speaker verification in noisy environment
• Systematic testing of the effective parameters has been reported in [1].
• Applicability of speaker verification in real environment has been reported in [2] and in Kinnunen’s PhD thesis [5].
• Additional testing will be done if enough time.
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
5a. Text-dependent verificationin access control
• Utilizing time series information improves recognition.
• Best result if everyone has their own password.
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
6. Prototype for access control
Microphone
Motion detector
Emergency button
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
7. Calling elevator(technical requirements)
• Communication with OPC-server:– Implemented with Matrikon server.
• Program logic to elevator implemented:– Reads variables from OPC-server. – Interprets and shows elevator status.– Includes recording logic.
• Speaker and voice related stuff:– Not yet implemented.– Main window does not show anything yet.
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
8. Usability for detecting sound sources in general
- Not yet done -
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
9. Keyword search
- Not yet done -
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Publications (season 3)1. J. Saastamoinen, Z. Fiedler, T. Kinnunen and P. Fränti,
"On factors affecting MFCC-based speaker recognition accuracy", Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, 503-506, October 2005.
2. H. Gupta, V. Hautamäki, T. Kinnunen and P. Fränti, "Field evaluation of text-dependent speaker recognition in an access control application", Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, 551-554, October 2005.
3. T. Kinnunen, R. Gonzalez-Hautamäki, "Long-Term F0 Modeling for Text-Independent Speaker Recognition" Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, 567-570, October 2005.
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Theses (season 3)Opinnäytetyöt
4. T. Kinnunen, "Optimizing Spectral Feature Based Text Independent Speaker Recognition”, PhD thesis, University of Joensuu, June 2005.
5. R. Gonzalez-Hautamäki, "Fundamental Frequency Estimation and Modeling for Speaker Recognition”, MSc thesis, University of Joensuu, July 2005.
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Speaker VerificationSpeaker Verification Speaker IdentificationSpeaker Identification
Speaker RecognitionSpeaker Recognition
Whose voice is this?Is this Bob’s voice?
(Claim)+
Verification
Imposter!
?Identification
Applications scenarios
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Software 1: Console program
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Software 2: WinSprofiler
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Software 3: Symbian
Port to Symbian OS with Series 60 UI platform
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Software 4: Door SProfiler
Opening laboratory door by speaking
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Software 5: Lift SProfiler(to appear in season 4 perhaps…)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Future development (1)
VAD
WinSprofilerWindows (JoY)
MobileSeries 60 (JoY)
SRLIB:
MSE
GMM
MFCC
VQ
DBsupport
LTAS
F0 extractionfusion by weighted MSE
Keyword search
Software integration
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Future development (2)
Classifier fusion
srlib
DB
Access controlSpeech analyzer tool
Forensic applications
Segmentation
VAD
common speaker recognition app. interface
Verification
Calling elevator
Keyword search
Call centerApplications
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Future development (3)
• Implement and integrate F0, maybe also other formants (F1, F2).
• Automatic voiced/unvoiced segmentation.
• User enrollment.
• Use of sequence information (triplets).
• Development of WinSprofiler software to the direction of voice profiler and speech analyzer tool!
Technical development
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
OPCserver
Machine room
CAN
EthernetTCP/IP
Microphone
Display
OPC client
LiftCaller
SRLIB 3.0
Approach detection
DCOM
Lift car &hardware
Our PC
GW box
Future development (4)
Elevator prototype
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Vision 1: Teleconferencing
Unkonwn
BobMinna
Alice
VPN
Paul
Speaker Recognition
Speaker Recognition
Speaker Recognition
Speaker Recognition
Speaker Recognition
Alice
Bob
Minna
Unknown
Verified &
allowed
Notregistered
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Vision 2: Call-center
• Speech is the main tool for people in call-center
• Voice login of personell
•Removes the need for manual entry
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Vision 3: Language recognition
• Related problem to speaker recognition – the same research groups usually study both problems.
• Not trivial to solve.
• Studied a lot for Asian languages, even for rare languages that do not have any ”written form”.
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Vision 4: Medical applications
• Doctor use voice to record summary of patient meetings.
• Access by keyword search.
• Annotation.
• Authentication of speaker.
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi
Thank for you patience!
Questions?