university of joensuu dept. of computer science p.o. box 111 fin- 80101 joensuu tel. +358 13 251...

38
University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Speaker Recognition Research in Joensuu Speech and Image Processing Unit (SIPU) http:// cs.joensuu.fi/sipu / Puheteknologian talviseminaari Pasi Fränti Joensuu 10.3.2006

Upload: rebeca-wisbey

Post on 31-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Speaker Recognition Research in Joensuu

Speech and Image Processing Unit (SIPU) http://cs.joensuu.fi/sipu/

Puheteknologian talviseminaari

Pasi Fränti

Joensuu10.3.2006

Page 2: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Goals for PUMS season 3 (1/2)

1. Usability of automatic speaker identification in forensic applications

2. Compatibility with large databases

3. Automatization of LTAS + fusion with MFCC.

4. Voice activity detection

Page 3: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Goals for PUMS season 3 (2/2)

5. Speaker verification in real (noisy) environment

6. Prototype for access control7. Solving technical requirements for

prototype in elevator.8. Usability for detecting sound sources in

general9. Key word search (using HTK or Lingsoft

Recognizer)

Page 4: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Research Group

Pasi FräntiProfessor

Juhani Saastamoinen, PhLic

Tomi Kinnunen, PhD (Singapore)Ville Hautamäki, MScIsmo Kärkkäinen, MSc

PUMS personnel

Marko Tuononen, BSc

Doctoral researchers

Collaborators

Rosa Gonzalez-Hautamäki, MSc

Ilja Sidoroff

Victoria YanulevskayaEvgeny Karpov, MSc (NRC)

Page 5: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

1. Applicability to forensic applications

• Automatic speaker recognition study has been done.

• Results are not reported but actions taken within tasks 3 and 4.

• Material can be found in Kinnunen’s PhD thesis [4] and Niemi-Laitinen’s presentation.

Page 6: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

2. Support for large databases

- Not yet done -

Page 7: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

3. LTAS and other features

• Automatic calculation of LTAS done. Integration to WinSprofiler in progress. Reporting in progress.

• Benefit of LTAS is merely its speed and ease of use: no difficult control parameters.

• No additional benefit to recognition accuracy. MFCC includes the same information.

• Could be used for preliminary pruning in case of large datasets.

Page 8: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Noise robustness of F0 featureResults reported in [3, 5]

Page 9: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

4. Voice activity detection

• Software for speech segmentation (VoiceGrep).

• Command line version for Linux.

• Windows version in WinSprofiler.

• Testing done in SIPU laboratory.– Labtec® pc mic 333, 44,1 kHz

– Recordings were emphasized 24 dB by Audacity voice editor

Page 10: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

4a. Test material and results• Material

– 4 hours in total.

– Bad quality recordings: 11 bits data, of which 4-5 informatio, and the rest noise.

– VoiceGrep made 168 detections:

– 56 speech (33%)

– 112 non-speech (67%)

• Material included 71 real speech segments: – Average segment length 16 s.

– VoiceGrep found 25 of these (35 %)

Page 11: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

4b. VoiceGrep overall results

Misc(5 %)Music

(4 %)Rattle, clatter

(7 %)

Kitchen businesses

(7%)

Running water (8 %)

Speech(33 %)

Doors (36 %)

Page 12: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

4c. VoiceGrep example(Correct detection)

Start of the speech is detected correctly

End of the speech is missed

Play sample #1

Page 13: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Door opening Running water

Walking Door

4d. VoiceGrep example(false detections)

Play sample #2 Play sample #3

Page 14: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

4e. VoiceGrep example(missed speech segment)

Door Speech and walkingDoorPlay sample #4

Page 15: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

4f. Entire data set(4 hours)

Speech segments

Result of VoiceGrep

Data

Page 16: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

5. Speaker verification in noisy environment

• Systematic testing of the effective parameters has been reported in [1].

• Applicability of speaker verification in real environment has been reported in [2] and in Kinnunen’s PhD thesis [5].

• Additional testing will be done if enough time.

Page 17: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

5a. Text-dependent verificationin access control

• Utilizing time series information improves recognition.

• Best result if everyone has their own password.

Page 18: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

6. Prototype for access control

Microphone

Motion detector

Emergency button

Page 19: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

7. Calling elevator(technical requirements)

• Communication with OPC-server:– Implemented with Matrikon server.

• Program logic to elevator implemented:– Reads variables from OPC-server. – Interprets and shows elevator status.– Includes recording logic.

• Speaker and voice related stuff:– Not yet implemented.– Main window does not show anything yet.

Page 20: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

8. Usability for detecting sound sources in general

- Not yet done -

Page 21: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

9. Keyword search

- Not yet done -

Page 22: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Publications (season 3)1. J. Saastamoinen, Z. Fiedler, T. Kinnunen and P. Fränti,

"On factors affecting MFCC-based speaker recognition accuracy", Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, 503-506, October 2005.

2. H. Gupta, V. Hautamäki, T. Kinnunen and P. Fränti, "Field evaluation of text-dependent speaker recognition in an access control application", Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, 551-554, October 2005.

3. T. Kinnunen, R. Gonzalez-Hautamäki, "Long-Term F0 Modeling for Text-Independent Speaker Recognition" Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, 567-570, October 2005.

Page 23: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Theses (season 3)Opinnäytetyöt

4. T. Kinnunen, "Optimizing Spectral Feature Based Text Independent Speaker Recognition”, PhD thesis, University of Joensuu, June 2005.

5. R. Gonzalez-Hautamäki, "Fundamental Frequency Estimation and Modeling for Speaker Recognition”, MSc thesis, University of Joensuu, July 2005.

Page 24: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Speaker VerificationSpeaker Verification Speaker IdentificationSpeaker Identification

Speaker RecognitionSpeaker Recognition

Whose voice is this?Is this Bob’s voice?

(Claim)+

Verification

Imposter!

?Identification

Applications scenarios

Page 25: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Software 1: Console program

Page 26: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Software 2: WinSprofiler

Page 27: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Software 3: Symbian

Port to Symbian OS with Series 60 UI platform

Page 28: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Software 4: Door SProfiler

Opening laboratory door by speaking

Page 29: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Software 5: Lift SProfiler(to appear in season 4 perhaps…)

Page 30: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Future development (1)

VAD

WinSprofilerWindows (JoY)

MobileSeries 60 (JoY)

SRLIB:

MSE

GMM

MFCC

VQ

DBsupport

LTAS

F0 extractionfusion by weighted MSE

Keyword search

Software integration

Page 31: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Future development (2)

Classifier fusion

srlib

DB

Access controlSpeech analyzer tool

Forensic applications

Segmentation

VAD

common speaker recognition app. interface

Verification

Calling elevator

Keyword search

Call centerApplications

Page 32: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Future development (3)

• Implement and integrate F0, maybe also other formants (F1, F2).

• Automatic voiced/unvoiced segmentation.

• User enrollment.

• Use of sequence information (triplets).

• Development of WinSprofiler software to the direction of voice profiler and speech analyzer tool!

Technical development

Page 33: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

OPCserver

Machine room

CAN

EthernetTCP/IP

Microphone

Display

OPC client

LiftCaller

SRLIB 3.0

Approach detection

DCOM

Lift car &hardware

Our PC

GW box

Future development (4)

Elevator prototype

Page 34: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Vision 1: Teleconferencing

Unkonwn

BobMinna

Alice

VPN

Paul

Speaker Recognition

Speaker Recognition

Speaker Recognition

Speaker Recognition

Speaker Recognition

Alice

Bob

Minna

Unknown

Verified &

allowed

Notregistered

Page 35: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Vision 2: Call-center

• Speech is the main tool for people in call-center

• Voice login of personell

•Removes the need for manual entry

Page 36: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Vision 3: Language recognition

• Related problem to speaker recognition – the same research groups usually study both problems.

• Not trivial to solve.

• Studied a lot for Asian languages, even for rare languages that do not have any ”written form”.

Page 37: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Vision 4: Medical applications

• Doctor use voice to record summary of patient meetings.

• Access by keyword search.

• Annotation.

• Authentication of speaker.

Page 38: University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955  Speaker Recognition

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 JoensuuTel. +358 13 251 7959fax +358 13 251 7955www.cs.joensuu.fi

Thank for you patience!

Questions?