double talk 1

Upload: srangaswamyreddy

Post on 06-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Double Talk 1

    1/13

    Improving Monaural Speaker Identication byDouble-Talk Detection

    R. Saeidi 1, P. Mowlaee 2 , T. Kinnunen 1, Z. -H. Tan 2,M. G. Christensen 3, S. H. Jensen 2, and P. Franti 1

    1 Speech and image processing Unit (SIPU), School of Computing,University of Eastern Finland

    2 Dept. of Electronic Systems, 3 Dept. of Architecture Design and Media TechnologyAalborg University, Denmark

    Interspeech 2010Japan, September 2010

    Presenter: Zhengh-Hua Tan

    R. Saeidi 1 , P. Mowlaee 2 , T. Kinnune n 1 , Z. -H. Tan 2 , M. G. Christensen 3 , S. H. Jensen 2 , and P. Fr anti 1

    Improving Monaural Speaker Identication by Double-Talk Detection

    1 / 13

    http://find/http://goback/
  • 8/3/2019 Double Talk 1

    2/13

    Presentation Outline

    1 Problem Denition and Background

    2 Proposed system

    3 Double Talk Detection

    4 Speaker Identication

    5 Performance Evaluation

    R. Saeidi et al. Monaural Speaker Identication 2 / 13

    http://find/
  • 8/3/2019 Double Talk 1

    3/13

    Monaural Speaker IdenticationProblem Denition

    FundamentalsRecognize BOTH of the speakers existing in a MIXED audio leNovelty of this work is including double-talk detector (DTD) as apre-processor for a previously proposed speaker identication

    back-end

    R. Saeidi, P. Mowlaee, T. Kinnunen, Z. H Tan, M. G. Christensen, S. H. Jensen and P. Fr anti,Signal-to-signal ratio independent speaker identication for co-channel speech signals, IEEE 20thInternational Conference on Pattern Recognition, ICPR 2010, , pp. 4565-4568, Istanbul, Turkey, August2010.

    R. Saeidi et al. Monaural Speaker Identication 3 / 13

    http://find/http://goback/
  • 8/3/2019 Double Talk 1

    4/13

    Monaural Speaker IdenticationMotivation

    There are SOME studies to recognize BOTH of the speakers, butthey need at least TWO microphones [1]There are FEW studies to recognize BOTH of the speakers whenwe have only ONE microphone [2]

    Building a stand alone speaker identication system as acomputationally less intensive alternative for Super HumanIroquois system [2]Bringing single-talk/double-talk information in frame-level toimprove monaural speaker identication

    [1] Y. E. Kim, J. M. Walsh, and T. M. Doll, Comparison of a joint iterative method for multiple speakeridentication with sequential blind source separation and speaker identication, in Odyssey 2008: TheSpeaker and Language Recognition Workshop, Jan. 2008.[2] J. R. Hershey, S. J. Rennie, P. A. Olsen, and T. T. Kristjansson, Super-human multi-talker speechrecognition: A graphical modeling approach, Elsevier Computer Speech and Language, vol. 24, no. 1, pp.4566, Jan 2010.

    R. Saeidi et al. Monaural Speaker Identication 4 / 13

    http://find/http://goback/
  • 8/3/2019 Double Talk 1

    5/13

    System structure

    Figure: The block diagram of the proposed system.

    R. Saeidi et al. Monaural Speaker Identication 5 / 13

    http://find/http://goback/
  • 8/3/2019 Double Talk 1

    6/13

    Double Talk Detection

    Assume that we have K candidate models denoted by M k (i.e.M 0, M 1, and M 2), for describing monaural speech signal.We adopt a maximum a posteriori (MAP) criterion formultiple-hypothesis test to determine double-talk/single-talkregions in segments of a mixed signal. Given the mixed signal,

    select the model which has the the maximum a posteriori (MAP)probability.

    We apply different policies in speaker identication for mixed and

    single-talker frames.M 0: None of the speakers is active,M 1: One of the speakers is active,M 2: Both of the speakers are active.

    R. Saeidi et al. Monaural Speaker Identication 6 / 13

    http://find/http://goback/
  • 8/3/2019 Double Talk 1

    7/13

    Double Talk Detection Performance

    0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.81

    0

    1

    2

    3

    4

    M i x e d s i g n a l

    0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.81

    0

    1

    2

    3

    4

    S p e a k e r

    1

    0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.81

    0

    1

    2

    3

    4

    Time (sec)

    S p e a k e r

    2

    Speaker 2 signalGround truthEstimated boudaries by DTD

    Speaker 1 signalGround truthEstimated boudaries by DTD

    Mixed signalGround truthEstimated boudaries by DTD

    Figure: Double-talk detection results for mixture of male and female mixed at 3 dB SSR. (Labels are -1 for no speech, 0 for mixed signal,1 for speaker 1 and

    2 for speaker 2.)R. Saeidi et al. Monaural Speaker Identication 7 / 13

    http://find/http://goback/
  • 8/3/2019 Double Talk 1

    8/13

    Speaker IdenticationFrame Level Likelihood (FLL)

    The main idea is to use mixed speech domain GMM modelsWe use here T 0 frames of input feature stream which arerecognized to be mixed speechWe compute FLL as: s igt = log[ p(x t | ig )] log[ p(x t |UBM )] (1)Finding the most probable speaker for each frame, we countnumber of winning frames per speaker and normalize it

    Figure: ig as the model for the ith spe aker at SSR level g.R. Saeidi et al. Monaural Speaker Identication 8 / 13

    http://find/http://goback/
  • 8/3/2019 Double Talk 1

    9/13

    Speaker IdenticationKullback-Leibler divergence (KLD)

    We use here T 0 frames of input feature stream which arerecognized to be mixed speech

    We compute KLD as:KLD ig = 12

    M m =1 wm ( me mig )

    T 1m ( me mig ) (2)

    KLD scores averaged over SSR levels, g, and then normalized

    R. Saeidi et al. Monaural Speaker Identication 9 / 13

    http://find/http://goback/
  • 8/3/2019 Double Talk 1

    10/13

    Speaker IdenticationScore Fusion

    For T 0 frames of input feature stream which are recognized to bemixed speech we form the score per speaker as:score = 0 .5 KLD + 0 .5 FLL

    For T 1 (T 2) frames of input which are recognized to belong tospeaker 1 (2), we pass them to KLD module to nd the best matchidx is the identied speaker from single-talk frames, we add abonus score to its decision score as:score [idx ] = score [idx ] + T 1/T (or T 2/T )R. Saeidi et al. Monaural Speaker Identication 10 / 13

    http://goforward/http://find/http://goback/
  • 8/3/2019 Double Talk 1

    11/13

    Evaluation Corpus

    Grid corpus

    Number of sentences per talker: 1000Number of speakers: 34 (18 male and 16 female)Corpus size: 34,000Number of distinct sentences: 2048

    Files duration: typically 1-2 sec

    Figure: The Speech Separation Challenge.

    R. Saeidi et al. Monaural Speaker Identication 11 / 13

    http://goforward/http://find/http://goback/
  • 8/3/2019 Double Talk 1

    12/13

    Speaker IdenticationResults

    Table: Speaker identication performance (% error) where both speakers arecorrectly found in the top-3 list. Yes/No indicates whether the proposed DTDmethod is included. For the ST scenario both systems provide no error.

    SG DG AverageDTD No Yes No Yes No YesSSR

    -9 dB 7.26 6.70 17.50 13.03 8.00 5.32-6 dB 3.35 3.35 6.00 5.00 3.00 2.29-3 dB 0.56 0.56 2.50 2.00 1.00 0.610 dB 1.68 1.68 1.00 2.00 0.83 0.613 dB 2.79 2.23 6.50 5.00 3.00 1.896 dB 6.15 5.59 9.50 10.50 5.00 4.37

    Average 3.64 3.35 7.17 6.17 3.47 2.57

    R. Saeidi et al. Monaural Speaker Identication 12 / 13

    http://goforward/http://find/http://goback/
  • 8/3/2019 Double Talk 1

    13/13

    Conclusion

    Successful ideas from speaker verication is applied for monauralspeaker identicationMixed speech with different SSRs used to train speaker GMMsSpeaker models are created by MAP adaptation rather than

    conventional ML trainDouble talk detection introduced to enhance speaker identicationsystem performance

    MATLAB codewill be made available in my webpage: cs.joensuu./pages/saeidicontact: [email protected].

    R. Saeidi et al. Monaural Speaker Identication 13 / 13

    http://goforward/http://find/http://goback/