computer science department

16
Computer Science Department A Speech / Music Discriminator A Speech / Music Discriminator using RMS and Zero-crossings using RMS and Zero-crossings Costas Panagiotakis and George Tziritas Department of Computer Science University of Crete Heraklion Greece

Upload: kyoko

Post on 05-Feb-2016

63 views

Category:

Documents


0 download

DESCRIPTION

Computer Science Department. A Speech / Music Discriminator using RMS and Zero-crossings. Costas Panagiotakis and George Tziritas. Department of Computer Science University of Crete Heraklion Greece. Computer Science Department. EUSIPCO 2002, Toulouse France. 1. Presentation Organization. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computer Science Department

Computer Science Department

A Speech / Music Discriminator using RMS and A Speech / Music Discriminator using RMS and Zero-crossingsZero-crossings

Costas Panagiotakis and George Tziritas

Department of Computer Science University of CreteHeraklion Greece

Page 2: Computer Science Department

Computer Science Department

Presentation Organization

I. Introduction II. SegmentationIII. ClassificationIV. ResultsV. Conclusion

EUSIPCO 2002, Toulouse FranceEUSIPCO 2002, Toulouse France 11

Page 3: Computer Science Department

Computer Science Department

Introduction (1/3)Input

Figure 1: Original Sound Signal (44100 or 22050 sample rate)

Output

Figure 2: Real time Segmentation and Classification (Speech,Music,Silence)

EUSIPCO 2002, Toulouse FranceEUSIPCO 2002, Toulouse France 22

Page 4: Computer Science Department

Computer Science Department

Introduction (2/3)Approaches

Basic purpose

•Features extraction (energy,frequency)

•Feature based Segmentation and Classification

•Real time segmentation and classification

•Algorithmic - computation constraints

•Low feature number

•Low change extraction error (20 msec)

•Low minimum distance between two changes (1 sec)

•High accuracy (95 %)

33EUSIPCO 2002, Toulouse FranceEUSIPCO 2002, Toulouse France

Page 5: Computer Science Department

Computer Science Department

Introduction (3/3)

Root Mean Square (RMS)

Basic Features

Zero Crossings (ZC)

•Computed every 20 msec

•Independent characteristics

Signal energy

Figure 3: RMS in music Figure 4: RMS in speech

Figure 5: ZC in music Figure 6: ZC in speech

Mean frequency

1

N

n

x n( )2

=A =

44EUSIPCO 2002, Toulouse FranceEUSIPCO 2002, Toulouse France

Page 6: Computer Science Department

Computer Science Department

Segmentation (1/3)

Basic characteristics RMS based χ2 distribution fits well the RMS histograms

Two stage algorithmStage 1

•1 sec accuracy (low computation cost)

Stage 2 •20 msec accuracy (high computation cost)

m : mean , s2 : variance

Figure 8: Histogram RMS in speech, approximation by χ2 distribution

Figure 7: Histogram RMS in speech, approximation by χ2 distribution

p(x) = xa e bx

ba 1 Gamma a 1( )x 0

a = m2

s21 b =

s2

m

Γ(Γ( a + 1) a + 1)

55EUSIPCO 2002, Toulouse FranceEUSIPCO 2002, Toulouse France

Page 7: Computer Science Department

Stage 1•Partitioning in 1 sec frames (50 RMS values)•Change in Frame i Frame i-1 and Frame i+1 have to differ•Computation of frame distance D (Matusita Distance) using frame similarity (p)

•Frame i is candidate for Stage 2 (there is a change)If D(i) > threshold and D(i) local maximal

p x( ) xp1

x( ) p2

x( ) d D i( ) 1 p pi 1 pi 1

Computer Science Department

Segmentation (2/3)

p( pp( p11 , p , p2 2 ))

66EUSIPCO 2002, Toulouse FranceEUSIPCO 2002, Toulouse France

RMSRMS

timetimeFrame i-1 Frame i+1

HIGH

Frame i Frame i+21 sec frames1 sec frames

DistanceDistance

Change in frame iChange in frame i

LOW

Page 8: Computer Science Department

Computer Science Department

Segmentation (3/3)

Stage 2•20 msec accuracy

•for each candidate frame (i) from stage 11. move 2 successive frames (1 sec) located before and after frame (i)2. find the time instant where the 2 successive frames have the maximum Matusita distance in RMS distribution

•Possible oversegmentation

Figure 10: The RMS data and the distance D

Figure 11: The segmentation result and the RMS data

77EUSIPCO 2002, Toulouse FranceEUSIPCO 2002, Toulouse France

Page 9: Computer Science Department

Computer Science Department

Classification (1/4)

Basic purpose Segment classification in one of following classes

•Music•Speech•Silence

Main Algorithm •Hypothesis

Segmentation gives homogenous segments

•Input Basic characteristics RMS, ZC

•Actual features computation of segment

•Classification based on actual features values

88EUSIPCO 2002, Toulouse FranceEUSIPCO 2002, Toulouse France

Page 10: Computer Science Department

Computer Science Department

Classification (2/4)

Actual Features specification •Normalized RMS variance, σ2

Α

σ2Α =

Usually (86 %) σ2Α(music) < σ2

Α (speech)

•The probability of null ZC, ZC0Always ZC0 (music) = 0 Usually (40%) ZC0 (speech) > 0

•Maximal mean frequency, max(ZC)Almost always in speech max(ZC) < 2.4 kHz In 2% of the cases in music max(ZC) > 2.4 kHz

var RMS( )

mean RMS( )( )2

99EUSIPCO 2002, Toulouse FranceEUSIPCO 2002, Toulouse France

Page 11: Computer Science Department

Computer Science Department

•Joint RMS/ZC measure, Cz Speech : High correlation RMS, ZC

many void intervals low RMS and ZC

Music : Essentially independent RMS, ZC

•Void intervals frequency, FuVoid intervals detection ( 20 msec ):

(RMS < T1) && (RMS < 0.1•max(RMS(i)) && (RMS < T2) || (ZC = 0)

Group neighborly silent intervals

Fu : frequency of grouped silent intervals

Always in speech Fu > 0.6

In at least 65% of music Fu < 0.6

iA

Actual Features specification Classification (3/4)

1010EUSIPCO 2002, Toulouse FranceEUSIPCO 2002, Toulouse France

Page 12: Computer Science Department

Computer Science Department

Silence segment recognition

Segment is silence E < Threshold

E 0.7 median RMS i( )( )

0.3

i

RMS i( )

A

A

i A

Classification (4/4)

Decision making algorithm

ομιλία

Silence segment check

Actual features check Silence

speech music

1111EUSIPCO 2002, Toulouse FranceEUSIPCO 2002, Toulouse France

Page 13: Computer Science Department

Computer Science Department

Data Data source

Segmentation performance

Results

11.328 sec speech 3.131 sec music

70% audio CDs15% WWW15% recordings

Actual features performance

•97% detection probability

•Change accuracy ~ 0.2 sec

FeaturesFeatures

1212EUSIPCO 2002, Toulouse FranceEUSIPCO 2002, Toulouse France

σσ22ΑΑ

Cz Cz Cz Cz σσ22

ΑΑ ZC0 ZC0 σσ22

ΑΑ

Fu Fu σσ22

ΑΑ

AllAll CzCz

Acc

urac

yA

ccur

acy

ZC0ZC0 σσ22ΑΑ ,

ZC0 ZC0 σσ22

ΑΑ

FeaturesFeatures

Page 14: Computer Science Department

Computer Science Department

Complexity Conclusion

Summary

•Minimum complexity O(N)•Low computation cost

•Real time segmentation and classification in three classes•Energy distribution (RMS) suffices for segmentation•RMS – ZC suffices for classification•Purpose : minimum cost and high performance

Future extension•Content-based indexing and retrieval audio signals•Pre-processing stage for speech recognition

1313EUSIPCO 2002, Toulouse FranceEUSIPCO 2002, Toulouse France

Page 15: Computer Science Department

Computer Science Department

Segmentation - Classification Demo

Page 16: Computer Science Department

Computer Science Department

Sound Player Demo