anechoic signal bathroom center 16 bathroom corner 16 ping−pong...

1
Effects of Reverberant Energy on Statistics of Speech Madhu Shashanka 1 , Barbara Shinn-Cunningham 1 , and Martin Cooke 2 1 Boston University, 2 University of Sheffield { shashanka,shinn } @cns.bu.edu, [email protected] Anechoic Signal Anechoic Bathroom Center 16 Bath Ctr 16 Bathroom Corner 16 Bath Cnr 16 Ping−pong Center 16 PP Ctr 16 Ping−pong Corner 16 PP Cnr 16 Ping−pong Side 16 PP Side 16 Anechoic Signal Anechoic Bathroom Center 50 Bath Ctr 50 Bathroom Corner 50 Bath Cnr 50 Ping−pong Center 50 PP Ctr 50 Ping−pong Corner 50 PP Cnr 50 Ping−pong Side 50 PP Side 50 Anechoic Signal Anechoic Bathroom Center 100 Bath Ctr 100 Bathroom Corner 100 Bath Cnr 100 Ping−pong Center 100 PP Ctr 100 Ping−pong Corner 100 PP Cnr 100 Ping−pong Side 100 PP Side 100 Anechoic Signal Anechoic Bathroom Center 200 Bath Ctr 200 Bathroom Corner 200 Bath Cnr 200 Ping−pong Center 200 PP Ctr 200 Ping−pong Corner 200 PP Cnr 200 Ping−pong Side 200 PP Side 200 Objective Study how reverberation affects sparsity and statistics of speech. Measurements: We measured impulse responses in five dif ferent room/position conditions and at four dif ferent distances fr om the sour ce. Bathr oom Center Corner PingPong Room Center Corner Side At every location/condition, responses wer e recorded at two mics (left and right) which wer e 0.4 m apart. How does reverb distort a speech signal? Reverberation smears energy across time. A reverb signal is less sparse in time and frequency than the anechoic signal. The waveforms and spectrograms of an anechoic signal and a reverb signal for a particular condition are shown below. Waveforms ANECHOIC BR CENTER 2m Spectrograms Approach We compar ed energy dif ferences between signals using a time-fr equency representation (spectr ogram). Reverberant signals wer e first time-aligned with the anechoic signal and levels wer e corr ected so that RMS energy of the dir ect sound was equal to that of the anechoic signal. −100 −50 0 50 0 5 10 15 20 Energy (dB) s t i n u F T f o e g a t n e c r e P Clean 0.16 m 0.50 m 1.0 m 2.0 m BR CENTER −100 −50 0 50 0 5 10 15 20 Energy (dB) s t i n u F T f o e g a t n e c r e P Clean Br Center Br Corner Pr Center Pr Corner Pr Side 2.0 m Speech and Silence We consider another measur e of the effects of reverberation – percentage of TF units for which anechoic energy was within ± 3 dB of reverb energy. Speech - if energy is within 20 dB of max energy. Silence - if energy is less than max energy by mor e than 20 dB. Below , we plot these percentages separately for the bathroom and the pingpong room. Bathroom : 16 50 100 200 0 10 20 16 50 100 200 0 20 40 ) c i o h c e n a ( E f o B d 3 n i h t i w ) b r e v e r ( E Speech 16 50 100 200 0 5 10 15 Silence Center Corner Pingpong room : 16 50 100 200 0 20 40 20 40 60 80 0 20 40 Center Corner Side ) c i o h c e n a ( E f o B d 3 n i h t i w ) b r e v e r ( E Speech Silence Below , we plot the histogram of the energy dif ference between S1 and S2 in the anechoic and reverberant cases.To visualize the dif ference mor e clearly , we also plot the dif ference between the two curves (right). Energy(S1-S2) Histograms and their dif ference −100 −50 0 50 100 0 5 10 15 20 Energy(S1) − Energy(S2) s t i n u F T f o % Anechoic Reverb Bathroom Center 2.0m −100 −50 0 50 100 −3 −2 −1 0 1 2 3 4 ) c i o h c e n A ( % ) b r e v e R ( % Energy(S1) − Energy(S2) Bathroom Center 2.0m We can also see to what extent energies in the two signals overlapped. Below , we plot the percentage of TF units for which energy in speech sample 1 was within 3 dB of energy in the other sample. Bathroom : 16 50 100 200 18 19 20 20 25 30 35 ( y g r e n e n i p a l r e v o h c i h w s t i n u F T % ± B d 3 Speech 18 19 20 21 Silence Center Corner Anechoic Pingpong room : 16 50 100 200 18 19 20 Center Corner Side Anechoic 20 25 30 35 19 20 21 ( y g r e n e n i p a l r e v o h c i h w s t i n u F T % ± ) B d 3 Speech Silence Energy of S2 in the gaps of S1 We consider how much energy of speech sample 2 is present in the silence regions of speech sample 1. The plot below shows the energy of speech sample 2 and dark blue regions represent TF units where sample 1 has high energy. Sample 2 in the "silence" of Sample 1 Below, we plot energy histograms. Silence of S1 −100 −80 −60 −40 −20 0 0 5 10 15 20 Energy s t i n u F T f o % Anechoic Reverb Energy of S1 −100 −80 −60 −40 −20 0 0 5 10 15 20 Energy s t i n u F T f o % Anechoic Reverb Energy of S2 Energy of (S1+S2) −100 −80 −60 −40 −20 0 0 5 10 15 20 Energy s t i n u F T f o % Anechoic Reverb Silence of S1 −100 −80 −60 −40 −20 0 0 5 10 15 20 Energy s t i n u F T f o % Silence of S2 Anechoic Reverb Reverberation More Energy Below are histograms of the energy in time frequency (TF) units. The relative amount of energy from reverberation grows with distance. Histogram of TF energy for the bathroom center condition. The hard-walled bathroom (BR) produces more reverberant energy than a classroom (the ping-pong room, PR). Histogram of TF energy for different room conditions, for a 2-m distance. Silence is filled in by reverberant energy, especially in the hard- walled bathroom. As distance increases, the effects of reverberation increase, both for silence and for speech. However, filling in the silence destroys the normal modulations that convey speech meaning. Plots show the percentage of TF units for which the energy in the reverberant condition is within 3 dB of what it was in the clean speech. 16 50 100 200 16 50 100 200 Simultaneous Speech We now consider the effect of reverberation on simultaneous speech signals. We analyzed all TF units togethe, but also broke the TF units into two categories: 16 50 100 200 16 50 100 200 16 50 100 200 16 50 100 200 Distance (cm) Distance (cm) Distance (cm) Distance (cm) All TF units All TF units All TF units All TF units Histograms of energy of signal 1 (left panel) and signal 2 (right panel) in the silence regions of signal 1. Histograms of energy of both signals in silence regions of signal 1 (left panel) and signal 2 (right panel). Acknowledgements 1. Tim Streeter for helping with measu- rements of impulse responses. 2. Portions of this work were supported by grants from (a) Air Force Office of Scientific Research (b) National Science Foundation.

Upload: others

Post on 07-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Anechoic Signal Bathroom Center 16 Bathroom Corner 16 Ping−pong …labrosa.ee.columbia.edu/Montreal2004/talks/poster_madhu.pdf · than a classroom (the ping-pong room, PR). Histogram

Effects of Reverberant Energy on Statistics of SpeechMadhu Shashanka1, Barbara Shinn-Cunningham1, and Martin Cooke2

1Boston University, 2University of Sheffield{ shashanka,shinn} @cns.bu.edu, [email protected]

Anechoic Signal

Anechoic

Bathroom Center 16

Bath Ctr 16

Bathroom Corner 16

Bath Cnr 16

Ping−pong Center 16

PP Ctr 16

Ping−pong Corner 16

PP Cnr 16

Ping−pong Side 16

PP Side 16

Anechoic Signal

Anechoic

Bathroom Center 50

Bath Ctr 50

Bathroom Corner 50

Bath Cnr 50

Ping−pong Center 50

PP Ctr 50

Ping−pong Corner 50

PP Cnr 50

Ping−pong Side 50

PP Side 50

Anechoic Signal

Anechoic

Bathroom Center 100

Bath Ctr 100

Bathroom Corner 100

Bath Cnr 100

Ping−pong Center 100

PP Ctr 100

Ping−pong Corner 100

PP Cnr 100

Ping−pong Side 100

PP Side 100

Anechoic Signal

Anechoic

Bathroom Center 200

Bath Ctr 200

Bathroom Corner 200

Bath Cnr 200

Ping−pong Center 200

PP Ctr 200

Ping−pong Corner 200

PP Cnr 200

Ping−pong Side 200

PP Side 200

Objective

Study how reverberation affects sparsity andstatistics of speech.

Measurements: We measured impulse responses in five dif ferent room/positionconditions and at four dif ferent distancesfrom the sour ce.

• Bathr oom

– Center

– Corner

• PingPong Room

– Center

– Corner

– Side

At every location/condition, responses wer erecorded at two mics (left and right) whichwer e 0.4 m apart.

How does reverb distort a speechsignal?Reverberation smears energy across time. Areverb signal is less sparse in time andfrequency than the anechoic signal. Thewaveforms and spectrograms of an anechoicsignal and a reverb signal for a particularcondition are shown below.

Waveforms

ANECHOIC BR CENTER 2m

Spectrograms

Approach

We compar ed energy dif ferences betweensignals using a time-fr equencyrepresentation (spectr ogram). Reverberantsignals wer e first time-aligned with theanechoic signal and levels wer e corr ected sothat RMS energy of the dir ect sound wasequal to that of the anechoic signal.

−100 −50 0 500

5

10

15

20

Energy (dB)

stinu FT fo eg atn ec r eP

Clean0.16 m0.50 m1.0 m2.0 m

BR CENTER

−100 −50 0 500

5

10

15

20

Energy (dB)

stinu FT fo e gatn e cre P

CleanBr CenterBr CornerPr CenterPr CornerPr Side

2.0 m

Speech and Silence

We consider another measur e of the effectsof reverberation – percentage of TF units forwhich anechoic energy was within ± 3 dB ofreverb energy.

• Speech - if energy is within 20 dB of maxenergy.

• Silence - if energy is less than max energyby more than 20 dB.

Below, we plot these percentages separatelyfor the bathroom and the pingpong room.

Bathroom :

16 50 100 2000

10

20

16 50 100 2000

20

40

)ciohcena(E fo Bd 3 ni htiw )b re ver(E

Speech

16 50 100 2000

5

10

15 Silence

CenterCorner

Pingpong room :

16 50 100 2000

20

40

20

40

60

80

0

20

40

CenterCornerSide

)ciohcena(E f o B d 3 n iht iw ) brev er(E

Speech Silence

Below, we plot the histogram of the energydif ference between S1 and S2 in the anechoicand reverberant cases.To visualize thedif ference more clearly , we also plot thedif ference between the two curves (right).

Energy(S1-S2) Histograms and theirdif ference

−100 −50 0 50 1000

5

10

15

20

Energy(S1) − Energy(S2)

stinu FT fo %

AnechoicReverb

BathroomCenter2.0m

−100 −50 0 50 100−3

−2

−1

0

1

2

3

4

)ciohcenA(% − )br ev e

R(%

Energy(S1) − Energy(S2)

BathroomCenter2.0m

We can also see to what extent energies inthe two signals overlapped. Below, we plotthe percentage of TF units for which energyin speech sample 1 was within 3 dB ofenergy in the other sample.

Bathroom :

16 50 100 20018

19

20

20

25

30

35

( ygrene ni pal rev o hcihw stinu FT

) Bd 3

Speech

18

19

20

21 Silence

CenterCornerAnechoic

Pingpong room :

16 50 100 20018

19

20CenterCornerSideAnechoic

20

25

30

35

19

20

21

( ygrene ni pa lrev o hcihw stinu FT

) Bd 3

Speech Silence

Energy of S2 in the gaps of S1

We consider how much energy of speech sample2 is present in the silence regions of speech sample 1. The plot below shows the energy of speech sample 2 and dark blue regions representTF units where sample 1 has high energy.

Sample 2 in the "silence" of Sample 1

Below, we plot energy histograms.

Silence of S1

−100 −80 −60 −40 −20 00

5

10

15

20

Energy

stinu FT fo %

AnechoicReverb

Energy of S1

−100 −80 −60 −40 −20 00

5

10

15

20

Energy

stinu FT fo %

AnechoicReverb

Energy of S2

Energy of (S1+S2)

−100 −80 −60 −40 −20 00

5

10

15

20

Energy

stinu FT fo %

AnechoicReverb

Silence of S1

−100 −80 −60 −40 −20 00

5

10

15

20

Energy

stinu FT fo %

Silence of S2AnechoicReverb

Reverberation More EnergyBelow are histograms of the energy intime frequency (TF) units.

The relative amount of energy fromreverberation grows with distance.Histogram of TF energy for thebathroom center condition.

The hard-walled bathroom (BR)produces more reverberant energythan a classroom (the ping-pong room, PR). Histogram of TF energy for different room conditions, for a 2-m distance.

Silence is filled in by reverberantenergy, especially in the hard-walled bathroom. As distanceincreases, the effects of reverberationincrease, both for silence and forspeech. However, filling in the silencedestroys the normal modulations that convey speech meaning. Plots show the percentage of TF unitsfor which the energy in the reverberantcondition is within 3 dB of what it was in the clean speech.

16 50 100 200 16 50 100 200

Simultaneous SpeechWe now consider the effect of reverberationon simultaneous speech signals.

We analyzed all TF units togethe, but alsobroke the TF units into two categories:

16 50 100 200 16 50 100 200

16 50 100 200 16 50 100 200

Distance (cm)

Distance (cm)

Distance (cm)

Distance (cm)

All TF units

All TF units

All TF units

All TF units

Histograms of energy of signal 1 (left panel) andsignal 2 (right panel) in the silence regions of signal 1.

Histograms of energy of both signals in silenceregions of signal 1 (left panel) andsignal 2 (right panel).

Acknowledgements1. Tim Streeter for helping with measu-rements of impulse responses.2. Portions of this work were supportedby grants from (a) Air Force Office of Scientific Research(b) National Science Foundation.