detecting proximity from personal audio recordings

12
Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12 1 1. Detecting Proximity 2. Audio Similarity: Cross Correlation 3. Audio Similarity: Fingerprints 4. Evaluation & Conclusions Detecting proximity from personal audio recordings Dan Ellis, Hiroyuki Satoh, Zhuo Chen LabROSA, Columbia Univ., NY USA ICSI, Berkeley, CA, USA Morikawa lab, University of Tokyo, Tokyo, Japan [email protected] http://labrosa.ee.columbia.edu / C OLUMBIA U NIVERSITY IN THE CITY OF NEW YORK Laboratory for the Recognition and Organization of Speech and Audio

Upload: others

Post on 27-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Detecting proximity from personal audio recordings

Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /121

1. Detecting Proximity 2. Audio Similarity: Cross Correlation 3. Audio Similarity: Fingerprints 4. Evaluation & Conclusions

Detecting proximity from personal audio recordings

Dan Ellis, Hiroyuki Satoh, Zhuo Chen LabROSA, Columbia Univ., NY USA

ICSI, Berkeley, CA, USA Morikawa lab, University of Tokyo, Tokyo, Japan

[email protected] http://labrosa.ee.columbia.edu/

COLUMBIA UNIVERSITYIN THE CITY OF NEW YORK

Laboratory for the Recognition andOrganization of Speech and Audio

Page 2: Detecting proximity from personal audio recordings

Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12

1. Detecting Proximity

• Easy for smartphones to “listen” to ambient audio • what can they do with the

information? !

!

2

• Ubiquitous Smartphones • opportunities from having

everyone’s phones connected via the cloud?

Page 3: Detecting proximity from personal audio recordings

Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12

Detecting Proximity• Application:

Who did I speak with?

• Approaches: • High-resolution indoor GPS

- walls? • Local wireless (NFC, Bluetooth)

- what is the right range?

• Ambient audio similarity • all phones have microphones • “radius” depends on noisiness

- matches practical conversation radius

3

09:00

09:30

10:00

10:30

11:00

11:30

12:00

12:30

13:00

13:30

14:00

14:30

15:00

15:30

16:00

16:30

17:00

17:30

18:00

preschool

cafepreschoolcafelecture

officeofficeoutdoorgroup

lab

cafemeeting2

office

office

outdoorlabcafe

Ron

Manuel

Arroyo?

Lesser

2004-09-13

L2cafeoffice

outdoorlecture

outdoormeeting2

outdoorofficecafeoffice

outdoorofficepostlecoffice

DSP03

compmtg

Mike

Sambarta?

2004-09-14

Page 4: Detecting proximity from personal audio recordings

Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12

Data: Poster Sessions• Simultaneous recordings by multiple subjects

in a real “poster session” • two attempts: SANE 2013, NEMISIG 2014

• Live subjects wore Red Hats • warning others • for tracking in video

• Final data set • six subjects • 30 mins with

at least 5 of 6

4

Page 5: Detecting proximity from personal audio recordings

Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12

2. Audio Similarity: Cross Correlation• Are two audio signals

“proximal”? • recorded at slightly

different places • .. different orientations, etc

• Expect differences in detail, but shared core

!!

• Cross-correlation reveals common part

5

’ MA(ej!) = HA(e

j!)C(ej!) +NA(ej!)

’ MB(ej!) = HB(e

j!)C(ej!) +NB(ej!)

SMAMB =MAM⇤B

=HAH⇤B |C|2

+HACN⇤B +H⇤

BC⇤NA +NAN

⇤B

Page 6: Detecting proximity from personal audio recordings

Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12

Short-Time Cross Correlation• Calculate cross-correlation between

corresponding short windows • e.g. 2 s windows every 1 s

• Find peak in time domain correlation • lag at peak = best local time alignment • value at peak (normalized by energies)

= degree of similarity between signals

• Plot best lag as vs. window time • threshold peak value to ignore chance correlation

6

Page 7: Detecting proximity from personal audio recordings

Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12

skewview!

• Compiled MATLAB application tocalculate & plot short-time cross correlation of long-duration signals • raw cross-correlation

plotted in grayscale • export peak lag times & values

http://labrosa.ee.columbia.edu/projects/skewview/

7

t_ref / sect_

targ

− t_

ref /

sec

Ref: 100_1198 Targ: 20140125 134129

500 1000 1500 2000−80.8

−80.6

−80.4

−80.2

−80

−79.8

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

Page 8: Detecting proximity from personal audio recordings

Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12

3. Audio Similarity: Fingerprints• Landmark-based audio fingerprinting:

• Represent audio as “constellation” of energy peaks

• Index nearby pairs of peaks for rapid search

• Match as multiple peaks in same relative positions

• Robust to… • channels - peak level not used • noise - only a few peaks need to match

• Fast search in large archives (Shazam)

8

Query audio

0

500

1000

1500

2000

2500

3000

3500

4000

Match: 05−Full Circle at 0.032 sec

0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

500

1000

1500

2000

2500

3000

3500

4000

time / secfre

q / H

z

Avery Wang, 2003

Page 9: Detecting proximity from personal audio recordings

Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12

audfprint• Open source audio fingerprinting tool

• Matlab: http://labrosa.ee.columbia.edu/matlab/audfprint/ Python: https://github.com/dpwe/audfprint

• Rapid retrieval of short noisy queries within large databases • 10 sec over-the-air

queries within 100k+ reference items in ~1 s

9

Page 10: Detecting proximity from personal audio recordings

Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12

4. Results• Mutual proximity between all six channels:

Cross-correlation Fingerprints

!

!

!

!

!

• Various “proximal episodes” between targets visible (dark)

• Good agreement between two methods

10

bmcrdl

dehpzc

crbm

dldehpzc

dlbm

crdehpzc

debm

crdl

hpzc

hpbm

crdl

dezc

zc

time / min1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

bmcrdl

dehp

counts0

5

bmcrdl

dehpzc

crbm

dldehpzc

dlbm

crdehpzc

debm

crdl

hpzc

hpbm

crdl

dezc

zc

time / min1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

bmcrdl

dehp

correlation0

0.3

Page 11: Detecting proximity from personal audio recordings

Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12

Evaluation• Ground truth?

• did not hand-mark video…

• Cross-correlation is quite reliable … • use it as reference for fingerprinting !

• DET curve for thresholded proximity • fprint vs. xcorr

• Execution time [for 6 x 30 min tracks]: • Cross-corr : ~(0.006 x tdur) x N2 [427 s] • Fingerprint: ~(0.030 x tdur) x N [317 s, linear]

11

0.1 0.5 2 5 10 20 40 0.1 0.2 0.5 1 2 5 10 20

40

False Alarm probability (in %)

Miss

pro

babil

ity (i

n %

)

FP DET curve for XC threshold 0.15

Page 12: Detecting proximity from personal audio recordings

Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12

Conclusions• Similarity between ambient audio

e.g. from smartphone micscan be used to track personal proximity

• Similarity can be measured by: • cross-correlation (accurate but expensive) • landmark fingerprinting (fast, but adequate?)

• Experiments showed both approachesgave very similar results • fingerprinting suitable for scaling to very large

datasets, e.g. across many users

12