audio lifelogging - columbia universitydpwe/talks/idse-lifelog-2014-01.pdf · audio lifelogging dan...
TRANSCRIPT
Audio Lifelogging - Dan Ellis 2014-01-29 /6���1
1. Audio Lifelogging 2. Speech 3. Environmental Sound Analysis 4. Medical Applications
Audio Lifelogging Dan Ellis
Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA
[email protected] http://labrosa.ee.columbia.edu/
Audio Lifelogging - Dan Ellis 2014-01-29 /6
Personal Audio Lifelogs!!
• Easy to record everything you hear ~250GB / year @ 64 kbps
!!
• Very hard to find anything how to scan? how to visualize? how to index?
���2
Audio Lifelogging - Dan Ellis 2014-01-29 /6
Speech Recognition• Transcripts of discussions would be useful…
distant-mic / noisy ASR still quite limited CHiME (Comp. Hearing in Multisource Env.s):
���3
Barker et al. ’13 http://spandh.dcs.shef.ac.uk/chime_workshop/slides/CHiME13_overview.pdf
Datasets and tasks
The ‘CHiME’ noise backgrounds
Binaural noise backgrounds recorded in a family home (living room).
Plenty of sources, well-defined application domain with a learnablenoise ‘vocabulary’ and ‘grammar’.
Total of 14 h of audio in 0.5 to 1.5 h sessions over several weeks.
(The 2nd ‘CHiME’ Challenge 01/06/2013 6 / 23
Results
Track 2 results
−6 −3 0 3 6 90
10
20
30
40
50
60
70
80
SNR (dB)
Wo
rd e
rro
r ra
te (
%)
ASR Baseline (reverb)ASR Baseline (noisy)TU Tampere & KU LeuvenTU Munich, TUT, KUL & BMWFBK−Irst & INESC−IDMitsubishi Electric
Best system: spatial enhancement, MLLT, SAT, LDA, f-bMMI, featureaugmentation, bMMI noise-adaptive training, DLM and MBR decoding
(The 2nd ‘CHiME’ Challenge 01/06/2013 22 / 23
Results
Track 2 results
−6 −3 0 3 6 90
10
20
30
40
50
60
70
80
SNR (dB)
Word
err
or
rate
(%
)
ASR Baseline (reverb)ASR Baseline (noisy)TU Tampere & KU LeuvenTU Munich, TUT, KUL & BMWFBK−Irst & INESC−IDMitsubishi Electric
Best system: spatial enhancement, MLLT, SAT, LDA, f-bMMI, featureaugmentation, bMMI noise-adaptive training, DLM and MBR decoding
(The 2nd ‘CHiME’ Challenge 01/06/2013 22 / 23
Audio Lifelogging - Dan Ellis 2014-01-29 /6
Environmental Sound Classification• Classify soundtracks with “texture” features
!!!!
• Mixed results…
���4
SoundAutomatic
gaincontrol
Envelopecorrelation
Cross-bandcorrelations
(318 samples)
Modulationenergy(18 x 6)
mean, var,skew, kurt(18 x 4)
melfilterbank
(18 chans) x
x
xxxx
FFT Octave bins0.5,1,2,4,8,16 Hz
Histogram
Ellis, Zheng, McDermott ’11
Audio Lifelogging - Dan Ellis 2014-01-29 /6
Segmentation & Clustering!
• Features from 1 min windows !
• Segmentation by local statistics !
• Cluster across whole set !
• Manual labels of each class
���5
09:00
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
preschool
cafepreschoolcafelecture
officeofficeoutdoorgroup
lab
cafemeeting2
office
office
outdoorlabcafe
Ron
Manuel
Arroyo?
Lesser
2004-09-13
L2cafeoffice
outdoorlecture
outdoormeeting2
outdoorofficecafeoffice
outdoorofficepostlecoffice
DSP03
compmtg
Mike
Sambarta?
2004-09-14
Lee & Ellis ’04
Audio Lifelogging - Dan Ellis 2014-01-29 /6
Medical Applications• Tracking behavior & symptoms
coughing sleep disturbances
• Tracking hospital interactions correlate recordings
���6
Audio Lifelogging - Dan Ellis 2014-01-29 /6
Privacy• Privacy-preserving features
scramble audio over 200ms windows
���7
Original (dan+kean-ex.wav)
0
time / s
level / dBfre
q / k
Hz
Scrambled (200ms wins over 1s)
0 2 4 6 8 10 12 140
2
4
freq
/ kH
z
2
4
-60-40-20020
Hearium
• Self-audio only augmented-realityearphones/mics
Audio Lifelogging - Dan Ellis 2014-01-29 /6
Summary• Personal Audio Lifelogs
Too good to waste! !
• Speech & Nonspeech ContentBoth useful in different ways !
• ApplicationsPersonal information, behavior measurement
���8