voice separation with tiny ml on the edge · 2020-03-13 · voice separation with tiny ml on the...

19
Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research Centre, Denmark) Prof. Toumas Virtanen (University of Tampere, Finland) Gaurav Naithani (University of Tampere, Finland) Niels H. Pontoppidan, PhD Research Area Manager, Augmented Hearing Science

Upload: others

Post on 12-Jul-2020

18 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

Voice Separation with tiny ML on the edge

Tiny ML Summit 2020

Main collaborators:

Dr. Lars Bramsløw (Eriksholm Research Centre, Denmark)

Prof. Toumas Virtanen (University of Tampere, Finland)

Gaurav Naithani (University of Tampere, Finland)

Niels H. Pontoppidan, PhD

Research Area Manager, Augmented Hearing Science

Page 2: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

Additional acknowledgements and references

• Thomas “Tom” Barker

• Giambattista Parascandolo

• Joonas Nikunen

• Rikke Rossing

• Atefeh Hafez

• Marianna Vatti

• Umaer Hanif

• Christian Grant

• Christian Hansen

• Bramsløw, L., Naithani, G., Hafez, A., Barker, T., Pontoppidan, N. H., & Virtanen, T. (2018). Improving competing voices segregation for hearing impaired listeners using a low-latency deep neural network algorithm. The Journal of the Acoustical Society of America, 144(1), 172–185.

• Naithani, G., Barker, T., Parascandolo, G., Bramsløw, L., Pontoppidan, N. H., & Virtanen, T. (2017). Low latency sound source separation using convolutional recurrent neural networks. 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 71–75.

• Naithani, G., Barker, T., Parascandolo, G., Bramsløw, L., Pontoppidan, N. H., & Virtanen, T. (2017). Evaluation of the benefit of neural network based speech separation algorithms with hearing impaired listeners. Proceedings of the 1st International Conference on Challenges in Hearing Assistive Technology. CHAT-17, Stockholm, Sweden.

• Naithani, G., Parascandolo, G., Barker, T., Pontoppidan, N. H., & Virtanen, T. (2016). Low-latency sound source separation using deep neural networks. 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 272–276.

• Pontoppidan, N. H., Vatti, M., Rossing, R., Barker, T., & Virtanen, T. (2016). Separating known competing voices for people with hearing loss. Proceedings of the Speech Processing in Realistic Environments Workshop, SPIRE Workshop.

• Barker, Thomas, Virtanen, T., & Pontoppidan, N. H. (2016). Hearing device comprising a low-latency sound source separation unit.

• Barker, Tom, Virtanen, T., & Pontoppidan, N. H. (2014). Hearing device comprising a low-latency sound source separation unit (Patent No. US Patent App. 14/874,641).

• Barker, Tom, Virtanen, T., & Pontoppidan, N. H. (2015). Low-latency sound-source-separation using non-negative matrix factorisation with coupled analysis and synthesis dictionaries. Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2015, 241–245.

Page 3: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

Facts and stats about hearing aids and market

Market

• 15+ million units sold per year

• Global wholesale market of USD 4+ billion per year

• Six largest manufacturers hold a market share of +90%

• Main market: OECD countries

• 4-6% yearly unit growth mainly due to demographic

development

• Growing aging population and increasing life expectancy

Hearing-aid users

• 10% of the population in OECD countries suffer from hearing

loss

• Only 20% of people suffering from a hearing loss use a

hearing aid

• 35-40% of the population aged 65+ suffer from a hearing loss

• Average age of first-time user is 69 years (USA)

• Average age of all users is 72 years (USA)

Page 4: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

Hearing devices

• Hearing devices help people communicate in simple and complex listening situations – also in sound environments were people with normal hearing give up using phones and headsets

• Some rely on hearing devices for a few hours a day for specific situations and many use them all awake hours

• Power 1 mA from zinc-air batteries replaced every week or Li-Ion batteries recharged every night

• Hardware design employs many low voltage and low clock-frequency methods

Page 5: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

Enhancing segregation by transforming “mono” to “stereo”

Page 6: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

History

1953

• Cocktail Party Problem coined by Colin Cherry

• Cherry proposes mono-to-Stereo to solve the probelm

2000

• Sam Roweis: One Microphone Source Separation at NIPS shows separation of known voices

2018

• Bramsløw et al: First time algorithms improve segregation of known voices for people with hearing loss

2020’s

• When will Tiny ML enable enhanced voice segregation in a hearing device?

Page 7: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

Spatial augmentation

• The algorithms separates voices into

two (or more channels)

• The hearing devices increases the

spatial difference cues, i.e. repositions

the sound sources further apart

• In case of spatial audio-visual cue

conflicts, visual cues are expected to

override the auditory cues just like

with ventroqlists

m

o

n

o

Artificial stereo

Page 8: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

Flowchart for training

DNN

training

Page 9: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

Flowchart for processing

Page 10: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

Enhanced segregation for people with mild/moderate hearing loss

• DNN processing

• 4 MIO weights for FDNNs (not optimized)

• 250 Hz audio frame processing rate

Bramsløw et al, JASA, 2018

Unprocessed

Ideal

Page 11: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

How listeners with normal hearing hears two competing voices

Page 12: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

How listeners with impaired hearing hears the two voices[The example could be harder to segregate]

Page 13: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

How it sounds when the two voices are separated out

Page 14: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

Focusing only on the female voice

Page 15: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

Focusing only on the male voice

Page 16: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

Enhanced segregation for people with mild/moderate hearing loss

• Processing requirements

• 4 MIO weights for FDNNs (not optimized)

• 250 Hz audio frame processing rate

Bramsløw et al, JASA, 2018

Unprocessed

Ideal

Page 17: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

Next steps

Feature performance

Increasing robustness to additional noise and reverberation

Increasing robustness to personal voice changes

Break reliance on training on specific voices (transfer learning)

Further decrease network sizes from 4 MIO weights

Hardware performance

See Zuzana Jelčicová’s poster:

• Benchmarking and improving NN execution on DSP vs. custom accelerator for hearing instruments

• From float to fixed point

• Parallel MACS

• Two-step scaling

Page 18: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

Zuzana Jelčicová: Benchmarking and improving NN execution on DSP vs. custom accelerator for hearing instruments

Page 19: Voice Separation with tiny ML on the edge · 2020-03-13 · Voice Separation with tiny ML on the edge Tiny ML Summit 2020 Main collaborators: Dr. Lars Bramsløw (Eriksholm Research

Voice Separation with tiny ML on the edge

Tiny ML Summit 2020

Main collaborators:

Dr. Lars Bramsløw (Eriksholm Research Centre, Denmark)

Prof. Toumas Virtanen (University of Tampere, Finland)

Gaurav Naithani (University of Tampere, Finland)

Niels H. Pontoppidan, PhD

Research Area Manager, Augmented Hearing Science