voice separation with tiny ml on the edge · 2020-03-13 · voice separation with tiny ml on the...

Post on 12-Jul-2020

18 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Voice Separation with tiny ML on the edge

Tiny ML Summit 2020

Main collaborators:

Dr. Lars Bramsløw (Eriksholm Research Centre, Denmark)

Prof. Toumas Virtanen (University of Tampere, Finland)

Gaurav Naithani (University of Tampere, Finland)

Niels H. Pontoppidan, PhD

Research Area Manager, Augmented Hearing Science

Additional acknowledgements and references

• Thomas “Tom” Barker

• Giambattista Parascandolo

• Joonas Nikunen

• Rikke Rossing

• Atefeh Hafez

• Marianna Vatti

• Umaer Hanif

• Christian Grant

• Christian Hansen

• Bramsløw, L., Naithani, G., Hafez, A., Barker, T., Pontoppidan, N. H., & Virtanen, T. (2018). Improving competing voices segregation for hearing impaired listeners using a low-latency deep neural network algorithm. The Journal of the Acoustical Society of America, 144(1), 172–185.

• Naithani, G., Barker, T., Parascandolo, G., Bramsløw, L., Pontoppidan, N. H., & Virtanen, T. (2017). Low latency sound source separation using convolutional recurrent neural networks. 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 71–75.

• Naithani, G., Barker, T., Parascandolo, G., Bramsløw, L., Pontoppidan, N. H., & Virtanen, T. (2017). Evaluation of the benefit of neural network based speech separation algorithms with hearing impaired listeners. Proceedings of the 1st International Conference on Challenges in Hearing Assistive Technology. CHAT-17, Stockholm, Sweden.

• Naithani, G., Parascandolo, G., Barker, T., Pontoppidan, N. H., & Virtanen, T. (2016). Low-latency sound source separation using deep neural networks. 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 272–276.

• Pontoppidan, N. H., Vatti, M., Rossing, R., Barker, T., & Virtanen, T. (2016). Separating known competing voices for people with hearing loss. Proceedings of the Speech Processing in Realistic Environments Workshop, SPIRE Workshop.

• Barker, Thomas, Virtanen, T., & Pontoppidan, N. H. (2016). Hearing device comprising a low-latency sound source separation unit.

• Barker, Tom, Virtanen, T., & Pontoppidan, N. H. (2014). Hearing device comprising a low-latency sound source separation unit (Patent No. US Patent App. 14/874,641).

• Barker, Tom, Virtanen, T., & Pontoppidan, N. H. (2015). Low-latency sound-source-separation using non-negative matrix factorisation with coupled analysis and synthesis dictionaries. Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2015, 241–245.

Facts and stats about hearing aids and market

Market

• 15+ million units sold per year

• Global wholesale market of USD 4+ billion per year

• Six largest manufacturers hold a market share of +90%

• Main market: OECD countries

• 4-6% yearly unit growth mainly due to demographic

development

• Growing aging population and increasing life expectancy

Hearing-aid users

• 10% of the population in OECD countries suffer from hearing

loss

• Only 20% of people suffering from a hearing loss use a

hearing aid

• 35-40% of the population aged 65+ suffer from a hearing loss

• Average age of first-time user is 69 years (USA)

• Average age of all users is 72 years (USA)

Hearing devices

• Hearing devices help people communicate in simple and complex listening situations – also in sound environments were people with normal hearing give up using phones and headsets

• Some rely on hearing devices for a few hours a day for specific situations and many use them all awake hours

• Power 1 mA from zinc-air batteries replaced every week or Li-Ion batteries recharged every night

• Hardware design employs many low voltage and low clock-frequency methods

Enhancing segregation by transforming “mono” to “stereo”

History

1953

• Cocktail Party Problem coined by Colin Cherry

• Cherry proposes mono-to-Stereo to solve the probelm

2000

• Sam Roweis: One Microphone Source Separation at NIPS shows separation of known voices

2018

• Bramsløw et al: First time algorithms improve segregation of known voices for people with hearing loss

2020’s

• When will Tiny ML enable enhanced voice segregation in a hearing device?

Spatial augmentation

• The algorithms separates voices into

two (or more channels)

• The hearing devices increases the

spatial difference cues, i.e. repositions

the sound sources further apart

• In case of spatial audio-visual cue

conflicts, visual cues are expected to

override the auditory cues just like

with ventroqlists

m

o

n

o

Artificial stereo

Flowchart for training

DNN

training

Flowchart for processing

Enhanced segregation for people with mild/moderate hearing loss

• DNN processing

• 4 MIO weights for FDNNs (not optimized)

• 250 Hz audio frame processing rate

Bramsløw et al, JASA, 2018

Unprocessed

Ideal

How listeners with normal hearing hears two competing voices

How listeners with impaired hearing hears the two voices[The example could be harder to segregate]

How it sounds when the two voices are separated out

Focusing only on the female voice

Focusing only on the male voice

Enhanced segregation for people with mild/moderate hearing loss

• Processing requirements

• 4 MIO weights for FDNNs (not optimized)

• 250 Hz audio frame processing rate

Bramsløw et al, JASA, 2018

Unprocessed

Ideal

Next steps

Feature performance

Increasing robustness to additional noise and reverberation

Increasing robustness to personal voice changes

Break reliance on training on specific voices (transfer learning)

Further decrease network sizes from 4 MIO weights

Hardware performance

See Zuzana Jelčicová’s poster:

• Benchmarking and improving NN execution on DSP vs. custom accelerator for hearing instruments

• From float to fixed point

• Parallel MACS

• Two-step scaling

Zuzana Jelčicová: Benchmarking and improving NN execution on DSP vs. custom accelerator for hearing instruments

Voice Separation with tiny ML on the edge

Tiny ML Summit 2020

Main collaborators:

Dr. Lars Bramsløw (Eriksholm Research Centre, Denmark)

Prof. Toumas Virtanen (University of Tampere, Finland)

Gaurav Naithani (University of Tampere, Finland)

Niels H. Pontoppidan, PhD

Research Area Manager, Augmented Hearing Science

top related