noise reduction for automatic speech recognition in ... · noise reduction for automatic speech...

1
Objective: quality of life in the ageing society Ɣ independence within one’s own residence Ɣ development of systems for assisting older people, relatives and caregivers Ɣ identification of threats Ɣ support of care structures Scenario 2: Monitoring of sports activities in prevention & rehabilitation Monitoring of relevant vital parameters Noise reduction for automatic speech recognition in ambient assisted living Background: Background: The The AAL AAL project project GAL GAL Introduction Introduction Scenario 1: Personal activity and household assistant Assisting system for everyday planning of activities and housekeeping Noise reduction using multi-microphone arrays with beamforming algorithms is a powerful means for the enhancement of speech in ambient noise. Current automatic speech recogniser (ASR) still need high SNR to perform accurately. Hence, a close distance between user and microphone of the ASR is usually required. In the ambient assistive living (AAL) project “Design of Environments for Ageing”, an acoustical interface for the interaction between users and assistive systems in their home environment is developed, including an ASR system for user input. In order to allow the users to interact with the system at any position in their home, a special, ambient system for the acquisition of acoustical signals is being developed. It consists of two spherical microphone arrays and algorithms for localisation and beamforming for SNR improvement. The noise-reduced signal is provided to the ASR system. www.altersgerechte-lebenswelten.de The Lower Saxony research network “Design of Environments for Ageing” (“Gestaltung Altersgerechter Lebenswelten” – GAL) deals with information and communication technologies for promoting and sustaining quality of life, health and self-sufficiency in the second half of life. Approach: interdisciplinary research Ɣ synergy of geriatrics, gerontology, economics, computer science, engineering, medicine, nursing science and special needs education Ɣ survey of requirements and resources Ɣ development and evaluation of exemplary assisting systems Expected Outcome: Four exemplary assisting systems Scenario 3: Sensor-based activity determination Comprehensive, automatic and continuous determination of activities at home Scenario 4: Sensor-based fall prevention and fall recognition Automatic recognition of falls and risk of falling Rainer Huber 1 , Christian Bartsch 2 and Joerg Bitzer 2 1) HörTech, Oldenburg, Germany; e-mail: [email protected] 2) Institute for Hearing Technology and Audiology (IHA), Jade University Of Applied Sciences, Oldenburg, Germany Acoustical front Acoustical front - - end of the assisting systems end of the assisting systems For the acoustical interaction between user and assisting systems, an acoustical front-end is developed. Components of the front-end Ɣ microphone array for signal acquisition Ɣ automatic source localisation Ɣ noise reduction by beamforming Ɣ signal classification for automatic event detection Ɣ automatic speech recognition Demands for application in home environments Ɣ ambient, non-intrusive integration possible (mounted microphones should be invisible / barely visible) Ɣ low-priced equipment Ɣ no microphone calibration required Conclusions Conclusions Realisation of signal acquisition technology Two spherical microphone arrays Ɣ Ø = 15 cm, 8 microphones each Ɣ mimics head shadow effect Ɣ cheap; no calibration needed Ɣ ambient integration possible (e.g. in lamp) Algorithms: Ɣ Time Delay of Arrival estimation by - Generalized Cross Correlation 1 (GCC) - Phase-Transform spectral weighting 1 of GCC Ɣ localisation: Global Coherence Field algorithm 2 Ɣ beamforming: Minimum Variance Distortion Response algorithm 3 Ɣ Voice Activity Detector 4 First results Ɣ localisation with both spheres within a radius of 30 cm at a hit rate of 90% Ɣ enhanced speech intelligibility (compared to binaural listening in original sound field) Theoretical directivity pattern of array directed to 90° Ɣ A cheap, ambient solution for signal acquisition including noise reduction in home environments could be found Ɣ First results show an enhancement of speech intelligibility benefit for ASR and hence for the AAL system is expected Acknowledgement This research was (partly) funded by grant VWZN2420 ("Lower Saxony Research Network Design of Environments for Ageing") from the Ministry for Science and Culture of Lower Saxony, Germany. References 1) Knapp, C.H., and Carter, G.C. (1976) The Generalized Correlation Method for Estimation of Time Delay. IEEE Transactions on Acoustics, Speech and Signal Processing 24(4), pp. 320–327. 2) Brutti, A., Omologo, M., and Svaizer, P. (2008) Localization of multiple speakers based on a two step acoustic map analysis. Proc. ICASSP '08, Las Vegas, USA 3) Bitzer, J. and Simmer, K. U. (2001) Superdirective microphone arrays, in Brandstein, Ward (Editors), "Microphone Arrays", Springer Verlag 4) Marzinzik, M., and Kollmeier, B. (2002) Speech pause detection for noise spectrum estimation by tracking power envelope dynamics..IEEE Transactions on Speech and Audio Processing, 10(2), p. 109-118. Positions of the 8 microphones Frequency [Hz] Damping [dB] Azimuth angle [ °] Exemplary ambient integration in lamp

Upload: others

Post on 21-Sep-2020

16 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Noise reduction for automatic speech recognition in ... · Noise reduction for automatic speech recognition in ambient assisted living ... Current automatic speech recogniser (ASR)

Objective: quality of life in the ageing society

independence within one’s own residencedevelopment of systems for assisting older people,relatives and caregiversidentification of threatssupport of care structures

Scenario 2: Monitoring of sports activities in prevention & rehabilitation

Monitoring of relevant vitalparameters

Noise reduction for automatic speech recognition in ambient assisted living

Background: Background: TheThe AAL AAL projectproject „„GALGAL““

IntroductionIntroduction

Scenario 1: Personal activity andhousehold assistant

Assisting system for everyday planningof activities andhousekeeping

Noise reduction using multi-microphone arrays with beamforming algorithms is a powerful means for the enhancement of speech in ambient noise. Current automatic speech recogniser (ASR) still need high SNR to perform accurately. Hence, a close distance between user and microphone of the ASR is usually required. In the ambient assistive living (AAL) project “Design of Environments for Ageing”, an acoustical interface for the interaction between users and assistive systems in their home environment is developed, including an ASR system for user input. In order to allow the users to interact with the system at any position in their home, a special, ambient system for the acquisition of acoustical signals is being developed. It consists of two spherical microphone arrays and algorithms for localisation and beamforming for SNR improvement. The noise-reduced signal is provided to the ASR system.

www.altersgerechte-lebenswelten.de

The Lower Saxony research network “Design of Environments for Ageing” (“Gestaltung Altersgerechter Lebenswelten” – GAL) deals with information and communication technologies for promoting and sustaining quality of life, health and self-sufficiency in the second half of life.

Approach: interdisciplinary research

synergy of geriatrics, gerontology, economics, computer science, engineering, medicine, nursing scienceand special needs educationsurvey of requirements and resourcesdevelopment and evaluation of exemplary assisting systems

Expected Outcome: Four exemplary assisting systems

Scenario 3: Sensor-based activitydetermination

Comprehensive, automatic andcontinuous determination ofactivities at home

Scenario 4: Sensor-based fall prevention and fall recognition

Automatic recognition of fallsand risk of falling

Rainer Huber1, Christian Bartsch2 and Joerg Bitzer2

1) HörTech, Oldenburg, Germany; e-mail: [email protected]) Institute for Hearing Technology and Audiology (IHA), Jade University Of Applied Sciences, Oldenburg, Germany

Acoustical frontAcoustical front--end of the assisting systemsend of the assisting systems

For the acoustical interaction between user and assisting systems, an acoustical front-end is developed.

Components of the front-end

microphone array for signal acquisition

automatic source localisation

noise reduction by beamforming

signal classification for automatic event detection

automatic speech recognition

Demands for application in home environmentsambient, non-intrusive integration possible(mounted microphones should be invisible / barelyvisible)low-priced equipment no microphone calibration required

ConclusionsConclusions

Realisation of signal acquisition technology

Two spherical microphone arrays

Ø = 15 cm, 8 microphones each

mimics head shadow effect

cheap; no calibration needed

ambient integration possible (e.g. in lamp)

Algorithms:

Time Delay of Arrival estimation by- Generalized Cross Correlation1 (GCC)

- Phase-Transform spectral weighting1 of GCC

localisation: Global Coherence Field algorithm2

beamforming: Minimum Variance DistortionResponse algorithm3

Voice Activity Detector4

First results

localisation with both spheres within a radiusof 30 cm at a hit rate of 90%

enhanced speech intelligibility (comparedto binaural listening in original sound field)

Theoretical directivity pattern of array directed to 90°

A cheap, ambient solution for signal acquisition includingnoise reduction in home environments could be found

First results show an enhancement of speech intelligibility

à benefit for ASR and hence for the AAL system is expected

Acknowledgement

This research was (partly) funded by grant VWZN2420 ("Lower Saxony Research Network Design of Environments for Ageing") from the Ministry for Science and Culture of Lower Saxony, Germany.

References1) Knapp, C.H., and Carter, G.C. (1976) The Generalized Correlation Method for

Estimation of Time Delay. IEEE Transactions on Acoustics, Speech and Signal Processing 24(4), pp. 320–327.

2) Brutti, A., Omologo, M., and Svaizer, P. (2008) Localization of multiple speakers based on a two step acoustic map analysis. Proc. ICASSP '08, Las Vegas, USA

3) Bitzer, J. and Simmer, K. U. (2001) Superdirective microphone arrays, in Brandstein, Ward (Editors), "Microphone Arrays", Springer Verlag

4) Marzinzik, M., and Kollmeier, B. (2002) Speech pause detection for noise spectrum estimation by tracking power envelope dynamics..IEEE Transactions on Speech and Audio Processing, 10(2), p. 109-118.

Positions of the 8 microphones

Frequency [Hz]

Dam

ping

[dB

]

Azimuth angle [°]

Exemplary ambient integration in lamp