![Page 1: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/1.jpg)
HIWIRE meeting
ITC-irst
Activity report
Marco Matassoni, Piergiorgio Svaizer
March 9.-10. 2006Torino
![Page 2: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/2.jpg)
Outline
• Beamforming and Adaptive Noise Cancellation• Environmental Acoustics Estimation• Audio-Video data collection• Multi-channel pitch estimation• Fixed-platform prototype acquisition module
![Page 3: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/3.jpg)
Beamforming: D&S
Availability of multi-channel signals allows to selectively capture the desired source:
)τs(t)(~i
M
1i
1 M
ts
Issues:
• estimation of reliable TDOAs;
Method:
• CSP analysis over multiple frames
Advantages:
• robustness
• reduced computational power
![Page 4: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/4.jpg)
D&S with MarkIII
Test set:
• set N1_SNR0 of MC-TIDIGITS (cockpit noise), MarkIII channels
• clean models, trained on original TIDIGITS
Results (WRR [%]):
C_1 38.5
C_32 50.8
DS_C8 79.9
DS_C16 83.0
DS_C32 85.3
DS_C64 85.4
![Page 5: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/5.jpg)
Adaptive Noise Cancellation
A remote microphone can be used as reference for noise estimation:
+ ++ -
equivalentnoise path filter
(cockpit) noise
(beamformed) speech
Adaptive filter
noisy speech
filtered noise
denoised speech
![Page 6: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/6.jpg)
NMLS
The tested algorithm is the Normalized Mean Least Squares: iterativelly estimate a FIR filter that minimizes the difference between the primary channel and the reference
We implemented two algorithms:
• time domain
• frequency domain (subband)
![Page 7: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/7.jpg)
D&S + ANC
Test set:
• set N1_SNR0 of MC-TIDIGITS (cockpit noise), MarkIII channels
• clean models, trained on original TIDIGITS
Results (WRR):
C_32 (T) 64.7
C_32 (F) 72.4
DS_C64 (T) 81.8
DS_C64 (F) 88.4
![Page 8: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/8.jpg)
Acoustics estimation
Idea:
Simulate in a realistic way an environment (and the noise)
Method:
• Measure several impulse responses in an environment with a multi-channel equipment (through reproduction of chirp signals) preserving relative amplitudes and mutual delays;
• Generate appropriate noisy signals starting from clean data;
The derived acoustics models perform better in the given environment (also) using real data.
![Page 9: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/9.jpg)
Audio–Video Data Collection
Idea:
In a noisy environment exploit additional features from video data
(collaboration with NTUA and TUC)
Design of AV corpus:
•Task: English connected digits, HIWIRE commands/keywords
•Channels: 4 audio, 3 video
•Environment: acoustically-treated room + noise diffusion
![Page 10: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/10.jpg)
Audio–Video Setup
)))
)))
Cockpit noise )))
70-80 cm
![Page 11: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/11.jpg)
Audio–Video Setup
Audio
4 omnidirectional PZM Shure microphones, 16 kHz/16 bits
background noise diffused by 2 loudspeakers
Video
Webcam: 640x480, 30 fps – color, Unix timestamps
Stereoscopic camera pair: 640x480, 30 fps - bw or 15 fps – color, perfectly synchronous
Current data sets
• 8 speakers / connected digits
• 2 speakers / HIWIRE keyword lists
![Page 12: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/12.jpg)
Fixed prototype acquisition device
Hardware platform:
8 Shure microphones + RME Hammerfall
Software environment:
Linux, ALSA driver
Acquisition module:
• acquires synchronously multiple channels (8);
• writes (to its standard output/file) the enhanced signal + additional information/features (start/end speech hyphoteses, voiced/unvoiced, pitch, …)
![Page 13: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/13.jpg)
Multi-channel pitch analysis
The basic principle is that we can exploit many observations of the same speech processOnce located the speaker, we can take into account the different propagation time at the microphones and perform a time-alignment
Pitch analysis can be performed using:adjacent time intervals extracted from different microphone signals
Basic correlation techniques: AMDF, AUTOC, WAUTOC
![Page 14: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/14.jpg)
WAUTOC is computed for each channel, and summed over the M channels.
For a given frame:
Issues:
• Weights wi may represent the channel reliability; • Use of possible intraframe smoothing of the resulting
fundamental frequency contour, which could improve the overall accuracy
A Multichannel WAUTOC Method
M
iiiwautocwf
1
)()(
)(
fMAX
![Page 15: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/15.jpg)
Video example: distant-talking speech recognition
![Page 16: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/16.jpg)
Video example: multi-channel pitch estimation
![Page 17: HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7b5503460f94a5f8b6/html5/thumbnails/17.jpg)
Forthcoming activities
• more effective combination of beamforming and ANC;
• test also ANC before D&S beamforming;
• test post-filtering after D&S;
• audio-video collection: an improved audio/video synchronization would be advisable;
• audio-video collection: select best balance beetween quality and frame rate
• acoustically characterize the target environment (prototype);
• integrate the selected features in the multi-channel front-end;