rasta processing of speech
DESCRIPTION
A presentation of Hermansky & Morgan's 1994 paper, RASTA Processing of Speech. Learn the dramatic effect of RASTA on critical band analysis when combined with PLP to do speech detection! Hermansky, Hynek, and Nelson Morgan. "RASTA processing of speech." Speech and Audio Processing, IEEE Transactions on 2.4 (1994): 578-589.TRANSCRIPT
![Page 1: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/1.jpg)
RASTA Processing of SpeechHynek Hermansky & Nelson Morgan
![Page 2: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/2.jpg)
The Question
Stochastic techniques to derive information from sound seems wasteful, especially since non-speech components have a predictable effect on speech signal.
Can we suppress spectral components that change too quickly or slowly to be speech?
![Page 3: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/3.jpg)
The Answer
RASTA - much like human listeners, isolates not the speech components, but the relative spectral changes in order to reduce slowly changing or steady state factors (noise!). This emphasizes changes/“edges”.
![Page 4: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/4.jpg)
Quick disclaimer: we definitely know what we’re talking about
![Page 5: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/5.jpg)
Edge Detection
![Page 6: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/6.jpg)
Inspiration
Humans can perceive speech like sounds depending on the spectral difference between the current sound and the preceding sound.
![Page 7: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/7.jpg)
Sounds!
An analogous situation might occur in time-reversed speech:
Intelligibility of Time Reversed Speech
![Page 9: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/9.jpg)
Filters
![Page 10: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/10.jpg)
More Sounds!
What band pass filters sound like from Chris’ experiments.
![Page 11: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/11.jpg)
Speech Processing Reviewhttp://www.learnartificialneuralnetworks.com/images/srfig01.jpg
![Page 12: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/12.jpg)
Speech Processing Reviewhttp://www.learnartificialneuralnetworks.com/images/srfig01.jpg
![Page 13: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/13.jpg)
Perceptual Linear Predictionhttp://svr-www.eng.cam.ac.uk/~ajr/SA95/img181.gif
![Page 14: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/14.jpg)
Replace conventional critical-band short term spectrum in PLP analysis with spectral estimate from frequencies band-pass filtered via a sharp spectral zero.
New estimate is less sensitive to variations.
The RASTA Method
![Page 15: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/15.jpg)
1. Compute critical-band power spectrum (PLP)2. Transform spectral amplitude through compressing static
nonlinear transformation (RASTA)3. Filter the time trajectory of each transformed spectral
component (RASTA)4. Transform the filtered speech representation through
expanding static nonlinear transformation (RASTA)5. Multiply by the equal loudness curve and exponentiate by
0.33 to simulate hearing (PLP)6. Compute an all-pole model of the result (PLP)
RASTA-PLP
![Page 16: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/16.jpg)
The Key→ suppress constant factors in the auditory-like spectrum, prior to estimation of language model.
Research issues:- What domain is filtering in?- What filter to use?
Speech Signal
Spectral Analysis
Bank of Compressing Static Nonlinearities
Bank of Linear Bandpass Filters
Bank of Expanding Static Nonlinearities
Continued Processing
![Page 17: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/17.jpg)
For this paper: an IIR filter with this transfer function
![Page 18: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/18.jpg)
Resulting Filter
![Page 19: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/19.jpg)
- Affects choice of compressing/expanding static nonlinear function (The domain):
1. Logarithmic2. Lin-Log
Two Flavors of RASTA
![Page 20: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/20.jpg)
Logarithmic Amplitude Transformation (step 2)Antilogarithmic (exponential) transformation (step 4)
![Page 21: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/21.jpg)
Natural Logarithm dependent on J, a signal-dependent positive constant that is linear like for J < 1 and logarithmic like for J > 1
J=0.1
J=1.0
![Page 22: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/22.jpg)
Results
![Page 23: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/23.jpg)
Digits recorded over phone lines, with or without noise or changes in noise over time
Isolated Digits Recognition
![Page 24: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/24.jpg)
Large Vocab Continuous Speech
Four speakers each reading 2,652 sentencesSentences were preserved as recorded or had a low-pass filter applied to them
![Page 25: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/25.jpg)
Next Experiments
● Let’s train the model in with no noise and then test it in a situation with noise in the background
● Analogous to software assembled in the factory and used in the real world
![Page 26: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/26.jpg)
● RASTA > PLP when noise changes between training and test
● Success of RASTA depends on transform of signal
Isolated Digits Recognition
![Page 27: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/27.jpg)
Large Vocab Continuous Speech
● Again, success depends on filter used
![Page 28: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/28.jpg)
Optimizing J
● It seems important, then, to pick an appropriate J = domain parameter, for each level of noise
● This can be approximated by measuring energy at the first part of an utterance
● Performance improves even more!
![Page 29: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/29.jpg)
Consequences of RASTA Processing
● Most important advance of RASTA: compare current information to previous information
● This highlights transitions and changes → edge detection!
![Page 30: Rasta processing of speech](https://reader031.vdocuments.us/reader031/viewer/2022013105/5552c20fb4c90581158b4849/html5/thumbnails/30.jpg)