[advanced] speech & audio signal processing
DESCRIPTION
[Advanced] Speech & Audio Signal Processing. ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006. State of the Art in Speech/Audio. Speech and audio processing may be divided into “low-level” and “high-level” inference - PowerPoint PPT PresentationTRANSCRIPT
[Advanced] Speech & Audio Signal Processing
ES 157/257: Speech and Audio ProcessingProf. Patrick Wolfe, Harvard DEAS
02 February 2006
State of the Art in Speech/Audio
Speech and audio processing may be divided into “low-level” and “high-level” inference Speech enhancement, compression, and
coding are all widely used technologies This low-level work is the most mature
High-level tasks will drive future advances Speech/music database information retrieval Automatic speaker and speech recognition
But low-level issues also remain…
Fundamental Questions
How to obtain highly structured representations of speech and audio signals? Time frequency “atoms”
as building blocks How can statistical inference
enable advances in speech signal processing? A means to obtain an
“atomic decomposition” Statistical modeling of time-
frequency coefficients provides a principled solution
Representative Applications
Missing data in the context of VOIP: Original Missing Restored
Source / Speaker Separation Source 1 Source 2
Mixture 1 Mixture 2
Recovery 1 Recovery 2
Digital Speech/Audio Processing
Speech Production
Time-Scale Modification
Time-Scale Modification
Male & Female Speaker Original Fast Faster Slower
Trumpet Original Fast Slow
Speech and Quasi-Periodic Audio Sinewave-based Modification Voicing-dependent Rate Factor
More Time-Scale Modification
Falling Can, Bongo Drums, Loon Original Slow
Complex Non-Speech Signals Phase-Vocoder-based Modification Event-Dependent Phase Coherence
Pitch and Vocal Tract Change
Male & Female Speaker Original Low pitch/Long vocal
tract High pitch/Short vocal
tract
Male Speaker Original and Monotone
Sinewave-based Modification
Speech Coding
Female Speaker Original CELP 8000 bps Sine 4800 bps Sine 2400 bps
Sinewave-based Code-Excited Linear Prediction
Male Speaker Original CELP 8000 bps Sine 4800 bps Sine 2400 bps
Noise Reduction
Cell Phone Noise, Cocktail Party, Automobile Noise Original Enhanced
Adaptive Wiener Filter Adaptation Based on Spectral Change
Compression
Low-noise case Original 1.5 dB Reduction 3.0 dB Reduction
Reduction of Peak-to-RMS amplitude ratio Based on Sinewave Analysis/Synthesis
High-noise case Original 1.5 dB Reduction 3.0 dB Reduction