improvement of probabilistic acoustic tube model for...

IMPROVEMENT OF PROBABILISTIC ACOUSTIC TUBE MODEL FOR SPEECH DECOMPOSITION

Yang Zhang1

[email protected] Ou2

[email protected] Hasegawa-Johnson1

[email protected]

1 University of Illinois at Urbana Champaign, Urbana IL, USADepartment of Electrical and Computer Engineering

2 Tsinghua University, Beijing, ChinaDepartment of Electronic Engineering

MotivationDrawbacks of Current Model-based Methods Highlights of PAT2

• Incomplete - tend to model only a part of parameters of interest,and disregard others that might also be important.

• Speech analysis may be inaccurate or even incorrect:• Chicken and egg effect;• LPC and MFCC corrupted by spectral tilt.

• A probabilistic generative model that jointly considers all speech parameters;

• Incorporates breathiness and glottal vibration;• Incorporates phase modeling and so completely defines a

probabilistic model for the complex spectrum of speech;• Makes U/V states a continuum by introducing voiced amplitude and

unvoiced amplitude, which is closer to the nature of speech.

PAT2 Signal ModelingThe Source Filter Model with Mixed Excitation

Glottal Filter Vocal Tract Filter

Impulse train

Glottal filter

Vocal tract filter

Voiced speech

Unvoiced Excitation

Unvoiced Speech

𝑎𝑡 usually dominates 𝑏𝑡

𝑏𝑡

𝑎𝑡

𝑎𝑡 = 0

voiced excitation

𝑆𝑡 𝜔 = 𝑎𝑡𝑉𝑡 𝜔 + 𝑏𝑡𝑈𝑡 𝜔 𝐻𝑡 𝜔 ⊛ 𝑊𝑡 𝜔

𝑉𝑡 𝜔 = 𝐺𝑡 𝜔 𝑒−𝑗𝜔𝜏𝑡

𝑘

𝛿 𝜔 − 𝑘𝜔0𝑡

𝐺𝑡 𝜔 = 1 + 𝑔1𝑡 cos 𝛽𝑡𝑒−𝑗𝑤 + 𝑔1𝑡

2 𝑒−2𝑗𝜔 −1

1 + 𝑔2𝑡𝑒−𝑗𝜔 −1

magnitude & phase of the anti-causal poles

𝐻𝑡 𝜔 = exp

𝑛=1

𝐾

ℎ 𝑛 exp −𝑗𝑚 𝜔 𝑛

mel-frequency complex cepstral coefficientsWhen negative sign and group delay is removed, complex cepstrum decay at the rate of 1/ 𝑛.

Experimental ResultsVoiced Reconstruction Voiced Reconstruction – Single Frame GCI Location Estimation

Vocal Tract Filter Estimation Voiced vs Whispered

Original Spectrogram

Voiced Reconstruct

Real Spectrum

Imaginary Spectrum

PAT2 MFCC PAT2 MFCC

PAT2 Probabilistic ModelingConvert to DFT and Vectorize Convert to Mel Frequency Add Prior

𝒔𝑡 = 𝑎𝑡𝝃𝑡 + 𝑏𝑡𝜼𝑡

𝜼𝑡~𝒩 0,𝑯𝑡

𝒔𝑡 = 𝑭𝒔𝑡 = 𝑎𝑡𝑭𝝃𝑡 + 𝑏𝑡𝑭𝜼𝑡

𝒔𝑡~𝒩 𝑎𝑡𝑭𝝃𝑡, 𝑏𝑡2𝑭𝑯𝑡𝑭

𝑇

𝑃𝜃𝑡|𝜃𝑡−1𝑢|𝑣 ∝ −

𝑢 − 𝑣 2

𝜎𝜃2

Analysis with MAP

unvoiced excitation

dirac delta function

vocal tract transfer function

windowing function

glottal transfer function

group delay

speech

magnitude of the causal pole quefrency mel-frequency

𝐻𝑡

𝑉𝑡

𝑈𝑡 fundamental frequency

improvement of probabilistic acoustic tube model for...

Documents