eg-348_371_09 1 multimedia communications (371) speech and image communications (348) john mason...

EG-348_371_09

1

Multimedia Communications (371) Speech and Image Communications (348)

John Mason

Engineering

Swansea University

EG-348_371_09

2

Features in speech

X1

.

.

.

.Xi

.

.

.

.

.

Acquisition

(frame: 20/30 ms & sampling F: 8khz)

Feature extraction

time

EG-348_371_09

3

Features in speech

X1

.

.

.

.Xi

.

.

.

.

.

Acquisition

(frame: 20/30 ms & sampling F: 8khz)

Feature extraction

EG-348_371_09

4

Speech production

Air fromthe lungs

Vocal fold Vocal tract Speech

0

EG-348_371_09

5

LPC Short and Long

Spectral envelop reflects morphological characteristics of the vocal tract

H1(z) H2(z)noise synthesisedSpeech

Air fromthe lungs


EG-348_371_09

6

Features: building of statistical model

T1

T2

T1

T2

T1

T2

T1

T2

T1

T2

T1

T2

T1

T2

T1

T2

T1

T2

T1

T2 T1

T2

EG-348_371_09

7

VT Shape & Some Vowels - Ladefoged ‘62

EG-348_371_09

8

Speech Processing - Applications

Why? Communications Synthesis Recognition

Speech & Speaker

How? Frame-based Systems approach

EG-348_371_09

9

Some Books

Flanagan -’Speech Analysis, Synthesis and Perception’, Springer-Verlag, - a classic!

Furui - several books on recognition Parsons - `Voice and Speech Processing’ - McGraw Hill,

one of the first text books on computer speech processing O’Shaughnessy - ‘Speech Comms - human and machine’

Addison-Wesley Rabiner & Juang - ‘Fundamentals of Speech Recognition’

Prentice Hall, 1993 Ramachandran & Mamone (eds) ‘Modern Methods of

Speech Processing’ Kluer Academic, 1995

EG-348_371_09

10

Speech Communications

Person-to-Person

Person-to-Machinespeech/speaker recognition

Machine-to-Personspeech synthesis

EG-348_371_09

11

(Electronic) Speech Communications

perhaps separated by long distance(or in time)

EG-348_371_09

12

Telephony & Broadcasting

Acoustic Air Path Acoustic Air Path

Electronic Link

l Transmission Path

EG-348_371_09

13

Speech Comms: Telephony

Electronic Link

Channel Transmission Path

MicrophoneADCAnalysisCodingTransmitter

ReceiverDecoding(re-)SynthesisDACLoudspeaker

EG-348_371_09

14

Speech Bit Rates

Message

Creation

Language

Coding

Human

Acoustic

generation

Transmission

Message

Realisation

Language

decoding

Human

Hearing

Extraction

Acoustic Space

tens hundreds thousands Tens ofthousands

Approx. bit rate in bps

EG-348_371_09

15

Criteria in Speech Comms.

Quality versus Bit-rate

Qua

lity

Excellent

Good

Fair

Poor

4 8 16 32 64 kbps

GSM

ADPCM

CELP

4 Quality Measures:intelligibility loudnessnaturalness ease-of-listening

EG-348_371_09

16

Low Bit Rate Speech CodingCompandent http://www.compandent.com/

EG-348_371_09

17

Speech Processing

The three main application areas are: Speech Comms. (the ‘electronic link’) Automatic Speech/Speaker recognition Speech Synthesis

Much of the underlying analysis is common, eg linear predictive coding

EG-348_371_09

18

What does speech look like?

EG-348_371_09

19

What does speech look like?

0 1000 2000 3000 4000 5000 6000 7000

Dynamic Range - for flexibilityand robustness

Time-varying - to convey information

EG-348_371_09

20

Frame-based Analysis

0 1000 2000 3000 4000 5000 6000 7000

To capture time variations:• 20-30 ms frames - ‘centi-second’ labeling

• spectral analysisFFTFilter-bankLinear Predictive Coding

EG-348_371_09

21

Speech Analysis/Coding

Two general cases: Waveform coders Source (voice) coders (vo-coders)

Source coders eg linear predictive coding (LPC): Model the source ie the vocal tract (VT) Linear, time varying model of VT, plus excitation

H(z)

Excitation:voiced

unvoiced

speechen sn

EG-348_371_09

22

Systems Approach

VocalTract

Excitation Speech

Voiced

Unvoiced

Model

Time VaryingParameters

Speechf0

EG-348_371_09

23

LPC Analysis/Synthesis

Synthesis: Input: Excitation output: Speech

Analysis: Input: Speech output: Excitation

H(z)hn

S(z)E(z)en sn

1/H(z) E(z)S(z)sn en

EG-348_371_09

24

‘Perfect’ Analysis/Synthesis

H(z)S(z)E(z)

en sn


Input sn and output sn are identical (within arithmetic limits)

EG-348_371_09

25

Analysis

Coding .Synthesis

De-coding

Source Coding

SnSn

LPC-based Systems (eg CELP)

1

H z( )sn en

Analysis Re-Synthesis

)(ˆ zHne sn

Practical Analysis/Synthesis

EG-348_371_09

26

Practical Analysis/Synthesis


H(z)S(z)E(z)

en sn

Transmission ReceivingSending

Parameters for Transmission :• Input / Excitation en

• Source model H(z)Thus Analysis must derive these parameters, and

Synthesis must use them to re-generate speech

EG-348_371_09

27

Principle of linear prediction: The next value (or sample) in a series, ie at time n, is predicted

or estimated by a weighted sum of previous values, ie those at time n-1, n-2, ...

Thus for a predictor of order p, we have:

s a s a s a sn n n n

1 1 2 2 3 3 ........ a sn p p

Linear Predictive Coding - LPC

EG-348_371_09

28

Linear Prediction

Transforming to the z-domain gives:( ) ( ) ( ) ...... ( )

( ) { ( ) ( ) ...... ( )}

( ) ( ) { ( ) ( ) ...... ( )}

( ) ( )

( ) ( .... )

S z a z S z a z S z a z S z


E z S z a z S z a z S z a z S z

A z S z

where A z a z a z a z

pp

pp

pp

pp

11

22

11

22

11

22

11

22

0

1

......s a s a s a s

a s

n n n p n p

i n ii

p

1 1 2 2

1

EG-348_371_09

29sn

)('1)(

)(zA

zS

zE

LPC Error Terms

Error is simply difference between predicted and actual values:

A’(z)

+ensn

e s s s a s

E z S z S z


A z S z

where A z a z a z a z

n n n n i n ii

p

pp

pp

( ) ( ) ( )

( ) { ( ) ( ) ...... ( )}

( ) ( )

( ) ( .... )

1

11

22

11

221

ˆ-

EG-348_371_09

30

Synthesis

H(z)sn

Parameters updated at frame rate

en

A’(z)

+ snen

+

NB ‘hat’ of approximation omitted for simplicity

EG-348_371_09

31

The Analysis and Synthesis must match what is needed for the Synthesis?

Answer: en - the excitation and H(z) - the system

Thus the Analysis must derive these terms (from sn ):

The speech signal, sn is analysed to give en and H(z) ie A’(z) parameters for transmission.

Analysis for Synthesis

H(z)sn

en

Synthesis

1/H(z) E(z)S(z)

sn en

Analysis

A’(z)

+

-

ensn

Analysis

EG-348_371_09

32

Derivation of LPC Coefficients - A(z)

e s s s a sn n n n i n ii

p

1

Recall:

where ai are the p prediction coefficients.The principlebehind LPC is to find a set of p coefficients, a1, a2, a3, ...ap, which in some sense minimizes the error signal en, over a frame of speech, N. This leads to a set p coefficients for each frame.

1

0

2

1

1

0

22

N

n

p

iinin

N

nnnn sasssE

EG-348_371_09

33

Derivation of A(z) – (2)

Minimisation of En is achieved by setting the p partial derivatives to zero:

02

i

n

a

E

for i = 1, 2, .… p

01

p

kjkkj rar where:

1nknjnjk ssr

From which:

In matrix form:

0 aRr rRa 1or

The matrix [R] is Toepliz symmetric, offering numerically efficient inversion techniques - Durbin’s recursion algorithm being one of the most popular.

EG-348_371_09

34

Derivation of A(z) – (3)

When N very large r is the autocorrelation coefficients of s S comes from e convolved with h (excitation & vocal tract) we are interested here in separating e and h the predictor order, p, is small to reflect the short-term periodicities

(formants) with higher predictor orders we will get the longer-term periodicities

(pitch) 2 practical problems with evaluating a:

matrix singularities in R-1

unstable resultant H(z)

in practice both are solved by windowing - shaping frame - Hamming

EG-348_371_09

35

Speech Signal Characteristics

Duration Dynamic Range Periodicities:

vocal tract pitch

Frame-based Analysis frame size: quasi-stationary

capture transitiontypically 20 - 30ms

frame rate: task dependent: more means moreband-width/computation - up to 100 frames/second

EG-348_371_09

36

Harmonic Structures and Periodicities

Harmonic Structures & Periodicities give potential for data reduction

LPC is one way of gaining this compression

Speech has two obvious separate structures

vocal tract resonances

pitch

EG-348_371_09

37


0

nenE

sase

sse

sas

in

p

iinn

nnn

in

p

iin

)( 2

1

1

ˆ

ˆ

nssn

p

Vocal tract

voicedorunvoiced

H(z)speechen sn

Tp

Short term prediction

Short Term

EG-348_371_09

38


0

nenE

sase

sse

sas

in

P

iinn

nnn

in

P

iin

)( 2

1

1

ˆ

ˆ

nssn

P

Vocal tract

voiced

unvoicedHst(z)

speechepn sn

Tp

Long term prediction

Hlt(z)

Pitchen

EG-348_371_09

39

Hst(z)snHlt(z)en ep

n

Two Structures: short-term (formants) & long-term - pitch (excitation)


eg 20ms frame160 samples @ 8Khz

ai eg p=3 ai eg p=10

Gain

k

NB Representations of these parameters are transmitted

EG-348_371_09

40

Waveform & Source Coders (Vocoders)Source Coders (Vocoders) 2 periodicities/redundancies in source

short-term (formants) long-term - pitch

Excitation en

Practical Coding Systems

Hst(z)snHlt(z)en epn

EG-348_371_09

41

‘Perfect’ Analysis/Synthesis (1)

H(z)S(z)E(z)

en sn


Input sn and output sn are identical (within arithmetic limits)

EG-348_371_09

42


H(z)S(z)E(z)

en sn


1/(1–A’(z))S(z)E(z)

en sn

1 – A’(z) E(z)S(z)sn en

1 – A’(z)sn en 1/(1–A’(z))en sn

EG-348_371_09

43


1 – A’(z)sn en 1/(1–A’(z))en sn

sn en

Z-1

Z-1

Z-1

a1

ai

ap

sn

sn

sn-1

sn-i

sn-p

+-

Note – minus sign:in Matlab combined with ai What determines p?

Original Speech Residual

p

iininnnn sassse

1

EG-348_371_09

44


1 – A’(z)sn en 1/(1–A’(z))en sn

en

Z-1

Z-1

Z-1

a1

ai

ap

sn

snen

Z-1

Z-1

Z-1

a1

ai

ap

sn-1

sn-i

sn-p

sn

sn-1

sn-i

sn-p

sn

Original Speech Residual Re-Synth.

+NoteNo minus

+-

EG-348_371_09

45

Practical System

TransmittedData Frame

H(z)S(z)E(z)

en


Input sn and output sn are “similar”

sn

What does the Transmitted Data Frame Contain?

EG-348_371_09

46

Analysis-by-Synthesis: LPAS

Integrated encoder & decoder at the encoder

Basicdecoder

Adaptiveencoder

sn

-

+

LPAS Encoder

Weighted error

EG-348_371_09

47

Log Spectral Estimates

Comparisons between frames are very important in many situations log spectral estimates are the most common (though in Comms. An

approximation is used to reduce computation)

))(log(

))(log(

1

)()(1

12/

0

2

0

2

zH

orsDFTSwhere

SSN

dwwSwSB

D

jwez

nk

N

kkk

B

In Comms, compuation is expensive and parameter vector approximations to D are used

EG-348_371_09

48

Some Standards

GSM European Cellular RPE-LTP13kb/s

FS1016 Secure Voice CELP 4.8

IS54 NA Cellular VSELP 7.95

IS96 “ QCELP 1-8

JDC-FR Japanese Cellular VSELP 6.7

JDC-HR “ PSI-CELP 3.67

G.728 (terrestrial) LD-CELP 16

EG-348_371_09

49

Low Bit Rate Speech CodingCompandent http://www.compandent.com/

EG-348_371_09

50

Criteria in Speech Comms.

Quality versus Bit-rate

Qua

lity

Excellent

Good

Fair

Poor

4 8 16 32 64 kbps

GSM

ADPCM

CELP

4 Quality Measures:intelligibility loudnessnaturalness ease-of-listening

EG-348_371_09

51

CELP eg

enHst(z)

snHlt(z)

CBIndex Gain

Long-term coefficients(pitch)

Short-term coefficients(formants)

Excitation is represented by address ie CB Index en

EG-348_371_09

52

CELP – LPAS (Encoder)

enHst(z) snHlt(z)

CBIndex

Gain

Long-term coefficients(pitch)

Short-term coefficients(formants)

Excitation is represented by address ie CB Index en

sn

snen

Basicdecoder

Adaptiveencoder

sn-

+Weighted error

EG-348_371_09

53

Conversion of LPC Parameters

• A(z) = 1 + a1 z - 1 + a2 z

- 2 + …… ap z - p and a i are to be Tx’d

• Line Spectral Frequencies (LSF) present a clever way of representing the LPC coefficients, the ai’s of A(z)

• The ai’s are floating point numbers and their accuracy is important

• Factorising A(z) tends to give complex roots in the z-domain

• LSF’s map these complex roots on to the unit circle

LSF’s Lead to efficient coding Ensure a minimum phase filter Bit errors are spectrum localised minimising loss of speech quality

z-plane jy

x

x

ws

LSF = ws . /2

EG-348_371_09

54

Line Spectral Frequencies

• Consider

P(z) = A(z) + z—(n+1) A(z—1 )

and

Q(z) = A(z) - z—(n+1) A(z—1 )

then P(z) and Q(z) lead to what is known as LSF’s

• Clearly if P(z) and Q(z) are known then A(z) can be found:A(z) = {P(z) + Q(z)} / 2

• Roots of P(z) and Q(z) lie on the unit circle in z-domain The locations give:

the LSF’s P(z) and Q(z), and whence A(z)

EG-348_371_09

55

LSF Evaluation

Consider one pair of complex roots, A1(z) :

A1(z) = 1 + a1 z -1 + a2 z

-2

P1(z) = 1 + a1 z -1 + a2 z

-2 + z -3 (1 + a1 z

1 + a2 z2 )

= (z2 + (a1 + a2 - 1) z + 1 )( z + 1 ) z –3

Q1(z) = 1 + a1 z -1 + a2 z

-2 - z -3 (1 + a1 z

1 + a2 z2 )

= (z2 + (a1 - a2 + 1) z + 1 )( z - 1 ) z -3

The roots at 0 and 1 are discarded

It follows that the LSF’s, 1 & 2 , are given by:

cos (1) = - (a1 + a2 - 1)/2

and cos (2) = - (a1 - a2 + 1)/2

Show:a1 = -(cos (1) + cos (2) ) and

a2 = (cos (2) - cos (1) +1 )

EG-348_371_09

56

LSF Test Example

A1(z) = 1 + a1 z -1 + a2 z

- 2

= (z2 + a1 z + a2 )z

- 2

= (z2 + 2 cos() wn z + wn

2 ) z - 2

where wn is radius and is angle from . So: radius = a2 & = -

Note: in P & Q all w n2 terms (of the multiple 2nd orders) are unity

EG 1: a2 = 1 then cos (1) = - (a1 + a2 - 1)/2 = - (a1)/2

roots already on circle and do not move (unstable system – not practical)

EG 2: a1 = 0 then cos (1) = - (a1 + a2 -1)/2 = - (a2 - 1)/2

cos (2) = - (a1 - a2 + 1)/2 = - (-a2 + 1)/2

so LSF’s are symmetric about /4

EG-348_371_09

57

LSF Review & Example (1)

LSF’s/LSP’s are defined as:

P(z) = A(z) + z-(n+1) A(z-1 )

and Q(z) = A(z) - z-(n+1) A(z-1 )

thus A(z) = {P(z) + Q(z)} / 2

EG-348_371_09

58

For a second order A(z)= 1 + a1 z-1 + a2 z-2

P (z) = 1 + a1 z-1 + a2 z-2 + (1 + a1 z1 + a2 z2)z-3

= (z2 + (a1 + a2 - 1)z + 1)(z + 1)z–3

Q (z) = 1 + a1 z-1 + a2 z-2 - (a1 z1 + a2 z2)z-3

= (z2 + (a1 - a2 + 1)z + 1)(z - 1 )z–3

cf: (s2 + ( 2cos()wn ) s + wn2)


EG-348_371_09

59

For a second order A(z)= 1 + a1 z-1 + a2 z-2 :

P (z) = (z2 + (a1 + a2 - 1)z + 1)(z + 1)z–3

Q (z) = (z2 + (a1 - a2 + 1)z + 1)(z - 1 )z–3

cf: (s2 + ( 2cos()wn )s + wn2)

Thus: (a1 + a2 - 1) = 2cos(1) = - 2cos(1)

&(a1 - a2 + 1) = - 2cos(2 )

So, given: i) LPC coeffs., a1 and a2 , then LSFs 1 & 2 can be found

ii) LSFs, 1 & 2 , then the LPC coeffs. a1 and a2 be found

00.20.40.60.8

1

-0.5 0 0.5 1

1

2 P(z)

Q(z)

P(z)Q(z)

2

1


EG-348_371_09

60

For a second order and with P(z) corresponding to the first root, Q(z) to the second root, and so P (z) = 1 + a1 z-1 + a2 z-2 + (1 + a1 z1 + a2 z2)z-3 = (z2 + (a1 + a2 - 1)z + 1)(z + 1)z–3 for the second pair of qi, 1.37 and 1.77

= (z2 - 2cos(1.37) z + 1 )(z + 1) z–3= (z3 +(1 - 2cos(1.37) z2 + (1 - 2cos(1.37))z + 1)z–3

LikewiseQ (z) = 1 + a1 z-1 + a2 z-2 - (a1 z1 + a2 z2)z-3

= (z2 + (a1 - a2 + 1)z + 1)(z - 1 )z–3 = (z2 - 2cos(1.77) z + 1 )(z - 1) z–3= (z3 +(-1 - 2cos(1.77) z2 + (1 + 2cos(1.77))z - 1)z–3

Then

A(z) = {P(z) + Q(z)} / 2) = (z3 + (cos(1.37) + cos(1.77))z2 + (1 - cos(1.37) + cos(1.77))z)z–3


EG-348_371_09

61

LSF Examples LPC coeffs. LSF’s

a1 a2 1 2

0 0.5 1.31812 1.82348

-1.8 0.9 0.31756 0.554811

+1.8 0.9 π-0.554811 π-0. 31756

2.2274 2.3743

-1 0 1

-1 0 1-1 0 1

EG-348_371_09

62

LSF Examples

LPC coeffs. LSF’s

a1 a2 1 2

0 0.5 1.31812 1.82348

-1.8 0.9 0.31756 0.554811

+1.8 0.9 π-0.554811

π-0. 31756

2.2274 2.3743

A(z)= 1 + a1 z-1 + a2 z-2

P (z) = 1 + a1 z-1 + a2 z-2 + (1 + a1 z1 + a2 z2)z-3

= (z2 + (a1 + a2 - 1)z + 1)(z + 1)z–3

= (z2 + (-1.8 + 0.9 - 1)z + 1)(z + 1)z–3

= (z2 - 1.9 z + 1) (z + 1)z–3

cf: (z2 + ( 2cos()wn ) z + wn2)

thus cos() = - 1.9/2 or = 2.824 and 1 = π -

= 0.318

EG-348_371_09

63

Bit allocation Voiced Unvoiced

V/U decision 1 1

Excitation 11 11

Sync 1 1

Φ1 = 0.3176 5 5

Φ2 = 0.5548 5 5

Φ3 = 1.4454 5 5

Φ4 = 1.6961 5 5

Φ 5 4 0

Φ 6 4 0

Φ 7 4 0

Φ 8 4 0

Φ 9 3 0

Φ 10 2 0

Error check 0 21

Total / frame 54 54

Example Bit Allocation

EG-348_371_09

64

Codebooks & VQ

p

N = 2L

i (0 … N-1)

Identical book

Data reduction: (p x B) to Ltime

p

time

EG-348_371_09

65

Principle representative data sets data vector is replaced / represented

by “nearest” vector, chosen from a “codebook” - a closed set of vectors

Examples LPC parameter sets Excitation as in CELP

Codebook Compression

M

N = 2 k

i

index, i

A(z)

enH(z)

sn

EG-348_371_09

66

P

Codebook Compression - CELP

H(z)sny ms eny ms

en are time domain samples (integers)

R samples per second (eg 8000 Hz)

Frame rate governs vector size

P = 2 j

Bit rate = j/y bits/ms

Codebook of time-domain samples

start point

y ms

NB en also includes gain

EG-348_371_09

67

A[z] at time t

time

Codebook Compression of H(z)

M

N = 2 k

i

index, i

Vector with M elements, every x ms

Codebook with N = 2 k vectors

Bit rate = k/x bits per ms (not a function of M)

In practice A[z] is converted to LSF’s.

x ms

EG-348_371_09

68

Codebook Generation

1) Initialise:form a single centroid of all training data, N=1

2) RepeatSplit centroids: N -> 2N Repeat

Cluster data to nearest centroiduntil convergence

until N large enough

EG-348_371_09

69

VQ Performance on Unseen Data

Ramachandran & Mamone (eds) ‘Modern Methods of Speech Processing’ Kluer Academic, 1995

EG-348_371_09

70

Ramachandran & Mamone (eds) ‘Modern Methods of Speech Processing’ Kluer Academic, 1995

VQ Performance on Unseen Data

EG-348_371_09

710 1 2 3 4 5-40

-20

0

20

40

Ma

gn

itu

de

(d

B)

Frequency (KHz) ( 0-to-Fs/2)

0 3.2 6.4 9.6 12.8 16 19.2 22.4 25.6-1

-0.5

0

0.5

1

Wav

efo

rm

Time (ms)

LPC & FFT SpectraLPC Roots -0.6651 ± 0.6695i -0.0560 ± 0.9709i 0.7228 ± 0.6225i 0.8714 ± 0.3694i 0.5758 -0.4200

2 of Q(z) 1 of P(z)

2.3743 2.2274

1.6540 1.5997

0.8261 0.6954

0.6106 0.3937

LSFs

EG-348_371_09

72

0 1 2 3 4 5-40

-20

0

20

40

Ma

gn

itu

de

(d

B)


LPC Spectra & LSF’sLPC Roots -0.6651 ± 0.6695i -0.0560 ± 0.9709i 0.7228 ± 0.6225i 0.8714 ± 0.3694i 0.5758 -0.4200

2 of Q(z) 1 of P(z)

2.3743 2.2274

1.6540 1.5997

0.8261 0.6954

0.6106 0.3937

LSFs

-1

-0.5

0

0.5

1

-1 0 1

EG-348_371_09

730 1 2 3 4 5-40

-20

0

20

40


0 3.2 6.4 9.6 12.8 16 19.2 22.4 25.6-1

-0.5

0

0.5

1

Time (ms)

A(z): 1.5537 -0.8276Roots: 0.7769 ± 0.4733i

H(0) = K (1- (1.5537 - 0.8276))

H(ws/2) = K

(1- (-1.5537 - 0.8276))

H(0) K/0.274 = = 21.8dBH(ws /2) K/ 3.38

LPC & FFT Spectra - 2nd Order

EG-348_371_09

74

GSM

Groupe Special Mobile - EU First digital cellular system in world See Hodge 1990 Based on TDMA & FDMA at 900MHz, and RPE-LPC

(ie it is an ‘LPAS’ system) Now at 1800 MHz Carriers at 200kHz Supporting 8 TDMA time slots each Time slots: 577s - 156.26 bit slots 8 time slots form 1 GSM frame of 4.62 ms Modulation: Gaussian minimum shift key 26 bit training in every time slot Round-trip delay ~ 80ms EU: GSM US: D-AMPS

EG-348_371_09

75

Other Related Topics

Spectral Lifting: H(z) = (1-az-1)

Codebook Training

Spectral Differences between 2 frames

Cepstra

Modeling Speech Space - HMM’s

EG-348_371_09

76

Pre-Emphasis Example

-8000

0

8000

-8000

0

8000

1

- 1

1

- 130ms

(a)

(b)

Figure Q1

EG-348_371_09

77

Pre-Emphasis Example

a

z-plane jy

1+a = 2

ws/2

G(ws/2) = 1 + aG(0) = 1 - a

For G(ws/2 ) > G(0) then a must be > 0

EG-348_371_09

78

1+a = 2

ws/2

0 1 2 3 4 5-30

-20

-10

0

10

20

30

40

50

Mag

nit

ud

e (d

B)


-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

Real Part

Imag

inar

y P

art

Z-plane to Magnitude Spectrum

EG-348_371_09

79

LPC Short and Long

Spectral envelop reflects morphological characteristics of the vocal tract

H1(z) H2(z)noise synthesisedSpeech

Air fromthe lungs


EG-348_371_09

80

ST & LT Prediction

1 – A’(z)sn en

Residual

1 – A’(z) e`n

Z-1

Z-1

Z-1

a1

ai

ai

sn

sn

sn-1

sn-i

sn-p

+-Z-1

Z-1

Z-1

a1

ai

ap

+-

Z-1

ap

LTP

STP

Speech

eg-348_371_09 1 multimedia communications (371) speech and image communications (348) john mason...

Documents

person speech synthesis

speech x1

enen snsn slide

speech output

speech analysiscoding

simplicity slide

information slide

speech communications