vector quantization lecture6new.pdf

8/11/2019 Vector Quantization lecture6new.pdf

1/37

Vector Quantization and Clustering

Introduction

K-means clustering

Clustering issues Hierarchical clustering

Divisive (top-down) clustering

Agglomerative (bottom-up) clustering

Applications to speech recognition

6.345 Automatic Speech Recognition Vector Quantization & Clustering 1

Lecture # 6Session 2003


2/37

Acoustic Modelling

Waveform

SignalRepresentation

Feature Vectors

VectorQuantization

Symbols

Signal representation produces feature vector sequence

Multi-dimensional sequence can be processed by: Methods that directly model continuous space

Quantizing and modelling of discrete symbols

Main advantages and disadvantages of quantization:

Reduced storage and computation costs

Potential loss of information due to quantization



3/37

Vector Quantization (VQ)

Used in signal compression, speech and image coding

More efficient information transmission than scalar quantization(can achieve less that 1 bit/parameter)

Used for discrete acoustic modelling since early 1980s

Based on standard clustering algorithms:

Individual cluster centroids are called codewords Set of cluster centroids is called a codebook

Basic VQ is K-means clustering

Binary VQ is a form of top-down clustering(used for efficient quantization)



4/37

VQ & Clustering

Clustering is an example of unsupervised learning

Number and form of classes {C i } unknown

Available data samples {x i } are unlabeled

Useful for discovery of data structure before classication ortuning or adaptation of classiers

Results strongly depend on the clustering algorithm



5/37

Acoustic Modelling Example

[I ]

[n ]

-

[t ]



6/37


7/37

K -Means Clustering

Used to group data into K clusters, {C1 , . . . , CK}

Each cluster is represented by mean of assigned data

Iterative algorithm converges to a local optimum: Select K initial cluster means, {1 , . . . , K}

Iterate until stopping criterion is satised:

1. Assign each data sample to the closest cluster

x C i , d(x , i ) d (x , j ) , i = j

2. Update K means from assigned samples

i = E (x ) , x C i , 1 i K

Nearest neighbor quantizer used for unseen data



8/37

K -Means Example: K = 3

Random selection of 3 data samples for initial means

Euclidean distance metric between means and samples

0 3

1 4

2 56.345 Automatic Speech Recognition Vector Quantization & Clustering 8


9/37

K -Means Properties

Usually used with a Euclidean distance metric

d (x , i ) = x i 2 = (x i )t (x i )

The total distortion , D , is the sum of squared error

D =K

i=1 x C i

x i 2

D decreases between n th and n + 1 st iteration

D (n + 1) D (n )

Also known as Isodata, or generalized Lloyd algorithm

Similarities with Expectation-Maximization (EM) algorithm forlearning parameters from unlabeled data



10/37


11/37

K -Means Clustering: Stopping Criterion

Many criterion can be used to terminate K-means :

No changes in sample assignments

Maximum number of iterations exceeded

Change in total distortion, D , falls below a threshold

1 D (n + 1)D (n )

< T



12/37


13/37

Clustering Issues: Number of Clusters

In general, the number of clusters is unknown

K = 2

K = 4

Dependent on clustering criterion, space, computation ordistortion requirements, or on recognition metric



14/37

Clustering Issues: Clustering Criterion

The criterion used to partition data into clusters plays a strong rolein determining the nal results



15/37

Clustering Issues: Distance Metrics

A distance metric usually has the properties:

1. 0 d (x ,y )

2. d (x ,y ) = 0 iff x = y

3. d (x ,y ) = d (y ,x )

4. d (x ,y ) d (x ,z ) + d (y ,z )

5. d (x + z ,y + z ) = d (x ,y ) (invariant)

In practice, distance metrics may not obey some of theseproperties but are a measure of dissimilarity



16/37

Clustering Issues: Distance Metrics

Distance metrics strongly inuence cluster shapes:

Normalized dot-product: x t y x y

Euclidean: x i 2 = (x i )t (x i )

Weighted Euclidean: (x i )t W (x i ) (e.g., W = 1 ) Minimum distance (chain): min d (x ,x i ) , x i C i

Representation specic . . .



17/37

Clustering Issues: Impact of Scaling

Scaling feature vector dimensions can signicantly impactclustering results

Scaling can be used to normalize dimensions so a simple distancemetric is a reasonable criterion for similarity



18/37


19/37


20/37

Hierarchical Clustering

Clusters data into a hierarchical class structure

Top-down (divisive) or bottom-up (agglomerative)

Often based on stepwise-optimal , or greedy , formulation Hierarchical structure useful for hypothesizing classes

Used to seed clustering algorithms such as K-means



21/37


22/37

Example of Non-Uniform Divisive Clustering

0

1

2



23/37

Example of Uniform Divisive Clustering

0

1

2



24/37

Divisive Clustering Issues

Initialization of new clusters

Random selection from cluster samples

Selection of member samples far from center

Perturb dimension of maximum variance

Perturb all dimensions slightly

Uniform or non-uniform tree structures Cluster pruning (due to poor expansion)

Cluster assignment (distance metric)

Stopping criterion

Rate of distortion decrease

Cannot increase cluster size6.345 Automatic Speech Recognition Vector Quantization & Clustering 26


25/37

Divisive Clustering Example: Binary VQ

Often used to create M = 2 B size codebook(B bit codebook, codebook size M )

Uniform binary divisive clustering used

On each iteration each cluster is divided in two

+i = i (1 + )

i = i (1 )

K-means used to determine cluster centroids Also known as LBG (Linde, Buzo, Gray) algorithm

A more efficient version does K-means only within each binary

split, and retains tree for efficient lookup6.345 Automatic Speech Recognition Vector Quantization & Clustering 27


26/37

Agglomerative Clustering

Structures N samples or seed clusters into a hierarchy

On each iteration, the two most similar clusters are mergedtogether to form a new cluster

After N 1 iterations, the hierarchy is complete

Structure displayed in the form of a dendrogram

By keeping track of the similarity score when new clusters arecreated, the dendrogram can often yield insights into the naturalgrouping of the data



27/37


28/37

Agglomerative Clustering Issues

Measuring distances between clusters Ci and C j with respectivenumber of tokens n i and n j

Average distance: 1

n i n j i,jd (x i ,x j )

Maximum distance (compact): maxi,j

d (x i ,x j )

Minimum distance (chain): mini,j d (x i ,x j )

Distance between two representative vectors of each clustersuch as their means: d ( i , j )



29/37

Stepwise-Optimal Clustering

Common to minimize increase in total distortion on each mergingiteration: stepwise-optimal or greedy

On each iteration, merge the two clusters which produce the

smallest increase in distortion Distance metric for minimizing distortion, D , is:

n i n jn i + n j i j

Tends to combine small clusters with large clusters before

merging clusters of similar sizes



30/37

Clustering for Segmentation



31/37

Speaker Clustering

23 female and 53 male speakers from TIMIT corpus

Vector based on F1 and F2 averages for 9 vowels

Distance d (C i , C j ) is average of distances between members

2

4

6



32/37

Velar Stop Allophones



33/37


34/37

Acoustic-Phonetic Hierarchy

Clustering of phonetic distributions across 12 clusters



35/37

Word Clustering

A A_ M

A F T E R N O O N

A I R C R A F T

A I R P L A N E

A M E R I C A N

A N

A N Y

A T L A N T A

A U G U S T A

V A I L A B L E

B A L T I M O R E

B E

B O O K

B O S T O N

C H E A P E S T

C I T Y

C L A S S

C O A C H

C O N T I N E N T A L

C O S T

D A L L A S

D A L L A S

_ F O R T_ W O R T H

D A Y

D E L T A

D E N V E R

D O L L A R S

D O W N T O W

N

E A R L I E S T

E A S T E R N

E C O N O M Y

E I G H T

E I G H T Y

E V E N I N G

F A R E

F A R E S

F I F T E E N

F I F T Y

F I N D

F I R S T_ C L A S S

F I V E F

L Y

F O R T Y

F O U R

F R I D A Y

G E

T

G I V E

G O

G R O U N D

H U N D R E D

I N F O R M A T I O N I T

J U L Y

K I N D

K N O W

L A T E S T L E

A S T

L O W E S T

M A K E

M A Y

M E A L

M E A L S

M O N D A Y

M O R N I N G

M O S T

N E E D

N I N E

N I N E T Y

N O N S T O P

N O V E M B E R

O + C

L O C K

O A K L A N D

O H

O N E

O N E

_ W A Y

P_ M

P H I L A D E L P H I A

P I T T S B U R G H

P L A N E

R E T U R N

R O U N D

_ T R I P

S A N_ F R A N C I S C O

S A T U R D A Y

S C E N A R I O

S E R V E D

S E R V I C E

S E V E N

S E V E N T Y

S I X

S I X T Y

S T O P O V E R

S U N D A Y

T A K E

T E L L

T H E R E

T H E S E

T H I R T Y

T H I S

T H O S E

T H R E E

T H U R S D A Y

T I C K E T

T I M E

T I M E S

T R A N S P O R T A T I O N

T R A V E L

T U E S D A Y

T W E N T Y

T W O

T Y P E U

_ S_ A

I R

U N I T E D

U S E D

W A N T

W A S H I N G T O N

W E D N E S D A Y

W I L L

W O U L D

Z E R O

N I L

0

2

4

6



36/37

f


37/37

References

Huang, Acero, and Hon, Spoken Language Processing ,Prentice-Hall, 2001.

Duda, Hart and Stork, Pattern Classication , John Wiley & Sons,

2001. A. Gersho and R. Gray, Vector Quantization and Signal

Compression , Kluwer Academic Press, 1992.

R. Gray, Vector Quantization, IEEE ASSP Magazine , 1(2), 1984. B. Juang, D. Wang, A. Gray, Distortion Performance of Vector

Quantization for LPC Voice Coding, IEEE Trans ASSP , 30(2), 1982.

J. Makhoul, S. Roucos, H. Gish, Vector Quantization in SpeechCoding, Proc. IEEE , 73(11), 1985.

L. Rabiner and B. Juang, Fundamentals of Speech Recognition ,Prentice-Hall, 1993.


vector quantization lecture6new.pdf

Documents