vector quantization lecture6new.pdf

Upload: jeysam

Post on 02-Jun-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Vector Quantization lecture6new.pdf

    1/37

    Vector Quantization and Clustering

    Introduction

    K-means clustering

    Clustering issues Hierarchical clustering

    Divisive (top-down) clustering

    Agglomerative (bottom-up) clustering

    Applications to speech recognition

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 1

    Lecture # 6Session 2003

  • 8/11/2019 Vector Quantization lecture6new.pdf

    2/37

    Acoustic Modelling

    Waveform

    SignalRepresentation

    Feature Vectors

    VectorQuantization

    Symbols

    Signal representation produces feature vector sequence

    Multi-dimensional sequence can be processed by: Methods that directly model continuous space

    Quantizing and modelling of discrete symbols

    Main advantages and disadvantages of quantization:

    Reduced storage and computation costs

    Potential loss of information due to quantization

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 2

  • 8/11/2019 Vector Quantization lecture6new.pdf

    3/37

    Vector Quantization (VQ)

    Used in signal compression, speech and image coding

    More efficient information transmission than scalar quantization(can achieve less that 1 bit/parameter)

    Used for discrete acoustic modelling since early 1980s

    Based on standard clustering algorithms:

    Individual cluster centroids are called codewords Set of cluster centroids is called a codebook

    Basic VQ is K-means clustering

    Binary VQ is a form of top-down clustering(used for efficient quantization)

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 3

  • 8/11/2019 Vector Quantization lecture6new.pdf

    4/37

    VQ & Clustering

    Clustering is an example of unsupervised learning

    Number and form of classes {C i } unknown

    Available data samples {x i } are unlabeled

    Useful for discovery of data structure before classication ortuning or adaptation of classiers

    Results strongly depend on the clustering algorithm

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 4

  • 8/11/2019 Vector Quantization lecture6new.pdf

    5/37

    Acoustic Modelling Example

    [I ]

    [n ]

    -

    [t ]

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 5

  • 8/11/2019 Vector Quantization lecture6new.pdf

    6/37

  • 8/11/2019 Vector Quantization lecture6new.pdf

    7/37

    K -Means Clustering

    Used to group data into K clusters, {C1 , . . . , CK}

    Each cluster is represented by mean of assigned data

    Iterative algorithm converges to a local optimum: Select K initial cluster means, {1 , . . . , K}

    Iterate until stopping criterion is satised:

    1. Assign each data sample to the closest cluster

    x C i , d(x , i ) d (x , j ) , i = j

    2. Update K means from assigned samples

    i = E (x ) , x C i , 1 i K

    Nearest neighbor quantizer used for unseen data

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 7

  • 8/11/2019 Vector Quantization lecture6new.pdf

    8/37

    K -Means Example: K = 3

    Random selection of 3 data samples for initial means

    Euclidean distance metric between means and samples

    0 3

    1 4

    2 56.345 Automatic Speech Recognition Vector Quantization & Clustering 8

  • 8/11/2019 Vector Quantization lecture6new.pdf

    9/37

    K -Means Properties

    Usually used with a Euclidean distance metric

    d (x , i ) = x i 2 = (x i )t (x i )

    The total distortion , D , is the sum of squared error

    D =K

    i=1 x C i

    x i 2

    D decreases between n th and n + 1 st iteration

    D (n + 1) D (n )

    Also known as Isodata, or generalized Lloyd algorithm

    Similarities with Expectation-Maximization (EM) algorithm forlearning parameters from unlabeled data

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 9

  • 8/11/2019 Vector Quantization lecture6new.pdf

    10/37

  • 8/11/2019 Vector Quantization lecture6new.pdf

    11/37

    K -Means Clustering: Stopping Criterion

    Many criterion can be used to terminate K-means :

    No changes in sample assignments

    Maximum number of iterations exceeded

    Change in total distortion, D , falls below a threshold

    1 D (n + 1)D (n )

    < T

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 11

  • 8/11/2019 Vector Quantization lecture6new.pdf

    12/37

  • 8/11/2019 Vector Quantization lecture6new.pdf

    13/37

    Clustering Issues: Number of Clusters

    In general, the number of clusters is unknown

    K = 2

    K = 4

    Dependent on clustering criterion, space, computation ordistortion requirements, or on recognition metric

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 14

  • 8/11/2019 Vector Quantization lecture6new.pdf

    14/37

    Clustering Issues: Clustering Criterion

    The criterion used to partition data into clusters plays a strong rolein determining the nal results

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 15

  • 8/11/2019 Vector Quantization lecture6new.pdf

    15/37

    Clustering Issues: Distance Metrics

    A distance metric usually has the properties:

    1. 0 d (x ,y )

    2. d (x ,y ) = 0 iff x = y

    3. d (x ,y ) = d (y ,x )

    4. d (x ,y ) d (x ,z ) + d (y ,z )

    5. d (x + z ,y + z ) = d (x ,y ) (invariant)

    In practice, distance metrics may not obey some of theseproperties but are a measure of dissimilarity

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 16

  • 8/11/2019 Vector Quantization lecture6new.pdf

    16/37

    Clustering Issues: Distance Metrics

    Distance metrics strongly inuence cluster shapes:

    Normalized dot-product: x t y x y

    Euclidean: x i 2 = (x i )t (x i )

    Weighted Euclidean: (x i )t W (x i ) (e.g., W = 1 ) Minimum distance (chain): min d (x ,x i ) , x i C i

    Representation specic . . .

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 17

  • 8/11/2019 Vector Quantization lecture6new.pdf

    17/37

    Clustering Issues: Impact of Scaling

    Scaling feature vector dimensions can signicantly impactclustering results

    Scaling can be used to normalize dimensions so a simple distancemetric is a reasonable criterion for similarity

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 18

  • 8/11/2019 Vector Quantization lecture6new.pdf

    18/37

  • 8/11/2019 Vector Quantization lecture6new.pdf

    19/37

  • 8/11/2019 Vector Quantization lecture6new.pdf

    20/37

    Hierarchical Clustering

    Clusters data into a hierarchical class structure

    Top-down (divisive) or bottom-up (agglomerative)

    Often based on stepwise-optimal , or greedy , formulation Hierarchical structure useful for hypothesizing classes

    Used to seed clustering algorithms such as K-means

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 22

  • 8/11/2019 Vector Quantization lecture6new.pdf

    21/37

  • 8/11/2019 Vector Quantization lecture6new.pdf

    22/37

    Example of Non-Uniform Divisive Clustering

    0

    1

    2

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 24

  • 8/11/2019 Vector Quantization lecture6new.pdf

    23/37

    Example of Uniform Divisive Clustering

    0

    1

    2

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 25

  • 8/11/2019 Vector Quantization lecture6new.pdf

    24/37

    Divisive Clustering Issues

    Initialization of new clusters

    Random selection from cluster samples

    Selection of member samples far from center

    Perturb dimension of maximum variance

    Perturb all dimensions slightly

    Uniform or non-uniform tree structures Cluster pruning (due to poor expansion)

    Cluster assignment (distance metric)

    Stopping criterion

    Rate of distortion decrease

    Cannot increase cluster size6.345 Automatic Speech Recognition Vector Quantization & Clustering 26

  • 8/11/2019 Vector Quantization lecture6new.pdf

    25/37

    Divisive Clustering Example: Binary VQ

    Often used to create M = 2 B size codebook(B bit codebook, codebook size M )

    Uniform binary divisive clustering used

    On each iteration each cluster is divided in two

    +i = i (1 + )

    i = i (1 )

    K-means used to determine cluster centroids Also known as LBG (Linde, Buzo, Gray) algorithm

    A more efficient version does K-means only within each binary

    split, and retains tree for efficient lookup6.345 Automatic Speech Recognition Vector Quantization & Clustering 27

  • 8/11/2019 Vector Quantization lecture6new.pdf

    26/37

    Agglomerative Clustering

    Structures N samples or seed clusters into a hierarchy

    On each iteration, the two most similar clusters are mergedtogether to form a new cluster

    After N 1 iterations, the hierarchy is complete

    Structure displayed in the form of a dendrogram

    By keeping track of the similarity score when new clusters arecreated, the dendrogram can often yield insights into the naturalgrouping of the data

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 28

  • 8/11/2019 Vector Quantization lecture6new.pdf

    27/37

  • 8/11/2019 Vector Quantization lecture6new.pdf

    28/37

    Agglomerative Clustering Issues

    Measuring distances between clusters Ci and C j with respectivenumber of tokens n i and n j

    Average distance: 1

    n i n j i,jd (x i ,x j )

    Maximum distance (compact): maxi,j

    d (x i ,x j )

    Minimum distance (chain): mini,j d (x i ,x j )

    Distance between two representative vectors of each clustersuch as their means: d ( i , j )

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 30

  • 8/11/2019 Vector Quantization lecture6new.pdf

    29/37

    Stepwise-Optimal Clustering

    Common to minimize increase in total distortion on each mergingiteration: stepwise-optimal or greedy

    On each iteration, merge the two clusters which produce the

    smallest increase in distortion Distance metric for minimizing distortion, D , is:

    n i n jn i + n j i j

    Tends to combine small clusters with large clusters before

    merging clusters of similar sizes

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 31

  • 8/11/2019 Vector Quantization lecture6new.pdf

    30/37

    Clustering for Segmentation

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 32

  • 8/11/2019 Vector Quantization lecture6new.pdf

    31/37

    Speaker Clustering

    23 female and 53 male speakers from TIMIT corpus

    Vector based on F1 and F2 averages for 9 vowels

    Distance d (C i , C j ) is average of distances between members

    2

    4

    6

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 33

  • 8/11/2019 Vector Quantization lecture6new.pdf

    32/37

    Velar Stop Allophones

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 34

  • 8/11/2019 Vector Quantization lecture6new.pdf

    33/37

  • 8/11/2019 Vector Quantization lecture6new.pdf

    34/37

    Acoustic-Phonetic Hierarchy

    Clustering of phonetic distributions across 12 clusters

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 36

  • 8/11/2019 Vector Quantization lecture6new.pdf

    35/37

    Word Clustering

    A A_ M

    A F T E R N O O N

    A I R C R A F T

    A I R P L A N E

    A M E R I C A N

    A N

    A N Y

    A T L A N T A

    A U G U S T A

    V A I L A B L E

    B A L T I M O R E

    B E

    B O O K

    B O S T O N

    C H E A P E S T

    C I T Y

    C L A S S

    C O A C H

    C O N T I N E N T A L

    C O S T

    D A L L A S

    D A L L A S

    _ F O R T_ W O R T H

    D A Y

    D E L T A

    D E N V E R

    D O L L A R S

    D O W N T O W

    N

    E A R L I E S T

    E A S T E R N

    E C O N O M Y

    E I G H T

    E I G H T Y

    E V E N I N G

    F A R E

    F A R E S

    F I F T E E N

    F I F T Y

    F I N D

    F I R S T_ C L A S S

    F I V E F

    L Y

    F O R T Y

    F O U R

    F R I D A Y

    G E

    T

    G I V E

    G O

    G R O U N D

    H U N D R E D

    I N F O R M A T I O N I T

    J U L Y

    K I N D

    K N O W

    L A T E S T L E

    A S T

    L O W E S T

    M A K E

    M A Y

    M E A L

    M E A L S

    M O N D A Y

    M O R N I N G

    M O S T

    N E E D

    N I N E

    N I N E T Y

    N O N S T O P

    N O V E M B E R

    O + C

    L O C K

    O A K L A N D

    O H

    O N E

    O N E

    _ W A Y

    P_ M

    P H I L A D E L P H I A

    P I T T S B U R G H

    P L A N E

    R E T U R N

    R O U N D

    _ T R I P

    S A N_ F R A N C I S C O

    S A T U R D A Y

    S C E N A R I O

    S E R V E D

    S E R V I C E

    S E V E N

    S E V E N T Y

    S I X

    S I X T Y

    S T O P O V E R

    S U N D A Y

    T A K E

    T E L L

    T H E R E

    T H E S E

    T H I R T Y

    T H I S

    T H O S E

    T H R E E

    T H U R S D A Y

    T I C K E T

    T I M E

    T I M E S

    T R A N S P O R T A T I O N

    T R A V E L

    T U E S D A Y

    T W E N T Y

    T W O

    T Y P E U

    _ S_ A

    I R

    U N I T E D

    U S E D

    W A N T

    W A S H I N G T O N

    W E D N E S D A Y

    W I L L

    W O U L D

    Z E R O

    N I L

    0

    2

    4

    6

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 37

  • 8/11/2019 Vector Quantization lecture6new.pdf

    36/37

    f

  • 8/11/2019 Vector Quantization lecture6new.pdf

    37/37

    References

    Huang, Acero, and Hon, Spoken Language Processing ,Prentice-Hall, 2001.

    Duda, Hart and Stork, Pattern Classication , John Wiley & Sons,

    2001. A. Gersho and R. Gray, Vector Quantization and Signal

    Compression , Kluwer Academic Press, 1992.

    R. Gray, Vector Quantization, IEEE ASSP Magazine , 1(2), 1984. B. Juang, D. Wang, A. Gray, Distortion Performance of Vector

    Quantization for LPC Voice Coding, IEEE Trans ASSP , 30(2), 1982.

    J. Makhoul, S. Roucos, H. Gish, Vector Quantization in SpeechCoding, Proc. IEEE , 73(11), 1985.

    L. Rabiner and B. Juang, Fundamentals of Speech Recognition ,Prentice-Hall, 1993.

    6.345 Automatic Speech Recognition Vector Quantization & Clustering 39