mark hasegawa-johnson jhasegaw@uiuc university of illinois at urbana-champaign, usa

35
Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson [email protected] University of Illinois at Urbana-Champaign, USA

Upload: osmond

Post on 15-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology. Mark Hasegawa-Johnson [email protected] University of Illinois at Urbana-Champaign, USA. Lecture 4: Hyperplanes, Perceptrons, and Kernel-Based Classifiers. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Landmark-Based Speech Recognition:

Spectrogram Reading,Support Vector Machines,

Dynamic Bayesian Networks,and Phonology

Mark [email protected]

University of Illinois at Urbana-Champaign, USA

Page 2: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Lecture 4: Hyperplanes, Perceptrons, and Kernel-Based Classifiers

• Definition: Hyperplane Classifier• Minimum Classification Error Training Methods

– Empirical risk– Differentiable estimates of the 0-1 loss function– Error backpropagation

• Kernel Methods– Nonparametric expression of a hyperplane– Mathematical properties of a dot product– Kernel-based classifier– The implied high-dimensional space– Error backpropagation for a kernel-based classifier

• Useful kernels– Polynomial kernel– RBF kernel

Page 3: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Classifier Terminology

Page 4: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Hyperplane Classifier

Class Boundary (“Separatrix”): The plane wTx=b

Normal Vector w

xx

xx

x

x

x

x

x

xx

xx

x

x x

xx

Origin (x=0)

Distance=b

Page 5: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Loss, Risk, and Empirical Risk

Page 6: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Empirical Risk with 0-1 Loss Function = Error Rate on Training Data

Page 7: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Differentiable Approximations of the 0-1 Loss Function: Hinge Loss

Page 8: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Differentiable Approximations of the 0-1 Loss Function: Hinge Loss

Page 9: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Differentiable Empirical Risks

Page 10: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Error Backpropagation: Hyperplane Classifier with Sigmoidal Loss

Page 11: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Sigmoidal Classifier = Hyperplane Classifier with Fuzzy Boundaries

xx

xx

x

x

x

x

x

xx

xx

x

xx

More Red

Less Red

Less Blue

More Blue

Page 12: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Error Backpropagation: Sigmoidal Classifier with Absolute Loss

Page 13: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Sigmoidal Classifier: Signal Flow Diagram

x1 x2 x3

+

Hypothesis h(x)

Input x

Sigmoid input g(x)

Connection weights ww3w2

w1

Page 14: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Multilayer Perceptron

+ + +

x1 x2 x3

+

Hypothesis h2(x)

Input h0(x)≡x

Sigmoid inputs g1(x)

Sigmoid outputs h1(x)

w133w123

w113

Connection weights w1

Sigmoid inputs g2(x)

Connection weights w1w313w312

w311

b11 b12 b13

b21

Page 15: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Multilayer Perceptron: Classification Equations

Page 16: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Error Backpropagation for a Multilayer Perceptron

Page 17: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Classification Power of a One-Layer Perceptron

Page 18: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Classification Power of a Two-Layer Perceptron

Page 19: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Classification Power of a Three-Layer Perceptron

Page 20: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Output of Multilayer Perceptron is an Approximation of Posterior Probability

Page 21: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Kernel-Based Classifiers

Page 22: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Representation of Hyperplane in terms of Arbitrary Vectors

Page 23: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Kernel-based Classifier

Page 24: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Error Backpropagation for a Kernel-Based Classifier

Page 25: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

The Implied High-Dimensional Space

Page 26: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Some Useful Kernels

Page 27: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Polynomial Kernel

Page 28: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Polynomial Kernel: Separatrix (Boundary Between Two Classes)

is a Polynomial Surface

Page 29: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Classification Boundaries Available from a Polynomial Kernel(Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004)

Page 30: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Implied Higher-Dimensional Space has a Dimension of Kd

Page 31: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

The Radial Basis Function (RBF) Kernel

Page 32: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

RBF Classifier Can Represent Any Classifier Boundary

Page 33: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

RBF Classifier Can Represent Any Classifier Boundary

(Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004)

In these figures, C was adjusted, not , but a similar effect can be achieved by setting N<<M and adjusting .

- More training corpus errors- Smoother boundary

- Fewer training corpus errors- Wigglier boundary

Page 34: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

If N<M, Gamma can Adjust Boundary Smoothness

Page 35: Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Summary• Classifier definitions

– Classifier = a function from x into y– Loss = the cost of a mistake– Risk = the expected loss– Empirical Risk = the average loss on training data

• Multilayer Perceptrons– Sigmoidal classifier is similar to hyperplane classifier with sigmoidal

loss function– Train using error backpropagation– With two hidden layers, can model any boundary (MLP is a “universal

approximator”)– MLP output is an estimate of p(y|x)

• Kernel Classifiers– Equivalent to: (1) project into (x), (2) apply hyperplane classifier– Polynomial kernel: separatrix is polynomial surface of order d– RBF kernel: separatrix can be any surface (RBF is also a “universal

approximator”)– RBF kernel: if N<M, can adjust the “wiggliness” of the separatrix