mark hasegawa-johnson jhasegaw@uiuc university of illinois at urbana-champaign, usa

Post on 15-Jan-2016

27 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology. Mark Hasegawa-Johnson jhasegaw@uiuc.edu University of Illinois at Urbana-Champaign, USA. Lecture 4: Hyperplanes, Perceptrons, and Kernel-Based Classifiers. - PowerPoint PPT Presentation

TRANSCRIPT

Landmark-Based Speech Recognition:

Spectrogram Reading,Support Vector Machines,

Dynamic Bayesian Networks,and Phonology

Mark Hasegawa-Johnsonjhasegaw@uiuc.edu

University of Illinois at Urbana-Champaign, USA

Lecture 4: Hyperplanes, Perceptrons, and Kernel-Based Classifiers

• Definition: Hyperplane Classifier• Minimum Classification Error Training Methods

– Empirical risk– Differentiable estimates of the 0-1 loss function– Error backpropagation

• Kernel Methods– Nonparametric expression of a hyperplane– Mathematical properties of a dot product– Kernel-based classifier– The implied high-dimensional space– Error backpropagation for a kernel-based classifier

• Useful kernels– Polynomial kernel– RBF kernel

Classifier Terminology

Hyperplane Classifier

Class Boundary (“Separatrix”): The plane wTx=b

Normal Vector w

xx

xx

x

x

x

x

x

xx

xx

x

x x

xx

Origin (x=0)

Distance=b

Loss, Risk, and Empirical Risk

Empirical Risk with 0-1 Loss Function = Error Rate on Training Data

Differentiable Approximations of the 0-1 Loss Function: Hinge Loss

Differentiable Approximations of the 0-1 Loss Function: Hinge Loss

Differentiable Empirical Risks

Error Backpropagation: Hyperplane Classifier with Sigmoidal Loss

Sigmoidal Classifier = Hyperplane Classifier with Fuzzy Boundaries

xx

xx

x

x

x

x

x

xx

xx

x

xx

More Red

Less Red

Less Blue

More Blue

Error Backpropagation: Sigmoidal Classifier with Absolute Loss

Sigmoidal Classifier: Signal Flow Diagram

x1 x2 x3

+

Hypothesis h(x)

Input x

Sigmoid input g(x)

Connection weights ww3w2

w1

Multilayer Perceptron

+ + +

x1 x2 x3

+

Hypothesis h2(x)

Input h0(x)≡x

Sigmoid inputs g1(x)

Sigmoid outputs h1(x)

w133w123

w113

Connection weights w1

Sigmoid inputs g2(x)

Connection weights w1w313w312

w311

b11 b12 b13

b21

Multilayer Perceptron: Classification Equations

Error Backpropagation for a Multilayer Perceptron

Classification Power of a One-Layer Perceptron

Classification Power of a Two-Layer Perceptron

Classification Power of a Three-Layer Perceptron

Output of Multilayer Perceptron is an Approximation of Posterior Probability

Kernel-Based Classifiers

Representation of Hyperplane in terms of Arbitrary Vectors

Kernel-based Classifier

Error Backpropagation for a Kernel-Based Classifier

The Implied High-Dimensional Space

Some Useful Kernels

Polynomial Kernel

Polynomial Kernel: Separatrix (Boundary Between Two Classes)

is a Polynomial Surface

Classification Boundaries Available from a Polynomial Kernel(Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004)

Implied Higher-Dimensional Space has a Dimension of Kd

The Radial Basis Function (RBF) Kernel

RBF Classifier Can Represent Any Classifier Boundary

RBF Classifier Can Represent Any Classifier Boundary

(Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004)

In these figures, C was adjusted, not , but a similar effect can be achieved by setting N<<M and adjusting .

- More training corpus errors- Smoother boundary

- Fewer training corpus errors- Wigglier boundary

If N<M, Gamma can Adjust Boundary Smoothness

Summary• Classifier definitions

– Classifier = a function from x into y– Loss = the cost of a mistake– Risk = the expected loss– Empirical Risk = the average loss on training data

• Multilayer Perceptrons– Sigmoidal classifier is similar to hyperplane classifier with sigmoidal

loss function– Train using error backpropagation– With two hidden layers, can model any boundary (MLP is a “universal

approximator”)– MLP output is an estimate of p(y|x)

• Kernel Classifiers– Equivalent to: (1) project into (x), (2) apply hyperplane classifier– Polynomial kernel: separatrix is polynomial surface of order d– RBF kernel: separatrix can be any surface (RBF is also a “universal

approximator”)– RBF kernel: if N<M, can adjust the “wiggliness” of the separatrix

top related