mark hasegawa-johnson jhasegaw@uiuc university of illinois at urbana-champaign, usa

Landmark-Based Speech Recognition:

Spectrogram Reading,Support Vector Machines,

Dynamic Bayesian Networks,and Phonology

Mark Hasegawa-Johnsonjhasegaw@uiuc.edu

University of Illinois at Urbana-Champaign, USA

Lecture 4: Hyperplanes, Perceptrons, and Kernel-Based Classifiers

• Definition: Hyperplane Classifier• Minimum Classification Error Training Methods

– Empirical risk– Differentiable estimates of the 0-1 loss function– Error backpropagation

• Kernel Methods– Nonparametric expression of a hyperplane– Mathematical properties of a dot product– Kernel-based classifier– The implied high-dimensional space– Error backpropagation for a kernel-based classifier

• Useful kernels– Polynomial kernel– RBF kernel

Classifier Terminology

Hyperplane Classifier

Class Boundary (“Separatrix”): The plane wTx=b

Normal Vector w

Origin (x=0)

Distance=b

Loss, Risk, and Empirical Risk

Empirical Risk with 0-1 Loss Function = Error Rate on Training Data

Differentiable Approximations of the 0-1 Loss Function: Hinge Loss

Differentiable Empirical Risks

Error Backpropagation: Hyperplane Classifier with Sigmoidal Loss

Sigmoidal Classifier = Hyperplane Classifier with Fuzzy Boundaries

More Red

Less Red

Less Blue

More Blue

Error Backpropagation: Sigmoidal Classifier with Absolute Loss

Sigmoidal Classifier: Signal Flow Diagram

x1 x2 x3

Hypothesis h(x)

Input x

Sigmoid input g(x)

Connection weights ww3w2

Multilayer Perceptron

x1 x2 x3

Hypothesis h2(x)

Input h0(x)≡x

Sigmoid inputs g1(x)

Sigmoid outputs h1(x)

w133w123

Connection weights w1

Sigmoid inputs g2(x)

Connection weights w1w313w312

b11 b12 b13

Multilayer Perceptron: Classification Equations

Error Backpropagation for a Multilayer Perceptron

Classification Power of a One-Layer Perceptron

Classification Power of a Two-Layer Perceptron

Classification Power of a Three-Layer Perceptron

Output of Multilayer Perceptron is an Approximation of Posterior Probability

Kernel-Based Classifiers

Representation of Hyperplane in terms of Arbitrary Vectors

Kernel-based Classifier

Error Backpropagation for a Kernel-Based Classifier

The Implied High-Dimensional Space

Some Useful Kernels

Polynomial Kernel

Polynomial Kernel: Separatrix (Boundary Between Two Classes)

is a Polynomial Surface

Classification Boundaries Available from a Polynomial Kernel(Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004)

Implied Higher-Dimensional Space has a Dimension of Kd

The Radial Basis Function (RBF) Kernel

RBF Classifier Can Represent Any Classifier Boundary

(Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004)

In these figures, C was adjusted, not , but a similar effect can be achieved by setting N<<M and adjusting .

- More training corpus errors- Smoother boundary

- Fewer training corpus errors- Wigglier boundary

If N<M, Gamma can Adjust Boundary Smoothness

Summary• Classifier definitions

– Classifier = a function from x into y– Loss = the cost of a mistake– Risk = the expected loss– Empirical Risk = the average loss on training data

• Multilayer Perceptrons– Sigmoidal classifier is similar to hyperplane classifier with sigmoidal

loss function– Train using error backpropagation– With two hidden layers, can model any boundary (MLP is a “universal

approximator”)– MLP output is an estimate of p(y|x)

• Kernel Classifiers– Equivalent to: (1) project into (x), (2) apply hyperplane classifier– Polynomial kernel: separatrix is polynomial surface of order d– RBF kernel: separatrix can be any surface (RBF is also a “universal

approximator”)– RBF kernel: if N<M, can adjust the “wiggliness” of the separatrix

mark hasegawa-johnson jhasegaw@uiuc university of illinois at urbana-champaign, usa

classifier boundaryhastie

average loss

sigmoidal loss functiontrain

separatrix boundary

error backpropagationwith

error rate

boundary mlp

separatrix i

Documents

mark hasegawa-johnson jhasegaw@uiuc university of illinois...

ece442-uiuc review

your comments - uiuc

uiuc slide presentation

001. hasegawa = temporal_esoinn.pdf

second examination - uiuc

lecture 24 - uiuc

speech recognition models of the interdependence among...

mark hasegawa-johnson

lead: roth (uiuc) abdelzaher (uiuc) huang (uiuc) lei (ibm)...

scott poole, uiuc; noshir contractor, northwestern; mark...

rfid fridge - uiuc

uiuc construction

hp uiuc part2

hasegawa product catalog english2015

two xepy - uiuc

portfolio for uiuc

water aliasing - uiuc

uiuc energy report

masaya hasegawa - kek