this week: overview on pattern recognition (related to machine learning)

35
This week: overview on pattern recognition (related to machine learning)

Upload: annabella-cora-douglas

Post on 11-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: This week: overview on pattern recognition (related to machine learning)

This week: overview on pattern recognition

(related to machine learning)

Page 2: This week: overview on pattern recognition (related to machine learning)

Non-review of chapters 6/7

• Z-transforms • Convolution• Sampling/aliasing• Linear difference equations• Resonances• FIR/IIR filtering• DFT/FFT

Page 3: This week: overview on pattern recognition (related to machine learning)

Speech Pattern Recognition

• Soft pattern classification plus temporal sequence integration

• Supervised pattern classification: class labels used in training

• Unsupervised pattern classification: labels not available (or at least not used)

Page 4: This week: overview on pattern recognition (related to machine learning)
Page 5: This week: overview on pattern recognition (related to machine learning)

Training and testing

• Training: learning parameters of classifier• Testing: classify independent test set, compare

with labels, and score

Page 6: This week: overview on pattern recognition (related to machine learning)

F1 and F2 for various vowels

Page 7: This week: overview on pattern recognition (related to machine learning)

Swedish basketball players vs.

speech recognition researchers

Page 8: This week: overview on pattern recognition (related to machine learning)

Feature extraction criteria

• Class discrimination• Generalization• Parsimony (efficiency)

Page 9: This week: overview on pattern recognition (related to machine learning)
Page 10: This week: overview on pattern recognition (related to machine learning)
Page 11: This week: overview on pattern recognition (related to machine learning)

Feature vector size

• Best representations for discrimination on training set are highly dimensioned

• Best representations for generalization to test set tend to be succinct

Page 12: This week: overview on pattern recognition (related to machine learning)

Dimensionality reduction

• Principal components (i.e., SVD, KL transform, eigenanalysis ...)• Linear Discriminant Analysis (LDA)• Application-specific knowledge• Feature Selection via PR Evaluation

Page 13: This week: overview on pattern recognition (related to machine learning)
Page 14: This week: overview on pattern recognition (related to machine learning)
Page 15: This week: overview on pattern recognition (related to machine learning)

PR Methods

• Minimum distance• Discriminant Functions– Linear– Nonlinear (e.g., quadratic, neural networks)– Some aspects of each - SVMs

• Statistical Discriminant Functions

Page 16: This week: overview on pattern recognition (related to machine learning)

Minimum Distance

• Vector or matrix representing element• Define distance function• Collect examples for each class• In testing, choose the class of the closest

example• Choice of distance equivalent to an implicit

statistical assumption• Signals (e.g. speech) add temporal variability

Page 17: This week: overview on pattern recognition (related to machine learning)
Page 18: This week: overview on pattern recognition (related to machine learning)

Limitations

• Variable scale of dimensions• Variable importance of dimensions• For high dimensions, sparsely sampled space• For tough problems, resource limitations

(storage, computation, memory access)

Page 19: This week: overview on pattern recognition (related to machine learning)

Decision Rule for Min Distance

• Nearest Neighbor (NN) - in the limit of infinite samples, at most twice the error of optimum classifier• k-Nearest Neighbor (kNN)• lots of storage for large problems; potentially

large searches

Page 20: This week: overview on pattern recognition (related to machine learning)

Some Opinions

• Better to throw away bad data than to reduce its weight

• Dimensionality-reduction based on variance often a bad choice for supervised pattern recognition

• Both of these are only true sometimes

Page 21: This week: overview on pattern recognition (related to machine learning)

Discriminant Analysis

• Discriminant functions max for correct class, min for others• Decision surface between classes• Linear decision surface for 2-dim is a line, for 3 is

a plane; generally called hyperplane• For 2 classes, surface at

wTx + w0 = 0• 2-class quadratic case, surface at

xTWx + wTx + w0 = 0

Page 22: This week: overview on pattern recognition (related to machine learning)

Two prototype example

• Di2 = (x – zi)T (x – zi) = xTx + zi

Tzi – 2 xTzi

• D12 – D2

2 = 2 xTz2 – 2 xTz1 + z1Tz1 – z2

Tz2

• At decision surface, distances are equal, so• 2 xTz2 – 2 xTz1 = z2

Tz2 – z1Tz1

OrxT(z2 – z1) = ½ (z2

Tz2 – z1Tz1)

And if prototypes are all normalized to 1,Decision surface is

• xT(z2 – z1) = 0

• And each discriminant function is xTzi

Page 23: This week: overview on pattern recognition (related to machine learning)

System to discriminate between two classes

Page 24: This week: overview on pattern recognition (related to machine learning)

Training Discriminant Functions

• Minimum distance• Fisher linear discriminant• Gradient learning

Page 25: This week: overview on pattern recognition (related to machine learning)

Generalized Discriminators - ANNs

• McCulloch Pitts neural model• Rosenblatt Perceptron• Multilayer Systems

Page 26: This week: overview on pattern recognition (related to machine learning)
Page 27: This week: overview on pattern recognition (related to machine learning)
Page 28: This week: overview on pattern recognition (related to machine learning)
Page 29: This week: overview on pattern recognition (related to machine learning)

Typical unit for MLP

Σ

Page 30: This week: overview on pattern recognition (related to machine learning)
Page 31: This week: overview on pattern recognition (related to machine learning)

Typical MLP

Page 32: This week: overview on pattern recognition (related to machine learning)

Support Vector Machines (SVMs)

• High dimensional feature vectors– Transformed from simple features (e.g., polynomial)– Can potentially classify training set arbitrarily well

• Improve generalization by maximizing margin• Via “Kernel trick”, don’t need explicit high dim– Inner product function between 2 points in space

• Slack variables allow for imperfect classification

Page 33: This week: overview on pattern recognition (related to machine learning)

Maximum margin decision boundary, SVM

Page 34: This week: overview on pattern recognition (related to machine learning)

Unsupervised clustering

• Large and diverse literature• Many methods• Next time one method explained in the

context of a larger, statistical system

Page 35: This week: overview on pattern recognition (related to machine learning)

Some PR/machine learning Issues

• Testing on the training set• Training on the test set• # parameters vs # training examples: overfitting

and overtraining• For much more on machine learning, CS 281A/B