introduction to pattern recognition

Pattern Recognition 2018-03-04

George Bebis 1

Introduction to Pattern Recognition

1

What is Pattern Recognition?

• Assign an unknown pattern to one of several known categories (or classes).

2


George Bebis 2

What is a Pattern?

• A pattern could be an object or event.

3

biometric patterns hand gesture patterns

Pattern Class

• A collection of “similar” objects.

4

Female Male


George Bebis 3

How do we model a Pattern Class?

• Typically, using a statistical model.

– probability density function (e.g., Gaussian)

5

Gender Classificationmale

female

How do we model a Pattern Class? (cont’d)

• Key challenges:

– Intra-class variability

– Inter-class variability

6

Letters/Numbers that look similar

The letter “T” in different typefaces


George Bebis 4

Pattern Recognition:Main Objectives

• Hypothesize the models that describe each pattern class (e.g., recover the process that generated the patterns).

• Given a novel pattern, choose the best-fitting model for it and then assign it to the pattern class associated with the model.

7

Classification vs Clustering

– Classification (known categories)

– Clustering (unknown categories)

8

Category “A”

Category “B”

Classification (Recognition)(Supervised Classification)

Clustering(Unsupervised Classification)


George Bebis 5

Pattern RecognitionApplications

9

Handwriting Recognition

10


George Bebis 6

License Plate Recognition

11

Biometric Recognition

12


George Bebis 7

Fingerprint Classification

13

Face Detection

14


George Bebis 8

Gender Classification

15

Balanced classes (i.e., male vs female)

Autonomous Systems

16


George Bebis 9

Medical Applications

17

Skin Cancer Detection Breast Cancer Detection

Land Classification(from aerial or satellite images)

18


George Bebis 10

“Hot” Applications

• Recommendation systems

– Amazon, Netflix

• Targeted advertising

19

The Netflix Prize

• Predict how much someone is going to enjoy a movie based on their movie preferences.

– $1M awarded in Sept. 2009

• Can software recommend movies to customers?

– Not Rambo to Woody Allen fans

– Not Saw VI if you’ve seen all previous Saw movies

20


George Bebis 11

Main Classification Approaches

x: input vector (pattern)

y: class label (class)

• Generative – Model the joint probability, p(x, y)– Make predictions by using Bayes rules to calculate p(ylx)– Pick the most likely label y

• Discriminative – Estimate p(ylx) directly (e.g., learn a direct map from inputs x to

the class labels y)– Pick the most likely label y

21

“Syntactic” Pattern Recognition Approach

• Represent patterns in terms of simple primitives.

• Describe patterns using deterministic grammars or formal languages.

22


George Bebis 12

Complexity of PR – An Example

23

Problem: Sorting

incoming fish on a

conveyor belt.

Assumption: Two

kind of fish:

(1) sea bass

(2) salmon

Pre-processing Step

24

Example

(1) Image enhancement

(2) Separate touching

or occluding fish

(3) Find the boundary of

each fish


George Bebis 13

Feature Extraction

• Assume a fisherman told us that a sea bass is generally longer than a salmon.

• We can use length as a feature and decide between sea bass and salmon according to a threshold on length.

• How should we choose the threshold?

25

“Length” Histograms

• Even though sea bass is longer than salmon on the average, there are many examples of fish where this observation does not hold.

26

threshold l*


George Bebis 14

“Average Lightness” Histograms

• Consider a different feature such as “average lightness”

• It seems easier to choose the threshold x* but we still cannot make a perfect decision.

27

threshold x*

Multiple Features

• To improve recognition accuracy, we might have to use more than one features at a time.

– Single features might not yield the best performance.

– Using combinations of features might yield better performance.

• How many features should we choose?

1

2

x

x

1

2

:

:

x lightness

x width

28


George Bebis 15

Classification

• Partition the feature space into two regions by finding the decision boundary that minimizes the error.

• How should we find the optimal decision boundary?

29

PR System – Two Phases

30

Training PhaseTest Phase


George Bebis 16

Sensors & Preprocessing

• Sensing:

– Use a sensor (camera or microphone) for data capture.

– PR depends on bandwidth, resolution, sensitivity, distortion of the sensor.

• Pre-processing:

– Removal of noise in data.

– Segmentation (i.e., isolation of patterns of interest from background).

31

Training/Test data

• How do we know that we have collected an adequately large and representative set of examples for training/testing the system?

32

Training Set Test Set ?


George Bebis 17

Feature Extraction

• How to choose a good set of features?

– Discriminative features

– Invariant features (e.g., translation, rotation and scale)

• Are there ways to automatically learn which features are best ?

33

How Many Features?

• Does adding more features always improve performance?

– It might be difficult and computationally expensive to extract certain features.

– Correlated features might not improve performance.

– “Curse” of dimensionality.

34


George Bebis 18

Curse of Dimensionality

• Adding too many features can, paradoxically, lead to a worsening of performance.– Divide each of the input features into a number of intervals, so

that the value of a feature can be specified approximately by saying in which interval it lies.

– If each input feature is divided into M divisions, then the total number of cells is Md (d: # of features).

– Since each cell must contain at least one point, the number of training data grows exponentially with d.

35

Missing Features

• Certain features might be missing (e.g., due to occlusion).

• How should we train the classifier with missing features ?

• How should the classifier make the best decision with missing features ?

36


George Bebis 19

Complexity

• We can get perfect classification performance on the training data by choosing complex models.

• Complex models are tuned to the particular training samples, rather than on the characteristics of the true model.

37

How well can the model generalize to unknown samples?

overfitting

Generalization

• Generalization is defined as the ability of a classifier to produce correct results on novel patterns.

• How can we improve generalization performance ?– More training examples (i.e., better model estimates).

– Simpler models usually yield better performance.

38

complex model simpler model


George Bebis 20

More on model complexity

39

• Consider the following 10 sample points (blue circles) assuming some noise.

• Green curve is the true function that generated the data.

• Approximate the true function from the sample points.

More on model complexity (cont’d)

40

Polynomial curve fitting: polynomials having various orders, shown as red curves, fitted to the set of 10 sample points.


George Bebis 21

More on complexity (cont’d)

41

Polynomial curve fitting: 9’th order polynomials fitted to 15 and 100 sample points.

Ensembles of Classifiers

• Performance can be improved using a "pool" of classifiers.

• How should we buildand combine different classifiers ?

42


George Bebis 22

PR System (cont’d)

• Post-processing:

– Exploit context to improve performance.

43

How m ch info mation are y u

mi sing?

Cost of miss-classifications

• Consider the fish classification example; there are two possible classification errors:

(1) Deciding the fish was a sea bass when it was a salmon.

(2) Deciding the fish was a salmon when it was a sea bass.

• Are both errors equally important ?

44


George Bebis 23

Cost of miss-classifications (cont’d)

• Suppose the fish packing company knows that:

– Customers who buy salmon will object vigorously if they see sea bass in their cans.

– Customers who buy sea bass will not be unhappy if they occasionally see some expensive salmon in their cans.

• How does this knowledge affect our decision?

45

Computational Complexity

• How does an algorithm scale with the number of:

• features

• patterns

• categories

• Consider tradeoffs between computational complexity and performance.

46


George Bebis 24

Would it be possible to build a “general purpose” PR system?

• Humans have the ability to switch rapidly and seamlessly between different pattern recognition tasks.

• It is very difficult to design a system that is capable of performing a variety of classification tasks.

– Different decision tasks may require different features.

– Different features might yield different solutions.

– Different tradeoffs exist for different tasks.

47

introduction to pattern recognition

Documents