introduction to pattern recognition
TRANSCRIPT
Pattern Recognition 2018-03-04
George Bebis 1
Introduction to Pattern Recognition
1
What is Pattern Recognition?
• Assign an unknown pattern to one of several known categories (or classes).
2
Pattern Recognition 2018-03-04
George Bebis 2
What is a Pattern?
• A pattern could be an object or event.
3
biometric patterns hand gesture patterns
Pattern Class
• A collection of “similar” objects.
4
Female Male
Pattern Recognition 2018-03-04
George Bebis 3
How do we model a Pattern Class?
• Typically, using a statistical model.
– probability density function (e.g., Gaussian)
5
Gender Classificationmale
female
How do we model a Pattern Class? (cont’d)
• Key challenges:
– Intra-class variability
– Inter-class variability
6
Letters/Numbers that look similar
The letter “T” in different typefaces
Pattern Recognition 2018-03-04
George Bebis 4
Pattern Recognition:Main Objectives
• Hypothesize the models that describe each pattern class (e.g., recover the process that generated the patterns).
• Given a novel pattern, choose the best-fitting model for it and then assign it to the pattern class associated with the model.
7
Classification vs Clustering
– Classification (known categories)
– Clustering (unknown categories)
8
Category “A”
Category “B”
Classification (Recognition)(Supervised Classification)
Clustering(Unsupervised Classification)
Pattern Recognition 2018-03-04
George Bebis 5
Pattern RecognitionApplications
9
Handwriting Recognition
10
Pattern Recognition 2018-03-04
George Bebis 6
License Plate Recognition
11
Biometric Recognition
12
Pattern Recognition 2018-03-04
George Bebis 7
Fingerprint Classification
13
Face Detection
14
Pattern Recognition 2018-03-04
George Bebis 8
Gender Classification
15
Balanced classes (i.e., male vs female)
Autonomous Systems
16
Pattern Recognition 2018-03-04
George Bebis 9
Medical Applications
17
Skin Cancer Detection Breast Cancer Detection
Land Classification(from aerial or satellite images)
18
Pattern Recognition 2018-03-04
George Bebis 10
“Hot” Applications
• Recommendation systems
– Amazon, Netflix
• Targeted advertising
19
The Netflix Prize
• Predict how much someone is going to enjoy a movie based on their movie preferences.
– $1M awarded in Sept. 2009
• Can software recommend movies to customers?
– Not Rambo to Woody Allen fans
– Not Saw VI if you’ve seen all previous Saw movies
20
Pattern Recognition 2018-03-04
George Bebis 11
Main Classification Approaches
x: input vector (pattern)
y: class label (class)
• Generative – Model the joint probability, p(x, y)– Make predictions by using Bayes rules to calculate p(ylx)– Pick the most likely label y
• Discriminative – Estimate p(ylx) directly (e.g., learn a direct map from inputs x to
the class labels y)– Pick the most likely label y
21
“Syntactic” Pattern Recognition Approach
• Represent patterns in terms of simple primitives.
• Describe patterns using deterministic grammars or formal languages.
22
Pattern Recognition 2018-03-04
George Bebis 12
Complexity of PR – An Example
23
Problem: Sorting
incoming fish on a
conveyor belt.
Assumption: Two
kind of fish:
(1) sea bass
(2) salmon
Pre-processing Step
24
Example
(1) Image enhancement
(2) Separate touching
or occluding fish
(3) Find the boundary of
each fish
Pattern Recognition 2018-03-04
George Bebis 13
Feature Extraction
• Assume a fisherman told us that a sea bass is generally longer than a salmon.
• We can use length as a feature and decide between sea bass and salmon according to a threshold on length.
• How should we choose the threshold?
25
“Length” Histograms
• Even though sea bass is longer than salmon on the average, there are many examples of fish where this observation does not hold.
26
threshold l*
Pattern Recognition 2018-03-04
George Bebis 14
“Average Lightness” Histograms
• Consider a different feature such as “average lightness”
• It seems easier to choose the threshold x* but we still cannot make a perfect decision.
27
threshold x*
Multiple Features
• To improve recognition accuracy, we might have to use more than one features at a time.
– Single features might not yield the best performance.
– Using combinations of features might yield better performance.
• How many features should we choose?
1
2
x
x
1
2
:
:
x lightness
x width
28
Pattern Recognition 2018-03-04
George Bebis 15
Classification
• Partition the feature space into two regions by finding the decision boundary that minimizes the error.
• How should we find the optimal decision boundary?
29
PR System – Two Phases
30
Training PhaseTest Phase
Pattern Recognition 2018-03-04
George Bebis 16
Sensors & Preprocessing
• Sensing:
– Use a sensor (camera or microphone) for data capture.
– PR depends on bandwidth, resolution, sensitivity, distortion of the sensor.
• Pre-processing:
– Removal of noise in data.
– Segmentation (i.e., isolation of patterns of interest from background).
31
Training/Test data
• How do we know that we have collected an adequately large and representative set of examples for training/testing the system?
32
Training Set Test Set ?
Pattern Recognition 2018-03-04
George Bebis 17
Feature Extraction
• How to choose a good set of features?
– Discriminative features
– Invariant features (e.g., translation, rotation and scale)
• Are there ways to automatically learn which features are best ?
33
How Many Features?
• Does adding more features always improve performance?
– It might be difficult and computationally expensive to extract certain features.
– Correlated features might not improve performance.
– “Curse” of dimensionality.
34
Pattern Recognition 2018-03-04
George Bebis 18
Curse of Dimensionality
• Adding too many features can, paradoxically, lead to a worsening of performance.– Divide each of the input features into a number of intervals, so
that the value of a feature can be specified approximately by saying in which interval it lies.
– If each input feature is divided into M divisions, then the total number of cells is Md (d: # of features).
– Since each cell must contain at least one point, the number of training data grows exponentially with d.
35
Missing Features
• Certain features might be missing (e.g., due to occlusion).
• How should we train the classifier with missing features ?
• How should the classifier make the best decision with missing features ?
36
Pattern Recognition 2018-03-04
George Bebis 19
Complexity
• We can get perfect classification performance on the training data by choosing complex models.
• Complex models are tuned to the particular training samples, rather than on the characteristics of the true model.
37
How well can the model generalize to unknown samples?
overfitting
Generalization
• Generalization is defined as the ability of a classifier to produce correct results on novel patterns.
• How can we improve generalization performance ?– More training examples (i.e., better model estimates).
– Simpler models usually yield better performance.
38
complex model simpler model
Pattern Recognition 2018-03-04
George Bebis 20
More on model complexity
39
• Consider the following 10 sample points (blue circles) assuming some noise.
• Green curve is the true function that generated the data.
• Approximate the true function from the sample points.
More on model complexity (cont’d)
40
Polynomial curve fitting: polynomials having various orders, shown as red curves, fitted to the set of 10 sample points.
Pattern Recognition 2018-03-04
George Bebis 21
More on complexity (cont’d)
41
Polynomial curve fitting: 9’th order polynomials fitted to 15 and 100 sample points.
Ensembles of Classifiers
• Performance can be improved using a "pool" of classifiers.
• How should we buildand combine different classifiers ?
42
Pattern Recognition 2018-03-04
George Bebis 22
PR System (cont’d)
• Post-processing:
– Exploit context to improve performance.
43
How m ch info mation are y u
mi sing?
Cost of miss-classifications
• Consider the fish classification example; there are two possible classification errors:
(1) Deciding the fish was a sea bass when it was a salmon.
(2) Deciding the fish was a salmon when it was a sea bass.
• Are both errors equally important ?
44
Pattern Recognition 2018-03-04
George Bebis 23
Cost of miss-classifications (cont’d)
• Suppose the fish packing company knows that:
– Customers who buy salmon will object vigorously if they see sea bass in their cans.
– Customers who buy sea bass will not be unhappy if they occasionally see some expensive salmon in their cans.
• How does this knowledge affect our decision?
45
Computational Complexity
• How does an algorithm scale with the number of:
• features
• patterns
• categories
• Consider tradeoffs between computational complexity and performance.
46
Pattern Recognition 2018-03-04
George Bebis 24
Would it be possible to build a “general purpose” PR system?
• Humans have the ability to switch rapidly and seamlessly between different pattern recognition tasks.
• It is very difficult to design a system that is capable of performing a variety of classification tasks.
– Different decision tasks may require different features.
– Different features might yield different solutions.
– Different tradeoffs exist for different tasks.
47