what is pattern recognition recognizing the fish! 1

What is Pattern Recognition

Recognizing the fish!

WHAT İS PATTERN

• Structures regulated by rules• Goal:Represent empirical knowledge in

mathematical forms• the Mathematics of Perception• Need: Algebra, probability theory, graph

theory

All flows! Heraclitos

• It is only the invariance, the permanent facts, that enable us to find the meaning in a world of flux.

• We can only perceive variances• Our aim is to find the invariant laws of our

varying obserbvations

Pattern Recognition

Learning vs Recognition Learning: Find the Mathematical Model of a

class, using data Recognition: Use that model to recognize

the unknown object

Two Schools of Thought

• Statistical Pattern Recognition:

Express the image class by a random variable corresponding to class features and model the class by finding the probability density fonction

• Structural Pattern Recognition

Express the image class by a set of picture primitives. Model the class by finding the relationship among the primitives using grammars or a graphs

1. STATISTICAL PR: ASSUMPTİON

SOURCE:HypothesisClassesObljects

CHANNEL:Noisy

OBSERVATION:Multiple sensorVariations

Probability TheoryApples and Oranges

Probability Theory

Marginal Probability

Conditional ProbabilityJoint Probability

Red Urn:x1C1 samples

Blue urn:x2 c2 samples

Oranges: y1 6 1

Apples: y2 2 1

Probability Theory

Sum Rule

Product Rule

The Rules of Probability

Sum Rule

Product Rule

Bayes’ Theorem

posterior likelihood × prior

Probability Densities

Expectations

Conditional Expectation(discrete)Approximate Expectation(discrete and continuous)

Variances and Covariances

1. BASİC CONCEPTS

• A class is a set of objects having some important properties in common

• A feature extractor is a program that inputs the data (image) and extracts features that can be used in classification.

• A classifier is a program that inputs the feature vector and assigns it to one of a set of designated classes or to the “reject” class.

With what kinds of classes do you work?

Feature Vector Representation X=[x1, x2, … , xn], each

xj a real number xj may be an object

measurement xj may be count of

object parts Example: object rep.

[#holes, #strokes, moments, …]

Possible Features for char rec.

Modeling the source

Functions f(x, K) perform some computation on feature vector x

Knowledge K from training or programming is used

Final stage determines class

Two Schools of Thought

• Statistical Pattern Recognition:

Express the image class by a random variable corresponding to class features and model the class by finding the probability density fonction

• Structural Pattern Recognition

Express the image class by a set of picture primitives. Model the class by finding the relationship among the primitives using grammars or a graphs

Classifiers often used in CV

• Nearest Mean Classifier •Nearest neighbor classifier• Bayesian Classifiers •----------------------------------•Decision Tree Classifiers• Artificial Neural Net Classifiers• Support Vector Machines

1. Classification using nearest class mean

Compute the Euclidean distance between feature vector X and the mean of each class.

Choose closest class, if close enough (reject otherwise)

Nearest mean might yield poor results with complex structure

Class 2 has two modes; where is

its mean?

But if modes are detected, two subclass mean vectors can be used

Scaling coordinates by Covariance Matrix Define Mahalanobis Distance between two

vectors x-xc = [ x-xc]T C-1 [x-xc] Where covariance matrix

C = 1/N { (xi – xc)T (xi – xc)}

2. Nearest Neighbor Classification– The k – nearest-neighbor rule

• Goal: Classify x by assigning it the label most frequently represented among the k nearest samples and use a voting scheme

Pattern Classification, Chapter 4 (Part 2) 25

Pattern Classification, Chapter 4 (Part 2)

Bayes Decision Making Thomas Bayes 1701--1761

Given the training data computemax P(wi/x) i

Bayesian decision-making

Normal distribution

0 mean and unit std deviation

Table enables us to fit histograms and represent them simply

New observation of variable x can then be translated into probability

Stockman CSE803 Fall 2008 30

Parametric Models can be used

Cherry with bruise Intensities at about 750 nanometers wavelength Some overlap caused by cherry surface turning away

Information Theory:Claude Shannon 1916-2001

Goal: Find the amount of information carried by a specific value of a r.v.

Need something intuitive.

Information Theory: C. Shannon Information: giving form or shape to the mind Assumptions: Source Receiver

• Information is the quality of a message • it may be a truth or a lie, • if the amount of information in the received

message increases, the message is more accurate.• Need a common alphabet to communucate

message

Quantification of information. Given r.v. X and p(x) , what is the amount of information when we receive

an outcome of x? Self Information

h(x)= -log p (x)

Low probability Surprise High info Base e: nats Base 2: bits

Entropy: Average of self information needed to specify the state of a random variable.

Given a set of training vectors S, if there are c classes,

Entropy(S) = -pi log (pi)

Where pi is the proportion of category i examples in S.

If all examples belong to the same category, the entropy is 0.

If the examples are equally mixed (1/c examples of eachclass), the entropy is a maximum at 1.0.

e.g. for c=2, -.5 log .5 - .5 log .5 = -.5(-1) -.5(-1) = 12 2

Entropy:

Why does entropy measures information?•İt makes sense intuitively•“Nobody knows what entropy really is, so in any discussion you will always have an advan tage". Von Neumann

Entropy

2. Structural Techniques:Goal: Represent the classes by graphs

Training the system: Given the training data

Step 1: Process the image– Remove noise– binarize– Find skeleton– Normalize the size

Step 2: Extract features – Side, lake, bay

Step 3: Represent the character by a graph41

3. Represent the character by a graph:G(N, A)

Training: represent each class by a graph Recognition: Use graph similarities to assign

a label

Decision Trees:

#holes

L-W ratio#strokes #strokes

best axisdirection

#strokes

- / 1 x w 0 A 8 B

Binary decision tree

Entropy-Based Automatic Decision Tree Construction

Node 1What feature

should be used?

What values?

Training Set S x1=(f11,f12,…f1m) x2=(f21,f22, f2m) . . xn=(fn1,f22, f2m)

Quinlan suggested information gain in his ID3 systemand later the gain ratio, both based on entropy.

Information Gain

The information gain of an attribute A is the expectedreduction in entropy caused by partitioning on this attribute.

Gain(S,A) = Entropy(S) - ----- Entropy(Sv)v Values(A)

where Sv is the subset of S for which attribute A hasvalue v.

Choose the attribute A that gives the maximuminformation gain.

Information Gain (cont)

Attribute A

v1 vkv2

repeatrecursively

Information gain has the disadvantage that it prefersattributes with large number of values that split thedata into small, pure subsets.

S={sS | value(A)=v1}

Gain Ratio

Gain ratio is an alternative metric from Quinlan’s 1986paper and used in the popular C4.5 package (free!).

GainRatio(S,A) = ------------------Gain(S,a)

SplitInfo(S,A)

SplitInfo(S,A) = - ----- log ------ |Si|

where Si is the subset of S in which attribute A has its ith value.

SplitInfo measures the amount of information providedby an attribute that is not specific to the category.

Information Content

A related method of decision tree construction usinga measure called Information Content is given in thetext, with full numeric example of its use.

Artificial Neural Nets

Artificial Neural Nets (ANNs) are networks ofartificial neuron nodes, each of which computesa simple function.

An ANN has an input layer, an output layer, and“hidden” layers of nodes.

Inputs

Outputs

Node Functions

output

output = g ( xj * w(j,i) )

Function g is commonly a step function, sign function,or sigmoid function (see text).

neuron iw(1,i)

w(j,i)

Neural Net Learning

That’s beyond the scope of this text; onlysimple feed-forward learning is covered.

The most common method is called back propagation.

There are software packages available.

What do you use?

How to Measure PERFORMANCE Classification rate: Count the number of correctly classified

sample in wi , ci, find the frequencey

classification rate=ci/N

2-Class problem

Estimated Class 1 (var) Estimated Class 1 (yok)

True Class 1 (var) True hit, Correct Detection

False dismissal, false negative

True Class 2 (yok) False positiveFalse alarm,

True dismissal

Receiver Operating Curve ROC

Plots correct detection rate versus false alarm rate

Generally, false alarms go up with attempts to detect higher percentages of known objects

Presicion- Recall in CBIR or DR

Confusion matrix shows empirical performance

what is pattern recognition recognizing the fish! 1

Documents

theory of cognitive pattern recognition · 2018-09-25 ·...

pattern recognition (pr) statistical pr - university of...

statistical pattern recognition techniques for intrusion...

pattern recognition & matlab intro: pattern recognition,...

dr. ng teck khim cs4243 computer vision and pattern...

iris recognition-a biometric technology · pdf fileabstract...

multivariate pattern recognition for...

raf103f mynsturgreining pattern recognition · pattern...

1 chapter 1 introduction. 2 what is pattern recognition?...

insight : recognizing humans without face recognition

pattern recognition - uni kiel › images › teaching ›...

pattern recognition •why is pattern recognition...

pattern recognition

1 perceptual processes introduction pattern recognition...

pattern recognition •why is pattern recognition …pattern...

introduction to pattern recognition. pattern recognition

object recognizing. object classes individual recognition

ceices: a “vertical” approach towards recognizing...

recognizing characters in art history using deep...

pattern recognition: statistics to deep...