1 pattern classification x. 2 content general method k nearest neighbors decision trees nerual...
TRANSCRIPT
1
Pattern Classification
X
2
Content
General Method K Nearest Neighbors Decision Trees Nerual Networks
General Method
Training Learning knowledge or parameters
Testing Applying learned to new instance
3
5
K Nearest Neighbors
K Nearest Neighbors Advantage
Nonparametric architecture Simple Powerful Requires no training time
Disadvantage Memory intensive Classification/estimation is slow
6
K Nearest Neighbors
The key issues involved in training this model includes setting the variable K
Validation techniques (ex. Cross validation) the type of distant metric
Euclidean measure
2
1
)(),(
D
i
YiXiYXDist
7
Figure K Nearest Neighbors Example
X
Stored training set patternsX input pattern for classification--- Euclidean distance measure to the nearest three patterns
8
Store all input data in the training set
For each pattern in the test set
Search for the K nearest patterns to the input pattern using a Euclidean distance measure
For classification, compute the confidence for each class as Ci /K,
(where Ci is the number of patterns among the K nearest patterns belonging to class i.)
The classification for the input pattern is the class with the highest confidence.
9
Training parameters and typical settings
Number of nearest neighbors The numbers of nearest neighbors (K) should be
based on cross validation over a number of K setting.
When k=1 is a good baseline model to benchmark against.
A good rule-of-thumb numbers is k should be less than the square root of the total number of training patterns.
10
Training parameters and typical settings
Input compression Since KNN is very storage intensive, we may
want to compress data patterns as a preprocessing step before classification.
Using input compression will result in slightly worse performance.
Sometimes using compression will improve performance because it performs automatic normalization of the data which can equalize the effect of each input in the Euclidean distance measure.
11 CPC group Seminar Thursday, June 1, 2006
Euclidean distance metric fails
Pattern to be classified Prototype A Prototype B
Prototype B seems more similar than Prototype A according to Euclidean distance.
Digit “9” misclassified as “4”.
Possible solution is to use an distance metric invariant to irrelevant transformations.
12
Decision trees
Decision trees are popular for pattern recognition because the models they produce are easier to understand.
Root node
A A
B B B B
A. Nodes of the tree
B. Leaves (terminal nodes) of the tree
C. Branches (decision point) of the tree
C
13
Decision trees-Binary decision trees
Classification of an input vector is done by traversing the tree beginning at the root node, and ending the leaf.
Each node of the tree computes an inequality (ex. BMI<24, yes or no) based on a single input variable.
Each leaf is assigned to a particular class.
Yes No
Yes No
NoYes
BMI<24
14
Decision trees-Binary decision trees
Since each inequality that is used to split the input space is only based on one input variable.
Each node draws a boundary that can be geometrically interpreted as a hyperplane perpendicular to the axis.
B C
15
Decision trees-Linear decision trees
Linear decision trees are similar to binary decision trees, except that the inequality computed at each node takes on an arbitrary linear from that may depend on multiple variables.
aX1+bX2Yes No
Yes No
NoYes
Biological Neural Systems
Neuron switching time : > 10-3 secs Number of neurons in the human brain: ~1010
Connections (synapses) per neuron : ~104–105
Face recognition : 0.1 secs High degree of distributed and parallel computation
Highly fault tolerent Highly efficient Learning is key
Excerpt from Russell and Norvig
A Neuron
Computation: input signals input function(linear) activation
function(nonlinear) output signal
ajoutput links
ak
outputInput links
Wk
j
ai = output(inj)
inj
j
kkjj IWin *
Part 1. Perceptrons: Simple NN
x1
x2
xn
.
.
.
w1
w2
wn
a=i=1n wi xi
Xi’s range: [0, 1]
1 if a y= 0 if a <
y
{
inputs
weights
activationoutput
Decision Surface of a Perceptron
x1
x2
Decision line
w1 x1 + w2 x2 = w1
1 1
0
0
00
0
1
Linear Separability
x1
x2
10
0 0
Logical AND
x1 x2 a y
0 0 0 0
0 1 1 0
1 0 1 0
1 1 2 1
w1=1w2=1=1.5
x1
10
0
w1=?w2=?= ?
1
Logical XOR
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 0
Threshold as Weight: W0
x1
x2
xn
.
.
.
w1
w2
wn
w0
x0=-1
a= i=0n wi xi
y
1 if a y= 0 if a <{
=w0
Thus, y= sgn(a)=0 or 1
Perceptron Learning Rule
w’=w + (t-y) x
wi := wi + wi = wi + (t-y) xi (i=1..n) The parameter is called the learning rate.
In Han’s book it is lower case L It determines the magnitude of weight updates wi .
If the output is correct (t=y) the weights are not changed (wi =0).
If the output is incorrect (t y) the weights wi are changed such that the output of the Perceptron for the new weights w’i is closer/further to the input xi.
Perceptron Training Algorithm
Repeatfor each training vector pair (x,t)
evaluate the output y when x is the inputif yt then
form a new weight vector w’ accordingto w’=w + (t-y) x
else do nothing
end if end forUntil y=t for all training vector pairs or # iterations > k
Perceptron Learning Examplet=1
t=-1
w=[0.25 –0.1 0.5]x2 = 0.2 x1 – 0.5
o=1
o=-1
(x,t)=([-1,-1],1)o=sgn(0.25+0.1-0.5) =-1
w=[0.2 –0.2 –0.2]
(x,t)=([2,1],-1)o=sgn(0.45-0.6+0.3) =1
w=[-0.2 –0.4 –0.2]
(x,t)=([1,1],1)o=sgn(0.25-0.7+0.1) =-1
w=[0.2 0.2 0.2]
Part 2. Multi Layer Networks
Output nodes
Input nodes
Hidden nodes
Output vector
Input vector
Can use multi layer to learn nonlinear functions
How to set the weights?
x1
10
0
w1=?w2=?= ?
1
Logical XOR
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 0
x1
x2
3
4
5
w23
w35
28
End