linear discriminant functions wen-hung liao, 11/25/2008
TRANSCRIPT
![Page 1: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/1.jpg)
Linear Discriminant Functions
Wen-Hung Liao, 11/25/2008
![Page 2: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/2.jpg)
Introduction: LDF
Assume we know the proper form of the discriminant functions, instead of the underlying probability densities.Use samples to estimate the parameters of the classifier.(statistical or non-statistical)Will be concerned with discriminant functions that are either linear in the components of x, or linear in some given set of functions of x.
![Page 3: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/3.jpg)
Why LDF?
Simplicity vs. accuracyAttractive candidates for initial, trial classifiersRelated to neural networks
![Page 4: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/4.jpg)
Approach
Find the LDF by minimizing a criterion function. Use gradient descent procedure for
minimization Convergence property Computational complexities
Example of criterion function: Sample risk, or training error. (Not appropriate, why?) Because a small training error does not guarantee a small test error.
![Page 5: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/5.jpg)
LDF and Decision Surfaces
A linear discriminant function:
where w : weight vectorw0: bias or threshold
0)( xxg t
![Page 6: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/6.jpg)
Two-Category CaseDecision rule: Decide w1 if g(x) > 0, decide w2 if g(x)<0
In other words, x is assigned to w1 if the inner product wtx exceeds the threshold –w0.
![Page 7: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/7.jpg)
Decision Boundary
A hyperplane H defined by g(x)=0If x1 and x2 are both on the decision surface, then:
w is normal to any vector lying on the hyperplane.
0)( 21
0201
xxw
wxwwxwt
tt
![Page 8: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/8.jpg)
Distance Measure
For any x,
where xp is the normal projection of x onto H , and r is the algebraic distance.
|||| w
wrxx p
||||)||||
()( wrw
wrxwxg p
t
||||
)(
w
xgr
![Page 9: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/9.jpg)
Multi-category Case
General case:
c-1 2-class c(c-1)/2 linear discriminant
![Page 10: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/10.jpg)
Use c linear discriminants,,...1,)( 0 ciwxwxg i
tii
![Page 11: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/11.jpg)
Distance Measure
wi-wj is normal to Hij.Distance for x to Hij is given by:
|||| ji
jiij ww
ggr
![Page 12: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/12.jpg)
Quadratic DF
Add terms involving products of pairs of component of x to obtain the quadratic discriminant function:
The separating surface defined by g(x)=0 is a hyperquadric function.
ji
d
jij
d
ii
d
ii xxwxwwxg
111
0)(
![Page 13: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/13.jpg)
Hyperquadric Surfaces
If W=[wij] is not singular, then the linear terms in g(x) can be eliminated by translating the axes.Define a scale matrix:HypersphereHyperellipsoidHyperperboloid
)4( 01 wwWw
WW t
![Page 14: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/14.jpg)
Generalized LDF
Polynomial discriminant functionsGeneralized LDF:
d
iii xyaxg
ˆ
1
)()(
![Page 15: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/15.jpg)
Augment Vectors
Augment feature vector:
Augment weight vector:
Mapping a d-dimensional x-space to (d+1)-dimensional y-space
i
d
iii
d
ii xwxwwxg
01
0)(
w
w
w
w
w
a
d
0
1
0
x
x
xy
d
11
1
![Page 16: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/16.jpg)
2-Category Separable Case
Look for a weight vector that classifies all of the samples correctly. If such a weight does exist, then the samples are said to be linearly separable.
![Page 17: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/17.jpg)
Gradient Descent Procedure
Define a criterion function J(a) that is minimized if a is a solution vector.Step 1: Randomly pick a(1), and compute the gradient vector:Step 2: a(2) is obtained by moving some distance from a(1) in the direction of the steepest descent.
))1((aJ
))(()()()1( kaJkkaka
![Page 18: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/18.jpg)
Setting the Learning Rate
Second-order expansion of J(a):
Substituting
Minimized whenJHJkJkkaJkaJ t )(
2
1||||)())(())1(( 22
jiij aa
Jh
2
))(()()()1( kaJkkaka
))(())((2
1))(())(()( kaaHkaakaaJkaJaJ tt
JHJ
Jk
t
2||||
)(
![Page 19: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/19.jpg)
Newton Descent For nonsingular H
Converges faster but more difficult to compute per step.
JHkaka 1)()1(
![Page 20: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/20.jpg)
Perceptron Criterion Function
where Y(a) is the set of samples misclassified by a. Since
Update rule:
Yy
tp yaaJ )()(
Yy
p yJ )(
kYy
ykkaka )()()1(
![Page 21: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d145503460f949e8824/html5/thumbnails/21.jpg)
Convergence Proof
Refer to page 229 to 232 of textbook.