object orie’d data analysis, last time classification / discrimination classical statistical...
TRANSCRIPT
![Page 1: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/1.jpg)
Object Orie’d Data Analysis, Last Time
Classification / Discrimination
• Classical Statistical Viewpoint– FLD “good”
– GLR “better”
– Conclude always do GLR
• No longer true for HDLSS data– GLR fails
– FLD gave strange effects
![Page 2: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/2.jpg)
HDLSS DiscriminationMovie Through Increasing Dimensions
![Page 3: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/3.jpg)
HDLSS DiscriminationSimple Solution:
Mean Difference (Centroid) Method• Recall not classically recommended
– Usually no better than FLD– Sometimes worse
• But avoids estimation of covariance• Means are very stable• Don’t feel HDLSS problem
![Page 4: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/4.jpg)
HDLSS DiscriminationMean Difference (Centroid) Method
Same Data,
Movie overdim’s
![Page 5: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/5.jpg)
HDLSS DiscriminationMean Difference (Centroid) Method• Far more stable over dimensions• Because is likelihood ratio solution
(for known variance - Gaussians)• Doesn’t feel HDLSS boundary• Eventually becomes too good?!?
Widening gap between clusters?!?• Careful: angle to optimal grows• So lose generalizability (since noise
inc’s)HDLSS data present some odd effects…
![Page 6: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/6.jpg)
Maximal Data Piling
Strange FLD effect at HDLSS boundary:
Data Piling:
For each class,
all data
project to
single
value
![Page 7: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/7.jpg)
Maximal Data PilingWhat is happening?
• Hard to imagine
• Since our intuition is 3-dim’al
• Came from our ancestors…
Try to understand data piling with
some simple examples
![Page 8: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/8.jpg)
Maximal Data PilingSimple example (Ahn & Marron 2005):
in
Let be the hyperplane:
• Generated by Class +1
• Which has dimension = 1
• I.e. line containing the 2 points
Similarly, let be the hyperplane
• Generated by Class -1
2 nn 3~
~
![Page 9: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/9.jpg)
Maximal Data PilingSimple example: in
Let be
• Parallel shifts of
• So that they pass through the origin
• Still have dimension 1
• But now are subspaces
2 nn 3
,
~,~
![Page 10: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/10.jpg)
Maximal Data PilingSimple example: in2 nn 3
![Page 11: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/11.jpg)
Maximal Data PilingSimple example: in
Construction 1:
Let be
• Subspace generated by
• Two dimensional
• Shown as cyan plane
2 nn 3
&
![Page 12: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/12.jpg)
Maximal Data PilingSimple example: in
Construction 1 (cont.):
Let be
• Direction orthogonal to
• One dimensional
• Makes Class +1 Data project to one point
• And Class -1 Data project to one point
• Called Maximal Data Piling Direction
2 nn 3
MDPv
![Page 13: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/13.jpg)
Maximal Data PilingSimple example: in
Construction 2:
Let be
• Subspaces orthogonal to
(respectively)
• Projection collapses Class +1
• Projection collapses Class -1
• Both are 2-d (planes)
2 nn 3
&
&
![Page 14: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/14.jpg)
Maximal Data PilingSimple example: in
Construction 2 (cont.):
Let intersection of be
• Same Maximal Data Piling Direction
• Projection collapses both
Class +1 and Class -1
• Intersection of 2-d (planes) is 1-d dir’n
2 nn 3
& MDPv
![Page 15: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/15.jpg)
Maximal Data PilingGeneral Case: in with
Let be
• Hyperplanes generated by Classes
• Of Dimensions (resp.)
Let be
• Parallel subspaces
• I.e. shifts to origin
• Of Dimensions (resp.)
nn , d
~&~
1,1 nn
nnd
&
1,1 nn
![Page 16: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/16.jpg)
Maximal Data PilingGeneral Case: in with
Let be
• Orthogonal Hyperplanes
• Of Dim’ns (resp.)
• Where
– Proj’n in Dir’ns Collapse Class +1
– Proj’n in Dir’ns Collapse Class -1
• Expect intersection
nn , d
&
1,1 ndnd
nnd
2 nnd
![Page 17: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/17.jpg)
Maximal Data PilingGeneral Case: in with
Can show (Ahn & Marron 2005):
• Most dir’ns in intersection collapse all to 0
• But there is a direction,
• Where Classes collapse to different points
• Unique in subspace generated by the data
• Called Maximal Data Piling Direction
nn , d nnd
MDPv
![Page 18: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/18.jpg)
Maximal Data PilingMovie Through Increasing Dimensions
![Page 19: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/19.jpg)
Maximal Data PilingMDP in Increasing Dimensions:• Sub HDLSS dimensions (d = 1-37):
– Looks similar to FLD?!?– Reason for this?
• At HDLSS Boundary (d = 38):– Again similar to FLD…
![Page 20: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/20.jpg)
Maximal Data PilingFLD in Increasing Dimensions:• For HDLSS dimensions (d = 39-1000):
– Always have data piling– Gap between grows for larger n– Even though noise increases?– Angle (gen’bility) first improves, d = 39–
180– Then worsens, d = 200-1000– Eventually noise dominates– Trade-off is where gap near optimal
diff’nce
![Page 21: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/21.jpg)
Maximal Data PilingHow to compute ?Can show (Ahn & Marron 2005):
Recall FLD formula:
Only difference is global vs. within classCovariance Estimates!
MDPv
XXv w
FLD
1ˆ
XXvMDP1ˆ
![Page 22: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/22.jpg)
Maximal Data Piling
Historical Note:
• Discovery of MDP
• Came from a programming error
• Forgetting to use within class
covariance
• In FLD…
![Page 23: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/23.jpg)
Maximal Data PilingVisual similarity of & ?
Can show (Ahn & Marron 2005), for d < n:
I.e. directions are the same!
• How can this be?
• Note lengths are different…
• Study from transformation viewpoint
MDPv FLDv
FLDFLDMDPMDP vvvv //
![Page 24: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/24.jpg)
Maximal Data PilingRecall
Transfo’
view of
FLD:
![Page 25: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/25.jpg)
Maximal Data PilingInclude Corres-pondingMDP Transfo’:
Both giveSameResult!
![Page 26: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/26.jpg)
Maximal Data PilingDetails:FLD, sep’ing plane normal vectorWithin Class, PC1 PC2Global, PC1 PC2
![Page 27: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/27.jpg)
Maximal Data Piling
Acknowledgement:
• This viewpoint
• I.e. insight into why FLD = MDP
(for low dim’al data)
• Suggested by Daniel Peña
![Page 28: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/28.jpg)
Maximal Data Piling
Fun e.g:
rotate
from PCA
to MDP
dir’ns
![Page 29: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/29.jpg)
Maximal Data PilingMDP for other class labellings:
• Always exists
• Separation bigger for natural clusters
• Could be used for clustering– Consider all directions
– Find one that makes largest gap
– Very hard optimization problem
– Over 2n-2 possible directions
![Page 30: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/30.jpg)
Maximal Data PilingA point of terminology (Ahn & Marron
2005):
MDP is “maximal” in 2 senses:
1. # of data piled
2. Size of gap
(within subspace
gen’d by data)
![Page 31: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/31.jpg)
Maximal Data PilingRecurring, over-arching, issue:
HDLSS space is a weird place
![Page 32: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/32.jpg)
Kernel EmbeddingAizerman, Braverman and Rozoner
(1964) • Motivating idea:
Extend scope of linear discrimination,By adding nonlinear components to data
(embedding in a higher dim’al space)
• Better use of name:nonlinear discrimination?
![Page 33: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/33.jpg)
Kernel EmbeddingToy Examples:In 1d, linear separation splits the
domaininto only 2 parts
xx :
![Page 34: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/34.jpg)
Kernel EmbeddingBut in the “quadratic
embedded domain”,
linear separation can give 3 parts
22 :, xxx
![Page 35: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/35.jpg)
Kernel EmbeddingBut in the quadratic embedded domain
Linear separation can give 3 parts• original data space lies in 1d manifold• very sparse region of • curvature of manifold gives:
better linear separation• can have any 2 break points
(2 points line)
22 :, xxx
2
![Page 36: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/36.jpg)
Kernel EmbeddingStronger effects for higher order polynomial embedding:
E.g. for cubic,
linear separation can give 4 parts (or fewer)
332 :,, xxxx
![Page 37: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/37.jpg)
Kernel EmbeddingStronger effects - high. ord. poly.
embedding:• original space lies in 1-d manifold,
even sparser in • higher d curvature gives:
improved linear separation• can have any 3 break points
(3 points plane)?• Note: relatively few
“interesting separating planes”
3
![Page 38: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/38.jpg)
Kernel EmbeddingGeneral View: for original data
matrix:
add rows:
i.e. embed inHigherDimensionalspace
dnd
n
xx
xx
1
111
nn
dnd
n
dnd
n
xxxx
xx
xx
xx
xx
212111
221
21
211
1
111
![Page 39: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/39.jpg)
Kernel EmbeddingEmbedded Fisher Linear Discrimination:
Choose Class 1, for any when:
in embedded space.• image of class boundaries in original
space is nonlinear• allows more complicated class regions• Can also do Gaussian Lik. Rat. (or
others) • Compute image by classifying points
from original space
dx 0
)2()1(1)2()1()2()1(10 ˆ2
1ˆ XXXXXXx wwt
![Page 40: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/40.jpg)
Kernel EmbeddingVisualization for Toy Examples:• Have Linear Disc. In Embedded Space• Study Effect in Original Data Space• Via Implied Nonlinear RegionsApproach:• Use Test Set in Original Space
(dense equally spaced grid)• Apply embedded discrimination Rule• Color Using the Result
![Page 41: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/41.jpg)
Kernel EmbeddingPolynomial Embedding, Toy Example 1:Parallel Clouds
![Page 42: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/42.jpg)
Kernel EmbeddingPolynomial Embedding, Toy Example 1:
Parallel Clouds• PC 1:
– always bad– finds “embedded greatest var.” only)
• FLD: – stays good
• GLR: – OK discrimination at data– but overfitting problems
![Page 43: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/43.jpg)
Kernel EmbeddingPolynomial Embedding, Toy Example 2:Split X
![Page 44: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/44.jpg)
Kernel EmbeddingPolynomial Embedding, Toy Example 2:
Split X
• FLD:
– Rapidly improves with higher degree
• GLR:
– Always good
– but never ellipse around blues…
![Page 45: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/45.jpg)
Kernel EmbeddingPolynomial Embedding, Toy Example 3:Donut
![Page 46: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/46.jpg)
Kernel EmbeddingPolynomial Embedding, Toy Example 3:
Donut
• FLD: – Poor fit for low degree
– then good
– no overfit
• GLR: – Best with No Embed,
– Square shape for overfitting?
![Page 47: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/47.jpg)
Kernel Embedding
Drawbacks to polynomial embedding:
• too many extra terms create
spurious structure
• i.e. have “overfitting”
• HDLSS problems typically get worse
![Page 48: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/48.jpg)
Kernel EmbeddingHot Topic Variation: “Kernel Machines”
Idea: replace polynomials by
other nonlinear functions
e.g. 1: sigmoid functions from neural nets
e.g. 2: radial basis functions
Gaussian kernels
Related to “kernel density estimation”
(recall: smoothed histogram)
![Page 49: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/49.jpg)
Kernel Density EstimationChondrite Data:• Represent points by red bars• Where are data “more dense”?
![Page 50: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/50.jpg)
Kernel Density EstimationChondrite Data:• Put probability mass 1/n at each
point• Smooth piece of “density”
![Page 51: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/51.jpg)
Kernel Density EstimationChondrite Data:• Sum pieces to estimate density• Suggests 3 modes (rock sources)
![Page 52: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/52.jpg)
Kernel EmbeddingRadial Basis Functions:
Note: there are several ways to embed:
• Naïve Embedding (equally spaced grid)
• Explicit Embedding (evaluate at data)
• Implicit Emdedding (inner prod. based)
(everybody currently does the latter)
![Page 53: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/53.jpg)
Kernel EmbeddingNaïve Embedding, Radial basis
functions:
At some “grid points” ,
For a “bandwidth” (i.e. standard dev’n) ,
Consider ( dim’al) functions:
Replace data matrix with:
kgg ,...,
1
d
kgxgx ,...,
1
knk
n
gXgX
gXgX
1
111
![Page 54: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/54.jpg)
Kernel EmbeddingNaïve Embedding, Radial basis
functions:
For discrimination:
Work in radial basis space,
With new data vector ,
represented by:
kgX
gX
0
10
0X
![Page 55: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/55.jpg)
Kernel EmbeddingNaïve Embedd’g, Toy E.g. 1: Parallel
Clouds
• Good
at data
• Poor
outside
![Page 56: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/56.jpg)
Kernel EmbeddingNaïve Embedd’g, Toy E.g. 2: Split X
• OK at
data
• Strange
outside
![Page 57: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/57.jpg)
Kernel EmbeddingNaïve Embedd’g, Toy E.g. 3: Donut
• Mostly
good
• Slight
mistake
for one
kernel
![Page 58: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/58.jpg)
Kernel Embedding
Naïve Embedding, Radial basis
functions:
Toy Example, Main lessons:
• Generally good in regions with data,
• Unpredictable where data are sparse
![Page 59: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/59.jpg)
Kernel EmbeddingToy Example 4: Checkerboard
VeryChallenging!
LinearMethod?
PolynomialEmbedding?
![Page 60: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/60.jpg)
Kernel EmbeddingToy Example 4: Checkerboard
Polynomial Embedding:
• Very poor for linear
• Slightly better for higher degrees
• Overall very poor
• Polynomials don’t have needed
flexibility
![Page 61: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/61.jpg)
Kernel EmbeddingToy Example 4: CheckerboardRadialBasisEmbedding+ FLDIsExcellent!
![Page 62: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/62.jpg)
Kernel EmbeddingDrawbacks to naïve embedding:
• Equally spaced grid too big in high d
• Not computationally tractable (gd)
Approach:
• Evaluate only at data points
• Not on full grid
• But where data live
![Page 63: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/63.jpg)
Kernel EmbeddingOther types of embedding:
• Explicit
• Implicit
Will be studied soon, after
introduction to Support Vector Machines…
![Page 64: Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR](https://reader035.vdocuments.us/reader035/viewer/2022062721/56649f1f5503460f94c37bd1/html5/thumbnails/64.jpg)
Kernel Embedding generalizations of this idea to other
types of analysis
& some clever computational ideas.
E.g. “Kernel based, nonlinear Principal
Components Analysis”
Ref: Schölkopf, Smola and Müller
(1998)