part ii: practical implementations
DESCRIPTION
Part II: Practical Implementations. Modeling the Classes. Stochastic Discrimination. Algorithm for Training a SD Classifier. Generate projectable weak model. Evaluate model w.r.t. training set, check enrichment. Check uniformity w.r.t. existing collection. Add to discriminant. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/1.jpg)
1
Part II: Practical Implementations.
![Page 2: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/2.jpg)
2
Modeling the Classes
Stochastic Discrimination
![Page 3: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/3.jpg)
3
Algorithm for Training a SD Classifier
Generate projectable weak model
Evaluate model w.r.t. training set, check
enrichment
Check uniformity w.r.t. existing collection
Add to discriminant
![Page 4: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/4.jpg)
4
Dealing with Data Geometry:
SD in Practice
![Page 5: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/5.jpg)
5
2D Example
• Adapted from [Kleinberg, PAMI, May 2000]
![Page 6: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/6.jpg)
6
• An “r=1/2” random subset in the feature space that covers ½ of all the points
![Page 7: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/7.jpg)
7
• Watch how many such subsets cover a particular point, say, (2,17)
(2,17)
![Page 8: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/8.jpg)
8
It’s in 1/2 modelsY = ½ = 0.5
It’s in 2/3 modelsY = 2/3 = 0.67
It’s in 3/4 modelsY = ¾ = 0.75
It’s in 4/5 modelsY = 4/5 = 0.8
It’s in 5/6 modelsY = 5/6 = 0.83
It’s in 0/1 modelsY = 0/1 = 0.0
In Out In
In In In
![Page 9: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/9.jpg)
9
It’s in 6/8 modelsY = 6/8 = 0.75
It’s in 7/9 modelsY = 7/9 = 0.77
It’s in 8/10 modelsY = 8/10 = 0.8
It’s in 8/11 modelsY = 8/11 = 0.73
It’s in 8/12 modelsY = 8/12 = 0.67
It’s in 5/7 modelsY = 5/7 = 0.72
In In
In Out Out
Out
![Page 10: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/10.jpg)
10
• Fraction of “r=1/2” random subsets covering point (2,17) as more such subsets are generated
![Page 11: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/11.jpg)
11
• Fractions of “r=1/2” random subsets covering several selected points as more such subsets are generated
![Page 12: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/12.jpg)
12
• Distribution of model coverage for all points in space, with 100 models
![Page 13: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/13.jpg)
13
• Distribution of model coverage for all points in space, with 200 models
![Page 14: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/14.jpg)
14
• Distribution of model coverage for all points in space, with 300 models
![Page 15: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/15.jpg)
15
• Distribution of model coverage for all points in space, with 400 models
![Page 16: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/16.jpg)
16
• Distribution of model coverage for all points in space, with 500 models
![Page 17: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/17.jpg)
17
• Distribution of model coverage for all points in space, with 1000 models
![Page 18: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/18.jpg)
18
• Distribution of model coverage for all points in space, with 2000 models
![Page 19: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/19.jpg)
19
• Distribution of model coverage for all points in space, with 5000 models
![Page 20: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/20.jpg)
20
• Introducing enrichment:
For any discrimination to happen, the models must have some difference in coverage for different classes.
![Page 21: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/21.jpg)
21
• Enforcing enrichment (adding in a bias): require each subset to cover more points of one class than another
Class distribution A biased (enriched) weak model
![Page 22: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/22.jpg)
22
• Distribution of model coverage for points in each class, with 100 enriched weak models
![Page 23: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/23.jpg)
23
• Distribution of model coverage for points in each class, with 200 enriched weak models
![Page 24: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/24.jpg)
24
• Distribution of model coverage for points in each class, with 300 enriched weak models
![Page 25: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/25.jpg)
25
• Distribution of model coverage for points in each class, with 400 enriched weak models
![Page 26: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/26.jpg)
26
• Distribution of model coverage for points in each class, with 500 enriched weak models
![Page 27: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/27.jpg)
27
• Distribution of model coverage for points in each class, with 1000 enriched weak models
![Page 28: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/28.jpg)
28
• Distribution of model coverage for points in each class, with 2000 enriched weak models
![Page 29: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/29.jpg)
29
• Distribution of model coverage for points in each class, with 5000 enriched weak models
![Page 30: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/30.jpg)
30
• Error rate decreases as number of models increases
Decision rule: if Y < 0.5 then class 2 else class 1
![Page 31: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/31.jpg)
31
• Sparse Training Data:
Incomplete knowledge about class distributions
Training Set Test Set
![Page 32: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/32.jpg)
32
• Distribution of model coverage for points in each class, with 100 enriched weak models
Training Set Test Set
![Page 33: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/33.jpg)
33
• Distribution of model coverage for points in each class, with 200 enriched weak models
Training Set Test Set
![Page 34: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/34.jpg)
34
• Distribution of model coverage for points in each class, with 300 enriched weak models
Training Set Test Set
![Page 35: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/35.jpg)
35
• Distribution of model coverage for points in each class, with 400 enriched weak models
Training Set Test Set
![Page 36: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/36.jpg)
36
• Distribution of model coverage for points in each class, with 500 enriched weak models
Training Set Test Set
![Page 37: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/37.jpg)
37
• Distribution of model coverage for points in each class, with 1000 enriched weak models
Training Set Test Set
![Page 38: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/38.jpg)
38
• Distribution of model coverage for points in each class, with 2000 enriched weak models
Training Set Test Set
![Page 39: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/39.jpg)
39
• Distribution of model coverage for points in each class, with 5000 enriched weak models
Training Set Test Set
No discrimination!
![Page 40: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/40.jpg)
40
• Models of this type, when enriched for training set, are not necessarily enriched for test set
Training Set Test Set
Random model with 50% coverage of space
![Page 41: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/41.jpg)
41
• Introducing projectability:
Maintain local continuity of class interpretations.
Neighboring points of the same class should share similar model coverage.
![Page 42: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/42.jpg)
42
• Allow some local continuity in model membership, so that interpretation of a training point can generalize to its immediate neighborhood
Class distribution A projectable model
![Page 43: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/43.jpg)
43
• Distribution of model coverage for points in each class, with 100 enriched, projectable weak models
Training Set Test Set
![Page 44: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/44.jpg)
44
• Distribution of model coverage for points in each class, with 300 enriched, projectable weak models
Training Set Test Set
![Page 45: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/45.jpg)
45
• Distribution of model coverage for points in each class, with 400 enriched, projectable weak models
Training Set Test Set
![Page 46: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/46.jpg)
46
• Distribution of model coverage for points in each class, with 500 enriched, projectable weak models
Training Set Test Set
![Page 47: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/47.jpg)
47
• Distribution of model coverage for points in each class, with 1000 enriched, projectable weak models
Training Set Test Set
![Page 48: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/48.jpg)
48
• Distribution of model coverage for points in each class, with 2000 enriched, projectable weak models
Training Set Test Set
![Page 49: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/49.jpg)
49
• Distribution of model coverage for points in each class, with 5000 enriched, projectable weak models
Training Set Test Set
![Page 50: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/50.jpg)
50
• Promoting uniformity:
All points in the same class should have equal likelihood to be covered by a model of each particular rating.
Retain models that cover the points whose coverage by current collection is less
![Page 51: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/51.jpg)
51
• Distribution of model coverage for points in each class, with 100 enriched, projectable, uniform weak models
Training Set Test Set
![Page 52: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/52.jpg)
52
• Distribution of model coverage for points in each class, with 1000 enriched, projectable, uniform weak models
Training Set Test Set
![Page 53: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/53.jpg)
53
• Distribution of model coverage for points in each class, with 5000 enriched, projectable, uniform weak models
Training Set Test Set
![Page 54: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/54.jpg)
54
• Distribution of model coverage for points in each class, with 10000 enriched, projectable, uniform weak models
Training Set Test Set
![Page 55: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/55.jpg)
55
• Distribution of model coverage for points in each class, with 50000 enriched, projectable, uniform weak models
Training Set Test Set
![Page 56: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/56.jpg)
56
The 3 necessary conditions
Complementary Information
Discriminating Power
Generalization Power
Enrichment:
Projectability:Uniformity:
![Page 57: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/57.jpg)
57
Extensions and Comparisons
![Page 58: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/58.jpg)
58
Alternative Discriminants
• [Berlind 1994]
• Different discriminants for N-class problems
• Additional condition on symmetry
• Approximate uniformity
• Hierarchy of indiscernibility
![Page 59: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/59.jpg)
59
Estimates of Classification Accuracies
• [Chen 1997]
• Statistical estimate of classification accuracy
under weaker conditions:
Approximate uniformity
Approximate indiscernibility
![Page 60: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/60.jpg)
60
• For n classes, define n discriminants Yi, one for each class i vs the others
• Classify an unknown point to the class i for which the computed Yi is the largest
Multi-class Problems
![Page 61: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/61.jpg)
61
[Ho & Kleinberg ICPR 1996]
![Page 62: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/62.jpg)
62
![Page 63: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/63.jpg)
63
![Page 64: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/64.jpg)
64
![Page 65: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/65.jpg)
65
Open Problems
• Algorithm for uniformity enforcementDeterministic methods?
• Desirable form of weak modelsFewer, more sophisticated classifiers?
• Other ways to address the 3-way trade-offEnrichment / Uniformity / Projectability
![Page 66: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/66.jpg)
66
Random Decision Forest
• [Ho 1995, 1998]
• A structured way to create models: fully split a tree, use leaves as models
• Perfect enrichment and uniformity for TR
• Promote projectability by subspace projection
![Page 67: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/67.jpg)
67
Compact Distribution Maps
• [Ho & Baird 1993, 1997]
• Another structured way to create models
• Start with projectable models by coarse quantization of feature value range
• Seek enrichment and uniformity
Signature of 2 types of events and measurements from a new observation
Signal IndexSignal Level
![Page 68: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/68.jpg)
68
SD & Other Ensemble Methods
• Ensemble learning via boosting:
A sequential way to promote uniformity of ensemble element coverage
• XCS (a genetic algorithm)
A way to create, filter, and use stochastic models that are regions in feature space
![Page 69: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/69.jpg)
69
XCS Classifier System
• [Wilson,95]Recent focus of GA community
Good performance
Reinforcement Learning + Genetic Algorithms
Model: set of rules
Environment
Set of Rules
input class
ReinforcementLearning
GeneticAlgorithms
reward
updatesearch
if (shape=square and number>10) then class=redif (shape=circle and number<5) then class=yellow
![Page 70: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/70.jpg)
70
Multiple Classifier Systems:Examples in Word Image Recognition
![Page 71: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/71.jpg)
71
Complementary Strengths of Classifiers
The case for classifier combination
… decision fusion
… mixture of experts
… committee decision making
Rank of true class out of a lexicon of 1091 words, by 10 classifiers for 20 images
![Page 72: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/72.jpg)
72
Classifier Combination Methods
• Decision Optimization:
find consensus among a given set of classifiers
• Coverage Optimization:
create a set of classifiers that work best with a given decision combination function
![Page 73: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/73.jpg)
73
Decision Optimization
• Develop classifiers with expert knowledge• Try to make the best use of their decisions
via majority/plurality vote, sum/product rule, probabilistic methods, Bayesian methods, rank/confidence score combination …
• The joint capability of the classifiers set an intrinsic limit on the combined accuracy
• There is no way to handle the blind spots
![Page 74: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/74.jpg)
74
Difficulties in Decision Optimization
• Reliability versus overall accuracy
• Fixed or trainable combination function
• Simple models or combinatorial estimates
• How to model complementary behavior
![Page 75: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/75.jpg)
75
Coverage Optimization
• Fix a decision combination function• Generate classifiers automatically and systematically
via training set sub-sampling (stacking, bagging, boosting),subspace projection (RSM), superclass/subclass decomposition (ECOC), random perturbation of training processes, noise injection …
• Need enough classifiers to cover all blind spots(how many are enough?)
• What else is critical?
![Page 76: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/76.jpg)
76
Difficulties inCoverage Optimization
• What kind of differences to introduce:– Subsamples? Subspaces? Super/Subclasses?– Training parameters? – Model geometry?
• 3-way tradeoff: – discrimination + diversity + generalization
• Effects of the form of component classifiers
![Page 77: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/77.jpg)
77
Dilemmas and Paradoxes in Classifier Combination
• Weaken individuals for a stronger whole?
• Sacrifice known samples for unseen cases?
• Seek agreements or differences?
![Page 78: Part II: Practical Implementations](https://reader031.vdocuments.us/reader031/viewer/2022020721/568139ae550346895da1491c/html5/thumbnails/78.jpg)
78
Stochastic Discrimination
• A mathematical theory that relates several key concepts in pattern recognition:
– Discriminative power … enrichment– Complementary information … uniformity– Generalization power … projectability
• It offers a way to describe complementary behavior of classifiers
• It offers guidelines to design multiple classifier systems (classifier ensembles)