near-minimax optimal learning with decision trees university of wisconsin-madison and rice...
DESCRIPTION
Smooth Decision Boundaries Suppose that the Bayes decision boundary behaves locally like a Lipschitz function Mammen & Tsybakov ‘99TRANSCRIPT
![Page 1: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/1.jpg)
Near-Minimax Optimal Learning with Decision Trees
University of Wisconsin-Madison and Rice University
Rob Nowak and Clay Scott
Supported by the NSF and the ONR
![Page 2: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/2.jpg)
Basic Problem
Classification: build a decision rule based on labeled training data
Given n training points, how well can we do ?
![Page 3: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/3.jpg)
Smooth Decision Boundaries
Suppose that the Bayes decision boundary behaves locally like a Lipschitz function
Mammen & Tsybakov ‘99
![Page 4: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/4.jpg)
Dyadic Thinking about Classification Trees
recursive dyadic partition
![Page 5: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/5.jpg)
Pruned dyadic partition
Pruned dyadic tree
Dyadic Thinking about Classification Trees
Hierarchical structure facilitates optimization
![Page 6: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/6.jpg)
The Classification Problem
Problem:
![Page 7: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/7.jpg)
Classifiers
The Bayes Classifier:
Minimum Empirical Risk Classifier:
![Page 8: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/8.jpg)
Generalization Error Bounds
![Page 9: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/9.jpg)
Generalization Error Bounds
![Page 10: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/10.jpg)
Generalization Error Bounds
![Page 11: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/11.jpg)
Selecting a good h
![Page 12: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/12.jpg)
Convergence to Bayes Error
![Page 13: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/13.jpg)
Ex. Dyadic Classification Trees
labeled training data Bayes decision boundary complete RDP pruned RDP
Dyadic classification tree
![Page 14: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/14.jpg)
Codes for DCTs
0
1
00
0 01 1 1 1
1
code-lengths:
ex:
code: 0001001111+ 6 bits for leaf labels
![Page 15: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/15.jpg)
Error Bounds for DCTs
Compare with CART:
![Page 16: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/16.jpg)
Rate of Convergence
Suppose that the Bayes decision boundary behaves locally like a Lipschitz function
Mammen & Tsybakov ‘99 C. Scott & RN ‘02
![Page 17: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/17.jpg)
Why too slow ?
because Bayes boundary is a (d-1)-dimensional manifold “good” trees are unbalanced
all |T| leaf trees are equally favored
![Page 18: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/18.jpg)
Local Error Bounds in Classification
Spatial Error Decomposition: Mansour & McAllester ‘00
![Page 19: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/19.jpg)
Relative Chernoff Bound
![Page 20: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/20.jpg)
Relative Chernoff Bound
![Page 21: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/21.jpg)
Local Error Bounds in Classification
![Page 22: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/22.jpg)
Bounded Densities
![Page 23: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/23.jpg)
Global vs. Local
Key: local complexity is offset by small volumes!
![Page 24: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/24.jpg)
Local Bounds for DCTs
![Page 25: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/25.jpg)
Unbalanced Tree
J leafsdepth J-1
Global bound:
Local bound:
![Page 26: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/26.jpg)
Convergence to Bayes Error
Mammen & Tsybakov ‘99 C. Scott & RN ‘03
![Page 27: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and](https://reader036.vdocuments.us/reader036/viewer/2022062401/5a4d1b7e7f8b9ab0599ba41a/html5/thumbnails/27.jpg)
Concluding Remarks
~
data dependent bound
Neural Information Processing Systems 2002, 2003 [email protected]