how to solve a classification problem with 45 class levels using random forests nicholas l....
DESCRIPTION
How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt US Forest Service, Rocky Mountain Research Station, Moscow, ID Western Mensurationists Missoula, MT June 20-22, 2010. Problem (we have 45 class levels, that’s a lot) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/1.jpg)
Big classification problemsHow to solve a classification problem with
45 class levels using Random Forests
Nicholas L. CrookstonGerald E. Rehfeldt
US Forest Service, Rocky Mountain Research Station, Moscow, IDWestern Mensurationists
Missoula, MTJune 20-22, 2010
![Page 2: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/2.jpg)
Big classification problemsContents• Problem (we have 45 class levels, that’s a lot)• Solution (we broke the problem into many
subsets and formed an ensemble classifier)• Results (very good, and we have a measure of
extrapolation)• Discussion
![Page 3: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/3.jpg)
Big classification problems
• We desire to predict the biotic community as a function of climate.
• There are 45 biotic communities of interest. Brown, D.E., F. Reichenbacher, S.E. Franson. 1998. A classification of North American biotic communities. University of Utah Press, Salt Lake City. 141 pp.
Problem
![Page 4: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/4.jpg)
Big classification problems
• In a 2006 effort on a subset of these communities, we had great results using:Breiman, Leo. 2001. Random Forests. Machine Learning 45:5-32.
• These results were published in:Rehfeldt, G.E., N.L. Crookston, M.V. Warwell and J.S. Evans. 2006. Empirical analyses of plant-climate relationships for the western United States. Int. J. Plant Sci. 167, 1123-1150.
Problem
![Page 5: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/5.jpg)
Big classification problemsRandom Forests• A Random Forest (RF) is a set of
classification or regression trees (CART).• RF builds many trees, each one minimizes the
classification error on a boot-strap sample of training data.
• 32 class-levels are supported, but when there are over 10, it uses a sampling scheme for each tree.
![Page 6: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/6.jpg)
Big classification problemsRandom Forests -- continued
• To classify a new observation:– RF puts the new observation down each of the
trees in the forest – Each tree gives a classification, the classification
is a vote.– The forest chooses the class having the most votes
over all the trees.
![Page 7: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/7.jpg)
Big classification problemsProblem -- continued
• We have 45 class levels, over the limit in package randomForest 32!
• We desire to make predictions using future climates.
• RF might predict nonsense answers for future climatic conditions that are unique with respect to the training data.
• These are extrapolations we need to detect.
![Page 8: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/8.jpg)
Big classification problemsSolution -- Steps
1. Training data: ~1.6 million obs, 35 climate variables from the Moscow climate model.
2. We created 100 Random Forests.3. To create 1 of the forests:
a. Sample 9 of 45 class levels (without replacement)b. Make a copy of the training data.c. Recode the biotic community in this copy; keep
as is if code is one of the 9 in the sample, otherwise change the observed class to “other”.
![Page 9: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/9.jpg)
Big classification problemsSteps -- continued.
3. Fit each of the 100 RFs. 4. To make a prediction:
a. Put the new case down all 100 RFs, providing a vector of 100 predictions for the case.
b. Count the number of predictions by biotic community code, including “other”. This gives a table of codes and counts that has 46 rows (one for each community code plus “other”).
![Page 10: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/10.jpg)
Big classification problemsSteps -- continued.
c. Divide the counts for each code by the number of RFs that contained the code.
d. The ensemble classification is the class value corresponding to the maximum of these quotients.
![Page 11: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/11.jpg)
Big classification problemsExample 1 (contemporary climate):
CodeNumber
PredictedNumber Forests Quotients
1 20 25 0.802 3 34 0.09
3 8 33 0.24
4 2 29 0.07
Other 6 100 0.06
![Page 12: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/12.jpg)
Big classification problemsExample 2 (future climate 1):
CodeNumber
PredictedNumber Forests Quotients
1 20 -> 4 25 0.16
2 3 -> 25 34 0.74
3 8 -> 4 33 0.12
4 2 -> 3 29 0.10
Other 6 -> 20 100 0.20
![Page 13: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/13.jpg)
Big classification problemsExample 3 (future climate 2):
CodeNumber
PredictedNumber Forests Quotients
1 20 -> 8 25 0.32
2 3 -> 4 34 0.12
3 8 -> 4 33 0.12
4 2 -> 3 29 0.10
Other 6 -> 40 100 0.40
![Page 14: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/14.jpg)
Big classification problemsResults• We interpret predictions of other to indicate
extrapolation. • For this work, extrapolation indicates there is
no biotic community in our study area that corresponds to the (new) climate.
• It is not a perfect indication of extrapolation.
![Page 15: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/15.jpg)
Big classification problemsResults• Application to Brown’s biotic communities
– All of North America– Prediction of community as a function of climatic
metrics– Mapped at 0.0083333 arc degrees (~ 1km2)
![Page 16: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/16.jpg)
Big classification problems
![Page 17: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/17.jpg)
Big classification problems
No analog: contemporary
![Page 18: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/18.jpg)
Big classification problems
No analog: 2030
![Page 19: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/19.jpg)
Big classification problems
No analog: 2090
Canadian
Princeton Hadley
![Page 20: How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt](https://reader035.vdocuments.us/reader035/viewer/2022081515/56816935550346895de09080/html5/thumbnails/20.jpg)
Big classification problemsDiscussion / Conclusion• The method can be use on larger problems and
perhaps with CART-based methods other than Random Forests.
• One could add samples that are actually other, that is, not any of those of interest.
• Random Forests remains a very important tool in our tool set.