cliff notes on ecological niche modeling with randomforest (ensembles) falk huettmann ewhale lab...
TRANSCRIPT
Cliff Notes on Ecological Niche Modeling with RandomForest (ensembles)
Falk HuettmannEWHALE lab
University of AlaskaFairbanks AK 99775
Email [email protected] Tel. 907 474 7882
Modeling Ecological Niches
Geographic Space Ecological Space
Latitude
Longitude Environmental factor a
Env
ironm
enta
l fac
tor
b
Sampling Space Model Space => Predictions
A Super Model
LMGLMGAMCARTMARS
NNGARP
TNRF
GDMMaxent…
=>Ensembles
‘Mean’SDOne formula capturing the data y=a +bx
Linear regression
A starting point…
Response Variable ~ Predictor1 Y X
X
Y
Common Ground
A Multiple Regression framework
Response Variable ~ Predictor1 + Predictor2 + Predictor3…
Common Ground
A Multiple Regression framework
Response Variable ~ Predictor1 + Predictor2 + Predictor3…
Traditionally, we used 1-5 predictors
But: 1 to 1000s of predictors are possible
‘One single algorithm’ that explains relationship between response and predictors
Derived relationship can be predicted to other locations with known predictors
GLM vs CART etc.
‘Mean’SD => potentially low r2
‘Mean’ ?SD ?
CART, TreeNet & RandomForest(there are many other algorithms !)
Linear(~unrealistic)
Non-Linear(driven by data)
Our Free Algorithms …
R-ProjectTreeNet
RandomForest
Fortran, C …
http://rweb.stat.umn.edu/R/library/randomForest/html/00Index.html
http://salford-systems.com/products.php
(free 30 day trial)
Tree/CART - Family
Classification & Regression Tree (CART)=>Binary recursive partitioning
Leo Breiman 1984, and others
Tree/CART - Family
Leo Breiman 1984, and others
YES NO
Temp>15
Precip <100
Temp<5
Classification & Regression Tree (CART)=>Binary recursive partitioning
Tree/CART - Family
Binary splits
Leo Breiman 1984, and others
Widely used concept
Tree/CART - Family
Binary splits
Leo Breiman 1984, and others
Widely used conceptFree of dataassumptions!No significances.
Tree/CART - Family
Binary splits
Binary split recursive partitioning (samepredictor can re-occur elsewhere as a ‘splitter’)
Maximizes Nodes for Homogenous Variance
Stopping Rules for Number of Branches basedon Optimization/Cross-validation
Terminal Nodes show Means (Regression Tree)or Categories (Classification Tree)
Leo Breiman 1984, and others
Widely used conceptFree of dataassumptions!No significances.
Tree/CART - Family
Binary splits Multiple splits
Binary split recursive partitioning (samepredictor can re-occur elsewhere as a ‘splitter’)
Maximizes Nodes for Homogenous Variance
Stopping Rules for Number of Branches basedon Optimization/Cross-validation
Terminal Nodes show Means (Regression Tree)or Categories (Classification Tree)
Leo Breiman 1984, and others
Classification Tree
A B C
A B
Widely used conceptRarely used, yet
Free of dataassumptions!No significances.
0.3 3 0.1
2 2.3
Regression Tree
CART Salford (rpart in R)Nice to interpret(e.g. for small trees, orwhen following throughspecific decision rulestil end)
0.70
0.80
0.90
0 100 200 300 400 500
Rel
ativ
e C
ost
Number of Nodes
DEM 100.00 ||||||||||||||||||||||||||||||||||||||||||TAIR_AUG 77.58 ||||||||||||||||||||||||||||||||PREC_AUG 69.46 |||||||||||||||||||||||||||||HYDRO 54.59 ||||||||||||||||||||||POP 47.39 |||||||||||||||||||LDUSE 40.88 |||||||||||||||||
Importance Value
CART Salford (rpart in R)
ROC curves for accuracy tests
e.g. correctly predicted absence app. 77%
e.g. correctly predicted presence app. 85%
=>Apply to a dataset for predictions
ROC
Nice to interpret(e.g. for small trees, orwhen following throughspecific decision rulestil end)
From withheld
Test Data
Optimum
TreeNet(~A sequence of CARTs) ‘boosting’
+ + + +
The more nodes…the more detail…the slower
Many trees make for a ‘net of trees’, or ‘a forest’ => Leo Breiman + Data Mining
TreeNet(~A sequence of CARTs) ‘boosting’
Variable Score LDUSE 100.00 ||||||||||||||||||||||||||||||||||||||||||TAIR_AUG 97.62 |||||||||||||||||||||||||||||||||||||||||HYDRO94.35 ||||||||||||||||||||||||||||||||||||||||DEM94.01 |||||||||||||||||||||||||||||||||||||||PREC_AUG 90.17 ||||||||||||||||||||||||||||||||||||||POP 82.54 ||||||||||||||||||||||||||||||||||HMFPT81.46 ||||||||||||||||||||||||||||||||||
0.0
0.1
0.2
0.3
0.4
0 10 20 30 40 50 60 70 80 90 100 110
Ris
k
Number of Trees
0
20
40
60
80
100
0 20 40 60 80 100
Pct
. C
lass
1
Pct. Population
+ + + +
Importance Value ROC curves for accuracy tests
e.g. correctly predicted absence app. 97%
e.g. correctly predicted presence app. 92%
=>Apply to a dataset for predictions
The more nodes…the more detail…the slower
ROCeach explains remaining variance
Difficult to interpretbut good graphs
Distance to Lake (m)
Bea
r O
ccu
rren
ce(P
arti
al D
epen
den
ce)
TreeNet: Graphic Output example
Response Curve
yes
no
Distance to Lake (m)
Bea
r O
ccu
rren
ce(P
arti
al D
epen
den
ce)
TreeNet: Graphic Output example
Response Curve
(the function above is virtually impossible to fit in linear algorithms => misleading coefficients, e.g. from LMs, GLMs)
yes
no
?
or
Distance to Lake (m)
Bea
r O
ccu
rren
ce(P
arti
al D
epen
den
ce)
or
TreeNet: Graphic Output example
Response Curve
(the function above is virtually impossible to fit in linear algorithms => misleading coefficients, e.g. from LMs, GLMs)
yes
no
?
?
Random set 1
Random set 2
Average Final Treefrom >2000 treesdone by VOTING
RandomForest (Prasad et al. 2006, Furlanelllo et al. 2003 Breimann 2001)
‘Boosting & Bagging’ algorithms (~Ensemble)
DEM Slope Aspect Climate Land-cover
1
2
3
4
5
Ran
dom
set
of
Row
s(C
ases
)
PredictorsRandom set 1
Random set 2
Average Final Treefrom e.g.>2000 treesdone by VOTING
RandomForest (Prasad et al. 2006, Furlanelllo et al. 2003 Breimann 2001)
‘Boosting & Bagging’ algorithms (~Ensemble)
DEM Slope Aspect Climate Land-cover
1
2
3
4
5
Ran
dom
set
of
Row
s(C
ases
)
Random set of Columns(Predictors)
Random set 1
Random set 2
RandomForest (Prasad et al. 2006, Furlanelllo et al. 2003 Breimann 2001)
Difficult to interpretbut good graphs
Average Final Treefrom e.g.>2000 treesdone by VOTING
‘Boosting & Bagging’ algorithms (~Ensemble)
DEM Slope Aspect Climate Land-cover
1
2
3
4
5
Ran
dom
set
of
Row
s(C
ases
)
Random set of Columns(Predictors)
Random set 1
Random set 2
RandomForest (Prasad et al. 2006, Furlanelllo et al. 2003 Breimann 2001)
Bagging: Optimization based on In-Bag, Out-of Bag samples
In RF no pruning => Difficult to overfit (robust)
Boosting & Bagging algorithms
Difficult to interpretbut good graphs
Handles ‘noise’, interactionsand categorical data fine!
Average Final Treefrom e.g.>2000 treesdone by VOTING
RandomForest and GIS: Spatial Modeling
RandomForest and GIS: Spatial Modeling
Predictors
Response
Table
RandomForest(quantification)
Train &DevelopModel
ApplyModel
GISOverlays
GISVisualization
ofPredictions
Predictors
Response
Table
aaahhhhuuhhhh ?!-Makes sense because of...-No, wait a minute, that’s wrong…
RandomForest and GIS: Spatial Modeling
Train &DevelopModel
ApplyModel
GISOverlays
GISVisualization
ofPredictions
RandomForest(quantification)
Allows for:
Works multivariate (100s of predictors)
Best Possible Predictions
Best Possible Clustering (without a response variable)
Tracking of Complex Interactions
Predictor Ranking
Handling Noisy Data
Fast & convenient applications
Allows for multiple (!) response variables !
RandomForest: Why so good and useable ?
Algorithms:RandomForest (R, Fortran, Salford)YAIMPUTE (R)PARTY (R)…
=> Change in World’s Science
What to read, for instance…
http://www.stat.berkeley.edu/~breiman/RandomForests/
Breiman, L. 2001. Statistical modeling: the two cultures. Statistical Science. 16(3): 199 –231.
Craig, E., and F. Huettmann. (2008). Using “blackbox” algorithms such as TreeNet and Random Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex ecological data: an overview, an example using golden eagle satellite data and an outlook for a promising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies through Pattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA,USA.
Magness, D.R., F. Huettmann, and J.M. Morton. (2008). Using Random Forests to provide predicted species distribution maps as a metric for ecological inventory & monitoring programs. Pages 209-229 in T.G. Smolinski, M.G. Milanova & A-E. Hassanien (eds.). Applications of Computational Intelligence in Biology: Current Trends and Open Problems. Studies in Computational Intelligence,Vol. 22, Springer-Verlag Berlin Heidelberg. 428 pp.
Prasad, A. L.A. Iverson, A. Liar. 2006. Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction. Ecosystems 181-199.
(and Hastie & Tibshirani, Furlanello et al. 2003, Elith et al. 2006 etc. etc.)
From now on, simply referred to as …
A Super Model
LMGLMCARTMARS
NNGARP
TNRF
GDMMaxent…
=>Ensembles
Some Super Models: Ensembles
LMGLMCARTMARS
NNGARP
TNRF
GDMMaxent…
Find the best modelfor a given section of yourdata => the best possible fit & prediction
Pres/Abs
Predictors
RF
LM
log
poly
Ivory Gull
LMpoly
RFlog
On Greyboxes, Philosophy and ScienceData
(Data Mining) Prediction & Accuracy
Algorithm with a Known Behavior
On Greyboxes, Philosophy and ScienceData
(Data Mining) Prediction & Accuracy
Algorithm with a Known Behavior
Such a statistical relationshipwill be found by either CART, TN, RF orLM, GLM
On Greyboxes, Philosophy and ScienceData
(Data Mining) Prediction & Accuracy
GLMs as a blackbox!? YES.Just think of software implementations, Max-Likelihood, Model FittingAIC and Research Design (sensu Keating & Cherry 1994)
Algorithm with a Known Behavior
On Greyboxes, Philosophy and Science
-> Over time ->GLM ANN Boosting, Bagging …
100%
0%
ImprovementIncreases
ModelPerfor-mance
Data
(Data Mining) Prediction & Accuracy
GLMs as a blackbox!? YES.Just think of software implementations, Max-Likelihood, Model FittingAIC and Research Design (sensu Keating & Cherry 1994)
Algorithm with a Known Behavior
Parsimony, Inference and Prediction ?!
Sole focus on predictions and its accuracies, whereas…
…R2, p-values and traditional inference (variable rankings, AIC) are of lower relevance
Why Parsimony ?
No real need for optimizing the fit and for parsimony when prediction is the goal
Global accuracy metrics, ROC, AUC, kappa, meta analysis …(instead of p-values and significance levels or AIC)
0.70
0.80
0.90
0 100 200 300 400 500
Rel
ativ
e C
ost
Number of Nodes