semi-random model tree ensembles: an effective and scalable regression method
DESCRIPTION
We present and investigate ensembles of semi-random model trees as a novel regression method. Such ensembles combine the scalability of tree-based methods with predictive performance rivalling the state of the art in numeric prediction. An empirical investigation shows that Semi-Random Model Trees produce predictive performance which is competitive with state-of-the-art methods like Gaussian Processes Regression or Additive Groves of Regression Trees. The training and optimization of Random Model Trees scales better than Gaussian Processes Regression to larger datasets, and enjoys a constant advantage over Additive Groves of the order of one to two orders of magnitude.TRANSCRIPT
Semi-random model tree ensembles: an effectiveand scalable regression method
Bernhard PfahringerDepartment of Computer Science
University of Waikato, New Zealand
September 22nd , 2011
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 1 / 28
Background
Outline
1 Background
2 Algorithm
3 Results
4 Summary
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 2 / 28
Background
Local regression
non-linear functions can be approximated by a set of locally linearestimatorsRegression and model trees are fast multi-variate versions of localregression
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 3 / 28
Background
Piece-wise linear approximation example
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 4 / 28
Background
Sample Regression Tree: constants in the leaves
A159 <= −0.62 :A149 <= 0.52 : Y = 1.6977A149 > 0.52 : Y = 1.2213
A159 > −0.62 :A149 <= 0.638 :
A57 <= −0.485 : Y = 0.8388A57 > −0.485 : Y = 1.0569
A149 > 0.638 : Y = 0.6062
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 5 / 28
Background
Sample Model Tree: linear models in the leaves
A159 <= −0.62 :A149 <= 0.52 : LM1A149 > 0.52 : LM2
A159 > −0.62 :A149 <= 0.638 : LM3A149 > 0.638 : LM4
LM1 Y = −0.597 ∗ A149− 0.211 ∗ A159 + 1.901LM2 Y = −0.471 ∗ A149− 0.211 ∗ A159 + 1.353LM3 Y = −0.365 ∗ A149− 0.232 ∗ A159 + 1.017LM4 Y = −0.555 ∗ A149− 0.232 ∗ A159 + 0.776
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 6 / 28
Algorithm
Outline
1 Background
2 Algorithm
3 Results
4 Summary
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 7 / 28
Algorithm
Ensembles of Semi-Random Model Trees
Ensembles usually improve resultsMost ensembles use randomization to generate diversity2 sources of randomness:
For each tree: divide data into a train and a validation setTo split: select best attribute from a random subset of all attributes
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 8 / 28
Algorithm
Single Semi-Random Model Tree
Only consider median as split value (=> balanced trees)Leaf model: linear ridge regression modelCap model predictions inside observed extremesOptimise tree depth and ridge value using the validation set
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 9 / 28
Algorithm
Build ensemble
BUILDENSEMBLE(data, numTrees, k)
1 for i = 1 to numTrees2 do randomly split data into two:3 train + validate4 BUILDTREE(train, validate, k)
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 10 / 28
Algorithm
BuildTree
BUILDTREE(train, validate, k)
1 min← MINTARGETVALUE(train)2 max ← MAXTARGETVALUE(train)3 localSSE ← LINREG(train, validate)4 �
5 if |train| > 10 & |validate| > 106 do split ← RANDOMSPLIT(train, k)7 �
8 smT ← SMALLER(train, split)9 smV ← SMALLER(validate, split)
10 smaller ← BUILDTREE(smT , smV , k)11 �
12 laT ← LARGER(train, split)13 laV ← LARGER(validate, split)14 larger ← BUILDTREE(laT , laV , k)15 �
16 subSSE ← SSE(smaller , larger , validate)17 �
18 if localSSE < subSSE19 do smaller ← null20 larger ← null21 else22 localModel ← null
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 11 / 28
Algorithm
BuildTree, continued
15 subSSE ← SSE(smaller , larger , validate)16 �
17 if localSSE < subSSE18 do smaller ← null19 larger ← null20 else21 localModel ← null
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 12 / 28
Algorithm
Ridge regression
LINREG(train, validate)
1 for ridge in 10−8, 10−4, 10−2, 10−1, 1, 102 do modelr ← RIDGEREGRESS(train, ridge)3 sser ← SSE(modelr , validate)4 if bestModel == model105 do build models for ridge = 102, 103, ...6 and so on while improving7 localModel ← bestModel8 return minimum-sse-on-validation-data
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 13 / 28
Algorithm
Random split selection
RANDOMSPLIT(train, k)
1 for i = 1 to k2 do splitAttr ← RANDOM CHOICE(allAttrs)3 stump ← STUMP(APPROX MEDIAN(splitAttr))4 compute SSE(stump, train)5 return minimum-sse stump
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 14 / 28
Algorithm
Parameter Settings
reported experiments:
average predictions of 50 randomized model treesto split select best of 50% randomly selected attributes
generally: should optimise separately for every application, e.g. usingcross-validation
number of trees: “the more the merrier”, but diminishing returnsnumber of randomly selected attributes: 50% is a good default, butmay be depend on the total number and on sparseness
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 15 / 28
Results
Outline
1 Background
2 Algorithm
3 Results
4 Summary
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 16 / 28
Results
Comparison
use more than 20 Torgo/UCI datasets, > 900 examplesrepeated 2
3 training, 13 testing splits
training split into equal build and validation halves (13 , 1
3 )preprocessed for missing or categorical valuescompare to:
LR: linear ridge regression, optimise ridge valueGP: gaussian process regression, optimise noise level and RBFgammaAG: additive groves, use ”fast” script
use RMAE: relative mean absolute error
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 17 / 28
Results
RMAE on Torgo/UCI
RMAE for Torgo/UCI data
0
10
20
30
40
50
60
70
80
90
100
colorh
istog
ram
layo
ut
cooc
textur
e
colorm
omen
ts
bank
8FM
stoc
kmv
ailero
ns
elnino
elev
ator
sfri
ed
delta
_aile
rons
2dplan
es
delta
_eleva
tors
cal_ho
using
cpu_
act
cpu_
small
bank
32nh
abalon
epo
l
hous
e_8L
puma8
NH
kin8
nm
hous
e_16
H
puma3
2H
quak
e
RMT
GP
LR
AG
Figure: RMAE for Torgo/UCI datasets, sorted by the linear regression result.
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 18 / 28
Results
Build times on Torgo/UCI
Training time in seconds for Torgo/UCI data
0.1
1
10
100
1000
10000
100000
stockquake
abalone
delta_ailerons
bank32nh
bank8FM
cpu_act
cpu_small
kin8nm
puma32H
puma8NH
delta_elevators
ailerons po
l
elevators
cal_housing
house_16H
house_8L
2dplanesfried mv
layout
colorhistogram
colormoments
cooctextureelnino
RMT
GP
LR
AG
Figure: Training time in seconds for Torgo/UCI datasets, sorted by thenumber of instances in each dataset; note the use of a logarithmic y-scale.
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 19 / 28
Results
UCI Census dataset
Table: Partial results, 2458285 examples in total, therefore about 800000 inthe training fold.
Method RMAE Time (secs)LR 15.96 1205RMT 9.78 19811GP ? ? (would need 5 Tb RAM)AG ? ? (estimated 2000000)
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 20 / 28
Results
Near infrared (NIR) Datasets
proprietary NIR data
7 datasetsfrom 255 upto 7500 spectrabetween 170 and 500odd featurespreprocessed for noise and base line shift
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 21 / 28
Results
Sample NIR spectrum
Prepocessed sample spectrum (nitrogen in soil)
-2
-1
0
1
2
3
4
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155 162 169
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 22 / 28
Results
RMAE on NIR data
RMAE for NIR datasets
10
20
30
40
50
60
70
80
90
n omd rmd tc phe ph p5 na g5
RMT
GP
LR
AG
Figure: RMAE for NIR datasets, sorted by the linear regression result.
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 23 / 28
Results
Build times on NIR data
Training time in seconds for NIR data
0.1
1
10
100
1000
10000
100000
omd rmd na n tc ph phe p5 g5
RMTGPLRAG
Figure: Training time in seconds for NIR datasets, sorted by the number ofinstances in each dataset; note the use of a logarithmic y-scale.
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 24 / 28
Results
Random Model Tree Build Times discussion
complexity is O(K ∗ N ∗ logN + K 2 ∗ N)
second term (linear model computation) seems to dominatetherefore observed complexity ∼ O(K 2 ∗ N)
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 25 / 28
Summary
Outline
1 Background
2 Algorithm
3 Results
4 Summary
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 26 / 28
Summary
Conclusions
Semi-Random Model Trees perform wellThey are fast: build time is practically linear in NCan model non-linear relationships
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 27 / 28
Summary
Future Work
Improve efficiency for large KStudy more and different regression problemsMore comparisons to alternative regression schemesStreaming/Moa variant
Bernhard Pfahringer Department of Computer Science University of Waikato, New Zealand ()Semi-random model tree ensembles: an effective and scalable regression methodSeptember 22nd , 2011 28 / 28