leveraging bagging for evolving data streams
DESCRIPTION
This talk presents the new Leveraging Bagging for evolving data streams.TRANSCRIPT
![Page 1: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/1.jpg)
Leveraging Bagging for Evolving Data Streams
Albert Bifet, Geoff Holmes, and Bernhard Pfahringer
University of WaikatoHamilton, New Zealand
Barcelona, 21 September 2010ECML PKDD 2010
![Page 2: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/2.jpg)
Mining Data Streams with Concept Drift
Extract information frompotentially infinite sequence of datapossibly varying over timeusing few resources
Adaptively:no prior knowledge of type or rate of change
2 / 32
![Page 3: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/3.jpg)
Mining Data Streams with Concept Drift
Extract information frompotentially infinite sequence of datapossibly varying over timeusing few resources
Leveraging BaggingNew improvements for adaptive bagging methods using
input randomizationoutput randomization
2 / 32
![Page 4: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/4.jpg)
Outline
1 Data stream constraints
2 Leveraging Bagging for Evolving Data Streams
3 Empirical evaluation
3 / 32
![Page 5: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/5.jpg)
Outline
1 Data stream constraints
2 Leveraging Bagging for Evolving Data Streams
3 Empirical evaluation
4 / 32
![Page 6: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/6.jpg)
Mining Massive Data
Eric Schmidt, August 2010Every two days now we create as much information as we didfrom the dawn of civilization up until 2003.
5 exabytes of data
5 / 32
![Page 7: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/7.jpg)
Data stream classification cycle
1 Process an example at atime, and inspect it onlyonce (at most)
2 Use a limited amount ofmemory
3 Work in a limited amountof time
4 Be ready to predict atany point
6 / 32
![Page 8: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/8.jpg)
Mining Massive Data
Koichi KawanaSimplicity means the achievement of maximum effect withminimum means.
time
accuracy
memory
Data Streams
7 / 32
![Page 9: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/9.jpg)
Evaluation Example
Accuracy Time MemoryClassifier A 70% 100 20Classifier B 80% 20 40
Which classifier is performing better?
8 / 32
![Page 10: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/10.jpg)
RAM-Hours
RAM-HourEvery GB of RAM deployed for 1 hour
Cloud Computing Rental Cost Options
9 / 32
![Page 11: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/11.jpg)
Evaluation Example
Accuracy Time Memory RAM-HoursClassifier A 70% 100 20 2,000Classifier B 80% 20 40 800
Which classifier is performing better?
10 / 32
![Page 12: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/12.jpg)
Outline
1 Data stream constraints
2 Leveraging Bagging for Evolving Data Streams
3 Empirical evaluation
11 / 32
![Page 13: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/13.jpg)
Hoeffding TreesHoeffding Tree : VFDT
Pedro Domingos and Geoff Hulten.Mining high-speed data streams. 2000
With high probability, constructs an identical model that atraditional (greedy) method would learnWith theoretical guarantees on the error rate
Time
Contains “Money”
YESYes
NONo
Day
YES
Night
12 / 32
![Page 14: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/14.jpg)
Hoeffding Naive Bayes Tree
Hoeffding TreeMajority Class learner at leaves
Hoeffding Naive Bayes Tree
G. Holmes, R. Kirkby, and B. Pfahringer.Stress-testing Hoeffding trees, 2005.
monitors accuracy of a Majority Class learnermonitors accuracy of a Naive Bayes learnerpredicts using the most accurate method
13 / 32
![Page 15: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/15.jpg)
Bagging
Figure: Poisson(1) Distribution.
Bagging builds a set of M base models, with a bootstrapsample created by drawing random samples withreplacement.
14 / 32
![Page 16: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/16.jpg)
Bagging
Figure: Poisson(1) Distribution.
Each base model’s training set contains each of the originaltraining example K times where P(K = k) follows a binomialdistribution.
14 / 32
![Page 17: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/17.jpg)
Oza and Russell’s Online Bagging for M models
1: Initialize base models hm for all m ∈ {1,2, ...,M}2: for all training examples do3: for m = 1,2, ...,M do4: Set w = Poisson(1)5: Update hm with the current example with weight w
6: anytime output:7: return hypothesis: hfin(x) = argmaxy∈Y ∑
Tt=1 I(ht(x) = y)
15 / 32
![Page 18: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/18.jpg)
ADWIN Bagging (KDD’09)
ADWIN
An adaptive sliding window whose size is recomputed onlineaccording to the rate of change observed.
ADWIN has rigorous guarantees (theorems)On ratio of false positives and negativesOn the relation of the size of the current window andchange rates
ADWIN BaggingWhen a change is detected, the worst classifier is removed anda new classifier is added.
16 / 32
![Page 19: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/19.jpg)
ADWIN Bagging for M models
1: Initialize base models hm for all m ∈ {1,2, ...,M}2: for all training examples do3: for m = 1,2, ...,M do4: Set w = Poisson(1)5: Update hm with the current example with weight w6: if ADWIN detects change in error of one of the
classifiers then7: Replace classifier with higher error with a new one
8: anytime output:9: return hypothesis: hfin(x) = argmaxy∈Y ∑
Tt=1 I(ht(x) = y)
17 / 32
![Page 20: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/20.jpg)
Leveraging Bagging for EvolvingData Streams
Randomization as a powerful tool to increase accuracy anddiversity
There are three ways of using randomization:Manipulating the input dataManipulating the classifier algorithmsManipulating the output targets
18 / 32
![Page 21: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/21.jpg)
Input Randomization
0,00
0,05
0,10
0,15
0,20
0,25
0,30
0,35
0,40
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
k
P(X
=k
) λ=1
λ=6
λ=10
Figure: Poisson Distribution.19 / 32
![Page 22: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/22.jpg)
ECOC Output Randomization
Table: Example matrix of random output codes for 3 classes and 6classifiers
Class 1 Class 2 Class 3Classifier 1 0 0 1Classifier 2 0 1 1Classifier 3 1 0 0Classifier 4 1 1 0Classifier 5 1 0 1Classifier 6 0 1 0
20 / 32
![Page 23: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/23.jpg)
Leveraging Bagging for Evolving Data Streams
Leveraging BaggingUsing Poisson(λ )
Leveraging Bagging MCUsing Poisson(λ ) and Random Output Codes
Fast Leveraging Bagging MEif an instance is misclassified: weight = 1if not: weight = eT/(1−eT ),
21 / 32
![Page 24: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/24.jpg)
Input Randomization
Baggingresampling with replacement using Poisson(1)
Other Strategiessubagging
resampling without replacementhalf subagging
resampling without replacement half of the instancesbagging without taking out any instance
using 1+Poisson(1)
22 / 32
![Page 25: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/25.jpg)
Leveraging Bagging for Evolving Data Streams
1: Initialize base models hm for all m ∈ {1,2, ...,M}2:3: for all training examples (x ,y) do4: for m = 1,2, ...,M do5: Set w = Poisson(λ )6: Update hm with the current example with weight w
7: if ADWIN detects change in error of one of the classifiersthen
8: Replace classifier with higher error with a new one
9: anytime output:10: return hfin(x) = argmaxy∈Y ∑
Tt=1 I(ht(x) = y)
23 / 32
![Page 26: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/26.jpg)
Leveraging Bagging for Evolving Data Streams MC
1: Initialize base models hm for all m ∈ {1,2, ...,M}2: Compute coloring µm(y)3: for all training examples (x ,y) do4: for m = 1,2, ...,M do5: Set w = Poisson(λ )6: Update hm with the current example with weight w and
class µm(y)7: if ADWIN detects change in error of one of the classifiers
then8: Replace classifier with higher error with a new one
9: anytime output:10: return hfin(x) = argmaxy∈Y ∑
Tt=1 I(ht(x) = µt(y))
23 / 32
![Page 27: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/27.jpg)
Leveraging Bagging for Evolving Data Streams ME
1: Initialize base models hm for all m ∈ {1,2, ...,M}2:3: for all training examples (x ,y) do4: for m = 1,2, ...,M do5: Set w = 1 if misclassified, otherwise eT/(1−eT )6: Update hm with the current example with weight w
7: if ADWIN detects change in error of one of the classifiersthen
8: Replace classifier with higher error with a new one
9: anytime output:10: return hfin(x) = argmaxy∈Y ∑
Tt=1 I(ht(x) = y)
23 / 32
![Page 28: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/28.jpg)
Outline
1 Data stream constraints
2 Leveraging Bagging for Evolving Data Streams
3 Empirical evaluation
24 / 32
![Page 29: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/29.jpg)
What is MOA?
{M}assive {O}nline {A}nalysis is a framework for mining datastreams.
Based on experience with Weka and VFMLFocussed on classification trees, but lots of activedevelopment: clustering, item set and sequence mining,regressionEasy to extendEasy to design and run experiments
25 / 32
![Page 30: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/30.jpg)
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
26 / 32
![Page 31: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/31.jpg)
Leveraging Bagging Empirical evaluation
Accuracy
75
77
79
81
83
85
87
89
91
93
95
1000
0
8000
0
1500
00
2200
00
2900
00
3600
00
4300
00
5000
00
5700
00
6400
00
7100
00
7800
00
8500
00
9200
00
9900
00
Instances
Ac
cu
rac
y (
%)
Leveraging Bagging
Leveraging Bagging MC
ADWIN Bagging
Online Bagging
Figure: Accuracy on dataset SEA with three concept drifts.
27 / 32
![Page 32: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/32.jpg)
Empirical evaluationAccuracy RAM-Hours
Hoeffding Tree 74.03% 0.01Online Bagging 77.15% 2.98ADWIN Bagging 79.24% 1.48ADWIN Half Subagging 78.36% 1.04ADWIN Subagging 78.68% 1.13ADWIN Bagging WT 81.49% 2.74
ADWIN Bagging Strategieshalf subagging
resampling without replacement half of the instancessubagging
resampling without replacementWT: bagging without taking out any instance
using 1+Poisson(1)
28 / 32
![Page 33: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/33.jpg)
Empirical evaluationAccuracy RAM-Hours
Hoeffding Tree 74.03% 0.01Online Bagging 77.15% 2.98ADWIN Bagging 79.24% 1.48Leveraging Bagging 85.54% 20.17Leveraging Bagging MC 85.37% 22.04Leveraging Bagging ME 80.77% 0.87
Leveraging BaggingLeveraging Bagging
Using Poisson(λ )Leveraging Bagging MC
Using Poisson(λ ) and Random Output CodesLeveraging Bagging ME
Using weight 1 if misclassified, otherwise eT /(1−eT )
29 / 32
![Page 34: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/34.jpg)
Empirical evaluation
Accuracy RAM-HoursHoeffding Tree 74.03% 0.01Online Bagging 77.15% 2.98ADWIN Bagging 79.24% 1.48Random Forest Leveraging Bagging 80.69% 5.51Random Forest Online Bagging 72.91% 1.30Random Forest ADWIN Bagging 74.24% 0.89
Random Foreststhe input training set is obtained by sampling withreplacementthe nodes of the tree use only
√(n) random attributes to
splitwe only keep statistics of these attributes.
30 / 32
![Page 35: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/35.jpg)
Leveraging Bagging Diversity
0
0,02
0,04
0,06
0,08
0,1
0,12
0,14
0,16
0,82 0,84 0,86 0,88 0,9 0,92 0,94 0,96
Kappa Statistic
Err
or
Leveraging Bagging
Online Bagging
Figure: Kappa-Error diagrams for Leveraging Bagging and Onlinebagging (bottom) on on the SEA data with three concept drifts,plotting 576 pairs of classifiers.
31 / 32
![Page 36: Leveraging Bagging for Evolving Data Streams](https://reader034.vdocuments.us/reader034/viewer/2022042814/554c83b2b4c905df3c8b500c/html5/thumbnails/36.jpg)
Summary
http://moa.cs.waikato.ac.nz/
ConclusionsNew improvements for bagging methods using inputrandomization
Improving Accuracy: Using Poisson(λ )Improving RAM-Hours: Using weight 1 if misclassified,otherwise eT /(1−eT )
New improvements for bagging methods using outputrandomization
No need for multi-class classifiers
32 / 32