lecture06 - sl - ensemble29 lecture 6: sl -ensemble dr. patrick chan @ scut three key factors:...

22
Machine Learning Lecture 6 Supervised Learning Ensemble Dr. Patrick Chan [email protected] South China University of Technology, China 1 Dr. Patrick Chan @ SCUT Agenda Why Ensemble Fusion Diversity Construction Method Lecture 6: SL - Ensemble 2

Upload: others

Post on 25-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Machine Learning

Lecture 6

Supervised Learning

Ensemble

Dr. Patrick [email protected]

South China University of Technology, China

1

Dr. Patrick Chan @ SCUT

Agenda

Why Ensemble

Fusion

Diversity

Construction Method

Lecture 6: SL - Ensemble2

Page 2: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Why Ensemble?

How to choose the best model for a classification problem?

Trial and Error

Train many classifiers with different settings

Evaluate how good they are

Lecture 6: SL - Ensemble3

Dr. Patrick Chan @ SCUT

10% Training90% Testing

Why Ensemble?

Banana Artificial Dataset

2 class problem

2 features and 1000 samples

Select for training Reserve for evaluation

Lecture 6: SL - Ensemble4

Page 3: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Why Ensemble?

Assume a simple MLPNN with one hidden layer is used

No idea how many hidden neurons should be used

Many 3-layer MLPNNs with different settingsare trained

Lecture 6: SL - Ensemble5

Dr. Patrick Chan @ SCUT

Why Ensemble?

Testing Error

Training Error

No of Hidden Neuron

Lecture 6: SL - Ensemble6

Page 4: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

More MLPNNsWhy Ensemble?

Lecture 6: SL - Ensemble7

Dr. Patrick Chan @ SCUT

Why Ensemble?

How to choose the best classifier? Selection Criteria

Training Accuracy

Many choices

#2, #3..

Training Accuracy+Complexity

Classifier with smallest number of hidden neurons and lowest training accuracy?

#6?

Which criterion is the best?

#

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

HN

9

9

13

8

14

6

11

9

13

2

10

2

12

15

9

12

10

10

Training

Error

0.020

0.010

0.010

0.010

0.010

0.010

0.010

0.010

0.010

0.140

0.010

0.160

0.010

0.010

0.010

0.010

0.010

0.010

Test

Error

0.051

0.040

0.021

0.029

0.041

0.039

0.040

0.024

0.021

0.161

0.018

0.152

0.018

0.019

0.029

0.023

0.027

0.021

Best

Worst

Lecture 6: SL - Ensemble8

Page 5: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Why Ensemble?

How about combine all of them?

Training Error = 0.0100

Testing Error = 0.0167

Its performance is better

than the best individual

classifier (0.018)

But no guarantee!

Lecture 6: SL - Ensemble9

Dr. Patrick Chan @ SCUT

Why Ensemble?

Drawbacks of selecting the BEST

Selecting a wrong one definitely leads to erroneous result

The “best” classifier is not necessarily the idealchoice

Different classifiers may contain different valuable information

Potentially valuable information may be lost by discarding results of less-successful classifiers

A single classifier may not be adequate to handle today’s increasingly complex problems

Lecture 6: SL - Ensemble10

Page 6: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Ensemble

Sometimes called Multiple Classifier System (MCS)

Consists of a set of individual classifiers united by a fusion method

Single

Classifier

x

f

result

x

f1 f2 fL…

Fusion

result

Ensemble

Classifier

Base classifier /

Individual classifier

Fusion Method

May depend on X

Lecture 6: SL - Ensemble11

Dr. Patrick Chan @ SCUT

Ensemble

1. Sample x is fed into

each base classifier

2. Each base classifier makes it’s own decision

3. Final decision is made by combining all individual decisions

x

f1 f2 fL…

Fusion

Final Result

Lecture 6: SL - Ensemble12

Page 7: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Ensemble

Ensemble must be better than Single?

All cases: NO!

But practically, yes for many cases

Lecture 6: SL - Ensemble13

Dr. Patrick Chan @ SCUT

Ensemble

Three factors affecting the performance (accuracy) of ensemble:

Accuracy of base classifiers

How good are the base classifiers?

Fusion Method

How to combine classifiers?

Diversity among base classifiers

How different are the decisions reached by the classifiers?

Lecture 6: SL - Ensemble14

Page 8: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Three Key Factors

Accuracy of Base Classifiers

Performance of a base classifier is affected by

Training Dataset (sample and feature)

Learning Model (type of classifier)

Parameters (e.g. neuron and layer # in a NN)

If base classifiers are poor, ensemble cannot be good

But we still can hope it will be better than base classifiers

Lecture 6: SL - Ensemble15

Dr. Patrick Chan @ SCUT

Three Key Factors

Fusion Method

A method to arrive at a group decision

Two categories based on classifier output:

Label output

Output is a class ID

E.g. [1 0 0]x is Class 1

Continuous-valued output

Output a real value (probability) for each class

E.g. [0.7 0.1 0.2]x is Class 1 is 0.7,

Class 2 is 0.1, Class 3 is 0.2

Lecture 6: SL - Ensemble16

Page 9: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method

Decision Profile

Decision Profile (D) of a number of base classifier fi (i = 1…L)

c : the number of classes

L : the number of base classifiers

yi,j : the output of ith classifier on jth class

Column: Outputs of all base classifier on a class

Row: Outputs of a base classifier on all classes

Lecture 6: SL - Ensemble17

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method

Label Output

Label output of a base classifier can be represented by one-hot 1 indicate the class x belongs to

Other classes are 0

For Example: 3-class problem

4 base classifiers and their decisions are class 2, class 3, class 1 and class 2

Column: a class

Row: a classifier

Lecture 6: SL - Ensemble18

Page 10: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method

Label Output

Majority Vote

Also called the Plurality

The class with the most votes

For example:

Column: a class

Row: a classifier

y1 y2 y3

Class 2 is the majority

1 2 1

Lecture 6: SL - Ensemble19

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method

Label Output

Simple Majority

A class has 50% + 1 votes ( )

More strictive than Majority Vote

Many unknown cases

Unanimity

All base classifiers have the same decision

Many unknown cases

Lecture 6: SL - Ensemble20

No decision

No decision

Page 11: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method

Label Output

Voting method assumes each base classifier has same classification ability

However, in most cases, this is not true

Weighted Majority Vote

Assign a weight (wi) to the ith base classifier

based on its ability

A large w indicates more accurate

E.g. Evaluated by accuracy on Training Set

The class is yk if

Lecture 6: SL - Ensemble21

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method

Label Output

Example: 3 classes, 5 base classifiers

3 1 1

Unanimity

Simple Majority

Majority Vote

Weighted Majority Vote

? (all votes)

(votes > 50%)y1

y1

y2

(most votes)

Class 1

Class 2

Class 3

0.1 1 + 0.1 1 + 0.2 1 + 0.4 0 + 0.2 0 = 0.4

0.1 0 + 0.1 0 + 0.2 0 + 0.4 0 + 0.2 1 = 0.1

0.1 0 + 0.1 0 + 0.2 0 + 0.4 1 + 0.2 0 = 0.5

Lecture 6: SL - Ensemble22

Page 12: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method

Continuous-valued Output

Base classifier outputs a real value (not a label) for each class

The values in D are a real number

For Example: 3-class problem

4 base classifiers

Based on D, fusion of Continuous-valued Output calculated a real value for each class ( , j = 1..c)

Lecture 6: SL - Ensemble23

Column: a class

Row: a classifier

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method

Continuous-valued Output

Statistical Operator

Product

Minimum

Simple Mean

Median

Maximum

Trimmed Mean Values are sorted and K

percent of the values are dropped on each side

Find the mean of remaining values

� �,�

���

� �,�

���

��

�,�

��

�,���

�,�

Lecture 6: SL - Ensemble24

Page 13: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method

Continuous-valued Output

Weighted Average

L Weights

One weight per base classifier

c L Weights

Weights are specific for each class per base classifier

� � �,�

���

� �� �,�

���

Lecture 6: SL - Ensemble25

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method

Continuous-valued Output

Example:

3 classes

5 base classifiers

Product

Median

Maximum

Minimum

Average

Trim 20% Average

0.05 0.01 0.00

0.5 0.4 0.6

0.7 0.8 0.7

0.5 0.2 0.1

0.56 0.46 0.42

0.53 0.43 0.43

y1

y3

y2

y1

y1

y1Lecture 6: SL - Ensemble26

Page 14: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method

Continuous-valued Output

Example:

3 classes

5 base classifiers

0.4 0.6 + 0.2 0.7 + 0.1 0.5 + 0.1 0.5 + 0.2 0.5 = 0.12

0.4 0.4 + 0.2 0.2 + 0.1 0.2 + 0.1 0.7 + 0.2 0.8 = 0.09

0.4 0.1 + 0.2 0.7 + 0.1 0.1 + 0.1 0.6 + 0.2 0.6 = 0.07

Class 1

Class 2

Class 3

Weight Average

L Weight

y1

Lecture 6: SL - Ensemble27

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method

Continuous-valued Output

Example:

3 classes

5 base classifiers

0.1 0.6 + 0.1 0.7 + 0.2 0.5 + 0.4 0.5 + 0.2 0.5 = 0.11

0.2 0.4 + 0.1 0.2 + 0.1 0.2 + 0.2 0.7 + 0.4 0.8 = 0.12

0.2 0.1 + 0.4 0.7 + 0.1 0.1 + 0.1 0.6 + 0.2 0.6 = 0.10

Class 1

Class 2

Class 3

Weight Average

L x c Weight

y2

Lecture 6: SL - Ensemble28

Page 15: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method

Diversity

If all base classifiers always have the same decision, no need to consider all of them

Diversity is a measure of difference between base classifiers

An intuitive, key concept for ensemble

Many definitions

Can be categorized according to output type: Label and Continuous-valued Output

Lecture 6: SL - Ensemble29

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method: Diversity

Label Output

Pairwise MethodConsider two base classifiers Di and D

k

There are four different possibilities:

N11

N01

N10

N00

Dicorrect

Diwrong

Dkcorrect D

kwrong

N = N00 + N01 + N10 + N11

N11 : Number of times when two base classifiers are correctN10 : Number of times when a classifier is correct and another is wrongN01 : Number of times when a classifier is wrong and another is correctN00 : Number of times when two base classifiers are wrongN : Total Number of times

Lecture 6: SL - Ensemble30

Page 16: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method: Diversity

Label Output

Disagreement Measure

Probability two classifiers disagree each other

Range: 0 – 1 (most diverse, totally disagree)

Double Fault Measure

Probability two classifiers being wrong together

Range: 0 – 1 (most diverse, both wrong all the time)

N01 + N10

N

N00

N

Lecture 6: SL - Ensemble31

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method: Diversity

Label Output

Ensemble contains L(L-1)/2 pairs of base

classifiers

The diversity measure is the mean of pairwise diversity values of those pairs

Lecture 6: SL - Ensemble32

Page 17: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method: Diversity

Label Output

Measure of “difficulty”

Let d be {0/L, 1/L, . . . , L/L} denoting proportion of base classifiers which correctly classify the training set

Diversity is measured by var(d)

Higher value, worse ensemble (given accuracy of base classifier is the same)

For example: 100 samples, 7 base classifier, 60% accuracy for all 7 classifiers

60

_0

7

_7

7

_2

7

_3

7

_4

7

_5

7

_6

7

_1

7

40

All 7 classifiers are identical

_0

7

_7

7

_2

7

_3

7

_4

7

_5

7

_6

7

_1

7

81

18

1

7 classifiers recognize different

subsets of 100 points

Lecture 6: SL - Ensemble33

7 classifiers are Independent

_0 _7_2 _3 _4 _5 _6_1

29

28

1926

133

7 77 7 7 7 77

Var(d)

= 0.031

Var(d)

= 0.240

Var(d)

= 0.004

majority

vote

border

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method: Diversity

Continuous-valued Output

Correlation Coefficient (CC)

CC between two classifiers’ outputs (pairwise)

Diversity is the average of CCs of L(L-1)/2 pairs

1: not diverse (identical)

0: independent

-1: the most diverse

Definition: � �� �

�� �

: the ith classifiers’ outputs

�and

�: mean and standard deviation of

the ith classifiers’ output on all samples

Lecture 6: SL - Ensemble34

Page 18: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Three Key Factors: Fusion Method

Diversity

How to make base classifiers diversify?

Implicit Method

Using different Training Sets

Samples

Features

Using different Base Classifiers

Learning Models

Training Parameters

Explicit Method

Maximize diversity during training

Lecture 6: SL - Ensemble35

Dr. Patrick Chan @ SCUT

Construction Method

The most well known ensemble construction methods:

Bagging

Boosting

Random Forest

Lecture 6: SL - Ensemble36

Page 19: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Construction Method

Bagging

Bagging = Bootstrap Aggregating

Use bootstrapping to generate L training sets

Draw L training sets randomly with replacement

Train one base classifier with each training set

Majority Voting is used as fusion

Algorithm

Random a portion of full training set with replacement

Train a base classifier using that portion

Repeat until L number of base classifiers have been trained

Finally, voting is used to combine these L base classifiers

Lecture 6: SL - Ensemble37

Dr. Patrick Chan @ SCUT

Construction Method

Bagging

Advantage:

Simple, Easy to understand

Good for unstable classifier

If small changes in the training set causes large difference in the generated classifier

The algorithm has high variance

E.g. Decision Tree, MLPNN

Disadvantage:

Generating complementary base classifiers is left to chance

Lecture 6: SL - Ensemble38

Page 20: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Construction Method

Boosting

Actively generate complementary base classifiers

Train the next base classifier based on mistakes made by previous classifiers

Adaptive Boosting (AdaBoost)

Generate a sequence of base classifiers each focusing on previous one’s errors

Base classifier is trained by minimizing the weighted error

A larger weight is assigned to samples classified wrongly

Weighted average is used as fusion method

Lecture 6: SL - Ensemble39

Dr. Patrick Chan @ SCUT

Construction Method

Boosting: Adaboost

Initialize the weight on all samples ( )

For t = 1, …, L (each classifier)

Train a classifier that minimizes the weighted error ( )

Calculate the weight in fusion

Update the weight on a sample

Lecture 6: SL - Ensemble40

(�)

(�)

���

��(�)

Classification Error (0 / 1)

accuracy higher than 50%,

the weight is positive

is normalization factor that ensures the sum

of all instance weights is equal to 1

Label output

1

0

−α < 1, exp < 1

α > 1, exp > 1

Page 21: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Construction Method

Boosting: Adaboost

L trained based classifiers are combined by weighted average

Lecture 6: SL - Ensemble41

Dr. Patrick Chan @ SCUT

Construction Method

Random Forest

Random forest (by Tin Kam Ho in 1995) is an ensemble classifier that consists of many decision trees

Majority Vote is used as a fusion

Lecture 6: SL - Ensemble42

Page 22: Lecture06 - SL - Ensemble29 Lecture 6: SL -Ensemble Dr. Patrick Chan @ SCUT Three Key Factors: Fusion Method: Diversity Label Output Pairwise Method Consider two base classifiers Diand

Dr. Patrick Chan @ SCUT

Construction Method

Random Forest

Each tree is constructed using the following algorithm:

Training Set:

Bootstrap a set with N samples from the original

training set choose a sample N times with replacement

For each node

Randomly choose M from m features

Calculate the best split based on these M features in

the training set

A tree is fully grown and not pruned

Lecture 6: SL - Ensemble43