Transcript
Page 1: Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006

Regression Using Boosting

Vishakh ([email protected])

Advanced Machine LearningFall 2006

Page 2: Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006

Introduction

● Classification with boosting– Well-studied– Theoretical bounds and guarantees– Empirically tested

● Regression with boosting– Rarely used– Some bounds and guarantees– Very little empirical testing

Page 3: Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006

Project Description

● Study existing algorithms & formalisms– AdaBoost.R (Fruend & Schapire, 1997)– SquareLev.R (Duffy & Helmbold, 2002)– SquareLev.C (Duffy & Helmbold, 2002)– ExpLev (Duffy & Helmbold, 2002)

● Verify effectiveness by testing on interesting dataset.– Football Manager 2006

Page 4: Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006

A Few Notes

● Want PAC-like guarantees● Can't directly transfer processes from

classification– Simply re-weighting distribution over iterations doesn't

work. – Can modify samples and still remain consistent with

original function class.● Performing gradient descent on a potential

function.

Page 5: Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006

SquareLev.R

● Squared error regression.● Uses regression algorithm for base learner.● Modifies labels, not distribution.● Potential function uses variance of residuals.● New label proportional to negative gradient of

potential function.● Each iteration, mean squared error decreases by a

multiplicative factor.● Can get arbitrarily small squared error as long as

correlation between residuals and predictions > threshold.

Page 6: Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006
Page 7: Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006

SquareLev.C

● Squared error regression● Use a base classifier● Modifies labels and distribution● Potential function uses residuals● New label sign of instance's residual

Page 8: Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006

ExpLev

● Attempts to get small residuals at each point.● Uses exponential potential.● AdaBoost pushes all instances to positive margin.● ExpLev pushes all instances to have small

residuals● Uses base regressor ([-1,+1]) or classifier ({-

1,+1}). ● Two-sided potential uses exponents of residuals.● Base learner must perform well with relabeled

instances.

Page 9: Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006
Page 10: Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006

Naive Approach

● Directly translate AdaBoost to the regression setting.

● Use thresholding of squared error to reweight.● Use to compare test veracity of other approaches

Page 11: Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006

Dataset

● Data from Football Manager 2006– Very popular game– Statistically driven

● Features are player attributes.● Labels are average performance ratings over a

season.● Predict performance levels and use learned model

to guide game strategy.

Page 12: Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006
Page 13: Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006

Work so far

● Conducted survey● Studied methods and formal guarantees and

bounds.● Implementation still underway.

Page 14: Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006

Conclusions

● Interesting approaches and analyses of boosting regression available.

● Insufficient real-world verification.● Further work

– Regressing noisy data– Formal results for more relaxed assumptions


Top Related