regression using boosting vishakh ([email protected])[email protected] advanced machine learning...
TRANSCRIPT
Introduction
● Classification with boosting– Well-studied– Theoretical bounds and guarantees– Empirically tested
● Regression with boosting– Rarely used– Some bounds and guarantees– Very little empirical testing
Project Description
● Study existing algorithms & formalisms– AdaBoost.R (Fruend & Schapire, 1997)– SquareLev.R (Duffy & Helmbold, 2002)– SquareLev.C (Duffy & Helmbold, 2002)– ExpLev (Duffy & Helmbold, 2002)
● Verify effectiveness by testing on interesting dataset.– Football Manager 2006
A Few Notes
● Want PAC-like guarantees● Can't directly transfer processes from
classification– Simply re-weighting distribution over iterations doesn't
work. – Can modify samples and still remain consistent with
original function class.● Performing gradient descent on a potential
function.
SquareLev.R
● Squared error regression.● Uses regression algorithm for base learner.● Modifies labels, not distribution.● Potential function uses variance of residuals.● New label proportional to negative gradient of
potential function.● Each iteration, mean squared error decreases by a
multiplicative factor.● Can get arbitrarily small squared error as long as
correlation between residuals and predictions > threshold.
SquareLev.C
● Squared error regression● Use a base classifier● Modifies labels and distribution● Potential function uses residuals● New label sign of instance's residual
ExpLev
● Attempts to get small residuals at each point.● Uses exponential potential.● AdaBoost pushes all instances to positive margin.● ExpLev pushes all instances to have small
residuals● Uses base regressor ([-1,+1]) or classifier ({-
1,+1}). ● Two-sided potential uses exponents of residuals.● Base learner must perform well with relabeled
instances.
Naive Approach
● Directly translate AdaBoost to the regression setting.
● Use thresholding of squared error to reweight.● Use to compare test veracity of other approaches
Dataset
● Data from Football Manager 2006– Very popular game– Statistically driven
● Features are player attributes.● Labels are average performance ratings over a
season.● Predict performance levels and use learned model
to guide game strategy.
Work so far
● Conducted survey● Studied methods and formal guarantees and
bounds.● Implementation still underway.
Conclusions
● Interesting approaches and analyses of boosting regression available.
● Insufficient real-world verification.● Further work
– Regressing noisy data– Formal results for more relaxed assumptions