# Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006.

Post on 01-Jan-2016

213 views

Category:

## Documents

Embed Size (px)

TRANSCRIPT

• Regression Using BoostingVishakh (vv2131@columbia.edu)

• IntroductionClassification with boostingWell-studiedTheoretical bounds and guaranteesEmpirically testedRegression with boostingRarely usedSome bounds and guaranteesVery little empirical testing

• Project DescriptionStudy existing algorithms & formalismsAdaBoost.R (Fruend & Schapire, 1997)SquareLev.R (Duffy & Helmbold, 2002)SquareLev.C (Duffy & Helmbold, 2002)ExpLev (Duffy & Helmbold, 2002)Verify effectiveness by testing on interesting dataset.Football Manager 2006

• A Few NotesWant PAC-like guaranteesCan't directly transfer processes from classificationSimply re-weighting distribution over iterations doesn't work. Can modify samples and still remain consistent with original function class.Performing gradient descent on a potential function.

• SquareLev.RSquared error regression.Uses regression algorithm for base learner.Modifies labels, not distribution.Potential function uses variance of residuals.New label proportional to negative gradient of potential function.Each iteration, mean squared error decreases by a multiplicative factor.Can get arbitrarily small squared error as long as correlation between residuals and predictions > threshold.

• SquareLev.CSquared error regressionUse a base classifierModifies labels and distributionPotential function uses residualsNew label sign of instance's residual

• ExpLevAttempts to get small residuals at each point.Uses exponential potential.AdaBoost pushes all instances to positive margin.ExpLev pushes all instances to have small residualsUses base regressor ([-1,+1]) or classifier ({-1,+1}). Two-sided potential uses exponents of residuals. Base learner must perform well with relabeled instances.

• Naive ApproachDirectly translate AdaBoost to the regression setting.Use thresholding of squared error to reweight.Use to compare test veracity of other approaches

• DatasetData from Football Manager 2006Very popular gameStatistically drivenFeatures are player attributes.Labels are average performance ratings over a season.Predict performance levels and use learned model to guide game strategy.

• Work so farConducted surveyStudied methods and formal guarantees and bounds.Implementation still underway.

• ConclusionsInteresting approaches and analyses of boosting regression available.Insufficient real-world verification.Further workRegressing noisy dataFormal results for more relaxed assumptions