Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006.

Download Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006.

Post on 01-Jan-2016

213 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Regression Using BoostingVishakh (vv2131@columbia.edu)

    Advanced Machine LearningFall 2006

  • IntroductionClassification with boostingWell-studiedTheoretical bounds and guaranteesEmpirically testedRegression with boostingRarely usedSome bounds and guaranteesVery little empirical testing

  • Project DescriptionStudy existing algorithms & formalismsAdaBoost.R (Fruend & Schapire, 1997)SquareLev.R (Duffy & Helmbold, 2002)SquareLev.C (Duffy & Helmbold, 2002)ExpLev (Duffy & Helmbold, 2002)Verify effectiveness by testing on interesting dataset.Football Manager 2006

  • A Few NotesWant PAC-like guaranteesCan't directly transfer processes from classificationSimply re-weighting distribution over iterations doesn't work. Can modify samples and still remain consistent with original function class.Performing gradient descent on a potential function.

  • SquareLev.RSquared error regression.Uses regression algorithm for base learner.Modifies labels, not distribution.Potential function uses variance of residuals.New label proportional to negative gradient of potential function.Each iteration, mean squared error decreases by a multiplicative factor.Can get arbitrarily small squared error as long as correlation between residuals and predictions > threshold.

  • SquareLev.CSquared error regressionUse a base classifierModifies labels and distributionPotential function uses residualsNew label sign of instance's residual

  • ExpLevAttempts to get small residuals at each point.Uses exponential potential.AdaBoost pushes all instances to positive margin.ExpLev pushes all instances to have small residualsUses base regressor ([-1,+1]) or classifier ({-1,+1}). Two-sided potential uses exponents of residuals. Base learner must perform well with relabeled instances.

  • Naive ApproachDirectly translate AdaBoost to the regression setting.Use thresholding of squared error to reweight.Use to compare test veracity of other approaches

  • DatasetData from Football Manager 2006Very popular gameStatistically drivenFeatures are player attributes.Labels are average performance ratings over a season.Predict performance levels and use learned model to guide game strategy.

  • Work so farConducted surveyStudied methods and formal guarantees and bounds.Implementation still underway.

  • ConclusionsInteresting approaches and analyses of boosting regression available.Insufficient real-world verification.Further workRegressing noisy dataFormal results for more relaxed assumptions

Recommended

View more >