predicting an mvp - samsi · 2016-06-21 · 4th: russell westbrook 5th: kevin durant james harden:...

24
Predicting an MVP Brian King, Derek Zhang, Juleen Graham, Erin Henning, Ryan Haney

Upload: others

Post on 13-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

Predicting an MVPBrian King, Derek Zhang, Juleen Graham, Erin Henning, Ryan Haney

Page 2: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

How is an MVP selected?◼ From 1979-1995, NBA players voted for the MVP

◼ 1995-2010, votes strictly from a panel of sportswriters and broadcasters - Votes from US and CA, each of whom casted a vote for 1st through 5th place selections

◼ 2010- One ballot is cast by fan votes from online

Page 3: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace
Page 4: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace
Page 6: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

Trends?◼ What caused a change in trend from

Centers/Forwards to Guards/Forwards?

Page 7: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

Questions◼ What are the most important statistical criteria for

choosing an MVP?

◼ Can we create a model to predict the probability of an individual winning the MVP award?

Page 8: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

Procedures◼ Data from the 1991-1992 season to 2015-2016

- Top 150 players for each season that had the most playing time

◼ Logistic Regression Model

◼ Used the data from 1991-1992 to 2012-2013 seasons to fit the model

◼ Predicted on 2013-2014 to 2015-2016

◼ Compare “order” of prediction to true voting order

Page 9: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

The Logistic Regression Model

Where Xi = Predictor variable

Assumptions● Binary Response variable (MVP or not)● Continuous, Independent Explanatory variables

Page 10: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

The Variables ◼ Points Per Game

◼ Blocks, Steals, Assists, Rebounds

◼ Effective Field Goal Percentage

◼ Position

◼ Personal Fouls, Age, Minutes Played, Turnovers, ...

Page 11: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

2013-14 Season: All StatsActual

MVP: Kevin Durant2nd: LeBron James3rd: Blake Griffin4th: Joakim Noah5th: James Harden

Kevin Love: 11thStephen Curry: 6thLaMarcus Aldridge: 10th

Prediction

MVP: Kevin Love2nd: LeBron James3rd: Kevin Durant4th: Stephen Curry5th:LaMarcus Aldridge

Blake Griffin: 12thJoakim Noah: 31stJames Harden: 8th

Page 12: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

2013-14 Season: MVStatsPrediction

MVP: Kevin Durant2nd: LeBron James3rd: Kevin Love4th: Stephen Curry5th: Chris Paul

Blake Griffin: 7thJoakim Noah: 23rdJames Harden: 9th

Actual

MVP: Kevin Durant2nd: LeBron James3rd: Blake Griffin4th: Joakim Noah5th: James Harden

Kevin Love: 11thStephen Curry: 6thChris Paul: 7th

Page 13: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

2014-15 SeasonPrediction

MVP: Russell Westbrook2nd: LeBron James3rd: Chris Paul4th: James Harden5th: Stephen Curry

Anthony Davis: 10th

Actual

MVP: Stephen Curry2nd: James Harden3rd: LeBron James4th: Russell Westbrook5th: Anthony Davis

Chris Paul: 6th

Page 14: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

2015-16 SeasonPrediction

MVP: Stephen Curry2nd: Russell Westbrook3rd: LeBron James4th: Kevin Durant5th: James Harden

Kawhi Leonard: 26th

Actual

MVP: Stephen Curry2nd: Kawhi Leonard3rd: LeBron James4th: Russell Westbrook5th: Kevin Durant

James Harden: 9th

Page 15: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

Random Forests

◼ Decision Tree Learning◼ Bootstrap Aggregating ◼ Random Subspace Method

Page 16: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

Decision Tree Learning

Pts<x

Pts>x000000000

Assists Per Game

Assists<x

000100001

Assists>x

Rebounds Per Game

Points Per Game

Assists Per Game

Rebounds Per Game 2

Algorithm chooses variable at each step that best splits the data into successes and failures

Page 17: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

Bootstrap Aggregating

◼ random forest consists of b= 1, …, B randomized tree models

◼ each model (tree) is built with a bootstrap sample of the original data (sample of the original data of same size with replacement)

◼ training many trees on the same data set leads to problems (possibly recreating the same tree)

◼ averaging the predictions from all the individual regression trees leads to better performance

Page 18: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

Random Forest Interpretation

◼ samples not included in any given bootstrap sample are called “out-of-bag” samples

◼ %IncMSE “=” how much worse the predictions are when a permuted version of the variable is used instead of the true values◼ Build tree, make predictions using “real” data

values, record the error (MSE) of this◼ Permute values of variable in the out-of-bag

sample, re-do predictions, recompute MSE ◼ %IncMSE is how much the error increases

for the permuted samples vs the true samples

Page 19: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

Most Important MVP VariablesAccording to the Random Forest method:

% Increase MSE

PPG 0.0020350589APG 0.0010963324MPG 0.0010791207SPG 0.0010197867PFPG 0.0007518895TPG 0.0007515411eFG. 0.0007482838BPG 0.0006166867Age 0.0002998612RPG 0.0001863932POS 0.0001672975

According to Logistic Regression:

Z-score (absolute value)

PPG 5.167RPG 3.395PFPG 3.08 APG 2.58Age 2.291eFG. 2.104BPG 1.566POS 1.459MPG 0.62TPG 0.314SPG 0.128

Page 20: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

Drawbacks to our models

◼ Only one MVP can be crowned every year◼ Predictions using our models assume that the

response variable (MVP or not) is independent between players

◼ As a result, all probabilities do not sum to 1◼ Our models can rank players in likelihood of winning

MVP, but cannot give explicit probabilities

Page 21: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

Conclusions

◼ The most important variables are:◼ Points Per Game◼ Assists Per Game◼ Rebounds Per Game

◼ The least important variables include:◼ Blocks Per Game◼ Steals Per Game

◼ The problem with defensive production◼ MVP Voting: Stat-Driven, but not completely

◼ Steve Nash, 2005

Page 22: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

Future Work

◼ Further research into possible interaction between variables

◼ Better interpretability of logistic regression predictions◼ Impact of team on MVP prospects◼ Change in MVP selection criteria over the years◼ Changes in rules over the years◼ Growing data set and possible outcomes

Page 23: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

Thanks!

Page 24: Predicting an MVP - SAMSI · 2016-06-21 · 4th: Russell Westbrook 5th: Kevin Durant James Harden: 9th. Random Forests Decision Tree Learning Bootstrap Aggregating Random Subspace

Questions?