simulating a basketball match with a homogeneous markov model and forecasting the outcome
DESCRIPTION
We used a possession-based Markov model to model the progression of a basketballmatch. The model’s transition matrix was estimated directly from NBA play-by-play dataand indirectly from the teams’ summary statistics. We evaluated both this approachand other commonly used forecasting approaches: logit regression of the outcome, alatent strength rating method, and bookmaker odds. We found that the Markov modelapproach is appropriate for modelling a basketball match and produces forecasts of aquality comparable to that of other statistical approaches, while giving more insight intobasketball. Consistent with previous studies, bookmaker odds were the best probabilisticforecasts.TRANSCRIPT
-
International Journal of Forecasting 28 (2012) 532542
at
n
ls
a
nia
nrathygiatot w 2011 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
1. Introduction
As Stekler, Sendor, and Verlander (2010) recentlypointed out in their survey of sports forecasting, a largeamount of effort is spent in forecasting the outcomes ofsporting events. While part of this effort focuses on theproperties of different types of forecasts and forecasters,the majority of such research is driven by sports bettingand revolves around betting markets. The evolution ofbetting markets brings new challenges for forecasters.In the past, only a few different betting markets wereavailable for a sports event. Today, bets can be made on allkinds of outcomes, ranging from the number of reboundsin a basketball match to the number of corners in a footballmatch. Furthermore, the increasing accessibility of in-playbetting has brought a demand for real-time forecasts basedon the progression of the sporting event. One way ofdealing with these issues in a uniform way is to constructa model that can simulate the sporting event. In this paperwe take a step in this direction by exploring the use ofMarkov models for basketball match simulation.
Corresponding author.E-mail addresses: [email protected] (E. trumbelj),
[email protected] (P. Vraar).
1.1. Related work
Basketball has received far less attention in the scien-tific literature than horse racing or football. Zak, Huang,and Sigfried (1979) combined defensive and offensive el-ements to rank individual teams. This ranking producedthe same order as the teams actual efficiencies. How-ever, no attempt was made to use the ranking to forecastthe results of individual matches. Berri (1999) proposed amodel for measuring how individual players contribute toa teams success. The sum of the contributions of all of itsplayers was a good estimator of the actual number of winsachieved by a team. Kvam and Sokol (2006) used a Markovmodel to predict the winner of NCAA basketball matches.Each team was a state in the model, transition probabil-ities were fit using the teams past performances, and theteamswere ranked according to themodels stationary dis-tribution. The model was better at forecasting the win-ner than several other statistical models and polls. Stern(1994) used the scores at the end of each quarter of 493NBA matches to fit a Brownian motion model of the scoreprogression. The model was good at estimating in-playwin probabilities, given the current points differences. Fora more comprehensive list of basketball-related forecast-ing references, especially those which focus on forecastingmatch outcomes from teams ratings or rankings, see Stek-ler et al. (2010).
0169-2070/$ see front matter 2011 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.doi:10.1016/j.ijforecast.2011.01.004Contents lists available
International Jour
journal homepage: www.e
Simulating a basketball match withforecasting the outcomeErik trumbelj, Petar Vraar Faculty of Computer and Information Science, Traka 25, 1000 Ljubljana, Slove
a r t i c l e i n f o
Keywords:Sports forecastingProbability forecastingMonte CarloSimulationNational Basketball Association
a b s t r a c t
We used a possessiomatch. The models tand indirectly fromand other commonllatent strength ratinapproach is approprquality comparable tbasketball. Consistenforecasts.SciVerse ScienceDirect
al of Forecasting
evier.com/locate/ijforecast
homogeneous Markov model and
-based Markov model to model the progression of a basketballnsition matrix was estimated directly from NBA play-by-play datae teams summary statistics. We evaluated both this approachused forecasting approaches: logit regression of the outcome, amethod, and bookmaker odds. We found that the Markov modele for modelling a basketball match and produces forecasts of athat of other statistical approaches, while giving more insight intoith previous studies, bookmaker odds were the best probabilistic
-
E. trumbelj, P. Vraar / International Journal of Forecasting 28 (2012) 532542 533Themost relevant is the work by Shirley (2007). Shirleyused a Markov model with states which correspondedto how a team obtained (retained) possession and howmany points were scored during the previous transition.Shirley fit a model for each team using summary statisticsand showed that the simulations based on such modelswere good at estimating a teams win probability againstan average opponent. However, the results were in-sample and the quality of the model as a forecaster wasnot investigated. Shirley also discussed fitting the modelusing exclusively in-match data (as opposed to summarystatistics) and taking into account the individual teamsstrengths, but did not pursue this due to insufficient data.
1.2. Research objectives
Motivated by Shirleys work, we investigated the useof Markov models for basketball match simulation. Whilethe long term goal is a detailed and accurate simulationof a basketball match, legitimate questions can be askedabout whether a more complex model can still produceaccurate forecasts for something as relevant as the matchoutcome. Therefore, our main objective was to determinewhether such a model could be used to simulate amatch between two specific teams, while still producingprobabilistic forecasts of a quality comparable to that ofthe other statistical approaches that are most commonlyused to forecast the outcome of a sporting event. Wehave revisited Shirleys average home vs. average away andteam vs. average team models and fit them using play-by-play data from 2252 NBA basketball matches. We havealso proposed an extension which takes into account thestrengths of the two competing teams. This is achievedby inferring the transition probabilities from the teamssummary statistics.
We found that some related basketball forecastingstudies do not include an ex-ante evaluation. Thosethat do, usually evaluate the forecasts using the per-centage of correct predictions, which is more appropri-ate for evaluating binary forecasts than for evaluatingprobabilistic forecasts. Therefore, a secondary objectiveis an overview of the quality of different probabilisticforecasts for basketball. We evaluate our approach andthree of the most commonly used approaches in forecast-ing: logit regression of the outcome, a latent strength rank-ing model, and bookmaker odds.
The remainder of the paper is organized as follows. Inthe following section we describe the Markov model andanalyze its properties on NBA play-by-play data. Section 3focuses on producing amatch-specific Markovmodel fromthe teams summary statistics. In Section 4 we describethree other well-known approaches to forecasting. Theresults of the empirical evaluation are given in Section 5.Finally, in Section 6 we conclude the paper and discusssome ideas for further work.
2. A possession-based Markov model
In basketball, two teams (A and B) alternate possessionof the ball and try to score points.Mostmodern approachesto basketball analysis evaluate the match at the level ofpossessions (see Kubatko, Oliver, Pelton, & Rosenbaum,2007), focusing on the number of possessions a teamhas and how effectively they convert them into points.Shirley (2007) proposed a Markov model with states thatcorrespond to a team obtaining (or retaining) possession.He considered the following states: restarting the actionwith an inbound pass (i), steal or non-whistle turnover (s),offensive (o) or defensive rebound (d), and going to thefree throw line after a shooting/bonus/technical foul (f).1These states cover all moments in a game duringwhich theteams can obtain possession of the ball. However, such astate does not imply a change in possession. For example,a team can get two consecutive offensive rebounds, thusobtaining the ball while already having possession (that is,retaining possession). Therefore, it is possible for a team tokeep possession for several consecutive states.
The model takes into account the obtainment (retain-ment) of possession, how it was obtained, and the numberof points scored during the transition. A transition encom-passes everything that happens between two states. Dur-ing a transition, 0, 1, 2, or 3 points are scored. This leadsus to the set of states: {A, B} {i, s, o, d, f} {0, 1, 2, 3}.For example, let team B miss a shot and team A rebound,then team Amiss a 2pt shot, get an offensive rebound, anddunk for 2ptwhile being fouled for an extra free throw. Thefollowing states and transitions describe this sequence upto the free throw line: 0 Ad00 Ao02 Af2. Only30 of the 40 states are actually possible. If possession wasobtained by a steal, no points could have been scored onthe previous transition, and scoring 3 points in a transitioncan never lead to a rebound.2 This eliminates 10 states. Ta-ble 1 shows the remaining 30 states and describes the tran-sitions between them.
First, we estimated the transition probabilities usingthe frequencies from the play-by-play data available forthe 200708 and 200809 seasons, separately. The resultwas a model for a match between the average home teamand the average away team for the corresponding season the average home vs. average away model. For moredetails about the play-by-play data and how they weretransformed into states, see Table 2. The NBA play-by-playdata were obtained from www.basketballgeek.com.
The transition matrix can be used to estimate thenumber of points scored and the win probabilities. Weused a Markov chain Monte Carlo model to estimate themodels stationary distribution. From this distribution weget the number of points per transition, which, multipliedby the number of transitions, gives an estimate of thenumber of points scored. To estimate thewin probabilities,10 000 matches were simulated (draws were ignored).Note that some teams play fast-paced basketball, whileothers play at a slower pace. For example, the meannumber of transitions per game for the 2007/08 (2008/09)seasonwas 267 (264)with a standard deviation of 7.9 (8.0).
1 Note that going to the free throw line is a state, while subsequent freethrows are not considered as states. Instead, they are encompassed in thetransition from the free-throw state to the next state.2 Scoring 2 points can lead to a rebound, for example in the situation
when a player has 3 free throws, and makes the first two but misses thelast one.
-
534 E. trumbelj, P. Vraar / International Journal of Forecasting 28 (2012) 532542Table1
Outlin
eof
thetran
sitio
nmatrixof
ShirleysMarko
vmod
elforb
asketball.Th
enu
mbers/le
ttersan
dthecorrespo
ndingde
scriptions
providemorede
tails
abou
thow
atran
sitio
ncanoccurb
etweentherow
and
columnstates.
Team
Aha
spo
ssession
Team
Bha
spo
ssession
Ai0
Ai1
Ai2
Ai3
As0
Ao0
Ao1
Ao2
Ad0
Ad1
Ad2
Af0
Af1
Af2
Af3
Bi0
Bi1
Bi2
Bi3
Bs0
Bo0
Bo1
Bo2
Bd0
Bd1
Bd2
Bf0
Bf1
Bf2
Bf3
Team
Aha
spo
ssession
Ai0
02
46
78
AB
CD
FAi1
02
46
78
AB
CD
FAi2
02
46
78
AB
CD
FAi3
02
46
78
AB
CD
FAs0
02
46
78
AB
CD
FAo
00
24
67
8A
BC
DF
Ao1
02
46
78
AB
CD
FAo
20
24
67
8A
BC
DF
Ad0
02
46
78
AB
CD
FAd
10
24
67
8A
BC
DF
Ad2
02
46
78
AB
CD
FAf0
01
12
33
55
58
99
BD
EE
GG
GAf1
01
23
55
89
9D
EG
GAf2
01
23
55
89
9D
EG
GAf3
02
58
9D
G
Team
Bha
spo
ssession
Bi0
8A
BC
DF
02
46
7Bi1
8A
BC
DF
02
46
7Bi2
8A
BC
DF
02
46
7Bi3
8A
BC
DF
02
46
7Bs0
8A
BC
DF
02
46
7Bo
08
AB
CD
F0
24
67
Bo1
8A
BC
DF
02
46
7Bo
28
AB
CD
F0
24
67
Bd0
8A
BC
DF
02
46
7Bd
18
AB
CD
F0
24
67
Bd2
8A
BC
DF
02
46
7Bf0
89
9B
DE
EG
GG
01
12
33
55
5Bf1
89
9D
EG
G0
12
35
5Bf2
89
9D
EG
G0
12
35
5Bf3
89
DG
02
5
0tim
eout
no
n-free-throw
foul
de
fensekn
ocks
ballou
tofb
ound
s9missedlastFT,tippe
dballou
tofb
ound
smadelastfree
throw
1missedlastfree
throw,defen
setip
pedballou
tofb
ound
sAmade2p
tsho
t2missedshot
orallfreethrows,offensiverebo
und
Bmade3p
tsho
tmade3free
throws
3missedlastfree
throw,offe
nsiverebo
und
Csteal
blockturnover
with
outw
histle
4fouled
Dmissedshot
orallfreethrows,de
fensiverebo
und
5missedlastfree
throw,loo
seballfoul
Emissedlastfree
throw,defen
sive
rebo
und
6made2p
tsho
t,shoo
tingfoul
Ftechnicalfou
loffensivefoul
whe
nin
bonu
s7made3p
tsho
t,shoo
tingfoul
Gmissedlastfree
throw,loo
seballfoul
8missedshot,tippe
dballou
tofb
ound
soffensivefoul
orviolation.
.
-
E. trumbelj, P. Vraar / International Journal of Forecasting 28 (2012) 532542 535
x (
r t
Aw
9797used. When estimating for a match between two specificteams, the mean of the two teams means is used.
Table 3 shows that both the 200708 and 200809average home vs. average away models produce goodestimates. Fig. 1 shows how the home teams winprobability changes over time, given leads of differentsizes. This was estimated using the 200708 model,and its shape is consistent with the results presentedby Stern (1994). Therefore, the Markov model is capableof modelling the home team advantage and the progressof the score over time.
Shirley (2007) also fit a smaller 18-state model usingthe teams summary statistics (3pt %, field goal %, rebounds,
probabilities do not change as the match progresses. Thisis a strong assumption, and those familiar with the sportwill agree that it is not a valid assumption. We thereforetreated each second as a separate bin and recorded thenumber of points scored in that particular second (seeFig. 2). Each quarter of an NBA match is 12 min or 720 slong. We pooled data from all four quarters of the 2008/09season (the 2007/08 data give similar results). The numberof points scored (and the method of scoring) remains ona similar level throughout a quarter, with the exceptionof approximately the first and last minutes. Those familiarwith the game of basketball know how the dynamics of thematch differ between time periods. In the final seconds, thestyle of play were the Denver Nuggets (287) and theGolden State Warriors (285). The San Antonio Spurs hadthe slowest pace in both 2007/08 and 2008/09, with 255and 249 transitions per game, respectively. Throughoutthe experiments, we use the teams mean numbers oftransitions as an estimate of the match length. Whenestimating for a particular team, only that teams games are
The teams win percentages obtained using the Markovmodel and the Monte Carlo simulation are unbiasedestimates of the home teamwin and the number of points.However, when looking at individual teams, the modelsoverestimate the weaker teams (see Fig. 3). The samething is observed in the results of Shirley (2007). Theproposed Markov model is homogeneous the transitionTable 2An excerpt from the first quarter of a match between Seattle (A) and Phoeni
Play-by-play data# Team Player Event Result
1 PHX Amare Stoudemire Shot 2pt2 SEA Damien Wilkins Shot Missed3 PHX Grant Hill Def. rebound -4 PHX Steve Nash Bad pass -5 SEA Kevin Durant Shot 2pt6 PHX Shawn Marion Shot 3pt Missed7 SEA Earl Watson Def. rebound -8 SEA Damien Wilkins Shot 2pt
Table 3Actual mean values and average home vs. average awaymodels estimates fo
Season Matches ActualHome win % Home points
200708 1132 0.606 100.9200809 1122 0.609 100.4
Fig. 1. Home teams win probability as a function of time and given leadsof different sizes. Estimated using the 2007/08 seasons home vs. awaymodel.
For 2007/08 and 2008/09, the two teamswith the quickestB).
Parsed model statesPossession Method Pts. on previous State
SEA Inbound pass 2 Ai2-PHX Def. rebound 0 Bd0SEA Steal 0 As0PHX Inbound pass 2 Bi2-SEA Def. rebound 0 Ad0PHX Inbound pass
he 2007/08 and 2008/09 seasons.
Estimateday points Home win % Home points Away points
.1 0.598 101.0 97.5
.0 0.607 100.9 97.5
etc.) for the 2003/04 season, for each team separately the team vs. average opponent model. Shirleys resultssuggest that the models provide a good estimate ofthe teams actual win percentages (R2 = 0.935).We repeated the procedure for the 2007/08 season,using the 30-state model and the play-by-play data.We estimated two models for each team, one fromthe teams home games and one from the teams awaygames. The two win probabilities estimated by thesetwo models, weighted by the share of home (away)matches, give an estimate of the teams win percentageagainst an average opponent. Fig. 3(a) shows that this isa good estimate of the teams actual win percentages. Theresults are similar if an out-of-sample ex-ante evaluationprocedure is used (see Fig. 3(b)). We used a rollingprocedure only matches which preceded a given matchwere used to estimate the transition matrix for thatmatch.
2.1. Limitations of the described Markov model approach
-
536 E. trumbelj, P. Vraar / International Journal of Forecasting 28 (2012) 532542(a) In-sample results. (b) Out-of-sample results.
Fig. 3. Actual vs. estimated win percentages for the 2007/08 season.Fig. 2. Actual total number of points scored for each second-long slice ofa quarter. We pooled data from all four quarters of each available matchfrom the 2008/09 season.
teamshold up the ball and attempt to score just before timeruns out to deny their opponents another possession, etc.Note also that more points are scored from free throws asa quarter progresses, because more bonus foul situationsarise as the teams accumulate fouls.
Differences also exist between quarters. Table 4 showsthe mean total number of points scored and the meandifference between the numbers of points scored by theteams (nett points), for each quarter. We tested thesignificance of the differences between quarters with arepeated measures analysis of the variance, and obtaineda quarter does not have a strong impact on its practicaluse. Using all available matches, we also performed a two-way analysis of variance, using the quarter (4 categories)and minute (12 categories) as factors, and the number ofpoints scored as the target variable. Both factors, quarter(Pr(>|F |) = 3.1 105) and minute (Pr(>|F |) < 1015),as well as the interaction quarter minute (Pr(>|F |) |.60.00.00Team (player) ratings or rankings are often used forforecasting the outcome of a sports event. The ratingis used to describe a teams latent strength, and theratings of a given pair of teams are used to determinethe outcome of a match between the two teams. Beforeany matches have been played, the ratings are setto a predetermined value, which is often the samefor all teams. As matches are played, the ratings areupdated, depending on the teams performances. A well-known method for rating a players skill level is the
B, where the two teams have similar ratings. The increasein rating for team A, if they win at home, will be the sameas the increase in rating for team B, if they win an awaymatch. This is unfair, because team A had the advantage ofplaying at home. We compensate for this by subtracting aconstant from the away teams rating
Pr(A wins) = 10RA/400
10RA/400 + 10(RBC)/400 , (2)effectively handicapping the away team.loss function (also known as the Brier score) is defined as(pi oi)2, where pi and oi are the predicted probabilityof a home win and the actual outcome for the home team(1 = won, 0 = lost), respectively.
4.1. Latent strength modelling
=10RA/400 + 10RB/400 . (1)
Given the outcome of the match, we adjust theratings by adding R = K (actual outcome for A Pr(A wins)) to RA and subtracting R from RB, where Kis a rate-of-change constant. Note that Eq. (1) does nottake into account the home team advantage. For example,observe the match between home team A and away teamhORBrh 5.687 3.936 0.149 0.278 0.223 0.359 0.025DRBrh 5.444 4.212 0.196 0.743 0.680 0.792 0.021FTFh 5.938 3.905 0.128 0.243 0.185 0.315 0.026oEFG%h 25.490 4.961
-
540 E. trumbelj, P. Vraar / International Journal of Forecasting 28 (2012) 532542The Elo rating system was originally developed for rat-ing chess players, where there is no need to take the win-ning margin into account. However, in basketball, it is rea-sonable to take the winning margin into account, reward-ing bigger wins more.3 Hvattum and Arntzen (2010) usedan extension of the Elo system, where the value of K isset according to the winning margin, K = K0(1 + ) ,where is the absolute winning margin and K0 > 0and > 0 are parameters. Hvattum and Arntzen (2010)used this approach for ranking football teams. However,it can easily be adapted for any sport. All that is requiredis a proper calibration of the parameters K0 and . Notethat sometimes parameters other than 10 and 400 areused in Eqs. (1) and (2). The purpose of these two pa-rameters is to set the scale of the ratings. Parameters Kand C require more attention. We used the following pa-rameter values for the basic ELO model and ELO: K =24.0, K0 = 2 and = 1. The home team advantageconstant was set to C = 64. To facilitate the out-of-sample evaluation, we did not use data from 2007/08 or2008/09 in the calibration. The chosen parameter valuesminimized the squared loss for the 2006/07 season. Werounded the values to simplify presentation, without sig-nificantly worsening the performance. The match scoreswhich were used to perform the calibration were obtainedfrom www.basketball-reference.com. References to sev-eral more complex latent-strength modelling approachesto sports forecasting, including that of Glickman and Stern(1998), are given by Stekler et al. (2010). Improvementsmight also be achieved by using the teams ELO ratings(or some other measure of the team quality) and fitting aBradley-Terry model (Bradley & Terry, 1952).
4.2. Bookmaker odds as forecasts
Bookmakers odds are probabilistic forecasts. For exam-ple, let dA and dB be the quoted decimal odds for a matchbetween teams A and B. Then pA = 1dA and pB = 1dB are winprobability estimates. Due to bookmakers profit margins,1dA
and 1dB can sum to more than 1, and thus it is commonto use the normalization pA = pApA+pB and pB =
pBpA+pB .
We used odds from the worlds largest betting ex-change: Betfair. Unlike regular bookmakers, betting ex-changes do not offer odds directly, but simply facilitatepeer-to-peer betting. Users can both back a selection (beton an event occurring) or lay a bet (bet against an eventoccurring) at any odds they choose. Matching back andlay bets are matched by the system. The odds change overtime, soweused the odds of the firstmatched bet (Betfair0)and the odds most frequently matched during the 5 minbefore the start of the match (Betfair1). These historicalodds data were obtained from data.betfair.com.4
Bookmaker odds are considered to be the overallbest source for probabilistic forecasts of sporting events.
3 Some rating systems, for example Sagarin ratings, also adjust forblowouts, so that a very large margin in a single game does not have aproportionately large impact on the teams rating.4 The data are free of charge, but some terms apply. One must be a
registered Betfair user with a certain (symbolic) number of loyalty points.However, recent results by Franck, Verbeek, and Nuesch(2010) and Smith, Patton, andWilliams (2010) suggest thatbetting exchange odds are better forecasts than regularbookmaker odds. However, these results are limited tohorse racing and football. To the best of our knowledge,no empirical evidence exists for basketball. To test thishypothesis, we compared forecasts from 11 bookmakersand Betfair across a set of 692 NBAmatches from 2008/09,such that all bookmakers forecasts were available for eachmatch. The selected bookmakers were 5Dimes (0.1859,6.24), Bet365 (0.1856, 6.68), BetCRIS (0.1857, 6.76), BetUS(0.1862, 7.05), Betway (0.1866, 7.20), Canbet (0.1858,7.01), Expekt (0.1859, 6.70), Jetbull (0.1868, 6.64), PaddyPower (0.1864, 6.37), Sports Interaction (0.1858, 6.22),Unibet (0.1866, 5.60), and Betfair (0.1855, 5.55). Thenumbers in parentheses represent the mean quadraticloss and mean rank (with respect to the mean quadraticloss), both across all 692 selected matches. We found nosignificant difference between the bookmakers predictionqualities (p-value 1). By applying the square root to themean quadratic loss of the best model (Betfair, 0.4307) andthe worst model (Betway, 0.4320), we express them in thesame unit as the predictions, thus simplifying comparisonand interpretation. The difference in quality between thebest and worst bookmakers is approximately one tenth ofa percent. However, Betfair has the lowestmean rank. Thatis, the Betfair odds are, on average, closest to the actualoutcome. Friedmans non-parametric test revealed thatthe differences in rank are significant (p-value 0). Weused Nemenyis test for post-hoc comparison. The criticaldifference in rank at the 0.01 significance level was 0.74.Therefore, the Betfair loss ranks significantly lower (better)than those of eight other bookmakers.
We also tested the distribution of the difference in lossbetween Betfair0 odds and Betfair1 odds. Using a pairwiset-test, we rejected the zero mean hypothesis and acceptedthe alternative hypothesis that the mean is greater than 0(p-value 0). Therefore, the odds from just before the startof the match are significantly better probabilistic forecaststhan the odds from the start of the betting period. This isconsistent with the findings of Gandar et al., who showedthat opening lines set by bookmakers (in our case, themarket) are less accurate than the closing lines establishedby the market (see Gandar, Dare, Brown, & Zuber, 1998;Gandar, Zuber, & Dare, 2000).5
5. Empirical evaluation
We compared the following sources of forecasts:(1) the home team vs. average opponent model fromSection 2 (MAvsAVG), (2) the Markov model approach (usingmultinomial logit row models) described in Section 3(MML), (3) Betfair1 odds, (4) the Elo rating models (seeSection 4.1), and (5) a logit regression model using the
5 Stekler et al. (2010) suggest that this indicates that the market(closing line) is more accurate than the bookmakers (opening line). Weshould interpret such indirect evidencewith caution. Opening lines couldalso be worse becausemore information becomes available after openinglines have been set. Without all of the participants forecasts at differentpoints in time, we cannot distinguish between the two possibilities.
-
E. trumbelj, P. Vraar / International Journal of Forecasting 28 (2012) 532542 541
cti
e h
00000matches in the evaluation). A rolling procedure was thenused to forecast the remaining matches in a season after forecasting and evaluating the forecasts for the nextdays matches, we added the matches to the sample andrebuilt all models using the current sample of matches.This procedure was repeated until the end of the season.
We evaluated each season separately. We evaluatedthe models forecasts using the quadratic loss function.We also measured the percentage of correct predictionswhen the forecasts were treated as being binary. For eachbasketball game, we had all of the models losses (that is,a complete block design). We viewed individual games assubjects and forecasting models as treatments. We wereinterested in seeing whether there exist differences inmean loss between forecasting models. The equivalence
the summary statistics (and/or transition frequencies) arenot completely accurate representations of a teams actualstrength. The three statistical approaches (logit regression,ELO rating, and the proposed model) exhibit different lev-els of bias, but produce forecasts of the samequality. There-fore, eachmodel represents a different bias-variance trade-off.
6. Conclusion
Using summary statistics to estimate Shirleys Markovmodel for basketball produced a model for a matchbetween two specific teams. The model was be used tosimulate the match and produce outcome forecasts of aquality comparable to that of other statistical approaches,the optimization of any such linear opinion pool whichincluded Betfair odds always resulted in the Betfair oddsbeing weighted approximately 1. The bookmaker oddsforecasts could not be improved on using a linear opinionpool. We did not investigate other types of forecastcombinations.
We selected the weights (0.54 for ELO, 0.46 for MML)that minimized the quadratic loss on the 2006/07 data,which were reserved for parameter selection and are notused in the evaluation. In the out-of-sample procedure, westarted with a sample of the first five home and first fiveaway matches for each team (we did not include these
AvsAVGworse than all of the other forecasts. The Betfair1 oddsare better than all of the other forecasts. The differencesin quality between the remaining statistical models arenot significant. The in-sample results for some of themodels illustrate how misleading their quality can be andemphasize the importance of evaluating ex-ante forecasts.
All of the statistical approaches were unbiased estima-tors of the win percentage, but, when observing individualteams, they exhibited a bias towardsweaker teams (see Ta-ble 12). This can be explained by the fact that the statisticalmodels do not take into account the amount of effort thatwas invested in a match (see Section 2.1). In other words,Table 10Empirical evaluation results. The accuracy is the proportion of correct predi
2007/08In-sample Out-of-sampleLoss Loss
Betfair1 0.190054100 ELO + 46100MML 0.1978ELO 0.2009Logit 0.1800 0.2022MML 0.1890 0.2048ELO 0.2065MAvsAVG 0.2003 0.2199Pr(Homewin) = 0.6 0.2373
N = 957
Table 11Adjusted p-values for pairwise tests of the equality of means. The alternativcolumn (model).
8 7 6
1 Betfair1
-
542 E. trumbelj, P. Vraar / International Journal of Forecasting 28 (2012) 532542
mer
e
69151Table 12Linear models of the relationship between the actual win percentage and theAll of the forecasters exhibit a statistically significant bias towards the weak
MML 54100 ELO + 46100MMLIntercept Slope Intercept Slop
Estimate 0.170 0.657 0.131 0.73Std. Err. 0.020 0.039 0.015 0.02t-value 8.354 16.694 8.844 25.7Pr(> |t|) 109