quantitative analysis in football

23
INDIAN INSTITUTE OF MANAGEMENT, KOZHIKODE QUANTITATIVE ANALYSIS IN FOOTBALL Quantitative Methods Project 9/18/2013 This report shows the various ways quantitative analysis can be done in football to determine the performance of a team and predict the result of a match.

Upload: tamarai-selvi-arumugam

Post on 25-Oct-2015

48 views

Category:

Documents


1 download

DESCRIPTION

Quantitative analysis in football

TRANSCRIPT

Page 1: Quantitative analysis in football

INDIAN INSTITUTE OF MANAGEMENT, KOZHIKODE

QUANTITATIVE ANALYSIS IN FOOTBALL

Quantitative Methods Project

9/18/2013

This report shows the various ways quantitative analysis can be done in football to determine the performance of a team and predict the result of a match.

Page 2: Quantitative analysis in football

Quantitative Analysis in Football

Contents

INTRODUCTION:...............................................................................4

DESCRIPTIVE STATISTICS:............................................................5

PROBABLITY ANALYSIS..............................................................10

INTERVAL ESTIMATION...............................................................14

HYPOTHESIS TESTING:.................................................................15

REFERENCES:..................................................................................18

Group 27 Page 2

Page 3: Quantitative analysis in football

Quantitative Analysis in Football

INTRODUCTION:We are all well aware of how football has impacted the financial world. The money generated by football is growing steadily since 1990. There have also been record breaking financial deals and negotiations between football clubs and players. Other than the deals between football clubs and players, a huge amount of money is transacted in the form of betting. According to Nevada gaming Commission $3.2 billion was wagered in sports bets in the state’s casinos in 2011. Of that amount, $1.34 billion or 41 percent was handled just for football. Thirty-three million Americans participate in fantasy football, according to the Fantasy Sports Trade Association. The FSTA found that $1.18 billion changes hands between players through pools each year.

Hence a there is a need to quantitatively evaluate not only the players, but also the performance of the team as a whole. Football results are randomly distributed but the outcomes of the games can be predicted using statistical analysis. Here in this project we have shown how quantitative analysis can be used in analysing the performance of the team and in turn predicting the results of a match.

With football betting, there are only three possible half-time and full time outcomes (home/draw/away). We have used the results of matches played by two teams – Real Madrid and Manchester United from 1998 to 2013 to analyse and predict their performance.

The data used for analysis contains the number of matches played by the team in a season, the position held by the team in that season, the points gained by the team in that season, the home and away match records (number of matches won, number of matches lost, number of matches with no result, number of goals scored for the team and number of goals scored against the team).

Below is the data used for our analysis:

Real Madrid

Home AwaySeason

Position

Played

Win Draw Loss

For Against Win Draw Loss

For Against

Points

2012-2013 2 38 17 2 0

67 21 9 5 5

36 21 85

2011-2012 1 38 16 2 1

70 19 16 2 1

51 13 100

2010-2011 2 38 16 1 2

61 12 13 4 2

41 21 92

2009-2010 2 38 18 0 1

60 18 13 3 3

42 17 96

2008-2009 2 38 14 2 3

49 29 11 1 7

34 23 78

2007-2008 1 38 17 0 2

53 18 10 4 5

31 18 85

2006- 1 38 12 4 3 3 18 11 3 5 3 22 76

Group 27 Page 3

Page 4: Quantitative analysis in football

Quantitative Analysis in Football

2007 2 42005-2006 2 38 11 4 4

40 21 9 6 4

30 19 70

2004-2005 2 38 15 1 3

43 12 10 4 5

28 20 80

2003-2004 4 38 13 2 4

43 26 8 5 6

29 28 70

2002-2003 1 38 13 5 1

52 22 9 7 3

34 20 78

2001-2002 3 38 14 5 0

48 14 5 4 10

21 30 66

2000-2001 1 38 15 3 1

53 15 9 5 5

28 25 80

1999-2000 5 38 9 4 6

31 27 7 10 2

27 21 62

1998-1999 2 38 14 2 3

46 24 7 3 9

31 38 68

Manchester United

Home AwaySeason

Position

Played

Win

Draw

Loss

For

Against

Win

Draw

Loss

For

Against

Points

EPL2012-2013 1 38 16 0 3 45 19 12 5 2 41 24 892011-2012 2 38 15 2 2 52 19 13 3 3 37 14 892010-2011 1 38 18 1 0 49 12 5 10 4 29 25 802009-2010 2 38 16 1 2 52 12 11 3 5 34 16 852008-2009 1 38 16 2 1 43 13 12 4 3 25 11 902007-2008 1 38 17 1 1 47 7 10 5 4 33 15 872006-2007 1 38 15 2 2 46 12 13 3 3 37 15 892005-2006 2 38 13 5 1 37 8 12 3 4 35 26 832004-2005 3 38 12 6 1 31 12 10 5 4 27 14 772003-2004 3 38 12 4 3 37 15 11 2 6 27 20 752002-2003 1 38 16 2 1 42 12 9 6 4 32 22 832001-2002 3 38 11 2 6 40 17 13 3 3 47 28 772000-2001 1 38 15 2 2 49 12 9 6 4 30 19 801999-2000 1 38 15 4 0 59 16 13 3 3 38 29 911998-1999 1 38 14 4 1 45 18 8 9 2 35 19 79

DESCRIPTIVE STATISTICS:Descriptive statistics is a discipline that describes the main features of collection of data. Some measures that are commonly used to describe a data set are measures of central tendency and measures of variability or dispersion. Measures of central tendency include the

Group 27 Page 4

Page 5: Quantitative analysis in football

Quantitative Analysis in Football

mean, median and mode, while measures of variability include the standard deviation (or variance), the minimum and maximum values of the variables, kurtosis and skewness.

Based on the position held by the teams in various seasons, we can come to the consensus that

Manchester United has remained in the top three teams for the past 14 years, with the majority times winning and getting ranked first.

Positions profile of Man U

123

Real Madrid has remained in the top five teams for the last 14 years, with majority times winning and getting ranked first.

Positions profile of Real Madrid

12345

Stacked column charts show the relationship of individual items to the whole, comparing the contribution of each value to a total across categories. Number of wins, draws and losses in home/away can be depicted using stacked column chart with each stack representing number of wins, number of losses and number of draws.

Group 27 Page 5

Page 6: Quantitative analysis in football

Quantitative Analysis in Football

Home – Manchester United

2012-2013

2011-2012

2010-2011

2009-2010

2008-2009

2007-2008

2006-2007

2005-2006

2004-2005

2003-2004

2002-2003

2001-2002

2000-2001

1999-2000

1998-199902468

101214161820

LossDrawWin

Home – Real Madrid

2012-2013

2011-2012

2010-2011

2009-2010

2008-2009

2007-2008

2006-2007

2005-2006

2004-2005

2003-2004

2002-2003

2001-2002

2000-2001

1999-2000

1998-199902468

101214161820

LossDrawWin

Group 27 Page 6

Page 7: Quantitative analysis in football

Quantitative Analysis in Football

Away – Manchester United

2012-2013

2011-2012

2010-2011

2009-2010

2008-2009

2007-2008

2006-2007

2005-2006

2004-2005

2003-2004

2002-2003

2001-2002

2000-2001

1999-2000

1998-199902468

101214161820

LossDrawWin

Away – Real Madrid

2012-2013

2011-2012

2010-2011

2009-2010

2008-2009

2007-2008

2006-2007

2005-2006

2004-2005

2003-2004

2002-2003

2001-2002

2000-2001

1999-2000

1998-199902468

101214161820

LossDrawWin

The summary statistics number of wins in home and away by a team is as follows

Manchester United

Home -Win

Group 27 Page 7

Page 8: Quantitative analysis in football

Quantitative Analysis in Football

Mean 14.73333Standard Error 0.511456Median 15Mode 16Standard Deviation 1.980861Sample Variance 3.92381Kurtosis -0.44462Skewness -0.46411Range 7Minimum 11Maximum 18Sum 221Count 15

Real Madrid

Home- Win Mean 14.26667Standard Error 0.628427Median 14Mode 14Standard Deviation 2.433888Sample Variance 5.92381Kurtosis 0.111816Skewness -0.52951Range 9Minimum 9

Group 27 Page 8

Away - Win

Mean 10.73333Standard Error 0.589323Median 11Mode 13Standard Deviation 2.282438Sample Variance 5.209524Kurtosis 1.366206Skewness -1.16011Range 8Minimum 5Maximum 13Sum 161Count 15

Page 9: Quantitative analysis in football

Quantitative Analysis in Football

Maximum 18Sum 214Count 15

Away- Win

Mean 9.8

Standard Error0.71180521

7Median 9Mode 9Standard Deviation 2.75680975Sample Variance 7.6

Kurtosis0.71580773

8Skewness 0.57022685Range 11Minimum 5Maximum 16Sum 147Count 15

Box Plot

This plot is used to determine the dispersion of values with respect to the mean as well as determine the skewness in the values.

Real Madrid

Home AwayWin Win

9 511 712 713 813 914 914 914 915 1015 1016 1116 1117 13

Group 27 Page 9

Page 10: Quantitative analysis in football

14 16 18139

Quantitative Analysis in Football

17 1318 16

Median = 14 Median = 9Q1 = 13 Q1 = 8Q2 = 16 Q2 = 11Minimum (x1) = 9

Minimum (x1) = 5

Maximum (x2) = 18

Maximum (x2) = 16

We see that winning at home ground is left skewed indicating that a higher number of matches are being won on home ground.

We see that winning away from the home ground is right skewed indicating that a lower number of matches are being won away from the home ground.

Manchester United

Home Away

Win Win11 512 812 913 914 10

15 1015 1115 1115 1216 1216 1216 1316 1317 1318 13Median = 15 Median = 11Q1 = 13 Q1 = 9Q2 = 16 Q2 = 13

Group 27 Page 10

Page 11: Quantitative analysis in football

Quantitative Analysis in Football

Minimum (x1) = 11 Minimum (x1) = 5Maximum (x2) = 18 Maximum (x2) = 13

We see that winning at home ground is left skewed indicating that a higher number of matches are being won on home ground.

We see that winning away from the home ground is also left skewed indicating that a high number of matches are being won away from the home ground as well and so in the two cases (Home and Away) the team has a similar performance whether the match is on home ground or not.

PROBABLITY ANALYSIS Determining the distribution of the number of wins in home of both the teams

Let X be the random variable that denotes number of wins

X follows normal distribution with parameters µ and σ

The standard normal variable z = X-µ/σ

f(Z) = 1/√2π e− z2/2 is the standard normal density function

Manchester United

µ = 14.733; σ = 1.980860804

Season Win Z f(Z)2012-2013 17 1.123031802 0.21312011-2012 16 0.712166509 0.31012010-2011 16 0.712166509 0.31012009-2010 18 1.533897096 0.12382008-2009 14 -0.109564078 0.3972007-2008 17 1.123031802 0.21312006-2007 12 -0.931294665 0.25892005-2006 11 -1.342159959 0.1625

Group 27 Page 11

Page 12: Quantitative analysis in football

Quantitative Analysis in Football

2004-2005 15 0.301301215 0.38142003-2004 13 -0.520429372 0.34852002-2003 13 -0.520429372 0.34852001-2002 14 -0.109564078 0.3972000-2001 15 0.301301215 0.38141999-2000 9 -2.163890546 0.03871998-1999 14 -0.109564078 0.397

Hence, the standard normal distribution of wins in home is given by the graph

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Real Madrid

µ = 14.26666667; σ = 2.433887739

Season Win Z f(Z)2012-2013 17 1.123031802 0.21312011-2012 16 0.712166509 0.31012010-2011 16 0.712166509 0.31012009-2010 18 1.533897096 0.12382008-2009 14 -0.109564078 0.3972007-2008 17 1.123031802 0.21312006-2007 12 -0.931294665 0.25892005-2006 11 -1.342159959 0.16252004-2005 15 0.301301215 0.38142003-2004 13 -0.520429372 0.34852002-2003 13 -0.520429372 0.34852001-2002 14 -0.109564078 0.3972000-2001 15 0.301301215 0.38141999-2000 9 -2.163890546 0.03871998-1999 14 -0.109564078 0.397

Group 27 Page 12

Page 13: Quantitative analysis in football

Quantitative Analysis in Football

Hence, the standard normal distribution of wins in home is given by the graph

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

For Standard Normal Distribution we see that for both the teams the entire data for 15 years lies within μ ± 2σ and the spread of the distributions for both the teams is almost the same indicating similar performance on home ground.

Calculating the number of points expected by the team to score in a match

Number of points gained if the match is won = 3

Number of points gained if the match is draw = 1

Number of points gained if the match is lost = 0

Manchester United

weight(x) x p(x) xP(x)

win 30.77543859

62.32631

6

draw 10.13333333

30.13333

3loss 0 0.09122807 0

2.459649

Hence, the average number of points expected by Manchester United to score in a match is 2.459

Real Madrid

weight(x x p(x) xP(x)

Group 27 Page 13

Page 14: Quantitative analysis in football

Quantitative Analysis in Football

)win 3 0.515789 1.547368draw 1 0.231579 0.231579loss 0 0.252632 0

1.778947

Hence, the average number of points expected by Manchester United to score in a match is 1.778947

Determining the expected amount of money that a team will make in the future match.

Manchester UnitedConsidering a sample of 15 English Premium Leagues

PosnNumber of times in 15 years

1 92 33 3

15

Event x(in million dollars) P(X) xP(X) E(X)Finishes 1st 15.1 0.60 9.06 11.42Finishes 2nd 7.3 0.20 1.46Finishes 3rd 4.5 0.20 0.9

Thus, for the next premier league we can conclude that the team will make $11.42 million. Thus, the management can afford to incur a maximum maintenance cost of 11.42 million $ for no profit no loss. Else it will result in a loss. (Currently the maintenance cost for Manchester United stands around $9 million yearly

Real Madrid

Considering a sample of 15 Spanish La Ligas

PosnNumber of times in 15 years

1 52 73 14 1

Group 27 Page 14

Page 15: Quantitative analysis in football

Quantitative Analysis in Football

5 115

Event x(in million dollars) P(X) xP(X) E(X)Finishes 1st 8.6 0.33 2.87 5.57Finishes 2nd 5.2 0.47 2.43Finishes 3rd 4.1 0.07 0.27Finishes 4th 3.3 0.07 0.22Finishes 5th 2.1 0.07 0.14

Thus, for the next premier league we can conclude that the team will make $5.57 million. Thus, the management can afford to incur a maximum maintenance cost of 5.57 million $ for no profit no loss. Else it will result in a loss. (Currently the maintenance cost for Real Madrid stands around $4 million yearly

INTERVAL ESTIMATION Manchester United

Estimating the mean number of goals scored by Manchester united.

Sample of past 15 seasons shows the mean to be 78.73 and standard deviation to be 10.83. Assuming goal scoring pattern to be normally distributed over the years, construct a 95% confidence interval level for mean.

Data and Analysis:

Given Data

Sample size 15

Mean

78.73

Standard deviation

10.22

Confidence Interval

95%

Sx

2.64

Degrees of freedom 14

t value

2.145

Group 27 Page 15

Page 16: Quantitative analysis in football

Quantitative Analysis in Football

Calculating from above values using t distribution, maximum and minimum values,

Max

Min

84.39

73.06

Real Madrid

Estimating the mean number of goals scored by Real Madrid in next season

Sample of past 15 seasons shows the mean to be 83 and standard deviation to be 17.23. Assuming goal scoring pattern to be normally distributed over the years, construct a 95% confidence interval level for mean.

Data and Analysis

Sample size 15Mean 83Std Dev 17.23783215Confidence Interval 95%

Sx 4.450789122Degrees of freedom 14t value 2.145

Max Min92.54 73.45

Conclusion: We can expect Manchester United to score goals in the range of 73 to 84 in

upcoming seasons with 95% certainty We can expect Real Madrid to score goals in the range of 73 to 92 in upcoming

seasons with 95% certainty Comparing both the team’s statistics, it can be concluded that Manchester United

is expected to perform consistently with less variations than Real Madrid.

Group 27 Page 16

Page 17: Quantitative analysis in football

Quantitative Analysis in Football

HYPOTHESIS TESTING: Manchester United

One sample hypothesis

Problem: A random sample of 570 English Premier Matches featuring Manchester United showed that the average number of goals scored by them Xbar = 1.182 per match and standard deviation = 0.1851. Does the average number of goals scored by MANU in a match be greater than 1? (Level of significance = 1%)

EIGHT STEP PROCEDURE:

Step 1.The parameter of interest is the mean number of goals scored by Manchester United per match, µ. (σ is not given)

Step 2. H0 : µ <= 1

Step 3. Ha : µ > 1

Step 4. α = 0.01

Step 5.The text statistic is

t = x3bar - µ0

s / √n

Step 6. Given that n=570, hence d.f. = 569 (as d.f >100, it can be approximated as infinity and calculated correspondingly from table). Also for α = 0.01, DOF = 569, α = 2.326. Hence, reject H0 if t0< 2.326

Step 7.Computations: Since xbar = 1.182, s = .1851, µ0= 1 and n=570, we have

t0 = 1.182 – 1 = 23.53

.1851/√570

Step 8.

Conclusion: Since t0 = 23.53 > 2.326 (t0.01, 569); we therefore reject the null hypothesis (that is H0 : µ <= 1) at the 0.01 level of significance. Therefore, we

Group 27 Page 17

Page 18: Quantitative analysis in football

Quantitative Analysis in Football

conclude that the mean number of goals scored by MANU per match exceeds 1 based on hypothesis testing using the sample of 570 Manchester United EPL matches and 5% level of significance.

Real Madrid

One sample hypothesis

Problem. A random sample of 570 Spanish La Liga Matches featuring Real Madrid showed that the average number of goals scored by them Xbar = 1.31 per match and standard deviation = 0.301. Does the average number of goals scored by Real Madrid in a match be greater than 1? (Level of significance = 5%)

EIGHT STEP PROCEDURE:

Step 1.The parameter of interest is the mean number of goals scored by Real Madrid per match, µ. (σ is not given)

Step 2. H0 : µ <= 1

Step 3. Ha : µ > 1

Step 4. α = 0.05

Step 5.The text statistic is

t = x3bar - µ0

s / √n

Step 6. Given that n=570, hence d.f. = 569 (as d.f >100, it can be approximated as infinity and calculated correspondingly from table). Also for α = 0.05, DOF = 569, α = 1.645. Hence, reject H0 if t0< 1.645

Step 7.Computations: Since xbar = 1.31, s = .301, µ0= 1 and n=570, we have

t0 = 1.31 – 1 = 24.74

.301/√570

Group 27 Page 18

Page 19: Quantitative analysis in football

Quantitative Analysis in Football

Step 8.

Conclusion: Since t0 = 24.74 > 1.645 (t0 .05, 569); we therefore reject the null hypothesis (that is H0 : µ <= 1) at the 0.05 level of significance. Therefore, we conclude that the mean number of goals scored by Real Madrid per match exceeds 1 based on hypothesis testing using the sample of 570 Real Madrid Spanish La Liga matches and 5% level of significance.

REFERENCES:

Source of data:

http://www.statto.com/football/teams/real-madrid/history/modern

http://www.statto.com/football/teams/manchester-united/history/modern

Group 27 Page 19