minimum violations sports ranking using ...yamamoto/recent...ples from sports to explain our new mvr...

MINIMUM VIOLATIONS SPORTS RANKING USINGEVOLUTIONARY OPTIMIZATION AND

BINARY INTEGER LINEAR PROGRAM APPROACHES

Timothy Chartiera, Erich Kreutzera, Amy Langvilleb, Kathryn Pedingsb,d, Yoshitsugu Yamamotoc

aDavidson College, Davidson, NC, USAbCollege of Charleston, Charleston, SC, USA

cUniversity of Tsukuba, Tsukuba, JapandCorresponding author: [email protected]

Abstract

Typically the binary integer linear program (BILP) formulation of the minimum violations ranking (MVR)problem and related rank aggregation problem is the preferred way to find a ranking that minimizes thenumber of violations to hillside form. However, for very large ranking problems, the BILP formulation islimited by the O(n3) number of constraints. Even when constraint relaxation techniques are employed, thereare practical limits on the size of n, the number of items being ranked. One goal of this paper is to demon-strate these limits on several ranking problems drawn from a wide range of application areas. Another goalis to overcome these limitations by using a evolutionary optimization (EO) algorithm to solve large MVRranking problems. Our EO algorithm uses many features of the BILP formulation to improve its speedand convergence. Though EO, unlike BILP, is not guaranteed to produce the global optimum, its speed,scalability, and flexibility make it the method of choice for solving very large-scale linear ordering problems.

Keywords: evolutionary optimization, binary integer linear program, minimum violations, ranking,hillside form, March Madness

1. Introduction

In this paper, we present a rating method that,given information on the pairwise comparisons of nitems, minimizes the number of inconsistencies inthe ranking of those items. Though Minimum Vi-olations Ranking (MVR) methods have many ap-plications (Reinelt et al., 1984), we use exam-ples from sports to explain our new MVR meth-ods. There are two algorithms discussed in this pa-per that both use MVR. The binary integer linearprogram (BILP) shown in this paper always givesthe optimal solution to the optimization problem.The evolutionary optimization (EO) gives a heuris-tic solution. In order to understand the methods, wemust define terms that will be used throughout thepaper.

The matrix D below, which we call a point dif-

ferential matrix, contains pairwise comparison dataand is commonly and easily produced for manysports.

D =

1 2 3 4 51 0 0 0 0 02 9 0 4 0 23 5 0 0 0 04 15 3 8 0 55 6 0 3 0 0

The (2, 3)-entry means that team 2 beat team 3 by 4points in their matchup. We will analyze this pointdifferential matrix in order to produce a ranking ofthese five teams. At this point we introduce a defi-nition.

Preprint submitted to MATHSPORT 2010 June 18, 2010

A matrix D is in hillside form if

di j ≤ dik , ∀ i and ∀ j ≤ k

di j ≥ dk j , ∀ j and ∀ i ≤ k.

The name is suggestive as a cityplot of a matrixin hillside form looks like a sloping hillside as infigure 2. As an illustrative example, consider twopoint differential matrices from two different sea-sons. Since this represents seasonal data, it is pos-sible that some of the teams played multiple match-ups. For instance, one possible scenario for matrixB is that teams 1 and 3 played two times during theseason, the first time team 1 beat team 3 by 5 points(B(1,3)=5) and the second time team 3 beat team 1by 7 points (B(3,1)=7). The matrix given in A isin hillside form and the season represented in B isnot.

A =

0 3 5 8 150 0 2 4 90 0 0 3 60 0 0 0 50 0 0 0 0

and

B =

0 3 5 8 150 0 2 4 97 0 0 3 40 0 0 0 50 0 0 0 0

.

For n × n matrices in hillside form, the ranking rof the items is clear: r = ( 1 2 · · · n ). Fornon-hillside matrices, we can count the number ofviolations of the hillside conditions. In the aboveexample, B has 7 violations. Often a matrix thatappears to be non-hillside can be symmetrically re-ordered so that it is in hillside or near hillside form.In fact, the non-hillside matrix D when reorderedaccording to the vector ( 5 2 4 1 3 ) formsthe hillside matrix A. Finding such a hidden hill-side structure is exactly the aim of both the EO andBILP methods. Our MVR methods find a reorder-ing of the items that when applied to the item-itemmatrix of differential data forms a matrix that is asclose to hillside form as possible. We will discussour measure of closeness to hillside form as viola-tions which are defined in the next paragraph.

Hillside form gives a great deal of informationabout the difference in the strengths of teams. For

example, matrix A says that not only is team 1ranked above teams 2, 3, 4, and 5, but we expectteam 1 to beat team 2 by some margin of victory,then team 3 by an even greater margin, and so on.Sometimes a data matrix has been reordered to beas close to hillside form as possible, yet violationsremain. These violations are of two types: upsetsand weak wins. Nonzero entries in the lower tri-angular part of the reordered matrix correspond toupsets, i.e., when a lower ranked team beat a higherranked team. Weak wins manifest as violations ofthe hillside conditions that occur in the upper trian-gular part of the matrix. This is when a high rankedteam beats a low ranked team but by a smaller mar-gin of victory than expected. Our MVR paper in-herently weights upsets more than weak wins. Theexample matrix B above demonstrates this well.Notice that the presence of the 7 in the lower tri-angular part of the matrix accounted for 6 of the 7violations. Looking across the third row, the 7 is tothe left of four number which are smaller in mag-nitude, giving 4 of the violations. Down the firstcolumn, 7 is below two zeroes giving the final 2 ofthe violations. The last of the 7 violations can beseen in the fifth column since the 5 falls below the4.

Although we have not experimented with alternateways to weight the data, it is possible. The usercould weight the seriousness of the upset. For ex-ample, a 12th ranked team beating a 4th ranked teamwould be weighted more heavily than a 9th rankedteam beating an 8th ranked team. This would be im-plemented by simply doing a Hadamard product ofthe data matrix with a weight matrix where the val-ues in the lower, left-hand corner would be largerthan those around the rest of the matrix. A sec-ond idea is to weight the input data by date. If theranking is used as a predictive method for a tourna-ment, then games closer to the tournament wouldbe weighted more heavily. Many different weightscould be used here. Other methods such as Colleyand Massey use linear, exponential, logarithmic, orstep functions to weight the games.

There are many types of data that can be used asinput to create MVR rankings. Below are fourcommon data matrices that can be used for sportsteams.

2

1. Point Matrices:

(a) Psumi, j = sum of all the points scored byteam i against team j

(b) Pavgi, j = average of all points scored byteam i against team j

2. Point Differential Matrices:

(a) Dsumi, j = sum of all the positive pointdifferences scored by team i against teamj

(b) Davgi, j = average of all positive pointdifferences scored by team i against teamj

3. Difference Matrices:

(a) Diffsumi, j = sum of the points scored byteam i minus points scored by team j

(b) Diffavgi, j = average of the points scoredby team i minus points scored by team j

4. Rank Aggregation Matrices:

(a) For rank aggregation, we use rankingsfrom other models and combine the data.We combine the data in the followingway:

(b) RankAggi, j = # lists having i above j

It is important to note that our algorithms havebeen tested with data from entire seasons to use asprediction models for tournaments. However, thisdoes not necessarily mean that we have informa-tion on every head to head match-up between theteams. The rankings can still be computed usingthe indirect relationships in the data. Future workneeds to be done to determine whether there is awarm up period needed for the algorithms to runsuccessfully.

This paper is outlined as follows. First, in Section2, we summarize the major findings from our priorMVR solution technique, which uses mathemati-cal programming to find MVR rankings. Then inSection 3, we propose our new MVR solution tech-nique, which uses the very intuitive method of EO.Section 4 gives results on data from the SouthernConference region of NCAA Division I basketballin the United States. The paper ends with some ex-periments from NCAA basketball and thoughts onfuture work for this topic.

2. Findings from Prior Work in MathematicalProgramming

Other researchers have proposed various methodsfor solving the MVR problem (Ali et al., 1986;Cassady et al., 2005; Coleman, 2005; Park, 2005)and in another paper (Langville et al., 2009), weformulated a binary integer linear program (BILP)to solve the MVR problem described above. In thatpaper, our MVR methods used ideas from math-ematical programming. While specialized knowl-edge of that field is required to appreciate and im-plement those methods, in this section we summa-rize the findings from that paper that pertain to thiswork. Our goal in this paper is to solve the MVRproblem using a more intuitive technique that re-quires no specialized knowledge or software.

The BILP that we formulated and explained in de-tail in (Langville et al., 2009) is below.

minn∑

i=1

n∑j=1

ci j xi j

xi j + x ji = 1 for all i , j

xi j + x jk + xki ≤ 2 for all i , j , k

xi j ∈ {0, 1},

where

xi j =

{1, if item i is ranked above item j0, otherwise.

and C is a matrix of constants formed from the datamatrix D. One definition assigns

ci j := #{ k | dik < d jk } + #{ k | dki > dk j }, (1)

where # denotes the cardinality of the correspond-ing set. Thus, #{ k | dik < d jk } is the numberof items ranking item j above item i. Similarly,#{ k | dki > dk j } is the items ranking item i aboveitem j.

Industrial software such as Xpress-MP finds theglobally optimal MVR ranking. When the opti-mization algorithm concludes, these xi j variablescan be assembled into a binary matrix X, which isthen used to create a ranking of the n items. Theitem with the greatest number of 1s in its row is thehighest ranked team. In fact, this item has a rowsum of n − 1, meaning that it is ranked above ev-ery other item. The second place item will have a

3

row sum of n − 2, the third place item will have arow sum of n − 3, and so on down to the last placeitem, which has a row sum of 0, meaning that it isranked above no items. In addition, in (Langvilleet al., 2009) we described a linear time algorithmfor scanning the optimal ranking to identify mul-tiple optimal solutions. In other words, multipleoptimal solutions correspond to an optimal MVRranking with ties in some rank positions.

We discovered that the O(n3) inequality constraintsdramatically limit the size of ranking problemsthat can be solved with the BILP method. Con-sequently, we used classical relaxation techniquesfrom the field of mathematical programming. Onerelaxation solves the linear program (LP) relax-ation of the original BILP. The LP results were verygood. We were able to prove that the LP solu-tion was optimal for the BILP under certain con-ditions. In the few cases where the LP produces asuboptimal solution, we used bounding techniquesto produce a measure ε, indicating that the LPsolution is within ε% of the optimal BILP solu-tion. Further, the LP relaxation requires slightlyless computation, immediately identifies multipleoptimal solutions, and enables sensitivity analysis,which enables measures of confidence in the as-signments of items to rank positions. While wesolved a 347-team example in under a minute usingXpress-MP software, we were still unsatisfied withthe size of ranking (also known as linear ordering)problems we could solve. As a result, in this pa-per, we present a very intuitive solution techniquecalled EO that requires no specialized knowledgeof mathematical programming or its software andenables the solution of even bigger ranking prob-lems.

3. EO

3.1. Overview of EO

EO, as the name suggests, takes it modus operandifrom natural evolution, and every EO algorithmuses the basic evolutionary ideas of mating, mu-tation, and survival of the fittest to solve an opti-mization problem. The trick is to tailor these basicideas to fit the specific problem at hand. The ideais to start with some initial population of p possi-ble candidate solutions for the problem of interest.Each member of the population is evaluated for itsfitness. The fittest members of the population are

mated to create offspring that contain the best prop-erties of their parents. Continuing with the evolu-tionary analogies, the less fit members are mutatedin asexual fashion while the least fit members aredropped and replaced with immigrants. This newpopulation of p members is evaluated for its fitnessand the process continued. As the iterations pro-ceed, it is fascinating to watch Darwin’s principleof survival of the fittest. The populations march to-ward more evolved, fitter collections of members.Perhaps more fascinating is the fact that there aretheorems proving that evolutionary algorithms con-verge to optimal solutions in many cases and nearoptimal solutions under certain conditions (Fogeland Michalewicz, 2004). Unfortunately, evolution-ary algorithms can be slow-converging, which iswhy careful tailoring to the application is so im-portant (Fogel and Michalewicz, 2004).

3.2. Tailoring EO to the MVR problem

In this section, we tailor the general EO ideas aboveto our specific MVR problem.

Figure 1: Overview of steps of EO for rankingproblem

4

3.2.1. Members of the Population

For our MVR problem, each member of the EOpopulation is a ranking vector, i.e., a permutationvector of the integers 1 through n.

3.2.2. Initialization

The EO algorithm always gets increasingly closerto hillside form as it progresses. This is motivationto find a good initial population. There are manyestablished methods of ranking that have done wellin predictive settings such as the NCAA MarchMadness.1 We use the output from these meth-ods as our initial parent population. Some methodsused are Massey, Colley, and mHits (Colley, 2002;Govan et al., 2009; Massey, 1997). As seen in table1, the initialization truly makes a difference. Theseexperiments were run using all 347 NCAA Divi-sion 1 basketball teams.

Init. time (sec) violations

Random 19.6 1,404,783Best 10 10.3 1,163,143

Worst 10 30.8 1,182,274

Table 1: Runtimes (in seconds) and number of vio-lations for EO with different initializations

Starting solutions from the Colley, Massey, andmHITS methods were considered. The number ofviolations to hillside form was calculated for eachsolution to determine the best and worst 10 solu-tions. It can be seen that the runtime and numberof violations were both lowest for the best initial-ization. Although the worst initialization had thehighest runtime, it had better results for the numberof violations than the random initialization. Theseresults are just preliminary, but suggest that furtherwork should be done on the sensitivity of the ini-tialization.

3.2.3. Fitness

The fitness function for EO is the number of viola-tions to hillside form for the reordered data matrix.This can be calculated using the same C matrix de-fined for the BILP in equation 1.

1Information about the NCAA March Madness Tour-nament can be found at http://www.ncaa.com/sports/m-baskbl/ncaa-m-baskbl-body.html

3.2.4. Offspring

There are two ways to create offspring: mating andmutating. When considering Darwinian ideas, mat-ing should take preference over mutating. Mutat-ing is used as a means to break the population outof a local optimum. Our algorithm allows the userto choose both the percentage of time to mate ver-sus mutate and set the probability density function(pdf) to determine which mating or mutating algo-rithm is used most often. All of our experimentswere run with mating set at 85% and a uniform pdfover each mating and mutating algorithm. Futurework should be performed to determine how thesepercentages affect the outcome.

This section presents the mating and mutating al-gorithms we used for our MVR problem.

1. Mating: These use two or more parent solu-tions to create one offspring.

(a) Borda Count: To compute the BordaCount for a particular team, for each par-ent list and for each team in that list,count the number of teams that it ranksabove. Sum this for each list.

(b) Average Rank: This method uses exactlytwo parent solutions. It averages the cor-responding entries then ranks these aver-ages.

2. Mutating: These use one parent solution tocreate one offspring.

(a) Flip: Randomly choose two teams andflip their positions.

(b) Insert: Randomly choose a team and putit in a different location.

(c) Displace: Choose a group of teams andput them in a different location.

(d) Reverse: Choose a group of teams andreverse their order.

After an offspring is formed, the fitness of that off-spring is checked. If the fitness is better than the fit-ness of the worst parent solution, the offspring be-comes part of the parent population and the worstparent is kicked out. This allows for the popula-tion to always be moving toward improving hillsideform.

3.2.5. Stopping Criteria

Texts suggest many different ways to select a stop-ping criteria (Fogel and Michalewicz, 2004). Our

5

EO algorithm uses the average change of the av-erage fitness of the parent population. Every timea new offspring is added to the population, the av-erage fitness is calculated. We then calculate theaverage change of these fitness values. The usersets a tolerance level for the average change to fallbelow. To allow the algorithm ample chance to es-cape a local minimum, the average fitness changehas to fall below this tolerance level five times in arow in order for the algorithm to stop. Upon termi-nation, the best parent solution in the population isthe ranking.

Figure 1 gives a global view of the EO algorithmapplied to the MVR problem.

3.3. Summary of EO for the MVR problem

We pause to consider the pros and cons of EOwhen compared to the alternative solution tech-niques from mathematical programming, describedin Section 2.

Pros:

• EO is easy to understand and requires no spe-cialized knowledge of mathematical program-ming.

• EO is easy to code and requires no specializedindustrial software.2

• EO is flexible and adaptable. For instance,a user can easily implement multi-objectivefunctions or secondary or tertiary objectivesby changing the fitness function.

• EO can handle big datasets because it scalesup well and can be parallelized.

• Early termination of an EO algorithm givesmeaningful results that are at least locally op-timal.

Cons:

• The EO solution is usually only locally opti-mal. Unlike the mathematical programmingmethods of Section 2, there is no guaranteethat the solution is globally optimal or withinsome percentage of globally optimal.

2Our EO code for the MVR problem is available uponemail request.

• No sensitivity analysis is available with theEO method.

However, it is possible to use the EO solution toinitialize or provide good bounds for the mathemat-ical programming methods, thereby greatly reduc-ing their runtimes, and enabling optimality guaran-tees and sensitivity analysis.

4. Small Example of EO vs. BILP

For this section, we will use the 2009-2010 South-ern Conference (SoCon) Men’s Basketball data.For this dataset, a total of 119 games were playedwith each team playing every other team approxi-mately twice. The results shown use the Davg datamatrix formulation. The heights of each of the barsin the pictures represent the magnitude of the entryin the matrix. The top plot shows the matrix re-ordered with the SoCon actual standings, the mid-dle plot shows the matrix reordered with the resultsfrom EO and the bottom plot shows the matrix re-ordered with the results from the BILP. For the EO,only one initialization was used with results fromMassey, Colley, and mHits, and all types of matingand mutating described in Section 3 were used. Ac-companying these plots are tables with the results.

6

Figure 2: SoCon Standings, EO Results, and BILPResults on 2009-2010 SoCon data.

You will notice that the EO and BILP differ only inthe last two teams. What is interesting about this isthat the number of violations found for both the EOand BILP is 436. Since both of the optimal valueswere the same, but the rankings were different, weknow there must be multiple optimal solutions forthis SoCon example.

5. Large Experiments

This section explores some larger examples forboth EO and BILP.

5.1. Datasets of Varying Size

We tested the runtime and number of violations fordata sets of different sizes for each of the algo-rithms. The first dataset used is all of the data fromthe 2009 - 2010 NCAA Division I basketball sea-son with 347 total teams. The second dataset is allof the data from the 2009 NCAA football seasonwith 634 total teams. The third dataset is all of thedata from the 2009 - 2010 NCAA basketball sea-son with 1041 total teams. The fourth dataset is allof the data from the 2009 - 2010 college basketballseason with 2034 total teams. The runtimes in thetable are in minutes.

EO BILP/LPn time violations time violations

347 .35 1,161,919 .66 1,147,912634 4.2 2,054,127 106.8 1,538,4901041 14.2 11,412,879 1343.8 9,311,5022034 21.3 42,109,102 - -

Table 2: Runtimes (in minutes) and number of vio-lations for EO vs. BILP for problems of increasingsize

Here we note some observations from Table 2.

• The EO number of violations for the n = 347example is within 1.22% of the optimal BILPnumber of violations.

• The BILP took almost 10 times the amount ofminutes to run for the n = 1041 example. Theresults are much better for the BILP, but onemust consider the time factor in determiningwhich algorithm to use.

• The BILP was allowed to run for 24 hours onthe n = 2034 example and did not obtain asolution. We stopped the algorithm as the timewas so much greater than that for the EO.

5.2. March Madness

Both of these methods have been used to predictwinners in each game of the Division I NCAAMen’s basketball tournament, which is often calledMarch Madness, in American College basketballfor the past two years. Before the tournament be-gins, many fans complete brackets predicting win-ners of each tournament game. Once the tourna-ment begins, each correct prediction in a bracketaccrues points. As such, a pool of brackets canbe formed and compete for the highest score. Apopular online pool is the ESPN Tournament Chal-lenge where over 4 million submissions were sub-mitted for the 2009-2010 tournament. In this on-line pool, the brackets are scored so that correctguesses in later rounds earn the fan more points outof the total 1920 points possible. The followingtable shows results of using the various data matri-ces and methods to predict March Madness Games.The final column gives the percentile ranking forthe corresponding ESPN score. For example, theEO Pavg method with ESPN score of 930 scored

7

above 93.3% of the over 4 million brackets submit-ted by fans.

Method ESPN Score PercentileEO Pavg 930 93.3EO Psum 930 93.3EO Rankagg 910 92.9BILP Rankagg 900 92.7EO Diffavg 850 90.9EO Davg 780 86.7EO Diffsum 750 84.1EO Dsum 740 83.2BILP Pavg 710 80.1BILP Dsum 700 78.9BILP Diffsum 660 72.2BILP Diffavg 650 69.9BILP Davg 570 44.2BILP Psum 480 15.8

Table 3: Table of March Madness Results

Let us note some observations from the table:

• Overall, the EO brackets did much better thanthe BILP brackets. The 2009 - 2010 seasonwas filled with many upsets. This may providea reason why the local solutions of the EO bet-ter predicted the tournament. However, this isa very intriguing result that will need furtherinvestigation.

• The P and Rankagg data matrices performedthe best.

• Almost all of the submissions were above the50th percentile of all brackets submitted.

6. Conclusions

This paper presents two equivalent, but differentmethods to solve the MVR problem. EO has asimple structure and can handle large datasets, butit gives a heuristic, often locally optimal value.The binary integer linear program gives a globaloptimum and can be converted to an LP whichhelps with multiple optimal solutions and sensitiv-ity analysis, but has a lot of constraints that signifi-cantly affect the run time.

Our EO algorithm has become much more sophisti-cated since we began this research; however, there

is more work that can be done. We need to up-date our code using the definition of the C matrixto count the number of violations to hillside form.This should decrease the runtime. We can also doanalysis for each of the mating and mutating algo-rithms to determine which lead to quicker conver-gence. Finally, there is a lot of user choice involvedin initializing the algorithm. There are multipleways to do this using rankings from other meth-ods or preprocessing the data itself. We believethat the EO algorithm can become a competitivemethod for ranking sports teams.

Acknowledgements

Dr. Chartier’s research was supported in part by aresearch fellowship from the Alfred P. Sloan Foun-dation. Dr. Langville’s research was supported inpart by NSF grant CAREER-CCF-0546622. Dr.Yamamoto’s research was supported in part by theGrant-in-Aid for Scientific Research (B) 18310101of the Ministry of Education, Culture, Sports, Sci-ence, and Technology of Japan.

We would also like to thank DASH for givingus use of the Xpress-MP Optimization Softwarethrough the Academic Partner Program.

References

Ali, I., Cook, W. D., and Kress, M. (1986). On the minimumviolations ranking of a tournament. Management Science,32(6):660–672.

Cassady, C. R., Maillart, L. M., and Salman, S. (2005). Rank-ing sports teams: A customizable quadratic assignment ap-proach. INFORMS: Interfaces, 35(6):497–510.

Coleman, B. J. (2005). Minimizing game score viola-tions in college football rankings. INFORMS: Interfaces,35(6):483–496.

Colley, W. N. (2002). Colley’s bias free college football rank-ing method: The colley matrix explained.

Fogel, D. B. and Michalewicz, Z. (2004). How to Solve It:Modern Heuristics. Springer.

Govan, A. Y., Langville, A. N., and Meyer, C. D. (2009).Offense-defense approach to ranking team sports. Journalof Quantitative Analysis in Sports, 5(1):1–17.

Langville, A. N., Pedings, K., and Yamamoto, Y. (2009). Aminimum violations ranking method. preprint.

Massey, K. (1997). Statistical models applied to the rating ofsports teams. Bachelor’s thesis, Bluefield College.

Park, J. (2005). On minimum violations ranking in pairedcomparisons.

Reinelt, G., Grotschel, M., and Junger, M. (1984). A cuttingplane algorithm for the linear ordering problem. Opera-tions Research, 32(6):1195–1220.

8

minimum violations sports ranking using ...yamamoto/recent...ples from sports to explain our new mvr...

Documents