compiling and presenting ultimate datacutler/classes/... · an accessible presentation of currently...

10
Compiling and Presenting Ultimate Data Timothy Newman and Benjamin West May 2016 1 Abstract In this paper we present a tool to visu- alize the data publicly provided by one of Ultimate Frisbee’s professional leagues, the American Ultimate Disc League (AUDL). To our knowledge, there have only been a few attempts to visualize this data and there are zero efforts to produce some of the most ba- sic visuals. We have compiled all the data in one place and wrapped it in a nice tool to allow easy exploration. 2 Introduction Having detailed statistics for Ultimate is a very new concept[1]. USAU, the governing body for club ultimate, has tracked some ba- sic statistics like goals and D’s for a little while. These are limited, and they gener- ally only track them during Nationals, which leads to a small sample size. In 2012 two club leagues started - the American Ultimate Disc League (AUDL) and Major League Ultimate (MLU). With these has come far more detailed records of games. So far the only people we have seen explore this data have been the leagues themselves. For example, the AUDL displays the indi- vidual statistics for each player, a large and cluttered parallel coordinates system of all the players, and a visual of how often players passed to different team mates. What is lacking is a way to really explore these statistics in the basic ways we expect to be able to. Other sports have a wide va- riety of ways to do this, from compendiums of every game ever recorded[2], to news sites dedicated solely to covering them[3]. We aim to build one of the first tools in Ultimate to do this. Hopefully, this will allow players to ask and answer basic questions such as ”Who is the best at scoring?”, ”What are the stan- dards for good players?”, etc, and challenge the common wisdom in the sport today. To this end we have compiled all the data from the AUDL and wrapped it in a nice front end that allows for easy generation of basic graphics such as scatter plots and bar graphs. Even though we only have the basics done it has already prompted us to ask and answer questions we had never considered be- fore even though we have a collective 6 years of experience. Once our tool is a little more 1

Upload: others

Post on 29-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Compiling and Presenting Ultimate Datacutler/classes/... · an accessible presentation of currently avail-able data that was clean and intuitive, there would be sizable audiences

Compiling and Presenting Ultimate Data

Timothy Newman and Benjamin West

May 2016

1 Abstract

In this paper we present a tool to visu-alize the data publicly provided by one ofUltimate Frisbee’s professional leagues, theAmerican Ultimate Disc League (AUDL). Toour knowledge, there have only been a fewattempts to visualize this data and there arezero efforts to produce some of the most ba-sic visuals. We have compiled all the datain one place and wrapped it in a nice tool toallow easy exploration.

2 Introduction

Having detailed statistics for Ultimate is avery new concept[1]. USAU, the governingbody for club ultimate, has tracked some ba-sic statistics like goals and D’s for a littlewhile. These are limited, and they gener-ally only track them during Nationals, whichleads to a small sample size.

In 2012 two club leagues started - theAmerican Ultimate Disc League (AUDL) andMajor League Ultimate (MLU). With thesehas come far more detailed records of games.So far the only people we have seen explore

this data have been the leagues themselves.For example, the AUDL displays the indi-vidual statistics for each player, a large andcluttered parallel coordinates system of allthe players, and a visual of how often playerspassed to different team mates.

What is lacking is a way to really explorethese statistics in the basic ways we expectto be able to. Other sports have a wide va-riety of ways to do this, from compendiumsof every game ever recorded[2], to news sitesdedicated solely to covering them[3]. We aimto build one of the first tools in Ultimate todo this. Hopefully, this will allow players toask and answer basic questions such as ”Whois the best at scoring?”, ”What are the stan-dards for good players?”, etc, and challengethe common wisdom in the sport today.

To this end we have compiled all the datafrom the AUDL and wrapped it in a nicefront end that allows for easy generation ofbasic graphics such as scatter plots and bargraphs. Even though we only have the basicsdone it has already prompted us to ask andanswer questions we had never considered be-fore even though we have a collective 6 yearsof experience. Once our tool is a little more

1

Page 2: Compiling and Presenting Ultimate Datacutler/classes/... · an accessible presentation of currently avail-able data that was clean and intuitive, there would be sizable audiences

polished we plan to release it to the generalcommunity and garner feedback.

3 What Is Ultimate?

Before we get too far it’s important to givesome context. Ultimate is a field sport playedwith seven players per side. Just like in foot-ball, you score by catching the disc in theendzone you are attacking. Players may notrun with the disc, and it is advanced by pass-ing it to your teammates. This must be donewithin ten seconds, or it is a turnover. If thedisc touches the ground while a player is notholding it, only the other team may pick itup. To start a point, one team throws to theother - like a kickoff in football.

These rules result in statistics frequentlyseen in other sports such as basketball, foot-ball, and soccer. The current big four statis-tics that get the most talk are goals, assists,Ds (when you cause a turnover on defense),and turnovers. There are stats that everyoneknows about but less people track because itis more time consuming such as number ofthrows and completion percentage. Lastly,Ultimate has some plays unique to itself. Anexample is a Callahan, which is when youscore off of the other team’s throw.

The bottom line is that while Ultimate hasa unique combination of stats, most of themare pretty similar to other well-known gen-eral sports metrics. There are not yet anystatistics that are of comparable complex-ity to WAR from basketball or VORP frombaseball[7][8].

Figure 1: Photo credit Alain Hwang. Playcredit Andrew Yale, Data Structures TA

4 Formulation

Before we describe the tool, it is important todescribe our exact motivation for creating thetool. Understanding why we created it andwho we created it for should explain some ofthe decisions we made.

4.1 Motivation

We touched on this briefly above, but it isworth elaborating on. Ultimate is in the mid-dle of one of its formative stages. Between theflood of youth programs, the creation of proleagues, and USAU signing a broadcast dealwith ESPN, it is clear that the sport is start-ing to gain traction in the general public. Asthis happens more and more people are goingto want to talk about it.

We aren’t saying that easy access to stats

2

Page 3: Compiling and Presenting Ultimate Datacutler/classes/... · an accessible presentation of currently avail-able data that was clean and intuitive, there would be sizable audiences

is necessary to conversation. Before statis-tics in sports became mainstream, people cer-tainly had no trouble talking about them forhours on end. But to argue that an interest insporting numbers is only for a niche group ofnerds would be completely ignoring the behe-moth that is fantasy football. Knowing howthe best in the sport are doing is clearly en-ticing for a lot of people.

We don’t expect the only interest will comefrom diehard fans though. Ultimate is in afunny place right now where the highest levelof competition is Club. This means anyonecan dream of competing at the highest level,or at the very least hope to play against thebest a couple times. The Ultimate commu-nity is full of extremely dedicated amateursprobably looking to improve their own game.And one of the best ways to improve is to seewhat the best are doing.

To wrap up why we wanted to create thistool: We expect that if there already existedan accessible presentation of currently avail-able data that was clean and intuitive, therewould be sizable audiences coming from mul-tiple places within the Ultimate community.This is supported both by Ultimate’s struc-ture as a sport, and from experience withother sports.

4.2 Audience

Even though it may be obvious from the mo-tivation section, it is important to define whothe audience is. We are restricting to peoplealready entrenched in the community with agood understanding of the game. We aren’tintending for this to draw new players in.

Furthermore, we are restricting to people whoalready have some understanding of numbersand charts. It is more important to allowaccess to these numbers to those who mighthave good ideas on how to use it than itis to try and rope everyone in at once andchange the mindset around the whole game.We’ll get into this in the next section, but thestatistics available definitely aren’t the be-allend-all and there is a lot of work to be donebefore they are useful to a wide audience.

4.3 Goals

Our goals will go hand in hand with our mo-tivation and our audience. We want to cou-ple the available data to some clean and ba-sic visualizations that are easy to create forthose who already have a little backgroundknowledge. To what specifics does this endup translating, though? We have a list be-low.

4.3.1 Get The Basics Done

Basic chart types should be supported. Fornow these include bar charts, scatter plots,and a view of all statistics on an individualplayer (this last one coming by our presenta-tion on Tuesday).

4.3.2 Ease Of Creation

These charts should be easy for a user to cre-ate and alter. Without this feature, no actualexploration will occur.

3

Page 4: Compiling and Presenting Ultimate Datacutler/classes/... · an accessible presentation of currently avail-able data that was clean and intuitive, there would be sizable audiences

4.3.3 Give Them Everything

All the data that we have parsed should beaccessible within the page. We aren’t tryingto show people where to look.

4.3.4 Leave It Up To Them

There should be flexibility in which statisticsto plot. The user should be able to decidewhich metrics they think are most interest-ing.

4.3.5 Keep It Simple

The interface should be as minimal as pos-sible without causing inconvenience. Thiswill not be the tool to create elaborate plotsor generate complicated mathematics, andshould be as easy to learn as possible.

5 Previous Work

The previous work we have seen falls into twomajor categories: work done on Ultimate,and work done on other sports.

Outside of the AUDL’s own website therehas been two projects to gather and visualizeUltimate data that we can find[4]. UltiAppsis an app that allows players to track passeson the field to later get a good idea of wherethe disc tends to go in a typical point. It re-sulted in a paper at SLOAN [?] and an affir-mation of some of the most basic knowledgein the game-having the disc in the center ofthe field is valuable. Compared to what weare trying to do, it is aimed at an entirelydifferent area. It is for players to track their

own games, and it’s best visual is a heatmapof the field.

There is also Ultistats. This is a mobilesite meant to assist teams in tracking theirown statistics, and produces different visualsthen we do. Again, its main goal is to assistteams in keeping a record. This is a differentgoal from ours, which is to allow players toexplore a large amount of data generated byexcellent players to learn about the game ingeneral.

Outside of Ultimate there is a huge amountof previous work in sports data. Tim followsthe NBA heavily, and Ben is just interestedin sports in general. We have obviously bothbeen influenced by the reporting that goes onduring games broadcasts and in the generalmedia.

FiveThirtyEight in particular has done alot of good statistics based reporting onsports, and produced a lot of good visuals.While they have some really out their stuffin terms of visualizations, their fall backs arebar graphs and scatter-plots. The amount ofinformation they convey with these two ba-sics leads us to believe we could don’t haveto jump straight into the deep end.

Grantland did some great reporting andmade some cool visualizations that were lessstandard[11]. They would be a good place tolook for inspiration after we’ve got the basicsdone.

We also did some research on what kindsof statistics were available and how they werepresented during the 90’s and early 2000’s.This was a time that numbers were start-ing to gain traction, but still in their earlystages. We referenced Statistics in Sport to

4

Page 5: Compiling and Presenting Ultimate Datacutler/classes/... · an accessible presentation of currently avail-able data that was clean and intuitive, there would be sizable audiences

get a good idea of the landscape during thistime. What we saw was that the statisticswere still really basic, and the displays ofthem easily so. A lot of the discussion wasof how to either generate or combine the ba-sic statistics to make them more meaningful.We aim to mirror this. It would be better tocreate something that allows people to thinkcritically about the information we have thanto generate more information before we havedone any thinking.

We also took some early inspiration fromNBAVis, a site that does something similarto ours but for the NBA. It’s less fleshed outwith features, but still a good tool and itwould be wrong not to mention them.

6 Research Question And

Hypothesis

To state it explicitly: our research question iswhether a tool could be created that would al-low the ultimate community to interact withtheir data in a way they find meaningful. Ourhypotheses is that people will feel that ourtool allows them to generate and answer in-teresting questions, and get a better under-standing of how well certain players perform.

7 Description

7.1 The Visual

For the visual it was important to us thatwe have as little clutter on the screen as pos-sible. Where ever possible we did work to

either hide information when not needed, ordo work behind the scenes. Because neitherof us was that familiar with web developmentsome of the data binding and actual dynamicstuff took a while, but we are happy withboth the results and the skills we picked up.

The basics of the interface are the graphand the selection menu. The graph type canbe changed via a drop down menu, and closeto the axes are menus for selecting the stats.On the actual graph the labeling is minimalbecause we have enabled hovering over datapoints to get more information about them.

The selection menu is on the right. It con-tains the teams and players available to beselected. To add something to the graph findthem here and click on them. Their namewill be highlighted and the data automati-cally plotted. To remove them click on theirname again.

Figure 2: Here is our layout. Graph on theleft, selection on the right, and menus up top

This interface may seem rather simple, butthat was the whole point. We want anyoneto be able to use and understand it. With as

5

Page 6: Compiling and Presenting Ultimate Datacutler/classes/... · an accessible presentation of currently avail-able data that was clean and intuitive, there would be sizable audiences

basic a layout and as much work done aheadof time as possible we hope we have accom-plished this.

Originally we wanted to color data pointsbased on team colors, but we soon realizedthat team jerseys are too messy and theiraren’t yet defining colors attached to teams.Because of this we decided a two color schemeof blue and gold to avoid making people thinkthey had found information where there isnone.

We have had one problem with the bargraph when adding large amounts of data.As you add more data the bar’s thickness be-comes a small amount of pixels, and at cer-tain amounts of data points it causes themto clump together in the middle. We haven’tfound a great solution to this, but making thegraph bigger helps a lot.

Figure 3: An example of the bars clumpingtogether because they want to be next to eachother but are only a couple pixels wide.

7.2 Creation

To create this visual we used D3.js. We owea lot of credit to Mike Bostock and all of hisextremely helpful tutorials and examples[6].We are happy with the result we got fromD3, but at many times it was a pain to use.We are developing on both Ubuntu and MacOSX. Frequently something would work ononly one of our machines, or only on Firefoxbut not Chrome. Not only that, but outsideof the tutorials we found the documentationrather lacking. There wasn’t an equivalent toman pages that we could find, which meantthat every time we had a problem or wantedto use a function it was difficult to get actualspecs.

As far as gathering the data, we exploreda couple possibilities. The first was to ripthe data from the HTML on the sites them-selves. The AUDL’s website, however, usesAngularJS to fill its pages with data afterthey load, which makes web scraping verytricky. We ended up downloading the play-by-play data for each team and tabulatingthis into the statistics we thought were im-portant. This included pretty much everystatistic we had ever seen tracked for Ulti-mate. We did this in python, and wrote theresults to a results into a csv that we laterread with javascript.

This tabulation took some work, as we hadto write code for any stat that we wanted, anddeal with a lot of edge cases in the data for-matting. Now that we have an infrastructureset up though, it takes much less work to addstats as we need them so it is a good thingwe went this way instead of going only with

6

Page 7: Compiling and Presenting Ultimate Datacutler/classes/... · an accessible presentation of currently avail-able data that was clean and intuitive, there would be sizable audiences

the stats calculated by the AUDL.

While writing code to gather these statis-tics we created a visual debugging tool toshow how successfully we were handling thefiles. A green line represents a success and ared line represents a failure.

Figure 4: A visual of how well we processed afile. Green is good, red is a line we got noth-ing from. Most of the red lines are actuallyat cessations for validation of our tracking.These looked much more red earlier in devel-opment.

During the creation process we got someuseful feedback from the class. They gave ussome helpful tips on how to solve some of theproblems we experienced with D3, as well asuseful feedback on what features and layoutsthey considered the most important. Unfor-tunately, none of the classmates we talked tofollowed any sports so they didn’t give us alot of feedback on whether this was somethingour target audience might actually use.

8 Preliminary Insights

Though we have only had access to the toolfor a small amount of time, and it is still un-dergoing development, we have already cometo a couple interesting conclusions. We dis-cuss these as an illustration of the toolspower.

While looking at the number of toucheseach player got it quickly became apparentthere was not the divide between handlersand cutters we might have expected. Han-dlers are sort of like the quarterbacks of ul-timate - it is their job to handle the disc.Cutters are like the receivers - it is their jobto generate yardage downfield. One might ex-pect that by looking at the number of posses-sions that certain players on a team had youwould be able to draw a line between the han-dlers and the cutters. And for some teams,this is true. But most of the time there is asmooth decline right down to the last man.And when one looks at all the teams at oncethere is a lack of any sudden drop-off any-where.

Figure 5: A plot of the number of touchesfor players on the Atlanta Hustle. Notice thesmooth decline with no real cliffs.

Maybe the better statistic to look at wouldbe touches/point. We don’t have the abilityto do this yet, but without this tool we neverwould have even asked the question.

Another thing we found that we wouldn’thave otherwise was that the leading goalscorer was someone we had never heard of

7

Page 8: Compiling and Presenting Ultimate Datacutler/classes/... · an accessible presentation of currently avail-able data that was clean and intuitive, there would be sizable audiences

Figure 6: A plot of the number of touches forplayers acroos all of the AUDL. Again, noticethe very smooth decline with no cliffs.

before (though the second most prolific wasBeau Kittredge, widely regarded as one ofthe best in the game). Not only that, heaccomplished the feat on a freakishly smallnumber of touches. During the 2015 seasonEthan Beardsley scored 70 goals on only 140touches. This made him a pretty extremeoutlier.

Figure 7: A scatter-plot of catches vs. goals.Far in the top left we see Ethan Beardsley,who scores pretty much every other time hetouches the disc.

9 Bloopers

During our design process we definitely en-countered some bugs, and some things we hadnot planned for. We’ve put some of themhere.

Figure 8: y-axis destruction. There are alsono teams named ”name”. The bar chart isnot sorted

Figure 9: There were initially many lines ofcode that we failed to parse correctly.

10 Limitations

There are some definite limitations of ourtool. The main one being that users a lim-ited to only a couple basic visuals, and ifthey feel that a different graph would be bet-ter that is too bad. This may be less truein other tools, but the time taken to createa graph is also much longer. Another limi-tation is that currently we don’t have it set

8

Page 9: Compiling and Presenting Ultimate Datacutler/classes/... · an accessible presentation of currently avail-able data that was clean and intuitive, there would be sizable audiences

up such that users could easily feed in theirown data. It could be done by someone tech-nically savy enough to run a python script,but that is a higher barrier to entry thanwe would like. This is something we couldchange by allowing people to upload stats,but there is a site that does that already (Ul-tiAnalytics) that it doesn’t make too muchsense to compete with. Lastly, because ofthe fact that the original data is play by playusers are limited to statistics that we have al-ready calculated. We plan to expand this alittle in future work, but are only able to goso far with it. With the statistics we do havethere are some things we can’t do that maybe important. For example, we don’t pro-duce what the correlation between two statsis when looking at them on the plot. Thereare lot of things we could do here, but theymay clutter the interface. We want to gathersome user feedback before adding too manyfeatures. Lastly, there is a lot of data thatsimply isn’t available yet. The easiest exam-ple is that it would be great if the play byplay data included the location on the field,but it does not. There is no tracking of off-disc movement, and nothing like baseball haswhere they track ball velocity. If this wereever to be made available, our setup is notprimed to visualize it well.

11 Results and Future

Work

We haven’t run a user study, but so far weare happy with the results. Both of us gen-

uinely enjoy using the app, and have used itto ask and answer questions we had not pre-viously thought about. There are a couplefeatures we plan to add in the immediate fu-ture, perhaps even by Tuesday. These includea search bar for players and a way to navi-gate to the stats page of an individual playerby clicking at their data point on the graph.We would also like a way for players to crafttheir own statistics, e.g. touches/point. Af-ter we are done with this we plan to post oursite on Reddit’s Ultimate subreddit. Redditis currently the main place people go on theinternet to talk about Ultimate. From post-ing there we hope to get some exposure, andsome feedback about what people thoughtand what features we should add. If thefeedback was good, we would go about im-plementing them.

12 Division of Labor

In our project proposal we were docked somepoints for not being clear in the division oflabor. However, about 90% of the time wespent working on this project we were sittingat the same table coding together and lookingover each other’s shoulders. As a result bothof us were deeply involved in every part ofthe project, and we both did approximatelythe same amount of work.

9

Page 10: Compiling and Presenting Ultimate Datacutler/classes/... · an accessible presentation of currently avail-able data that was clean and intuitive, there would be sizable audiences

References

[1] Carl Bialik: Ultimate Frisbee Is In The Dark Ages Of Analytics - And It Wants ToEscape fivethirtyeight.com.

[2] Baseball Reference http://www.baseball-reference.com/.

[3] ESPN espn.go.com.

[4] Ulti Apps ultiapps.com.

[5] USA Ultimate http://www.usaultimate.org/index.html

[6] Mike Bostock Personal Site https://bost.ocks.org/mike/

[7] Fangraphs: What is WAR? http://www.fangraphs.com/library/misc/war/

[8] Baseball Prospectus: VORP http://www.baseballprospectus.com/glossary/index.php?search=VORP

[9] Jay Bennett Statistics in Sport Arnold Application of Statistics, 1998

[10] NBAVIS http://www.cc.gatech.edu/gvu/ii/sportvis/nbaVis/vis/index.html

[11] Grantland http://grantland.com/

10