statistics 111 - lecture 25 advanced statistical research...

5
April 23, 2015 Stat 111 - Lecture 25 - Baseball! 1 Advanced Statistical Research in Baseball Statistics 111 - Lecture 25 April 23, 2015 Stat 111 - Lecture 25 - Baseball! 2 Administrative Notes Homework 7 due in recitation on Friday, April 24 Recitation on Friday, April 24 th is mandatory Recitation on Friday, May 1 st is optional and will just be final exam review Q and A My office hours on Tu April 28 th are only 3-4pm No office hours on Tu May 5 th but instead I will hold office hours 3-5pm on Monday May 4th April 23, 2015 Stat 111 - Lecture 25 - Baseball! 3 Administrative Notes Final Exam is Tuesday, May 5th (3-5pm) Covers Chapters 1-8 and 10 in textbook Bring ID cards to final! Allowed: Calculators, double-sided 8.5 x 11 cheat sheet List of additional textbook study problems will be posted Rooms are same as the midterm: Stat 111 Lecture Last Name Midterm Exam Room 11am – 12pm Everyone MEYERSON HALL B1 2 – 3pm A-F MEYERSON HALL B1 2 – 3pm G-Z COHEN HALL G17 Current Methods and Data Bayesball Model SAFE Future Measuring Fielding in Baseball: Present and Future Shane T. Jensen Department of Statistics, The Wharton School, University of Pennsylvania April 23, 2015 Current Methods and Data Bayesball Model SAFE Future Quantifying Fielding Performance in Baseball Overall goal: accurate evaluation of the fielding performance of each major league baseball player Historical Method: Errors Errors only punishes for bad plays, no corresponding reward for good plays No accounting for relative difficulty of each play Historical Method: Fielding Percentage Percentage of time a player properly handles the ball Ambiguity in the denominator: players with poor range could have high FP due to less opportunities Need to take into account the relative difficulty of individual balls-in-play (BIP) Current Methods and Data Bayesball Model SAFE Future Available Data Ball-in-play data available from Baseball Info Solutions Each season has 120000 balls-in-play (BIP) I have worked with BIP data from 2002-08 (seven seasons) Three BIP types: 42% grounders, 33% flys, 25% liners BIP velocity information as ordinal category 300 200 100 0 100 200 300 0 100 200 300 400 X Coordinate Y Coordinate Flyballs Caught by CF 300 200 100 0 100 200 300 0 100 200 300 400 X Coordinate Y Coordinate Flyballs Not Caught by CF

Upload: others

Post on 16-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics 111 - Lecture 25 Advanced Statistical Research ...stjensen/stat111/lecture25.handout.pdfApril 23, 2015 Stat 111 - Lecture 25 - Baseball 1 Advanced Statistical Research in

April 23, 2015 Stat 111 - Lecture 25 - Baseball! 1

Advanced Statistical Research in Baseball

Statistics 111 - Lecture 25

April 23, 2015 Stat 111 - Lecture 25 - Baseball! 2

Administrative Notes •  Homework 7 due in recitation on Friday, April 24

•  Recitation on Friday, April 24th is mandatory

•  Recitation on Friday, May 1st is optional and will just be final exam review Q and A

•  My office hours on Tu April 28th are only 3-4pm

•  No office hours on Tu May 5th but instead I will hold office hours 3-5pm on Monday May 4th

April 23, 2015 Stat 111 - Lecture 25 - Baseball! 3

Administrative Notes

•  Final Exam is Tuesday, May 5th (3-5pm)

•  Covers Chapters 1-8 and 10 in textbook •  Bring ID cards to final! •  Allowed: Calculators, double-sided 8.5 x 11 cheat sheet •  List of additional textbook study problems will be posted •  Rooms are same as the midterm:

Stat 111 Lecture Last Name Midterm Exam Room 11am – 12pm Everyone MEYERSON HALL B1

2 – 3pm A-F MEYERSON HALL B1 2 – 3pm G-Z COHEN HALL G17

Current Methods and Data Bayesball Model SAFE Future

Measuring Fielding in Baseball: Presentand Future

Shane T. Jensen

Department of Statistics, The Wharton School,University of Pennsylvania

April 23, 2015

Current Methods and Data Bayesball Model SAFE Future

Quantifying Fielding Performance in Baseball

Overall goal: accurate evaluation of the fieldingperformance of each major league baseball player

Historical Method: ErrorsErrors only punishes for bad plays, no correspondingreward for good playsNo accounting for relative difficulty of each play

Historical Method: Fielding PercentagePercentage of time a player properly handles the ballAmbiguity in the denominator: players with poor rangecould have high FP due to less opportunities

Need to take into account the relative difficulty ofindividual balls-in-play (BIP)

Current Methods and Data Bayesball Model SAFE Future

Available Data

Ball-in-play data available from Baseball Info Solutions

Each season has ≈120000 balls-in-play (BIP)

I have worked with BIP data from 2002-08 (seven seasons)

Three BIP types: 42% grounders, 33% flys, 25% liners

BIP velocity information as ordinal category

−300 −200 −100 0 100 200 300

0

100

200

300

400

X Coordinate

Y Co

ordin

ate

Flyballs�Caught�by�CF

−300 −200 −100 0 100 200 300

0

100

200

300

400

X Coordinate

Y Co

ordin

ate

Flyballs�Not�Caught�by�CF

Page 2: Statistics 111 - Lecture 25 Advanced Statistical Research ...stjensen/stat111/lecture25.handout.pdfApril 23, 2015 Stat 111 - Lecture 25 - Baseball 1 Advanced Statistical Research in

Current Methods and Data Bayesball Model SAFE Future

Current Methods: Ultimate Zone Rating

Ultimate Zone Rating: divides field up into zones andtabulates success/failures of each fielder within zones

Current Methods and Data Bayesball Model SAFE Future

Current Methods: Ultimate Zone Rating cont’d

Difference between fielders success rate and averagesuccess rate calculated for each zone

Differences weighted by run value and then aggregatedzone for overall rating

Advantage: UZR zones are proxy for difficulty of BIP

Advantage: runs saved/cost is an easy to interpret scale

Disadvantage: zones are an ad hoc discretization of thecontinuous fielding surface

Current Methods and Data Bayesball Model SAFE Future

Other Current Methods

Plus-Minus system (John Dewan): uses zones like UZRAverage success rate sk calculated within each zone kFielder gets credit of 1 − sk for each successful play, debitof −sk for each unsuccessful play in zoneAggregating over zones gives plus-minus valueVersion with run values: defensive runs saved (DRS)

Probabilistic Model of Range (David Pinto): uses anglesto represent BIP direction (instead of zones)

Predicted outs for each direction calculated over all playersActual outs for each direction calculated for individualplayers and compared to predictedDifferent PMR charts for grounders vs. liners vs. flys

Big Zone Metric (Peter Jensen):Uses publicly available MLB Gameday data instead of BISData is less resolute, so larger zones are used

Current Methods and Data Bayesball Model SAFE Future

Continuous Fielding Curves

Zone-based methods break up the field into discrete binsfor computational convenience

High-resolution data could also be used to fit smoothfielding curves to the continuous playing surface

Even more sophisticated approach embeds smoothfielding curves within a Bayesian hierarchical model

Allows for principled sharing of information within andbetween individual players

Current Methods and Data Bayesball Model SAFE Future

Count Data

The outcome of each play is either a success or failure:

Sij =

!1 if the j th BIP hit to the i th player leads to out

0 if the j th BIP hit to the i th player leads to hit

Observed successes and failures are modeled as Binaryoutcomes from an underlying probability pij

Each pij is a function of available data for that BIP:(x , y)ij location, velocity Vij and type of the BIP

These probability functions will be smooth parametriccurves that can vary between different players

Current Methods and Data Bayesball Model SAFE Future

Representation for Different BIP Types

Two-dimensional curves needed for flys/liners: successdepends on velocity, direction and distance to BIPOne-dimensional curves needed for grounders: successdepends on velocity, direction and angle to BIP

−200 −100 0 100 200

0

100

200

300

400

X Coordinate

Y C

oord

inat

e

Flyballs�and�Liners

Forward

Backward

BIPLocation

CF Location at (0,324)

Distance

−100 −50 0 50 100

0

50

100

150

200

X Coordinate

Y C

oord

inat

e

Grounders

SSLocation

GrounderTrajectory

θ Angle

LeftRight

Page 3: Statistics 111 - Lecture 25 Advanced Statistical Research ...stjensen/stat111/lecture25.handout.pdfApril 23, 2015 Stat 111 - Lecture 25 - Baseball 1 Advanced Statistical Research in

Current Methods and Data Bayesball Model SAFE Future

Logistic regression for each smooth curve

Logistic regression used to model smooth curves forprobability pij of successfully fielding BIP j by player i

Logistic regression for fly-balls/liners:

log

"pij

1 − pij

#= βi0 + βi1 Dij + βi2 Dij Fij + βi3 Dij Vij

Dij = distance to BIP, Vij = vel, Fij = 1 if forward (vs. back)

Logistic regression for grounders:

log

"pij

1 − pij

#= βi0 + βi1 θij + βi2 θij Lij + βi3 θij Vij

θij = angle to BIP, Vij = velocity, Lij = 1 if left (vs. right)

Current Methods and Data Bayesball Model SAFE Future

Individual Grounder Curves

Compare curves of individual fielders $βββ i of to aggegrate

model $βββ+ for all fielders at that position

0.0

0.2

0.4

0.6

0.8

1.0

Degrees from SS

P(Su

cces

s)

3rd Base 2nd Base 1st BaseSS Location

22.5 7.5 7.5 22.5 37.5 52.5 67.5

P(Success)�for�Everett,�Jeter�vs.�average�SS

AverageJeterEverett

Current Methods and Data Bayesball Model SAFE Future

Individual Fly/Liner Curves

Compare curves of individual fielders $βββ i of to aggegrate

model $βββ+ for all fielders at that position

Current Methods and Data Bayesball Model SAFE Future

Numerical Summary of Overall Performance

Beyond comparing curves between players, we can derivean overall numerical estimate of fielder performance

SAFE: Spatial Aggregate Fielding Evaluation

For each player, aggregate differences between individualcurve (based on βββ i ) and overall curve (based on µµµ)

Aggregation done by numerical integration over fine gridof values (1D grid for grounders, 2D grid for flys/liners)

Estimates and standard errors of βββ i gives us the mean and95% confidence interval of SAFE for each player

Current Methods and Data Bayesball Model SAFE Future

Differential Weighting in SAFE

Our full aggregation also weights grid points by BIPfrequency, run value, and shared consequence

0.0

0.2

0.4

0.6

0.8

1.0

Degrees from SS

P(Su

cces

s)

3rd Base 2nd Base 1st BaseSS Location

22.5 7.5 7.5 22.5 37.5 52.5 67.5

(a)�P(Success)�for�Jeter�vs.�Average

AverageJeter

Degrees from SS

Dens

ity

3rd Base 2nd Base 1st BaseSS Location

22.5 7.5 7.5 22.5 37.5 52.5 67.5

(b)�Density�Estimate�of�Grounder�Angle

0.50

0.55

0.60

0.65

Degrees from SS

Runs

3rd Base 2nd Base 1st BaseSS Location

22.5 7.5 7.5 22.5 37.5 52.5 67.5

(c)�Run�Consequence�for�Grounders

0.0

0.2

0.4

0.6

0.8

1.0

Degrees from SS

Resp

onsib

ility F

ractio

n

3rd Base 2nd Base 1st BaseSS Location

22.5 7.5 7.5 22.5 37.5 52.5 67.5

(d)�Shared�Responsibility�of�SS

SAFE value: runs saved/cost of fielder vs. average

Current Methods and Data Bayesball Model SAFE Future

Results for Corner Infielders: Best/Worst Posterior SAFE values

Ten Best 1B Player-Years Ten Best 3B Player-YearsName and Year Mean 95% Interval Name and Year Mean 95% IntervalDoug Mientkiewicz , 2007 7.2 ( 2.8 , 11.3 ) Marco Scutaro , 2003 12.6 ( 10.0 , 16.6 )Andy Phillips , 2007 7.1 ( 2.6 , 11.4 ) Mark Bellhorn , 2004 10.4 ( 4.0 , 17.1 )Rich Aurilia , 2007 6.6 ( 2.7 , 10.2 ) Hank Blalock , 2002 10.0 ( 4.2 , 16.5 )Albert Pujols , 2007 5.5 ( 3.1 , 8.2 ) Sean Burroughs , 2004 8.9 ( 3.4 , 14.2 )Doug Mientkiewicz , 2006 5.5 ( 1.8 , 9.1 ) David Bell , 2003 7.4 ( 1.7 , 13.3 )Albert Pujols , 2006 5.1 ( 1.9 , 8.1 ) Scott Rolen , 2002 7.4 ( 1.9 , 12.1 )Kendry Morales , 2006 5.0 ( -0.5 , 10.3 ) Hank Blalock , 2002 7.3 ( 1.4 , 11.3 )Ken Harvey , 2003 5.0 ( 1.5 , 8.0 ) Damian Rolls , 2005 7.2 ( 0.1 , 13.6 )Howie Kendrick , 2006 4.5 ( -0.8 , 9.6 ) Pedro Feliz , 2002 7.1 ( 0.5 , 13.3 )Albert Pujols , 2008 4.1 ( 1.0 , 6.8 ) Joe Crede , 2002 7.0 ( 0.0 , 15.8 )

Ten Worst 1B Player-Years Ten Worst 3B Player-YearsName and Year Mean 95% Interval Name and Year Mean 95% IntervalRichie Sexson , 2002 -4.9 ( -8.2 , -1.9 ) Eric Munson , 2003 -7.1 ( -12.4 , -2.8 )Robert Fick , 2002 -5.0 ( -11.3 , 2.0 ) Michael Cuddyer , 2005 -7.3 ( -11.4 , -2.9 )Mo Vaughn , 2002 -5.1 ( -9.7 , -0.3 ) Michael Cuddyer , 2004 -7.4 ( -14.1 , -2.3 )Dmitri Young , 2003 -5.5 ( -9.9 , 0.1 ) Garrett Atkins , 2007 -7.8 ( -12.4 , -2.4 )Tony Clark , 2005 -6.3 ( -11.7 , -1.6 ) Fernando Tatis , 2002 -8.1 ( -14.2 , -2.0 )Fred McGriff , 2002 -6.4 ( -9.4 , -2.8 ) Chone Figgins , 2006 -8.8 ( -18.7 , -1.4 )Mike Jacobs , 2002 -6.4 ( -9.4 , -2.9 ) Travis Fryman , 2002 -9.4 ( -15.2 , -4.4 )Ben Broussard , 2005 -6.7 ( -10.4 , -2.2 ) Joe Randa , 2006 -9.8 ( -17.3 , -2.8 )Nomar Garciaparra , 2003 -7.2 ( -11.1 , -3.5 ) Ryan Braun , 2007 -10.9 ( -17.4 , -2.9 )Jason Giambi , 2003 -7.7 ( -13.4 , -3.2 ) Jose Bautista , 2006 -11.6 ( -17.4 , -5.9 )

Page 4: Statistics 111 - Lecture 25 Advanced Statistical Research ...stjensen/stat111/lecture25.handout.pdfApril 23, 2015 Stat 111 - Lecture 25 - Baseball 1 Advanced Statistical Research in

Current Methods and Data Bayesball Model SAFE Future

Results for Middle Infielders: Best/Worst Posterior SAFE values

Ten Best 2B Player-Years Ten Best SS Player-YearsName and Year Mean 95% Interval Name and Year Mean 95% IntervalJulius Matos , 2002 18.1 ( 12.4 , 22.1 ) Pokey Reese , 2004 22.6 ( 12.0 , 31.2 )Erick Aybar , 2007 17.6 ( 10.0 , 24.6 ) Adam Everett , 2007 20.4 ( 10.4 , 27.4 )Junior Spivey , 2005 14.5 ( 4.7 , 27.1 ) Adam Everett , 2006 17.1 ( 9.0 , 21.8 )Tony Graffanino , 2006 14.1 ( 4.6 , 27.6 ) Craig Counsell , 2006 14.7 ( 6.9 , 21.1 )Adam Kennedy , 2008 11.3 ( 1.7 , 18.6 ) Jorge Velandia , 2003 14.2 ( 3.0 , 24.0 )Willie Bloomquist , 2005 10.9 ( 4.3 , 17.8 ) Alex Cora , 2005 14.1 ( 3.0 , 24.6 )Jose Valentin , 2006 10.9 ( 4.2 , 17.9 ) Alex Rodriguez , 2003 13.5 ( 3.5 , 24.4 )Chase Utley , 2008 10.8 ( 5.7 , 17.5 ) Maicer Izturis , 2004 13.2 ( 3.8 , 22.2 )Chase Utley , 2005 10.8 ( 3.1 , 17.7 ) Marco Scutaro , 2008 13.0 ( 4.0 , 20.1 )Craig Counsell , 2005 10.8 ( 5.3 , 18.0 ) Brent Lillibridge , 2008 11.8 ( 5.0 , 19.1 )

Ten Worst 2B Player-Years Ten Worst SS Player-YearsName and Year Mean 95% Interval Name and Year Mean 95% IntervalRonnie Belliard , 2008 -9.8 ( -19.5 , 2.6 ) Erick Almonte , 2003 -13.8 ( -26.9 , 2.3 )Geoff Blum , 2005 -10.2 ( -17.5 , -1.7 ) Derek Jeter , 2007 -13.9 ( -21.7 , -5.8 )Miguel Cairo , 2004 -10.9 ( -17.9 , -3.1 ) Michael Morse , 2005 -14.2 ( -23.0 , -4.5 )Terry Shumpert , 2002 -11.0 ( -22.2 , 0.7 ) Damian Jackson , 2005 -14.5 ( -30.6 , -3.5 )Roberto Alomar , 2003 -12.1 ( -19.3 , -4.6 ) Brandon Fahey , 2008 -15.1 ( -22.4 , -8.2 )Enrique Wilson , 2004 -12.3 ( -18.9 , -6.2 ) Marco Scutaro , 2006 -15.1 ( -22.0 , -10.0 )Alberto Callaspo , 2008 -12.4 ( -20.4 , -4.5 ) Derek Jeter , 2003 -15.6 ( -24.8 , -6.4 )Dave Berg , 2002 -13.5 ( -25.1 , -2.4 ) Michael Young , 2004 -15.6 ( -23.6 , -7.2 )Luis Rivas , 2002 -13.8 ( -20.9 , -6.4 ) Josh Wilson , 2007 -15.8 ( -26.5 , -6.4 )Bret Boone , 2005 -15.4 ( -22.4 , -8.1 ) Derek Jeter , 2005 -18.5 ( -29.1 , -9.2 )

Current Methods and Data Bayesball Model SAFE Future

Results for Outfielders: Best/Worst Posterior SAFE values

Ten Best Left Fielders Ten Best Center Fielders Ten Best Right FieldersName and Year Mean 95% Interval Name and Year Mean 95% Interval Name and Year Mean 95% IntervalE Brown , 07 14.4 ( 2.2 , 27.9 ) J Michaels , 05 17.9 ( 3.3 , 32.5 ) G Matthews Jr. , 02 14.4 ( 5.7 , 22.3 )D Dellucci , 06 13.7 ( 5.7 , 20.4 ) C Figgins , 03 15.5 ( 3.8 , 31.2 ) D Mohr , 05 11.8 ( 2.3 , 28.0 )R Johnson , 05 12.1 ( 2.3 , 21.0 ) J Hairston Jr. , 05 13.7 ( 0.3 , 28.6 ) T Nixon , 05 11.5 ( 3.3 , 18.1 )C Crisp , 05 11.2 ( 4.1 , 17.8 ) A Jones , 05 11.8 ( 2.2 , 20.7 ) G Matthews Jr. , 05 10.5 ( 2.6 , 19.0 )S Hairston , 07 11.1 ( 1.1 , 23.5 ) D Glanville , 04 11.1 ( -3.1 , 30.5 ) R Langerhans , 05 10.5 ( 4.6 , 19.3 )S Podsednik , 07 11.1 ( 6.1 , 17.8 ) J Payton , 05 10.2 ( 0.0 , 17.8 ) T Nixon , 04 9.4 ( 0.7 , 19.4 )M Byrd , 05 10.7 ( 0.5 , 22.7 ) J Edmonds , 05 10.1 ( -0.5 , 20.5 ) A Escobar , 03 8.7 ( -0.1 , 19.3 )G Vaughn , 02 10.7 ( 1.9 , 16.9 ) J Gathright , 05 10.1 ( -6.6 , 25.0 ) A Ochoa , 02 8.7 ( -2.3 , 20.9 )O Palmeiro , 02 10.6 ( 0.8 , 22.2 ) D Erstad , 03 10.0 ( -1.2 , 20.7 ) E Marrero , 02 8.7 ( -0.9 , 20.7 )T Long , 04 10.3 ( -0.8 , 21.7 ) C Patterson , 04 9.8 ( 1.9 , 17.9 ) J Drew , 03 8.4 ( -1.4 , 20.8 )

Ten Worst Left Fielders Ten Worst Center Fielders Ten Worst Right FieldersName and Year Mean 95% Interval Name and Year Mean 95% Interval Name and Year Mean 95% IntervalK Mench , 02 -10.7 ( -19.4 , -2.3 ) C Hermansen , 02 -9.5 ( -23.6 , 4.3 ) G Kapler , 05 -8.1 ( -13.2 , -1.6 )K Mench , 03 -11.1 ( -14.9 , -7.3 ) D Roberts , 05 -9.8 ( -21.0 , 2.2 ) L Walker , 05 -8.2 ( -17.2 , 1.2 )A Piatt , 02 -12.4 ( -16.8 , -6.9 ) R Ledee , 02 -10.0 ( -19.6 , 0.5 ) J Guillen , 05 -8.6 ( -17.0 , 0.7 )L Berkman , 05 -13.0 ( -16.7 , -8.2 ) K Griffey Jr. , 04 -12.5 ( -24.4 , -1.3 ) K Mench , 02 -8.6 ( -17.7 , 0.7 )M Ramirez , 07 -13.5 ( -19.1 , -5.4 ) B Williams , 04 -13.2 ( -24.5 , -3.1 ) E Kingsale , 04 -9.2 ( -13.1 , -2.9 )R Sierra , 03 -13.8 ( -16.2 , -10.3 ) S Green , 05 -13.3 ( -28.3 , 2.8 ) W Pena , 02 -9.7 ( -17.6 , 0.2 )B Kielty , 06 -15.2 ( -16.7 , -9.1 ) L Terrero , 05 -13.6 ( -29.4 , 5.8 ) C Wilson , 02 -11.1 ( -22.1 , -2.6 )T Womack , 05 -17.1 ( -25.0 , -5.0 ) B Williams , 05 -14.2 ( -23.4 , -5.3 ) J Gonzalez , 05 -13.2 ( -16.4 , -10.4 )L Berkman , 02 -18.2 ( -18.9 , -17.0 ) M Grissom , 05 -20.3 ( -34.2 , -9.4 ) M Tucker , 03 -14.1 ( -21.8 , -8.1 )M Ramirez , 06 -19.5 ( -24.8 , -13.5 ) J Cruz , 05 -22.4 ( -36.2 , -5.4 ) Sheffield , 04 -14.7 ( -21.6 , -9.5 )

Current Methods and Data Bayesball Model SAFE Future

Comparing SAFE to Current Methods

Decent overall correlation between SAFE and othermethods, though magnitudes much less for SAFE

Current Methods and Data Bayesball Model SAFE Future

Summary of Our Approach

BIP data allows more detailed examination of differencesbetween players

Parametric approach: smooth probability functionreduces variance of results by sharing information betweenall points near to a fielder

SAFE run value aggregates individual differences whileweighting for BIP frequency, run value, and sharedconsequence between positions

Current Methods and Data Bayesball Model SAFE Future

Publicity and Feedback

Boston Globe (Gideon Gil, 02/16/08):

“Numbers tell a glove story”

Wired (Greta Lorge, 02/16/08):

“Statistics in the Outfield”

AP (Randolph E. Schmid, 02/16/08):

“Baseball’s top fielders ranked innew statistical system”

New York Post had different take on study:

“You’ve Got To Be Kidding!”

Jeter himself responded in NY Post:

“Must have been a computer glitch”

Current Methods and Data Bayesball Model SAFE Future

Player Starting Positions

Glossed over reality: we don’t actually know where fieldersare located when a particular BIP is hit!

Each distance/angle is an estimated value, not a knownvalue since starting location is not truly known

Starting location for each position fixed at point withhighest overall success probability

Problematic when fielders are systematically different intheir positioning e.g. 1B when there is a man on base

Ideally, would build information about defensive shifts aswell

Page 5: Statistics 111 - Lecture 25 Advanced Statistical Research ...stjensen/stat111/lecture25.handout.pdfApril 23, 2015 Stat 111 - Lecture 25 - Baseball 1 Advanced Statistical Research in

Current Methods and Data Bayesball Model SAFE Future

Differences between Ballparks

Current analysis does not take into account differences inthe playing field for different parks

Could impact both evaluation of infielders (turf vs. grass)and outfielders (different outfield shapes)

Park-specific BIP densities could account for differencesin shape but may have higher variance (less data)

Current Methods and Data Bayesball Model SAFE Future

Field F/X

New system tracks players and BIPs with video cameras

This data will revolutionize the estimation of fielding ability

Current Methods and Data Bayesball Model SAFE Future

Field F/X cont’d

How will Field F/X improve fielding estimation?

Real starting positions and speed for each player

Real hang time on flys/liners instead of current proxiesbased on distance/velocity

Real trajectories on all BIPs: was that liner to theshortstop 10 feet (catchable) or 20 feet (uncatchable) offthe ground?

Issue is availability of data. Current access limited to onlya few people (I’m not currently one of them).

Current Methods and Data Bayesball Model SAFE Future

True Defensive Range (TDR)

Greg Rybarczyk (Hittracker.com) co-authored article inthe 2011 Hardball Times on "An Introduction to FIELDf/x"

True Defensive Range (TDR) uses Fieldf/x location andhang time to evaluate each defensive play

Fielders credited/debited for plays based onleague-average likelihood of those plays

Expect many fantastic advances over the next few years!