furman engaged pres

28
Pitcher Cluster Analysis in Major League Baseball Dr. John Harris, Dr. Tom Lewis, Jamey McDowell, Ian McConnell

Upload: james-mcdowell

Post on 14-Feb-2017

36 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Furman Engaged Pres

Pitcher Cluster Analysis in Major League Baseball

Dr. John Harris, Dr. Tom Lewis, Jamey McDowell, Ian McConnell

Page 2: Furman Engaged Pres

Issue:

When predicting an individual batter’s performance against an individual pitcher’s, frequently that batter has not faced that pitcher often enough to generate a significant sample size.

For example, Freddie Freeman has gone 2 for 10 against Cole Hamels this season; is this really an accurate predictor of how well he will perform against Hamels in his next at-bat?

Page 3: Furman Engaged Pres

Goal

We wanted to increase sample size of batter-pitcher “matchups” in order to better predict future interactions between specific players

To accomplish this, we grouped pitchers with similar styles together; we did this through the use of clustering algorithms

Page 4: Furman Engaged Pres

Hypothesis

By seeing how well a batter does against a cluster of same-style pitchers, we can better predict how well he will fair against any particular pitcher contained within that cluster

Page 5: Furman Engaged Pres

Data Sources

Sean Forman, founder of Baseball-Reference.com

Baseball-Reference.com

Brooks Baseball

Page 6: Furman Engaged Pres

Sample Data Sheet:

Page 7: Furman Engaged Pres

Metrics Analyzed

List of every plate appearance of the 2014 regular season, sorted by date

Pitch type statistics by pitcher

Pitcher style

Batter hand (L/R/Switch)

Page 8: Furman Engaged Pres

Data Exclusions

Batters who only batted in second half of season

Batters who only sacrifice bunted in first half

Page 9: Furman Engaged Pres
Page 10: Furman Engaged Pres

Clustering Methods in Use

K-means: make initial guesses at k cluster centers, then adjust centers based on mean of observations in that cluster

Decision Tree Analysis: let the computer choose which pitcher characteristics most strongly affect opposing OBP (minimum cluster size 50)

Page 11: Furman Engaged Pres

Clustering Pitchers by...

Pitch independent stats (Strike percentage, GB/FB, etc.)

L-R

Batter performance against pitcher

Pitch similarity (i.e. fastballs thrown alike, etc.)

Page 12: Furman Engaged Pres

Pitch Independent Stats, K-means

K=17

Even spread on large clusters, obvious reasonings for small clusters

Page 13: Furman Engaged Pres
Page 14: Furman Engaged Pres

Pitch Independent Stats, CART

8 clusters/leaves

Decided by OBP against

Important factors:Strike percentage

Strikeout percentage

Velocity difference between top two pitches

Pitches per plate appearance

Page 15: Furman Engaged Pres
Page 16: Furman Engaged Pres

Batter performance against

8 clusters

Pitchers in same cluster if same batters perform in a similar fashion against those pitchers

Page 17: Furman Engaged Pres

Clustering Batters

When we treat batters as individual entities, they do not have enough plate appearances to make accurate predictions

We solve this by treating all left-handed batters as the same “batter”, and do the same with righties and switch hitters

Page 18: Furman Engaged Pres

Method

By a given cluster method, assign each pitcher to numbered cluster

Compile every plate appearance and total on-base for a pitcher cluster against a batter type in first half of season

Page 19: Furman Engaged Pres

Example

Left-handed batters are 200 for 860 against cluster 3

This is the predicted performance of LHBs against cluster 3 in second half of 2014

Page 20: Furman Engaged Pres

Method (cont.)

Run the same compilation on second half of season

We test the accuracy of our prediction with a minimum variance test

Page 21: Furman Engaged Pres

Minimum Variance Test

∑(x_i-p)^2=S(1-2p)+Np^2

S=# of times on base

N=Number of opportunities (PA)

p=predicted OBP v. cluster

Page 22: Furman Engaged Pres
Page 23: Furman Engaged Pres

Results

Hypothesis confirmed: every method we tested better predicted the second half of a season than career history

Clustering methods which also cluster batters were the only ones that “beat” prediction based on first half OBP of batter

Page 24: Furman Engaged Pres
Page 25: Furman Engaged Pres

Conclusions

Computer-chosen decision statistics best separate pitchers into clusters

Sample sizes are large enough to accurately predict when all batters are treated as three clusters

Page 26: Furman Engaged Pres
Page 27: Furman Engaged Pres

Application

In-game decisions

Pre-game decisions

Page 28: Furman Engaged Pres

Future WorkTest on other years

Cluster batters in ways other than handedness

Probit Modeling