transferring lean six sigma and dfss data simply and effectively - baseball analytics

28
TRANSFERRING LEAN SIX SIGMA AND DFSS DATA SIMPLY AND EFFECTIVELY “Baseball Analytics” Baseball is the only field of endeavor where a man can succeed three times out of ten and be considered a good performer. ~Ted Williams 4 th Annual Design for Six Sigma Conference James M. Wasiloff Cary Young US Army TACOM LCMC 9 February 2009

Upload: vijaybijaj

Post on 22-May-2015

463 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

TRANSFERRING LEAN SIX SIGMA AND DFSS DATA

SIMPLY AND EFFECTIVELY

“Baseball Analytics”

Baseball is the only field of endeavor where a man can succeed three times out of ten and be considered a good performer.  ~Ted Williams

4th Annual Design for Six Sigma ConferenceJames M. Wasiloff

Cary YoungUS Army TACOM LCMC

9 February 2009

Page 2: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Agenda

• Introduction of Baseball Analytics• Descriptive statistics and graphical data analysis • Hypothesis development and testing• Analysis of Variance (ANOVA)• Pearson Correlation Coefficient• Simple Linear Regression• Multiple Regression and Best Fit Model • Predictive Models• Statistical Process Control• Next Steps / Application in Other Sports

Page 3: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Introduction

• Why the session…

Better way to understand and teach LSS and DFSS ToolsCan Money Spent = WinsKeep it “Statistically Simple”Just the Beginning

Baseball quote…

The charm of baseball is that, dull as it may be on the field, it is endlessly fascinating as a rehash.  ~Jim Murray

Page 4: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Test of Hypothesis• Null Hypothesis:

Ho12

MLB example: Ho: Mean Batting Average of the NY Yankees from 2006-2008 equals the Mean Batting Average of the Tampa Bay Rays from 2006-2008

• Alternative Hypothesis:

Ha: 12 or 12

“They are not the same”

During my 18 years I came to bat almost 10,000 times.  I struck out about 1,700 times and walked maybe 1,800 times.  You figure a ballplayer will average about 500 at bats a season.  That means I played seven years without ever hitting the ball.  ~Mickey Mantle, 1970

Page 5: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Batting Stats

TeamBatting Averages

2006 2007 2008

Baltimore 0.277 0.272 0.267

Boston 0.269 0.279 0.280

Chicago Sox 0.280 0.246 0.263

Cleveland 0.280 0.268 0.262

Detroit 0.274 0.287 0.271

Kansas City 0.271 0.261 0.269

LA Angeles 0.274 0.284 0.268

Minnesota 0.287 0.264 0.279

NY Yankees 0.285 0.290 0.271

Oakland 0.260 0.256 0.242

Seattle 0.272 0.287 0.265

Tampa Bay 0.255 0.268 0.260

Texas 0.278 0.263 0.283

Toronto 0.284 0.259 0.264

TeamBatting Averages

2006 2007 2008

Arizona 0.267 0.250 0.251

Atlanta 0.270 0.275 0.270

Chicago Cubs 0.268 0.271 0.278

Cincinnati 0.257 0.267 0.247

Colorado 0.270 0.280 0.263

Florida 0.264 0.267 0.254

Houston 0.255 0.260 0.263

LA Dodgers 0.276 0.275 0.264

Milwaukee 0.258 0.262 0.253

NY Mets 0.264 0.275 0.266

Philadelphia 0.267 0.274 0.255

Pittsburgh 0.263 0.263 0.258

San Diego 0.263 0.251 0.250

San Francisco 0.259 0.254 0.262

St. Louis 0.269 0.274 0.281

Washington 0.262 0.256 0.251

American League National League

It ain't like football.  You can't make up no trick plays.  ~Yogi Berra

Page 6: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Test of Hypothesis• Are the batting averages of the National League different than

the American League? • T-test

• Interpretation: “P Low, null must go – P High, null will fly”

Two-Sample T-Test and CI: AL, NL

N Mean StDev SE MeanAL 14 0.27086 0.00772 0.0021NL 16 0.26356 0.00704 0.0018

Difference = mu (AL) - mu (NL)Estimate for difference: 0.00729595% CI for difference: (0.001717, 0.012872)T-Test of difference = 0 (vs not =): T-Value = 2.69 P-Value = 0.012

Page 7: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Are Salaries Correlated to Team Performance?

• The trend is…

• Problem statement:– Will increasing player salaries lead to more

success?

Baseball was the major American sport in which money bought success. George Will, Moneyball

Page 8: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

2008 MLB Salaries and Win Count

Team Total Salary Wins Team Total Salary Wins

NY Yankees $207,108,489 89 San Francisco $76,194,000 72

NY Mets $137,391,376 89 Milwaukee $74,687,499 90

Detroit $137,290,196 74 Cincinnati $74,117,695 74

Boston $133,220,112 95 San Diego $72,626,616 63

Chicago Sox $121,189,332 89 Colorado $68,655,500 74

LA Angels $118,825,333 100 Baltimore $66,806,249 68

LA Dodgers $118,188,536 84 Texas $66,312,326 79

Chicago Cubs $117,954,333 97 Arizona $66,202,712 82

Seattle $116,876,482 61 Kansas City $57,855,500 75

Atlanta $102,849,666 72 Minnesota $56,932,766 88

St. Louis $99,624,449 86 Washington $54,166,000 59

Toronto $97,001,500 86 Pittsburgh $48,689,783 67

Philadelphia $95,479,880 92 Oakland $47,167,126 75

Houston $88,930,414 86 Tampa Bay $43,422,997 97

Cleveland $78,970,066 81 Florida $22,650,000 84

Page 9: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Correlation Between Salary and Wins?

Total Salary 2008

Win

s in

2008

200000000150000000100000000500000000

100

90

80

70

60

Scatter Plot of Salary Versus Games Won -2008

Page 10: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Use these Derivation Formulae or?

Page 11: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Use This Simple Graphic?Pearson Correlation Coefficient Definition

Values of r

Page 12: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Correlation Coefficient

• Graphic approximation… what do you think?

• Minitab results: Pearson correlation of Total Salary 2008 and Wins in 2008 = 0.323

• Interpretation of results

Page 13: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

American League West in 2002(“Moneyball” Data Set)

Team Wins Payroll

Oakland 103 $41,942,665

Anaheim 99 $62,757,041

Seattle 93 $86,084,710

Texas 73 $106,915,180

Pearson correlation of Wins and Payroll = -0.928

Page 14: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

ANOVA

• Null Hypothesis:

Ho12 = 3n

MLB example: Ho: Mean Batting Average of the NY Yankees equals the Mean Batting Average of the Tampa Bay Rays equals the Mean Batting Average of the NY Mets equals the Mean Batting Average of the …

• Alternative Hypothesis:

Ha: At least on kis different from one other k

MLB example: At least one team has a Mean Batting Average different from all other teams

A baseball fan has the digestive apparatus of a billy goat.  He can, and does, devour any set of diamond statistics with insatiable appetite and then nuzzles hungrily for more.  ~Arthur Daley

Page 15: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Regression Analysis• Is it possible to model and predict number of wins for a

season based on statistical parameters?• The initial simple linear regression model, 2002 data:

Team Wins Payroll

Oakland 103 $41,942,665

Anaheim 99 $62,757,041

Seattle 93 $86,084,710

Texas 73 $106,915,180

Page 16: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Multiple Regression and Best Fit Model

• Regression studies the relationship between the mean value of a random variable and the corresponding values of one or more independent variables.

– A model for predicting one variable from another.

– A statistical analysis assessing the association between two variables.Regression analysis is a method of analysis that enables you to quantify the relationship between two or more variables (X) and (Y) by fitting a line or plane through all the points such that they are evenly distributed about the line or plane.

• Multiple regression is a method of determining the relationship between a continuous process output (Y) and several factors (Xs).

Page 17: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

American League West in 2002(“Moneyball” Data Set)

Team Wins Payroll

Oakland 103 $41,942,665

Anaheim 99 $62,757,041

Seattle 93 $86,084,710

Texas 73 $106,915,180

Page 18: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Exploratory Data AnalysisW

ins

in 2

008 100

80

60

Walks72

060

048

0

2000

0000

0

1000

0000

0

0 2.41.60.8 1000

800

600

Tota

l Sal

ary

2008

200000000

100000000

0

Ave

rage

Age

30

28

26

New

2.4

1.6

0.8

Save

s 60

45

30

Run

s (P

) 1000

800

600

Wins in 2008

Wal

ks

1008060

720

600

480

Total Salary 2008 Average Age

302826

New Saves

604530

Runs (P)

Matrix Plot of Wins in 2008, Total Salary, Average Age, New, ...

What does it mean?

Page 19: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Testing the Predictive Model

• Tigers 2008 data…– Here is the predictive transfer function from

Minitab:

– Testing on 2008 Data:• Actual win count = 74• Predicted win count = 74.26

Wins = 32.1 + 1.48 Average Age - 34.5 Team ERA + 154 Team Batting Average + 0.582 Saves (P) + 0.150 Runs (P) - 0.0202 Walks (P) - 0.0087 SO (P)

Page 20: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Statistical Process Control and Statistical Thinking

• “Statistical process control is the application of statistical methods to identify and control the special cause of variation in a process” – iSixSigma.com

• Statistical Thinking: The process of using wide ranging and interacting data to understand processes, problems, and solutions.

– The opposite of “one factor at a time” where the tendency is to change one factor and “see” what happens.

– Statistical thinking is the tendency to want to understand situational phenomena over a wide range of data where several control factors may be interacting at once to produce and outcome.

– Common cause variation becomes your friend and special cause variation your enemy.

– Attribute judgements of good and bad are replaced with estimates of significance with given confidence.

Page 21: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Games Played

Indiv

idual V

alu

e

31161

0

-50

-100

_X=0

UCL=24.2

LCL=-24.2

Games Played

Movin

g R

ange

30151

100

50

0

__MR=9.1

UCL=29.7

LCL=0

1111

11

11

11

11

1

1

Statistical Process Control ModelPercent Variation from "Games Won" Target

Example 1: Notional Data – Status at Game 37

Range outside UCL indicates “out of control” -Need to investigate “special cause”

Page 22: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Games Played

Indiv

idual V

alu

e

464136312621161161

20

0

-20

_X=0

UCL=12.44

LCL=-12.44

Games Played

Movin

g R

ange

464136312621161161

20

10

0

__MR=4.68

UCL=15.28

LCL=0

1111

111

11

1

111

1

1

1

11

Statistical Process Control ModelPercent Variation from "Games Won" Target

Games Played

Y-D

ata

50403020100

30

20

10

0

-10

-20

-30

Variable

Percent off Target

Win targetActual Wins

Scatterplot of Win target, Actual Wins, Percent off vs Games Played

Which Method is Earliest at Detecting a “Special Cause?

Old Way

Analytics Approach

Page 23: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Next Steps

• Additional MLB Analytics

• System approach to baseball

• Other sports?– Golf Fishbone Cause and Effect Analysis

example

Baseball statistics are like a girl in a bikini.  They show a lot, but not everything.  ~Toby Harrah, 1983

Page 24: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics
Page 25: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics
Page 26: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Wasiloff – Young Baseball Analytics“Systems Approach to Batting”

Analytic Based Reactive Batting Problem Solving

Pre Emptive Batting Problem Discovery

Optimal Batting System Design

Batter

Accesso

ries

Fu

nd

amen

tals

Bat

Stad

ium

OurMission

Develop world class batters

who use consistent, disciplined, and proven methods, of eliminating or preventing hitting problems

thereby providing our fans excellence in batting, league leading run creation resulting in high level fan satisfaction  

Systems Based PotentialCauses

•Lean Six Sigma Analytics•Design for Six Sigma•Statistical Methods•Correlation/Regression Analysis•Design of Experiments•VOC / QFD•Taguchi Methods•Innovation Methods

Page 27: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics
Page 28: Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

Questions / comments?

Thanks!

Baseball?  It's just a game - as simple as a ball and a bat.  Yet, as complex as the American spirit it symbolizes.  It's a sport, business - and sometimes even religion.  ~Ernie Harwell, "The Game for All America," 1955