Download - The Impact of Hitters on Winning Percentage and Salary

Mandrik1

TheImpactofHittersonWinningPercentageandSalaryUsingSabermetrics

ZachMandrik

ProfessorDeprano

4/11/2017

Economics490DirectedResearch

Abstract

Thispaperaddressesthequestionifadvancedsabermetricsandtraditionalstatisticsareefficientpredictorsinestimatingahitter’ssalaryenteringfreeagency.Inadditiontheresearchalsolookstoanswerifthesesamestatisticscansignificantlypredictateam’swinningpercentage.Usingdatafrom1985-2011,themodelsfoundthatbothtraditional and advanced baseballmetrics are significant in relation to salary andwinning percentage. However, all of the regressionmodels have low explanatorypowerintheirabilitytoforecastresults.IsuggestthatMLBclubsusethesemodelsasa checkpoint for free agent player salaries and/or ability to contribute to a team(winningpercentage),ratherthanasanabsolutedeterminationofthesevariables.

Mandrik2

Overview

IntroductionLiteratureReview StatisticsReview wOBA wRC+ BsRRegressionModelsTheDataWhataretheoutputstellingus?ForecastsandAnalysisConclusionSuggestionsfortheSabermetricsCommunityReferencesAppendix;GraphsandTables

Mandrik3

IntroductionOneofthemajorgoalsforabaseballfranchise,oranyprofessionalsportsfranchise

ingeneral,istoultimatelywinachampionshiptobringinfans.Winningasaresult

typicallybringsaninflowofrevenue,whichisanowner’sdesire.Aportionofbuilding

awinningbaseballteamiscenteredonstatisticsandanalytics.Thankstotheworks

ofBillJamesandmanyotherbaseballanalysts,thedevelopmentofsabermetricshas

revolutionizedthewaybusiness isdone inbaseball.MajorLeagueBaseball (MLB)

frontofficesyear inandyearoutuseanalytics tosignplayers in theoff-seasonto

bolster their rosters. It’s essentially their job to maximize the efficiency of these

investmentstoensuretheirteamcancompeteandwinmoregames.Thegoalofthis

research is to develop statisticalmodels thatpredict a hitter’s impact onwinning

alongwithforecastingthesalarytheydeserveusingbothadvancedandtraditional

statistics.

The inventionofadvancedmetrics includingWAR,weightedrunscreated(wRC+),

weighted on base average (wOBA), andmore, provide baseball analysts a deeper

understandingofaplayer’sabilities.Usingallofthenecessaryandavailabledata,I

want to test two separate relationships (salary andwinning percentage) through

regressingamultitudeofhittingvariables.Ihypothesizethatthestatisticschosenwill

havea strong relationshipwithboth salaryandwinningpercentage. Inaddition, I

hypothesizethesemodelswillenableteamstoaccuratelyvalidateifaplayerisworth

theinvestment.

Mandrik4

Beforedivingstraightintotheresultsofthispaper,Iplantointroducesimilarstudies

thataddressmyquestioninadifferentmanner.Thisleadsdirectlyintoastatistics

reviewoftheadvancedstatisticschosen,followedbymyreasonswhyIchoosethese

variables.I’llexplaintheoriginsofthedatathenrevealtheresultsoftheexperiment.

Lastly,I’llprovideconclusionsaboutsabermetricsandtheirrelationshiptosalaryand

winningpercentagetiedtoplayersenteringorcontinuingfreeagency.

LiteratureReview

Past researchers have analyzed the relationships between a variety of baseball

performance variables related to pay and winning. A couple of scholarly articles

includingMiceliandHuber’s(2009)articleintheJournalofQuantitativeAnalysisin

Sports,explainhowthereisindeedasignificantrelationshipbetweenperformance

andwinning.Theyalsoconcludedthatthereisn’tastrongrelationshipbetweenpay

andperformanceattheteamlevel.

To test this hypothesis, NicholasMiceli and Alan Huber used a factor analysis to

distinguishwhichteam-levelvariablesshouldbeincludedintheirregressions.The

hittingvariableschosenbasedontheiranalysisincludedhits,strikeouts,homeruns,

andwalks.Afterrunningtheirmodelstheyfoundthatpayandperformancearenot

strongly relatedat the team level.Theydidhowever finda statistically significant

relationship between performance variables and individual pay, but the practical

importanceoftherelationships(theRsquared)wasextremelylow.

Mandrik5

MiceliandHuber’smodelsandmethodsfocusontheteamlevelratherthantheplayer

level to determine where a team should focus it’s spending. This limits their

regressionmodelstousingtraditionalstatisticsasindependentvariablestomeasure

predictedsalaryandwinningpercentage.Mymodelsfocusontheuseofadvanced

sabermetricstotesttherelationshipsonsalaryandwinningpercentage.

Inanotheracademicpaper,ChangandZenilman’s“StudyofSabermetrics inMajor

LeagueBaseball…”(2013)focusesontheimpactofsabermetricsonfreeagents.They

createdahedonicpricingmodel,whichincludedcontractlength,playerheight,stolen

bases,On-Baseplussluggingpercentage(OPS),groundintodoubleplays(GDP),and

WinsAboveReplacement(WAR).Withtheirmodel, they foundthat theMoneyball

theory1hastangibleandlastingimpactonMLBplayervaluations.

ChangandZenilman (2013) ran regressionsusingplayer salaryas thedependent

variablewithallofthepreviouslymentionedindependentvariablesfor3different

timeperiods.Thesetimeperiodswerelabeledaspre-moneyball(before2000),post-

moneyball(2005),andpostpost-moneyball(2011).Asaresult,theyfoundincreasing

significanceincertainvariablesincludingWAR.Astimehaspassed,WARhasshowed

anincreasingtrendinmonetaryvalueaswellasstatisticalsignificance.

Although this paper focuses its attention onmultiple variables to create a pricing

model,theseauthorsrevealedtheimpactofWARovertimeonsalaries.I’dassumeif

1ChangandZenilman’sreferenceto“Moneyballtheory”essentiallymeanssabermetrics.

Mandrik6

WARhashadanincreasingimpactonsalary,thenotheradvancedstatisticswillcarry

asimilartrend.It’sanotherreasonwhyIanalyzetheeffectsoftheotheradvanced

statisticsavailableonsalaryandwinningpercentage.

StatisticsReview

wOBATomTango,theauthorofTheBook,createdweightedonbaseaverage(wOBA),which

essentiallygoesbeyondstandardratestatisticslikeOPS(onbaseplusslugging)or

battingaverage(AVG).ThepurposeofwOBAistomeasureahitter’soveralloffensive

value based on the relative values of each distinct offensive event (Tango 2007).

UnlikeOn-BasePercentage(OBP)orAVG,wOBAtreatseachoffensiveoutcomewith

linearweightstocreditthehitterbasedontheoutcome(ex.HRhasweightof2.1).I

includedwOBAinmymodelsbecause it’seasytocomprehendbecause it isscaled

similarlytoOBP.Inaddition,theformulaisbasedonacontinuallychangingweight

system according to the league average, keeping the statistic current with each

passingyear.

TheweightsystemthatcreatestheseasonalconstantsisapartofbuildingwOBA.It

requires calculating run expectancymatrices for each year to correspond to each

year’splayerwOBA. Ingeneral, “runexpectancymeasures theaveragenumberof

Mandrik7

runsscored(throughtheendofthecurrentinning)giventhecurrentbase-outstate”

(Weinberg2016). These run expectancies essentially derive the weights, which are

scaled based on base percentage (OBP). A further explanation on weights and

scaling used for wOBA can be found in Weinberg’s Fangraphs article, “The

Beginner’sGuidetoDerivingwOBA”(2016).

The formula for wOBA can be found in the appendix as formula 1. It basically

multiplieseachstatisticbyitscorrespondingweightsinthenumerator;unintentional

walks (uBB), hit by pitch (HBP), singles, doubles, triples, and home runs. The

denominator issimplyatbats(AB)pluswalks(BB)minus intentionalwalks(IBB)

plussacrificeflies(SF)plushitbypitches(HBP).Sinceweightsassociatedwitheach

variable change annually, the formula included in the appendix does not always

containthesameweightsinformula1.

wRC+

Inbaseball,theonlywaytowinistoscoremorerunsthantheopposingteam.The

sabermetricscommunityimprovedBillJames’runscreatedmetric,whichmeasures

a hitter’s abilities to provide runs called weighted runs created plus (wRC+)

(FanGraphs2017).SimilartowOBA,wRC+isaratestatisticthatcreditsahitterbased

ontherunvalueofeachoffensiveoutcome,butalsocontrols forrunenvironment

(ballpark and league). A further detailed explanation on how these factors are

calculatedcanbefoundatFangraphswebsite,www.fangraphs.com(2017).

Mandrik8

ParkfactorsmakewRC+ahighlyregardedstatisticsbecauseitvaluesthehitterbased

on the ballpark they play in. Every ballparkhas different distances, altitudes, and

other factors,which iswhy itmaybeuseful toprovidethisadditionalcontext ina

statistic.Forinstance,ahitterwhoplaysatCoorsField(ahittersparkduetothinner

air)won’tbecreditedasstronglyversusonewhohitsinPetcoPark(pitcher’spark).

BecausewRC+isanothermetricthatattemptstocaptureaplayer’soveralloffensive

abilities,Iuseditasavariableformymodels.

wRC+isappealingbecauseit’smeasuredonanaveragescalesetat100.Forinstance,

ifaplayerhasawRC+of150thatmeanstheyare50percentagepointsabovethe

averageplayerintheirabilitytocreaterunsfortheirteam.It’sanefficientandeasy

tounderstandstatistic tocompareplayersoffensiveabilities.Theruleof thumbis

locatedintheappendix(RuleofThumb1)andindicatestheratingscaleforwRC+.

TheformulaforwRC+isfromFanGraphswebsiteandcanbelocatedintheappendix

(Formula 2). In general terms, wRC+ essentially looks at league average metrics

comparedtothetargetplayerwhilealsoincludingparkandleaguefactorscalculated

byFanGraphs.

BsRHitting isn’t the entirety of a player’s ability to score or drive in runs. It’s why I

included BsR,which is an all-encompassing base-running statistic created by the

Mandrik9

peopleatFanGraphs(FanGraphs2017).BsRisthebase-runningcomponentofWins

AboveReplacement(WAR)andcalculatesbeyondstolenbases.It’sanimprovement

overcountingstolenbasestoanalyzeaplayer’sbaserunningabilities.

BsRisablendofthreeotherexistingbase-runningstatistics;weightedstolenbases

(wSB),ultimatebaserunning(UBR),andweightedgroundintodoubleplays(wGDP).

ThevariableswGDPandwSBareself-explanatoryastheyarebothcenteredonthe

leagueaverageannually,whileUBRevaluatesaplayerbasedontherunexpectancies

to advance (or not) on the bases. Additional information as to how these are

calculatedcanbefoundatFanGraphs.com(2017).Overall,BsRevaluatestheimpact

aplayerhasonthebasestoproviderunsforhisteam.Theformula(formula3)and

theruleofthumbforBsR(object3)arelocatedintheappendix.Thequestionnow

becomes,“howdothesevariablesfitinwiththemodels?”

RegressionModelThe regressions test tosee if there is anydifference inpredictionpowerbetween

traditionalandadvancedstatistics.Thegoalistodiscovertherelationshipsbetween

the dependent variables (salary and winning percentage) and a variety of

independent variables. I split some of my models up by traditional statistic and

advanced sabermetrics to also check if there is a difference between the two in

regardstostatisticalsignificance.

Mandrik10

Beforerunningthesemodels, Icreatecorrelationmatrices for traditionalstatistics

andadvancedsabermetricstoavoidmulticollinearityinmyregressions(seefig.1and

fig.2).ThefiguresillustratetherelationshipbetweenallofthevariablesIconsider

using in my analysis. The colors indicate how strongly or poorly correlated the

variablesare.Darkershadesofblueindicatehighercorrelationversusnocolororred

indicateazeroornegativecorrelationbetweenvariables.

IdecidedtousewOBA(weightedonbaseaverage),wRC+(weightedrunscreated),

hits(H),andstrikeouts(SO)asindependentvariablesbasedonthematrices.These

variablesareincludedforbothdependentvariables(SalaryandWinningPercentage).

I did not use multi-regression in most of my models because the independent

variablesarecorrelatedinsomeway,influencingthecoefficientsinthemodels(see

regressionresults).

TheData

IusedatafrombothFanGraphsandSeanLahman’sbaseballdatabases,whichcontain

numeroustablesofdatawithfilteringcapabilities.FanGraphsenablesuserstocreate

customdatatableswithanytraditionaloradvancedstatisticsforanygivenperiodof

time.Thedata Iuse includesvariousvariables from theperiod1985-2011since I

want to compare my predictions versus actual results from 2012-2016. After

gathering data from FanGraphs, I merged it with all of the data from Lahman’s

database.

Mandrik11

Lahman’sdatabasecontainsmultipletablesofdataincludingateamstable,master

table,battingtable,andsalarytable.TogettheFanGraphsdatatomergecorrectly

withthesalarydataIcombinetheteam’stablewiththeplayer’stablebyyearIDand

teamID.Thisisbecausewewanttoknowhowmanywinsaplayer’steamhadwith

thatplayerontheteam.IthenintegratethesalarytablewithinLahmanbyplayerIDs.

After successfully performing these tasks I merge the advanced statistics from

FanGraphswiththisnewlyformedtable.

Because FanGraphs data has a different way of identifying playerID, I merge the

mastertablewiththenewlymergedteam/batter/salarytabletoincludefullfirstand

lastname.Aftercombingthedatasetsbyseason,firstname,andlastnameIhavea

workingdatasetthathasadvancedmetrics,salary,andteamwinsforeachplayerfor

theyears1985-2011.

Onelastportiontomentionisthatallofthesamplesincludeplayersofage28orolder.

Thereasonbehindthisistheaveragerookieageis24yearsoldandaplayermust

havesixyearsofservicetobeeligibleforfreeagency(Isaacs2012).Thisisn’tagiven

factbutanassumptionbasedontheaverageageofrookiesandassumingtheystay

upforsixyearsofservice.Theassumptionisthatafreeagentwilltypicallybearound

28-32yearsofagetheirfirsttimearound.

Thefinaldatasetincludes2289observationswith61variables.Ofcoursenotevery

variablewasusedinthispaperbutIdecidedtoincludethemanywaysjustincaseI

Mandrik12

everdecidedtogoback.Thedetailsofthedatacanbefoundintheappendix(Table

5).Thetablecontainsthefirst20rowsofeachvariableofthefreeagentdatafrom

1985-2011.

WhatDotheRegressionOutputsMean?

All of the variables in the winning percentage regressions are highly statistically

significant(illustratedby lowp-values).This isapositivesignbecausethemodels

indicate a positive relationship betweenwinning percentage and the independent

variables.However,allmodelsholdRsquaredvaluesbetween.02and.06,whichis

fairly low. Although the explanatorypower isn’t as expected, it isn’t necessarily a

terriblethingbecausethevariablesinalloftheregressionswerestillsignificant.

In all of the regressions, the coefficients produce expected results as far as their

relationshiptowinningpercentage.Inmodel1forexample,itindicatesthatforevery

additional unit ofwRC+, on average, winning percentage increases by .00005217

while holding all over variables constant. That’s a relatively low number (even if

scaled)anddoesn’tnecessarilygivestellarpredictionswhenapplyingthemodel.The

goodnewsisthere’sanindicationthatwRC+isstatisticallysignificant.

Likewisewithwinningpercentageas thedependent,all thevariables in thesalary

regressions are statistically significant (illustrated by low p-values). Again this

informs researchers that there is a relationship between the salary and

sabermetrics/traditional statistics in each model. There are some independent

Mandrik13

variablesintheregressionsthatgiveexpectedcoefficientsignswhilethereareothers

thatdon’t.

For the majority of the models, the independent variables have the expected

coefficientsignsandstatisticalrelationshipswithsalary. Inmodels5and6,wRC+

demonstrates a positive relationship. However inmodel 5, BsR shows a negative

coefficient. One would expect BsR on average would increase salary because the

highertheBsR,thebettertheplayerisatrunningthebases.Thismightbethecase

because there is a slight negative correlation between BsR and wRC+. Model 8

regressesthetraditionalstatisticsandhasasimilarproblemtomodel6inthesense

thatstrikeoutshaveapositivecoefficientforsalary.Againthisisprobablyduetothe

correlationbetweenvariableshitsandstrikeouts.Otherthanthesetwomodels,the

other independent variables carry favorable coefficient values. The summary for

eachmodelcanbefoundintable1intheappendix.

ForecastsandAnalysis

The subjects for thesemodels include three players of different calibers (All Star,

Average,andbench)allfromfreeagentclassesbeyond2011.TheseplayersareJose

Reyes,JeffKeppinger,andMikeAviles.Tables2-4containtheactualresultsversus

thepredicted results forwinningpercentageandsalary foreachplayer.Though I

wouldn’tdeemthesemodelsasexcellent,theyprovidedmoderatelyaccurateresults.

Mandrik14

Before explaining the results for each player I must mention that in every case,

winningpercentagepredictionsshouldbetakenwithagrainofsalt.Thisisbecause

winningpercentageisnotderivedfromasingleplayerandisdependentonateam’s

structure.Themostsignificantandinterestingresultsfromthewinningpercentage

modelscomefrommodel3.Forallplayers,weseethatwOBAisasignificantpredictor

insomecontextforateam’sabilitytowin.Butagaintakethislightlybecauseofthe

contextofwinningpercentage.Overall, there is a significant relationshipbetween

wOBAandateam’swinningpercentage.

Theefficiencyoftheresultsforsalaryvariesforeachplayer’scase.We’llstartwith

theall-starcaseinJoseReyes.Reyesathistimewasclassifiedasaperennialall-star

forhisabilitiestohitandrunthebases.Afterthe2011season,hebecameafreeagent

where he signed a 6 year $106million contractwith theMiamiMarlins (Brisbee

2011).Hisannualaveragevalueonhiscontractwasabout$17milliondollarswhich

isthenumberIcomparedthepredictedvaluesagainst.Allofthepredictedresults

werearoundthe$4-$6millionrange,whichisn’tnearlycloseto$17million(seetable

2).Thesepredictionvaluesaren’tnecessarilyasurprisegiventhatregressionstellus

thedependentvariablebasedonaverages.ThismightexplainwhythesalariesforJeff

KeppingerandMikeAvilesweremoreaccurate.

JeffKeppingerwasneveranallstarinhiscareerbutservedasaneverydaystarterfor

amajorityofhiscareer.Basedonhis2012numbers,themodelcloselypredictshis

annualaveragevaluefromhis3-year$12milliondealhesignedforthe2013season

Mandrik15

(Padilla 2012). Out ofmodels 4-8, number 6 (wRC+) has the smallest differential

between the actual and predicted results at $656,243.14 (see table 3). So far, the

modelsslightlyovervalueaplayerwithsimilarabilitiestoKeppinger.

ThelastpredictioncaseisMikeAviles,whoservedasastarterwithClevelandIndians

in 2013-2014 but later on phased into a bench role with them in 2015. After

completinghis2-yearcontractwiththeIndians,heresignedwiththetribein2015on

a1-year$3.5millioncontract.LikewisewithKeppinger’scase,themodelswereable

topredict arounda$1milliondollar range fromAviles’ actual contract.However,

model 8,which uses hits and strikeouts as independent variables, has the lowest

differential(seetable4).Onecouldmakethecasethattraditionalstatisticsarebetter

predictors, butwithKeppinger’s case advancedmetrics did a better job. Overall,

thesemodelspredictedwellforaverageandbenchplayersintheleaguebutnotas

muchforall-starcaliberplayers.

Conclusion

After running these regressionsusingbothadvancedsabermetricsand traditional

statistics, I believe MLB teams should use these prediction models with caution.

Althoughthevariablesaresignificantineachmodelandprovidesufficientresultsfor

an average/bench player, neither the sabermetric statistics nor the traditional

statisticsprovidesufficientexplanatorypowerforforecasting.Weknowthisbecause

Mandrik16

all of themodels had extremely lowR squared values. Before concludingmatters

entirely,thereareafewproblemsandsuggestionsIhavewiththesemodelsthatthe

sabermetriccommunitycouldfurtheranalyze.

SuggestionsfortheSabermetricCommunity

Asidefromthemodelsthemselves,thereareacoupleotherthingsIwouldhavedone

differently. One of the concerns throughout the research process was gathering

accuratefreeagentdata.Toavoidassumption,Iwantedcontractdataforeveryfree

agent player dating back to 1985, since thiswas as far back as I could go using

Lahman’sdatabase.Iassumedplayershittingfreeagencywerearoundages28and

olderbasedonaveragerookieages.Iwouldhavepreferredtohaveeveryplayerwho

reachedfreeagencybasedonservicetimeratherthananageassumptionforvalidity.

Withtheproperresources,inmycasetime,Iwouldhavegathereddataoneveryfree

agentandtheamountofyearsintheircontractsaswelltoensuresoundresults.In

addition, I also believe this experiment could be improved by using a different

dependentvariableotherthanwinningpercentage,perhapsmarginalwins.

Itwouldbeinterestingtopredictefficiencyusingmarginalwinsattachedtoaplayer

asthedependentvariable.Thiswouldentaillookingatateam’swinsbeforeaplayer

signedwith the teamversus the amount ofwins attained after the player signed.

Maybe there’s a significant econometric model that uses sabermetrics to predict

marginalwinsandusethosepredictedwinsinanefficiencyformula.Unfortunately,

duetotimeconstraintsIcouldn’tperformthissortofanalysis.However,Iencourage

Mandrik17

thoseinthesabermetricscommunitytoinvestigatethishypothesisandrevealwhat

theycomeupwith.

Lastly, themost significant downfall to this research is the lackof adjustment for

inflationforsalary.Thesalarydatadatesbackto1985,whichcouldhaveaffectedthe

resultssincetheyarenotadjustedtotoday’sdollars.Thepossiblesolutionsforthis

wouldbetorunregressionsforacoupleyearsofdataandusethoseresultstoforecast

salary.Theotheroptionwouldbetogothroughandadjusteveryplayer’ssalaryto

today’s dollars, but this could become extremely time consuming. All in all, I

encourageaspiringbaseballresearchersinterestedinforecastingbaseballsalariesto

testtheseresultsandreporttheirfindings.

Mandrik18

Sources

Brisbee,Grant2011.“JoseReyesContractDetailsReleased”SBNationhttp://www.sbnation.com/2011/12/7/2618435/jose-reyes-contract-details-released2017.“SalaryArbitration”FanGraphshttp://www.fangraphs.com/library/business/mlb-salary-arbitration-rules/Chang,JasonandZenilman,Joshua2013.“AStudyofSabermetricsinMajorLeagueBaseball:TheImpactofMoneyballonFreeAgentSalaries”WashingtonUniversityatSt.Louis.http://olinblog.wustl.edu/wp-content/uploads/AStudyofSabermetricsinMajorLeagueBaseball.pdfKrautmann,AnthonyandSolow,John.2009“TheDynamicsofPerformanceOvertheDurationofMajorLeagueBaseballLong-TermContracts”JournalofSportsEconomics.p.1http://journals.sagepub.com/doi/pdf/10.1177/1527002508327382Gennaro,Vince2007.“DiamondDollars:TheEconomicsofWinninginBaseball(PartI)”TheHardballTimes.Pg1http://www.hardballtimes.com/diamond-dollars-the-economics-of-winning-in-baseball-part-1/Issacs,Noah2012.“MinorLeagueLeaderboardContext”FanGraphs,p.1http://www.fangraphs.com/blogs/minor-league-leaderboard-context/Miceli,NicholasS.andHuber,AlanD. 2009"`IftheTeamDoesn'tWin,NobodyWins:'ATeam-LevelAnalysisofPayandPerformanceRelationshipsinMajorLeagueBaseball,"JournalofQuantitativeAnalysisinSports:Vol.5:Iss.2,Article6. https://www.degruyter.com/downloadpdf/j/jqas.2009.5.2/jqas.2009.5.2.1170/jqas.2009.5.2.1170.pdfPadilla,Doug2012.“JeffKeppingerJoinsWhiteSox”ESPNhttp://www.espn.com/chicago/mlb/story/_/id/8732840/jeff-keppinger-agrees-deal-chicago-white-soxTango,Tom,Lichtman,Mitchel,andDolphin,Andrew2007.“Toolshed”,TheBookPlayingthePercentagesinBaseball.Pg.29PotomacBooks,Inc.Weinberg,Neil2016.“TheBeginner’sGuidetoDerivingwOBA”FanGraphshttp://www.fangraphs.com/library/the-beginners-guide-to-deriving-woba/AssociatedPress2014.“MikeAvilesgetstwo-yearcontract”ESPNhttp://www.espn.com/mlb/story/_/id/8926748/cleveland-indians-sign-mike-aviles-two-year-deal

Mandrik19

2013.“2013MLBFreeAgentTracker”MLBTradeRumorshttp://www.mlbtraderumors.com/2013-mlb-free-agent-tracker2013.“WhatisWAR?”FanGraphshttp://www.fangraphs.com/library/misc/war/2017.“ParkFactors–5yearregressed”FanGraphshttp://www.fangraphs.com/library/park-factors-5-year-regressed/2017.FanGraphshttp://www.fangraphs.com/statss.aspx?playerid=1943&position=P2017“wRCandwRC+”Fangraphshttp://www.fangraphs.com/library/offense/wrc/2017"wRC+andLessonsofContext”Fangraphshttp://www.fangraphs.com/library/wrc-and-lessons-of-context/2017“wOBA”Fangraphshttp://www.fangraphs.com/library/offense/woba/

Mandrik20

Appendix

Formula1:wOBA

Formula2:wRC+

RuleofThumb1:wRC+

Fangraphs

Formula3:BSR

BsR=wSB+UBR+wGDP

RuleofThumb2:BSR

Fangraphs

Mandrik21

Fig1.CorrelationMatrixofTraditionalStatistics

CreatedonRStudio

Fig2.CorrelationMatrixofAdvancedSabermetrics

CreatedonRStudio

Mandrik22

Table1:Summaryofmodels

Table2:JoseReyesPredictions

Jose Reyes Predictions 2012 (All Star) Model Predicted Results Actual Results Differential

1 0.054 0.475 0.422 2 0.053 0.475 0.423 3 0.518 0.475 0.043 4 0.052 0.475 0.424 5 $5,326,725.10 $17,666,667.00 $12,339,941.90 6 $6,119,315.00 $17,666,667.00 $11,547,352.00 7 $5,484,887.47 $17,666,667.00 $12,181,779.53 8 $4,190,287.00 $17,666,667.00 $13,476,380.00

Table3:JeffKeppingerPredictions

Jeff Keppinger Predictions 2013 (Average) Model Predicted Results Actual Results Differential

1 0.051 0.389 0.338 2 0.052 0.389 0.337 3 0.406 0.389 0.017 4 0.045 0.389 0.344 5 $5,776,533.00 $4,000,000.00 $1,776,533.00 6 $5,410,481.00 $4,000,000.00 $1,410,481.00 7 $4,656,243.14 $4,000,000.00 $656,243.14 8 $2,912,903.00 $4,000,000.00 $1,087,097.00

Model Dependent Variable Intercept + Independent Variable(s) coefficients R Squared P-Value

1 Winning Percentage .04515+ .00005217wRC+ + .0001459BsR 0.03975 < 2.2e-16 2 Winning Percentage .04523 + .00005164wRC+ 0.06762 < 2.2e-16 3 Winning Percentage .40551 + .29881wOBA 0.02925 < 2.2e-16 4 Winning Percentage .04524 + .00003256H +.00001309SO 0.02859 4.00E-15 5 Salary -1005676 + 50201wRC+ + -118827BsR 0.09118 < 2.2e-16 6 Salary -1070287 + 50631wRC+ 0.0848 < 2.2e-16 7 Salary -7497207 + 34526847wOBA 0.09613 < 2.2e-16 8 Salary -159843 + 18254H + 25516SO 0.05899 < 2.2e-16

Mandrik23

Table4:MikeAvilesPredictions

Mike Aviles Predictions 2015 (Bench) Model Predicted Results Actual Results Differential

1 0.045 0.503 0.46 2 0.045 0.503 0.46 3 0.406 0.503 0.10 4 0.045 0.503 0.46 5 $2,258,857.30 $3,500,000.00 $1,241,142.70 6 $2,473,883.00 $3,500,000.00 $1,026,117.00 7 $1,894,095.38 $3,500,000.00 $1,605,904.62 8 $2,642,031.00 $3,500,000.00 $857,969.00

Model1:WinPct=wRC++BsR

Mandrik24

Model2:WinPct=wRC+

Mandrik25

Model3:WinPct=wOBA

Model4:WinPct=H+SO

Mandrik26

Model5:Salary=wRC++BsR

Model6:Salary=wRC+

Mandrik27

Model7:Salary=wOBA

Model8:Salary=H+SO

Mandrik28

Table5:FreeAgentData(first20rows)UsedinResearch(1985-2011)

Mandrik29

Mandrik30

Download - The Impact of Hitters on Winning Percentage and Salary

Top Related