Mandrik1
TheImpactofHittersonWinningPercentageandSalaryUsingSabermetrics
ZachMandrik
ProfessorDeprano
4/11/2017
Economics490DirectedResearch
Abstract
Thispaperaddressesthequestionifadvancedsabermetricsandtraditionalstatisticsareefficientpredictorsinestimatingahitter’ssalaryenteringfreeagency.Inadditiontheresearchalsolookstoanswerifthesesamestatisticscansignificantlypredictateam’swinningpercentage.Usingdatafrom1985-2011,themodelsfoundthatbothtraditional and advanced baseballmetrics are significant in relation to salary andwinning percentage. However, all of the regressionmodels have low explanatorypowerintheirabilitytoforecastresults.IsuggestthatMLBclubsusethesemodelsasa checkpoint for free agent player salaries and/or ability to contribute to a team(winningpercentage),ratherthanasanabsolutedeterminationofthesevariables.
Mandrik2
Overview
IntroductionLiteratureReview StatisticsReview wOBA wRC+ BsRRegressionModelsTheDataWhataretheoutputstellingus?ForecastsandAnalysisConclusionSuggestionsfortheSabermetricsCommunityReferencesAppendix;GraphsandTables
Mandrik3
IntroductionOneofthemajorgoalsforabaseballfranchise,oranyprofessionalsportsfranchise
ingeneral,istoultimatelywinachampionshiptobringinfans.Winningasaresult
typicallybringsaninflowofrevenue,whichisanowner’sdesire.Aportionofbuilding
awinningbaseballteamiscenteredonstatisticsandanalytics.Thankstotheworks
ofBillJamesandmanyotherbaseballanalysts,thedevelopmentofsabermetricshas
revolutionizedthewaybusiness isdone inbaseball.MajorLeagueBaseball (MLB)
frontofficesyear inandyearoutuseanalytics tosignplayers in theoff-seasonto
bolster their rosters. It’s essentially their job to maximize the efficiency of these
investmentstoensuretheirteamcancompeteandwinmoregames.Thegoalofthis
research is to develop statisticalmodels thatpredict a hitter’s impact onwinning
alongwithforecastingthesalarytheydeserveusingbothadvancedandtraditional
statistics.
The inventionofadvancedmetrics includingWAR,weightedrunscreated(wRC+),
weighted on base average (wOBA), andmore, provide baseball analysts a deeper
understandingofaplayer’sabilities.Usingallofthenecessaryandavailabledata,I
want to test two separate relationships (salary andwinning percentage) through
regressingamultitudeofhittingvariables.Ihypothesizethatthestatisticschosenwill
havea strong relationshipwithboth salaryandwinningpercentage. Inaddition, I
hypothesizethesemodelswillenableteamstoaccuratelyvalidateifaplayerisworth
theinvestment.
Mandrik4
Beforedivingstraightintotheresultsofthispaper,Iplantointroducesimilarstudies
thataddressmyquestioninadifferentmanner.Thisleadsdirectlyintoastatistics
reviewoftheadvancedstatisticschosen,followedbymyreasonswhyIchoosethese
variables.I’llexplaintheoriginsofthedatathenrevealtheresultsoftheexperiment.
Lastly,I’llprovideconclusionsaboutsabermetricsandtheirrelationshiptosalaryand
winningpercentagetiedtoplayersenteringorcontinuingfreeagency.
LiteratureReview
Past researchers have analyzed the relationships between a variety of baseball
performance variables related to pay and winning. A couple of scholarly articles
includingMiceliandHuber’s(2009)articleintheJournalofQuantitativeAnalysisin
Sports,explainhowthereisindeedasignificantrelationshipbetweenperformance
andwinning.Theyalsoconcludedthatthereisn’tastrongrelationshipbetweenpay
andperformanceattheteamlevel.
To test this hypothesis, NicholasMiceli and Alan Huber used a factor analysis to
distinguishwhichteam-levelvariablesshouldbeincludedintheirregressions.The
hittingvariableschosenbasedontheiranalysisincludedhits,strikeouts,homeruns,
andwalks.Afterrunningtheirmodelstheyfoundthatpayandperformancearenot
strongly relatedat the team level.Theydidhowever finda statistically significant
relationship between performance variables and individual pay, but the practical
importanceoftherelationships(theRsquared)wasextremelylow.
Mandrik5
MiceliandHuber’smodelsandmethodsfocusontheteamlevelratherthantheplayer
level to determine where a team should focus it’s spending. This limits their
regressionmodelstousingtraditionalstatisticsasindependentvariablestomeasure
predictedsalaryandwinningpercentage.Mymodelsfocusontheuseofadvanced
sabermetricstotesttherelationshipsonsalaryandwinningpercentage.
Inanotheracademicpaper,ChangandZenilman’s“StudyofSabermetrics inMajor
LeagueBaseball…”(2013)focusesontheimpactofsabermetricsonfreeagents.They
createdahedonicpricingmodel,whichincludedcontractlength,playerheight,stolen
bases,On-Baseplussluggingpercentage(OPS),groundintodoubleplays(GDP),and
WinsAboveReplacement(WAR).Withtheirmodel, they foundthat theMoneyball
theory1hastangibleandlastingimpactonMLBplayervaluations.
ChangandZenilman (2013) ran regressionsusingplayer salaryas thedependent
variablewithallofthepreviouslymentionedindependentvariablesfor3different
timeperiods.Thesetimeperiodswerelabeledaspre-moneyball(before2000),post-
moneyball(2005),andpostpost-moneyball(2011).Asaresult,theyfoundincreasing
significanceincertainvariablesincludingWAR.Astimehaspassed,WARhasshowed
anincreasingtrendinmonetaryvalueaswellasstatisticalsignificance.
Although this paper focuses its attention onmultiple variables to create a pricing
model,theseauthorsrevealedtheimpactofWARovertimeonsalaries.I’dassumeif
1ChangandZenilman’sreferenceto“Moneyballtheory”essentiallymeanssabermetrics.
Mandrik6
WARhashadanincreasingimpactonsalary,thenotheradvancedstatisticswillcarry
asimilartrend.It’sanotherreasonwhyIanalyzetheeffectsoftheotheradvanced
statisticsavailableonsalaryandwinningpercentage.
StatisticsReview
wOBATomTango,theauthorofTheBook,createdweightedonbaseaverage(wOBA),which
essentiallygoesbeyondstandardratestatisticslikeOPS(onbaseplusslugging)or
battingaverage(AVG).ThepurposeofwOBAistomeasureahitter’soveralloffensive
value based on the relative values of each distinct offensive event (Tango 2007).
UnlikeOn-BasePercentage(OBP)orAVG,wOBAtreatseachoffensiveoutcomewith
linearweightstocreditthehitterbasedontheoutcome(ex.HRhasweightof2.1).I
includedwOBAinmymodelsbecause it’seasytocomprehendbecause it isscaled
similarlytoOBP.Inaddition,theformulaisbasedonacontinuallychangingweight
system according to the league average, keeping the statistic current with each
passingyear.
TheweightsystemthatcreatestheseasonalconstantsisapartofbuildingwOBA.It
requires calculating run expectancymatrices for each year to correspond to each
year’splayerwOBA. Ingeneral, “runexpectancymeasures theaveragenumberof
Mandrik7
runsscored(throughtheendofthecurrentinning)giventhecurrentbase-outstate”
(Weinberg2016). These run expectancies essentially derive the weights, which are
scaled based on base percentage (OBP). A further explanation on weights and
scaling used for wOBA can be found in Weinberg’s Fangraphs article, “The
Beginner’sGuidetoDerivingwOBA”(2016).
The formula for wOBA can be found in the appendix as formula 1. It basically
multiplieseachstatisticbyitscorrespondingweightsinthenumerator;unintentional
walks (uBB), hit by pitch (HBP), singles, doubles, triples, and home runs. The
denominator issimplyatbats(AB)pluswalks(BB)minus intentionalwalks(IBB)
plussacrificeflies(SF)plushitbypitches(HBP).Sinceweightsassociatedwitheach
variable change annually, the formula included in the appendix does not always
containthesameweightsinformula1.
wRC+
Inbaseball,theonlywaytowinistoscoremorerunsthantheopposingteam.The
sabermetricscommunityimprovedBillJames’runscreatedmetric,whichmeasures
a hitter’s abilities to provide runs called weighted runs created plus (wRC+)
(FanGraphs2017).SimilartowOBA,wRC+isaratestatisticthatcreditsahitterbased
ontherunvalueofeachoffensiveoutcome,butalsocontrols forrunenvironment
(ballpark and league). A further detailed explanation on how these factors are
calculatedcanbefoundatFangraphswebsite,www.fangraphs.com(2017).
Mandrik8
ParkfactorsmakewRC+ahighlyregardedstatisticsbecauseitvaluesthehitterbased
on the ballpark they play in. Every ballparkhas different distances, altitudes, and
other factors,which iswhy itmaybeuseful toprovidethisadditionalcontext ina
statistic.Forinstance,ahitterwhoplaysatCoorsField(ahittersparkduetothinner
air)won’tbecreditedasstronglyversusonewhohitsinPetcoPark(pitcher’spark).
BecausewRC+isanothermetricthatattemptstocaptureaplayer’soveralloffensive
abilities,Iuseditasavariableformymodels.
wRC+isappealingbecauseit’smeasuredonanaveragescalesetat100.Forinstance,
ifaplayerhasawRC+of150thatmeanstheyare50percentagepointsabovethe
averageplayerintheirabilitytocreaterunsfortheirteam.It’sanefficientandeasy
tounderstandstatistic tocompareplayersoffensiveabilities.Theruleof thumbis
locatedintheappendix(RuleofThumb1)andindicatestheratingscaleforwRC+.
TheformulaforwRC+isfromFanGraphswebsiteandcanbelocatedintheappendix
(Formula 2). In general terms, wRC+ essentially looks at league average metrics
comparedtothetargetplayerwhilealsoincludingparkandleaguefactorscalculated
byFanGraphs.
BsRHitting isn’t the entirety of a player’s ability to score or drive in runs. It’s why I
included BsR,which is an all-encompassing base-running statistic created by the
Mandrik9
peopleatFanGraphs(FanGraphs2017).BsRisthebase-runningcomponentofWins
AboveReplacement(WAR)andcalculatesbeyondstolenbases.It’sanimprovement
overcountingstolenbasestoanalyzeaplayer’sbaserunningabilities.
BsRisablendofthreeotherexistingbase-runningstatistics;weightedstolenbases
(wSB),ultimatebaserunning(UBR),andweightedgroundintodoubleplays(wGDP).
ThevariableswGDPandwSBareself-explanatoryastheyarebothcenteredonthe
leagueaverageannually,whileUBRevaluatesaplayerbasedontherunexpectancies
to advance (or not) on the bases. Additional information as to how these are
calculatedcanbefoundatFanGraphs.com(2017).Overall,BsRevaluatestheimpact
aplayerhasonthebasestoproviderunsforhisteam.Theformula(formula3)and
theruleofthumbforBsR(object3)arelocatedintheappendix.Thequestionnow
becomes,“howdothesevariablesfitinwiththemodels?”
RegressionModelThe regressions test tosee if there is anydifference inpredictionpowerbetween
traditionalandadvancedstatistics.Thegoalistodiscovertherelationshipsbetween
the dependent variables (salary and winning percentage) and a variety of
independent variables. I split some of my models up by traditional statistic and
advanced sabermetrics to also check if there is a difference between the two in
regardstostatisticalsignificance.
Mandrik10
Beforerunningthesemodels, Icreatecorrelationmatrices for traditionalstatistics
andadvancedsabermetricstoavoidmulticollinearityinmyregressions(seefig.1and
fig.2).ThefiguresillustratetherelationshipbetweenallofthevariablesIconsider
using in my analysis. The colors indicate how strongly or poorly correlated the
variablesare.Darkershadesofblueindicatehighercorrelationversusnocolororred
indicateazeroornegativecorrelationbetweenvariables.
IdecidedtousewOBA(weightedonbaseaverage),wRC+(weightedrunscreated),
hits(H),andstrikeouts(SO)asindependentvariablesbasedonthematrices.These
variablesareincludedforbothdependentvariables(SalaryandWinningPercentage).
I did not use multi-regression in most of my models because the independent
variablesarecorrelatedinsomeway,influencingthecoefficientsinthemodels(see
regressionresults).
TheData
IusedatafrombothFanGraphsandSeanLahman’sbaseballdatabases,whichcontain
numeroustablesofdatawithfilteringcapabilities.FanGraphsenablesuserstocreate
customdatatableswithanytraditionaloradvancedstatisticsforanygivenperiodof
time.Thedata Iuse includesvariousvariables from theperiod1985-2011since I
want to compare my predictions versus actual results from 2012-2016. After
gathering data from FanGraphs, I merged it with all of the data from Lahman’s
database.
Mandrik11
Lahman’sdatabasecontainsmultipletablesofdataincludingateamstable,master
table,battingtable,andsalarytable.TogettheFanGraphsdatatomergecorrectly
withthesalarydataIcombinetheteam’stablewiththeplayer’stablebyyearIDand
teamID.Thisisbecausewewanttoknowhowmanywinsaplayer’steamhadwith
thatplayerontheteam.IthenintegratethesalarytablewithinLahmanbyplayerIDs.
After successfully performing these tasks I merge the advanced statistics from
FanGraphswiththisnewlyformedtable.
Because FanGraphs data has a different way of identifying playerID, I merge the
mastertablewiththenewlymergedteam/batter/salarytabletoincludefullfirstand
lastname.Aftercombingthedatasetsbyseason,firstname,andlastnameIhavea
workingdatasetthathasadvancedmetrics,salary,andteamwinsforeachplayerfor
theyears1985-2011.
Onelastportiontomentionisthatallofthesamplesincludeplayersofage28orolder.
Thereasonbehindthisistheaveragerookieageis24yearsoldandaplayermust
havesixyearsofservicetobeeligibleforfreeagency(Isaacs2012).Thisisn’tagiven
factbutanassumptionbasedontheaverageageofrookiesandassumingtheystay
upforsixyearsofservice.Theassumptionisthatafreeagentwilltypicallybearound
28-32yearsofagetheirfirsttimearound.
Thefinaldatasetincludes2289observationswith61variables.Ofcoursenotevery
variablewasusedinthispaperbutIdecidedtoincludethemanywaysjustincaseI
Mandrik12
everdecidedtogoback.Thedetailsofthedatacanbefoundintheappendix(Table
5).Thetablecontainsthefirst20rowsofeachvariableofthefreeagentdatafrom
1985-2011.
WhatDotheRegressionOutputsMean?
All of the variables in the winning percentage regressions are highly statistically
significant(illustratedby lowp-values).This isapositivesignbecausethemodels
indicate a positive relationship betweenwinning percentage and the independent
variables.However,allmodelsholdRsquaredvaluesbetween.02and.06,whichis
fairly low. Although the explanatorypower isn’t as expected, it isn’t necessarily a
terriblethingbecausethevariablesinalloftheregressionswerestillsignificant.
In all of the regressions, the coefficients produce expected results as far as their
relationshiptowinningpercentage.Inmodel1forexample,itindicatesthatforevery
additional unit ofwRC+, on average, winning percentage increases by .00005217
while holding all over variables constant. That’s a relatively low number (even if
scaled)anddoesn’tnecessarilygivestellarpredictionswhenapplyingthemodel.The
goodnewsisthere’sanindicationthatwRC+isstatisticallysignificant.
Likewisewithwinningpercentageas thedependent,all thevariables in thesalary
regressions are statistically significant (illustrated by low p-values). Again this
informs researchers that there is a relationship between the salary and
sabermetrics/traditional statistics in each model. There are some independent
Mandrik13
variablesintheregressionsthatgiveexpectedcoefficientsignswhilethereareothers
thatdon’t.
For the majority of the models, the independent variables have the expected
coefficientsignsandstatisticalrelationshipswithsalary. Inmodels5and6,wRC+
demonstrates a positive relationship. However inmodel 5, BsR shows a negative
coefficient. One would expect BsR on average would increase salary because the
highertheBsR,thebettertheplayerisatrunningthebases.Thismightbethecase
because there is a slight negative correlation between BsR and wRC+. Model 8
regressesthetraditionalstatisticsandhasasimilarproblemtomodel6inthesense
thatstrikeoutshaveapositivecoefficientforsalary.Againthisisprobablyduetothe
correlationbetweenvariableshitsandstrikeouts.Otherthanthesetwomodels,the
other independent variables carry favorable coefficient values. The summary for
eachmodelcanbefoundintable1intheappendix.
ForecastsandAnalysis
The subjects for thesemodels include three players of different calibers (All Star,
Average,andbench)allfromfreeagentclassesbeyond2011.TheseplayersareJose
Reyes,JeffKeppinger,andMikeAviles.Tables2-4containtheactualresultsversus
thepredicted results forwinningpercentageandsalary foreachplayer.Though I
wouldn’tdeemthesemodelsasexcellent,theyprovidedmoderatelyaccurateresults.
Mandrik14
Before explaining the results for each player I must mention that in every case,
winningpercentagepredictionsshouldbetakenwithagrainofsalt.Thisisbecause
winningpercentageisnotderivedfromasingleplayerandisdependentonateam’s
structure.Themostsignificantandinterestingresultsfromthewinningpercentage
modelscomefrommodel3.Forallplayers,weseethatwOBAisasignificantpredictor
insomecontextforateam’sabilitytowin.Butagaintakethislightlybecauseofthe
contextofwinningpercentage.Overall, there is a significant relationshipbetween
wOBAandateam’swinningpercentage.
Theefficiencyoftheresultsforsalaryvariesforeachplayer’scase.We’llstartwith
theall-starcaseinJoseReyes.Reyesathistimewasclassifiedasaperennialall-star
forhisabilitiestohitandrunthebases.Afterthe2011season,hebecameafreeagent
where he signed a 6 year $106million contractwith theMiamiMarlins (Brisbee
2011).Hisannualaveragevalueonhiscontractwasabout$17milliondollarswhich
isthenumberIcomparedthepredictedvaluesagainst.Allofthepredictedresults
werearoundthe$4-$6millionrange,whichisn’tnearlycloseto$17million(seetable
2).Thesepredictionvaluesaren’tnecessarilyasurprisegiventhatregressionstellus
thedependentvariablebasedonaverages.ThismightexplainwhythesalariesforJeff
KeppingerandMikeAvilesweremoreaccurate.
JeffKeppingerwasneveranallstarinhiscareerbutservedasaneverydaystarterfor
amajorityofhiscareer.Basedonhis2012numbers,themodelcloselypredictshis
annualaveragevaluefromhis3-year$12milliondealhesignedforthe2013season
Mandrik15
(Padilla 2012). Out ofmodels 4-8, number 6 (wRC+) has the smallest differential
between the actual and predicted results at $656,243.14 (see table 3). So far, the
modelsslightlyovervalueaplayerwithsimilarabilitiestoKeppinger.
ThelastpredictioncaseisMikeAviles,whoservedasastarterwithClevelandIndians
in 2013-2014 but later on phased into a bench role with them in 2015. After
completinghis2-yearcontractwiththeIndians,heresignedwiththetribein2015on
a1-year$3.5millioncontract.LikewisewithKeppinger’scase,themodelswereable
topredict arounda$1milliondollar range fromAviles’ actual contract.However,
model 8,which uses hits and strikeouts as independent variables, has the lowest
differential(seetable4).Onecouldmakethecasethattraditionalstatisticsarebetter
predictors, butwithKeppinger’s case advancedmetrics did a better job. Overall,
thesemodelspredictedwellforaverageandbenchplayersintheleaguebutnotas
muchforall-starcaliberplayers.
Conclusion
After running these regressionsusingbothadvancedsabermetricsand traditional
statistics, I believe MLB teams should use these prediction models with caution.
Althoughthevariablesaresignificantineachmodelandprovidesufficientresultsfor
an average/bench player, neither the sabermetric statistics nor the traditional
statisticsprovidesufficientexplanatorypowerforforecasting.Weknowthisbecause
Mandrik16
all of themodels had extremely lowR squared values. Before concludingmatters
entirely,thereareafewproblemsandsuggestionsIhavewiththesemodelsthatthe
sabermetriccommunitycouldfurtheranalyze.
SuggestionsfortheSabermetricCommunity
Asidefromthemodelsthemselves,thereareacoupleotherthingsIwouldhavedone
differently. One of the concerns throughout the research process was gathering
accuratefreeagentdata.Toavoidassumption,Iwantedcontractdataforeveryfree
agent player dating back to 1985, since thiswas as far back as I could go using
Lahman’sdatabase.Iassumedplayershittingfreeagencywerearoundages28and
olderbasedonaveragerookieages.Iwouldhavepreferredtohaveeveryplayerwho
reachedfreeagencybasedonservicetimeratherthananageassumptionforvalidity.
Withtheproperresources,inmycasetime,Iwouldhavegathereddataoneveryfree
agentandtheamountofyearsintheircontractsaswelltoensuresoundresults.In
addition, I also believe this experiment could be improved by using a different
dependentvariableotherthanwinningpercentage,perhapsmarginalwins.
Itwouldbeinterestingtopredictefficiencyusingmarginalwinsattachedtoaplayer
asthedependentvariable.Thiswouldentaillookingatateam’swinsbeforeaplayer
signedwith the teamversus the amount ofwins attained after the player signed.
Maybe there’s a significant econometric model that uses sabermetrics to predict
marginalwinsandusethosepredictedwinsinanefficiencyformula.Unfortunately,
duetotimeconstraintsIcouldn’tperformthissortofanalysis.However,Iencourage
Mandrik17
thoseinthesabermetricscommunitytoinvestigatethishypothesisandrevealwhat
theycomeupwith.
Lastly, themost significant downfall to this research is the lackof adjustment for
inflationforsalary.Thesalarydatadatesbackto1985,whichcouldhaveaffectedthe
resultssincetheyarenotadjustedtotoday’sdollars.Thepossiblesolutionsforthis
wouldbetorunregressionsforacoupleyearsofdataandusethoseresultstoforecast
salary.Theotheroptionwouldbetogothroughandadjusteveryplayer’ssalaryto
today’s dollars, but this could become extremely time consuming. All in all, I
encourageaspiringbaseballresearchersinterestedinforecastingbaseballsalariesto
testtheseresultsandreporttheirfindings.
Mandrik18
Sources
Brisbee,Grant2011.“JoseReyesContractDetailsReleased”SBNationhttp://www.sbnation.com/2011/12/7/2618435/jose-reyes-contract-details-released2017.“SalaryArbitration”FanGraphshttp://www.fangraphs.com/library/business/mlb-salary-arbitration-rules/Chang,JasonandZenilman,Joshua2013.“AStudyofSabermetricsinMajorLeagueBaseball:TheImpactofMoneyballonFreeAgentSalaries”WashingtonUniversityatSt.Louis.http://olinblog.wustl.edu/wp-content/uploads/AStudyofSabermetricsinMajorLeagueBaseball.pdfKrautmann,AnthonyandSolow,John.2009“TheDynamicsofPerformanceOvertheDurationofMajorLeagueBaseballLong-TermContracts”JournalofSportsEconomics.p.1http://journals.sagepub.com/doi/pdf/10.1177/1527002508327382Gennaro,Vince2007.“DiamondDollars:TheEconomicsofWinninginBaseball(PartI)”TheHardballTimes.Pg1http://www.hardballtimes.com/diamond-dollars-the-economics-of-winning-in-baseball-part-1/Issacs,Noah2012.“MinorLeagueLeaderboardContext”FanGraphs,p.1http://www.fangraphs.com/blogs/minor-league-leaderboard-context/Miceli,NicholasS.andHuber,AlanD. 2009"`IftheTeamDoesn'tWin,NobodyWins:'ATeam-LevelAnalysisofPayandPerformanceRelationshipsinMajorLeagueBaseball,"JournalofQuantitativeAnalysisinSports:Vol.5:Iss.2,Article6. https://www.degruyter.com/downloadpdf/j/jqas.2009.5.2/jqas.2009.5.2.1170/jqas.2009.5.2.1170.pdfPadilla,Doug2012.“JeffKeppingerJoinsWhiteSox”ESPNhttp://www.espn.com/chicago/mlb/story/_/id/8732840/jeff-keppinger-agrees-deal-chicago-white-soxTango,Tom,Lichtman,Mitchel,andDolphin,Andrew2007.“Toolshed”,TheBookPlayingthePercentagesinBaseball.Pg.29PotomacBooks,Inc.Weinberg,Neil2016.“TheBeginner’sGuidetoDerivingwOBA”FanGraphshttp://www.fangraphs.com/library/the-beginners-guide-to-deriving-woba/AssociatedPress2014.“MikeAvilesgetstwo-yearcontract”ESPNhttp://www.espn.com/mlb/story/_/id/8926748/cleveland-indians-sign-mike-aviles-two-year-deal
Mandrik19
2013.“2013MLBFreeAgentTracker”MLBTradeRumorshttp://www.mlbtraderumors.com/2013-mlb-free-agent-tracker2013.“WhatisWAR?”FanGraphshttp://www.fangraphs.com/library/misc/war/2017.“ParkFactors–5yearregressed”FanGraphshttp://www.fangraphs.com/library/park-factors-5-year-regressed/2017.FanGraphshttp://www.fangraphs.com/statss.aspx?playerid=1943&position=P2017“wRCandwRC+”Fangraphshttp://www.fangraphs.com/library/offense/wrc/2017"wRC+andLessonsofContext”Fangraphshttp://www.fangraphs.com/library/wrc-and-lessons-of-context/2017“wOBA”Fangraphshttp://www.fangraphs.com/library/offense/woba/
Mandrik20
Appendix
Formula1:wOBA
Formula2:wRC+
RuleofThumb1:wRC+
Fangraphs
Formula3:BSR
BsR=wSB+UBR+wGDP
RuleofThumb2:BSR
Fangraphs
Mandrik21
Fig1.CorrelationMatrixofTraditionalStatistics
CreatedonRStudio
Fig2.CorrelationMatrixofAdvancedSabermetrics
CreatedonRStudio
Mandrik22
Table1:Summaryofmodels
Table2:JoseReyesPredictions
Jose Reyes Predictions 2012 (All Star) Model Predicted Results Actual Results Differential
1 0.054 0.475 0.422 2 0.053 0.475 0.423 3 0.518 0.475 0.043 4 0.052 0.475 0.424 5 $5,326,725.10 $17,666,667.00 $12,339,941.90 6 $6,119,315.00 $17,666,667.00 $11,547,352.00 7 $5,484,887.47 $17,666,667.00 $12,181,779.53 8 $4,190,287.00 $17,666,667.00 $13,476,380.00
Table3:JeffKeppingerPredictions
Jeff Keppinger Predictions 2013 (Average) Model Predicted Results Actual Results Differential
1 0.051 0.389 0.338 2 0.052 0.389 0.337 3 0.406 0.389 0.017 4 0.045 0.389 0.344 5 $5,776,533.00 $4,000,000.00 $1,776,533.00 6 $5,410,481.00 $4,000,000.00 $1,410,481.00 7 $4,656,243.14 $4,000,000.00 $656,243.14 8 $2,912,903.00 $4,000,000.00 $1,087,097.00
Model Dependent Variable Intercept + Independent Variable(s) coefficients R Squared P-Value
1 Winning Percentage .04515+ .00005217wRC+ + .0001459BsR 0.03975 < 2.2e-16 2 Winning Percentage .04523 + .00005164wRC+ 0.06762 < 2.2e-16 3 Winning Percentage .40551 + .29881wOBA 0.02925 < 2.2e-16 4 Winning Percentage .04524 + .00003256H +.00001309SO 0.02859 4.00E-15 5 Salary -1005676 + 50201wRC+ + -118827BsR 0.09118 < 2.2e-16 6 Salary -1070287 + 50631wRC+ 0.0848 < 2.2e-16 7 Salary -7497207 + 34526847wOBA 0.09613 < 2.2e-16 8 Salary -159843 + 18254H + 25516SO 0.05899 < 2.2e-16
Mandrik23
Table4:MikeAvilesPredictions
Mike Aviles Predictions 2015 (Bench) Model Predicted Results Actual Results Differential
1 0.045 0.503 0.46 2 0.045 0.503 0.46 3 0.406 0.503 0.10 4 0.045 0.503 0.46 5 $2,258,857.30 $3,500,000.00 $1,241,142.70 6 $2,473,883.00 $3,500,000.00 $1,026,117.00 7 $1,894,095.38 $3,500,000.00 $1,605,904.62 8 $2,642,031.00 $3,500,000.00 $857,969.00
Model1:WinPct=wRC++BsR
Mandrik24
Model2:WinPct=wRC+
Mandrik25
Model3:WinPct=wOBA
Model4:WinPct=H+SO
Mandrik26
Model5:Salary=wRC++BsR
Model6:Salary=wRC+
Mandrik27
Model7:Salary=wOBA
Model8:Salary=H+SO
Mandrik28
Table5:FreeAgentData(first20rows)UsedinResearch(1985-2011)
Mandrik29
Mandrik30