baseball sta)s)cs and an introducon to...
TRANSCRIPT
Baseball sta)s)cs and an introduc)on to R
Overview
DiscussionofBigDataBaseballWatchhalfaninningofthe2014All-stargame
Reviewofstructureddataandclassicbaseballsta?s?cs
Introduc?ontoR!
Discussion of Big Data Baseball chapter 1
Sta?s?cscangetusbeyondwhatwecan“see”ifwetrustthem(Phillip)• Shouldwejusttrusttheanalyses,whataboutplayerswhohave“heart”?(James)• Howdowemaximizeourdecisionswithbothanalysisandhumandecisions(Campbell)
Howtofindandquan?fytherelevantdata?(on-basepercentage,etc.)(Henne)• New/differentsta?s?csandanalyzescangivepowerfulnewinsights(Aodhan)• Newcomputa?onalsystemscanshednewinsights(Kefentse)• Whatisthevalueofdifferenthittypes,e.g.,singlesvs.homeruns?(Julia)
Whydidn’tanyonerealizethatequidistantspacingofdefensiveplayerswassubop?mal?(Helen)
• Yes,thedefensivechangeswillbeexplainedmoreinfuturechapters(Ian)
Howcanwemakechangesthatarewithinourreach?(Maddie)• Withonlya$15millionbudget(Sheyla)• Andtakingonchallengingsitua?ons(Ma_)
Rulesofthegamearethesame,butthewayplayersareacquiredhaschanged(Christopher)
2014 All-star game
Na#onal American Order Player Posi#on Order Player Posi#on
1 AndrewMcCutchen CF 1 DerekJeter SS
2 YasielPuig RF 2 MikeTrout LF
3 TroyTulowitzki SS 3 RobinsonCanó 2B
4 PaulGoldschmidt 1B 4 MiguelCabrera 1B
5 GiancarloStanton DH 5 JoséBau?sta RF
6 AramisRamírez 3B 6 NelsonCruz DH7 ChaseUtley 2B 7 AdamJones CF
8 JonathanLucroy C 8 JoshDonaldson 3B
9 CarlosGómez LF 9 SalvadorPérez C
AdamWainwright P FélixHernández P
Score card
sta)s)cs and structured data
sta#s#cs:anumericalsummaryofdataSta#s#cs:isthemathema?csofcollec?ng,organizingandinterpre?ngdata
Describing and summarizing data
sta?s?csthatareusedtosummarizeadataset(sampleofdata)arecalleddescrip#vesta#s#csExamples:
• Maximumvalueinthedataset• Minimumvalueinthedataset• Meanvalueofthedataset
Common baseball descrip)ve sta)s)cs
G=games• Numberofgamesaplayerpar?cipatedin(outof162gamesinaseason)
AB=atbats• Numberof?mesaba_erwashiqngandeithergotahitorgotout(doesnotincludewalksorreachingbaseonanerror)
R=runs• Numberofrunstheplayerscored
H=hit• Numberof?mesaplayerhittheballongotonbaseorhitahomerun(sumof1B,2B,3B,HR)
Common baseball sta)s)cs
BB=baseonballs(walks)• Numberof?mesaplayergotonbasedotothepitcherthrowing4balls
RBI=Runsba_edin• Howmanyrunsscoredasaresultofaplayergeqngahit
SB=stolenbases• Numberof?mesarunneradvancedby‘stealingabase’
Common derived baseball sta)s)cs
AVG=baqngaverage• Hits/(Atbats)=H/AB=(1B+2B+3B+HR)/AB
SLG=sluggingpercentage• (1*1B+2*2B+3*3B+4*4B)/AB
Lahman Database – Individual player yearly baIng sta)s)cs
Cases
Variables
DatatakenfromtheLahmanBaqngdataset
Example Dataset – Individual player yearly sta)s)cs
Cases
Variables
Categorical and Quan)ta)ve Variables
Cases
CategoricalVariable Quan?ta?veVariable
Another Dataset – 2014 Team sta)s)cs Cases
Variables
A Ques)on
Q:Whatprogramminglanguagedothepiratesuse?A:Arrrr
Q:Worstjokeofthesemester?A:Waitandsee…
Basics of R
Everyonelogonto: h_ps://asterius.hampshire.edu/Createanewscripttokeepnotesaboutyourwork
RStudio layout
3.Environment1.RMarkdownandscripts
2.Console4.Files,etc.
RStudio layout
2.Console
Rasacalculator>2+2>7*5
R Basics
Arithme?c:>2+2>7*5
Assignment:
>a<-4>b<-7>D<-a+b>D[1]11
Numberjourney…
Number journey
>a<-7>b<-52>d<-a*b>d[1]364
Character strings and booleans
>a<-7 >s<-"helloeveryone">b<-TRUE>class(a)[1]numeric>class(s)[1]character
Func)ons
Func?onsuseparenthesis:func?onName(x)>sqrt(49) >tolower("HELLOeveryone")Togethelp>?sqrtOnecanaddcommentstoyourcode>sqrt(49)#thistakesthesquarerootof49
GeIng help
Youcangethelpaboutafunc?oninRusingthe?command.
>?sqrt
Vectors
Vectorsareorderedsequencesofnumbersorle_ersThec()func?onisusedtocreatevectors
>v<-c(5,232,5,543) Onecanaccesselementsofavectorusingsquarebrackets[]>v[3]#whatwilltheanswerbe?Workswithstringstoo>z<-c("a","b","c","d")>z[3] Canaddnamestovectorelements>names(v)<-c(“first",“second",“third",“fourth")
Ques)on?
Q:WhatkindofgradesdidthePiratesgetinSta?s?csclass?A:HighSeas
Q:Worstjokeofthesemester?A:Staytuned…
Data types: data frames
DataFramesarecollec?onsofvectorsofthatsamelength.• Eachvectorcanhaveadifferenttypeofdata
Let’s look at a data frame
Loadafunc?onIwroteintoRbytyping: source('/home/shared/baseball_stats_2017/baseball_class_functions.R')
Ifyouloadthiscorrectlyyoushouldhaveafunc?oninyourGlobalEnvironmentcalledget.Lahman.batting.data()
Let’s look at a data frame
Usethisfunc?ontogetbaqngdataonaspecificplayer: > card.data <- get.Lahman.batting.data("Kelly", "Shoppach") > View(card.data)
Let’s look at a data frame
Geqngnumberofgames(G)Kellyplayedeachseason: > card.data$G [1] 9 41 59 112 89 63 87 28 48 35 1
Compu)ng sta)s)cs
Onecomputesta?s?csonvectors(columnsofadataframe)> sum(card.data$G)
[1] 572 Or we can assign vectors in a data frame to an object > games <- card.data$G
>games
Prac)ce R with DataCamp!
Trychapters1and2ontheintroduc?ontoRDataCamptutorialh_ps://www.datacamp.com/courses/free-introduc?on-to-r
Readchapter2ofBigDataBaseballandpostaquoteandreac?onbymidnightonWednesday