berg-finalprojectisaacberg.com/documents/algorithms/berg-finalproject.pdf · title: microsoft word...

13
Isaac Berg STAT 139 Final Project November 14, 2016 Introduction: Ever since the NHL was founded in 1917, teams have been searching for different ways in which they can get a leg up on their competition. As constantly evolving technology has made it easier to take more in depth statistics within games, teams have turned to analysts to use this data and find different ways to give their team a competitive edge. Starting during the 2015-2016 season hundreds of stats were kept on every player who played at least one game and posted to an excel file that is open to the public. Using this data set, I will look at six different potential relationships that I think might be found within the data. These relationships include points scored vs. yearly salary, points vs. age, goals vs. age, assists vs. age, goalie heights vs. starts, and players’ month of birth. Data Used: I pulled this data from a website called hockeyabstract.com, pulled the data I needed out onto another Excel file and then loaded I as a CSV file into R. In order to avoid outliers of players who were only called up for short stints I decided to only use data from players who had played a minimum of 20 games in the season. Ø install.packages("gsheet") Ø > > library(gsheet) Ø Hockey<- (gsheet2tbl("https://docs.google.com/spreadsheets/d/17h48ZPLqbV8qv2m EJDZ3uIyf9USypixQq1L1aHPPLY0/edit?usp=sharing")) Ø Mainhockey<- (gsheet2tbl(“https://docs.google.com/spreadsheets/d/1qQoY7ofuQz8iK4h1 zJtzk4zSSngrULK-nnc7-4JJv74/edit?usp=sharing)) Ø > mainhockey<- (mainhockey [which(GP>20),]) Ø >fullgoalie<- (gsheet2tbl("https://docs.google.com/spreadsheets/d/1E04cp2XPno5e4acy XR6bpMylszg_NoHpHrO7ggTEFYo/edit?usp=sharing")) Ø > goalie<- (fullgoalie [which(fullgoalie$GS>5),])

Upload: others

Post on 25-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Berg-FinalProjectisaacberg.com/documents/Algorithms/Berg-FinalProject.pdf · Title: Microsoft Word - Berg-FinalProject .docx Created Date: 4/26/2018 4:40:08 PM

IsaacBergSTAT139FinalProjectNovember14,2016Introduction: Ever since the NHL was founded in 1917, teams have been searching fordifferentways inwhich they can get a leg up on their competition. As constantlyevolving technology has made it easier to take more in depth statistics withingames,teamshaveturnedtoanalyststousethisdataandfinddifferentwaystogivetheirteamacompetitiveedge. Startingduringthe2015-2016seasonhundredsofstatswerekeptoneveryplayerwhoplayedatleastonegameandpostedtoanexcelfilethatisopentothepublic.Usingthisdataset,Iwilllookatsixdifferentpotentialrelationships that I think might be found within the data. These relationshipsincludepointsscoredvs.yearlysalary,pointsvs.age,goalsvs.age,assistsvs.age,goalieheightsvs.starts,andplayers’monthofbirth.DataUsed: Ipulledthisdatafromawebsitecalledhockeyabstract.com,pulledthedataIneededoutontoanotherExcelfileandthenloadedIasaCSVfileintoR.InordertoavoidoutliersofplayerswhowereonlycalledupforshortstintsIdecidedtoonlyusedatafromplayerswhohadplayedaminimumof20gamesintheseason.

Ø install.packages("gsheet")Ø >>library(gsheet)Ø Hockey<-

(gsheet2tbl("https://docs.google.com/spreadsheets/d/17h48ZPLqbV8qv2mEJDZ3uIyf9USypixQq1L1aHPPLY0/edit?usp=sharing"))

Ø Mainhockey<-(gsheet2tbl(“https://docs.google.com/spreadsheets/d/1qQoY7ofuQz8iK4h1zJtzk4zSSngrULK-nnc7-4JJv74/edit?usp=sharing))

Ø >mainhockey<-(mainhockey[which(GP>20),])Ø >fullgoalie<-

(gsheet2tbl("https://docs.google.com/spreadsheets/d/1E04cp2XPno5e4acyXR6bpMylszg_NoHpHrO7ggTEFYo/edit?usp=sharing"))

Ø >goalie<-(fullgoalie[which(fullgoalie$GS>5),])

Page 2: Berg-FinalProjectisaacberg.com/documents/Algorithms/Berg-FinalProject.pdf · Title: Microsoft Word - Berg-FinalProject .docx Created Date: 4/26/2018 4:40:08 PM

Data/Results:FigureA:SalaryVs.Points

Ø plot(mainhockey$Salary,mainhockey$PTS,main="SalaryVs.Points")

>cor.test(mainhockey$Salary,mainhockey$PTS,main=”Salaryvs.Points”) Pearson'sproduct-momentcorrelationdata:mainhockey$Salaryandmainhockey$PTSt=17.172,df=663,p-value<2.2e-16alternativehypothesis:truecorrelationisnotequalto095percentconfidenceinterval:0.49989360.6053295sampleestimates:cor0.5548354This plot looks to have a slight positive correlation and the cor.test() does provethis,althoughit isonlyaslightpositivecorrelationat .55.Inanattempttoshowabetter positive correlation, I decided to get rid of all defensemen as they aregenerallypaidtopreventgoalsratherthanscorethem.Aftertakingthesevaluesout,Ire-didboththescatterplotandcorrelationtest.

Page 3: Berg-FinalProjectisaacberg.com/documents/Algorithms/Berg-FinalProject.pdf · Title: Microsoft Word - Berg-FinalProject .docx Created Date: 4/26/2018 4:40:08 PM

Ø >salvpoints<-(mainhockey[which(Pos!="D"),])Ø >plot(salvpoints$Salary,salvpoints$PTS,main=”Salaryvs.Pointsfor

Forwards”)Ø >cor.test(salvpoints$Salary,salvpoints$PTS)Ø Ø Pearson'sproduct-momentcorrelationØ Ø data:salvpoints$Salaryandsalvpoints$PTSØ t=21.18,df=590,p-value<2.2e-16Ø alternativehypothesis:truecorrelationisnotequalto0Ø 95percentconfidenceinterval:Ø 0.60887690.7006888Ø sampleestimates:Ø corØ 0.6572141

FigureB:AgeVs.Points>plot(mainhockey$Age,mainhockey$PTS,main=”Agevs.Points”)>cor.test(mainhockey$Age,mainhockey$PTS) Pearson'sproduct-momentcorrelationdata:mainhockey$Ageandmainhockey$PTSt=0.50098,df=664,p-value=0.6166alternativehypothesis:truecorrelationisnotequalto095percentconfidenceinterval:

Page 4: Berg-FinalProjectisaacberg.com/documents/Algorithms/Berg-FinalProject.pdf · Title: Microsoft Word - Berg-FinalProject .docx Created Date: 4/26/2018 4:40:08 PM

-0.056617640.09526932sampleestimates:cor0.01943799

Thisdatahasaveryslightpositivecorrelationat.019,butthisfailstorejetthenull.Itwouldmakesensethatthisdatawouldbenormallydistributedasyoungerplayerswouldscorelesspointsastheyareintroducedtothegame,scoretheirmostpointsastheirexperienceandathleticismpeak,andthendropinpointsastheirbodiesbegintoage.>plot(model)>hockey2<-hockey[,c("G","Age","Month","GP","Salary")]Error:object'hockey'notfound>hockey2<-mainhockey[,c("G","Age","Month","GP","Salary")]>pairs(hockey2)>model1<-lm(PTS~Age+Month+GP+Pos+Salary,data=mainhockey)>summary(model1)Call:lm(formula=PTS~Age+Month+GP+Pos+Salary,data=mainhockey)Residuals:Min1QMedian3QMax-27.786-8.132-0.6116.52942.707Coefficients:EstimateStd.ErrortvaluePr(>|t|)

Page 5: Berg-FinalProjectisaacberg.com/documents/Algorithms/Berg-FinalProject.pdf · Title: Microsoft Word - Berg-FinalProject .docx Created Date: 4/26/2018 4:40:08 PM

(Intercept)15.335733.409214.4988.12e-06***Age-1.167400.11368-10.269<2e-16***Month0.020760.134760.1540.878GP0.532710.0265120.098<2e-16***PosC/LW-2.562432.13323-1.2010.230PosC/LW/RW-3.456315.80289-0.5960.552PosC/N5.0841311.507690.4420.659PosC/RW0.592912.554500.2320.817PosC/RW/LW-1.965874.78371-0.4110.681PosD-9.615611.28959-7.4562.88e-13***PosD/RW6.071838.157490.7440.457PosLW1.986151.919291.0350.301PosLW/C-1.990462.33584-0.8520.394PosLW/C/RW-10.1413511.47687-0.8840.377PosLW/RW-3.185402.03791-1.5630.119PosLW/RW/C3.730208.138040.4580.647PosRW1.188131.904980.6240.533PosRW/C-2.307663.34250-0.6900.490PosRW/LW-1.677971.94115-0.8640.388Salary4.240580.2285818.551<2e-16***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1Residualstandarderror:11.41on645degreesoffreedom(1observationdeletedduetomissingness)MultipleR-squared:0.6629, AdjustedR-squared:0.653F-statistic:66.77on19and645DF,p-value:<2.2e-16FigureC:AgeVs.Goals>plot(mainhockey$Age,mainhockey$G,main=”Agevs.Goals”)>cor.test(mainhockey$Age,mainhockey$G) Pearson'sproduct-momentcorrelationdata:mainhockey$Ageandmainhockey$Gt=-0.29738,df=664,p-value=0.7663alternativehypothesis:truecorrelationisnotequalto095percentconfidenceinterval:-0.087435000.06448892sampleestimates:cor-0.01153963

Page 6: Berg-FinalProjectisaacberg.com/documents/Algorithms/Berg-FinalProject.pdf · Title: Microsoft Word - Berg-FinalProject .docx Created Date: 4/26/2018 4:40:08 PM

MuchlikeFigureBthereisonlyaslightnegativecorrelationbetweenageandgoalsscoredbutitissosmallthatthereshouldbenocorrelationconsidered.DosamethingasfigureBhere>model<-lm(G~Age+Month+GP+Pos+Salary,data=mainhockey)>summary(model)Call:lm(formula=G~Age+Month+GP+Pos+Salary,data=mainhockey)Residuals:Min1QMedian3QMax-12.426-3.808-0.6502.85424.306Coefficients:EstimateStd.ErrortvaluePr(>|t|)(Intercept)7.8047361.6649884.6883.38e-06***Age-0.5041300.055521-9.080<2e-16***Month0.0052920.0658160.0800.93594GP0.2017540.01294515.586<2e-16***PosC/LW-0.1911301.041825-0.1830.85450PosC/LW/RW-1.0316052.834011-0.3640.71597PosC/N1.6991495.6201130.3020.76250PosC/RW2.2930831.2475641.8380.06651.

Page 7: Berg-FinalProjectisaacberg.com/documents/Algorithms/Berg-FinalProject.pdf · Title: Microsoft Word - Berg-FinalProject .docx Created Date: 4/26/2018 4:40:08 PM

PosC/RW/LW0.5052002.3362640.2160.82887PosD-6.7522470.629807-10.721<2e-16***PosD/RW0.8387733.9839460.2110.83331PosLW2.5423460.9373422.7120.00686**PosLW/C-0.7379911.140776-0.6470.51791PosLW/C/RW-5.1279385.605064-0.9150.36060PosLW/RW1.0775990.9952711.0830.27934PosLW/RW/C1.7902143.9744470.4500.65255PosRW1.8753120.9303532.0160.04425*PosRW/C-0.5307681.632406-0.3250.74518PosRW/LW1.1898150.9480151.2550.20991Salary1.5948990.11163614.287<2e-16***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1Residualstandarderror:5.57on645degreesoffreedom(1observationdeletedduetomissingness)MultipleR-squared:0.6087, AdjustedR-squared:0.5971F-statistic:52.8on19and645DF,p-value:<2.2e-16

FigureD:AgeVs.Assists>plot(mainhockey$Age,mainhockey$A,main=”Agevs.Assists)>cor.test(mainhockey$Age,mainhockey$A) Pearson'sproduct-momentcorrelationdata:mainhockey$Ageandmainhockey$A

Page 8: Berg-FinalProjectisaacberg.com/documents/Algorithms/Berg-FinalProject.pdf · Title: Microsoft Word - Berg-FinalProject .docx Created Date: 4/26/2018 4:40:08 PM

t=1.0066,df=664,p-value=0.3145alternativehypothesis:truecorrelationisnotequalto095percentconfidenceinterval:-0.037046940.11466691sampleestimates:cor0.03903494

MuchlikeFigureBthereisonlyaslightpositivecorrelationbetweenageandassistsbutitissosmallthatthereshouldbenocorrelationconsidered.>model2<-lm(A~Age+Month+GP+Pos+Salary,data=mainhockey)>summary(model2)Call:lm(formula=A~Age+Month+GP+Pos+Salary,data=mainhockey)Residuals:Min1QMedian3QMax-22.671-5.036-0.8643.91234.242Coefficients:EstimateStd.ErrortvaluePr(>|t|)(Intercept)7.531002.356613.1960.00146**

Page 9: Berg-FinalProjectisaacberg.com/documents/Algorithms/Berg-FinalProject.pdf · Title: Microsoft Word - Berg-FinalProject .docx Created Date: 4/26/2018 4:40:08 PM

Age-0.663270.07858-8.440<2e-16***Month0.015470.093150.1660.86815GP0.330950.0183218.063<2e-16***PosC/LW-2.371301.47459-1.6080.10830PosC/LW/RW-2.424714.01123-0.6040.54574PosC/N3.384987.954650.4260.67059PosC/RW-1.700181.76579-0.9630.33599PosC/RW/LW-2.471073.30673-0.7470.45516PosD-2.863360.89142-3.2120.00138**PosD/RW5.233065.638840.9280.35373PosLW-0.556201.32670-0.4190.67519PosLW/C-1.252471.61464-0.7760.43821PosLW/C/RW-5.013417.93335-0.6320.52765PosLW/RW-4.263001.40870-3.0260.00258**PosLW/RW/C1.939995.625390.3450.73031PosRW-0.687181.31681-0.5220.60195PosRW/C-1.776892.31049-0.7690.44214PosRW/LW-2.867791.34181-2.1370.03295*Salary2.645680.1580116.744<2e-16***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1Residualstandarderror:7.884on645degreesoffreedom(1observationdeletedduetomissingness)MultipleR-squared:0.5963, AdjustedR-squared:0.5844F-statistic:50.15on19and645DF,p-value:<2.2e-16ofmonths

Ø hist(mainhockey$Month,main="DistributionofMonths")

Page 10: Berg-FinalProjectisaacberg.com/documents/Algorithms/Berg-FinalProject.pdf · Title: Microsoft Word - Berg-FinalProject .docx Created Date: 4/26/2018 4:40:08 PM

FigureF:Goalieheightsvs.Starts:AtrendIhavenoticedoverthelastfewyearsistheincreasingamountofemphasisplacedonhavingtallergoaliesonNHLteamsastheycantakeupmorespaceinthenetwithoutsacrificingquickness,aseveryoneintheNHLhasbecomesoquickthatthishasalmostbecomenegligible.Idecidedtotestwhethertherewasanycorrelationbetweentheheightofagoalieandthenumberofgamestheystartinaseason.

Ø plot(goalie$HT,goalie$GS,main="Heightvs.GamesStarted")Ø >cor.test(goalie$HT,goalie$GS)Ø Ø Pearson'sproduct-momentcorrelationØ Ø data:goalie$HTandgoalie$GSØ t=1.2094,df=69,p-value=0.2306Ø alternativehypothesis:truecorrelationisnotequalto0Ø 95percentconfidenceinterval:Ø -0.092331230.36510728Ø sampleestimates:Ø corØ 0.1440761

Page 11: Berg-FinalProjectisaacberg.com/documents/Algorithms/Berg-FinalProject.pdf · Title: Microsoft Word - Berg-FinalProject .docx Created Date: 4/26/2018 4:40:08 PM

FigureG:PairsDataPlot:>hockey2<-mainhockey[,c("G","Age","Month","GP","Salary")]>pairs(hockey2)

Page 12: Berg-FinalProjectisaacberg.com/documents/Algorithms/Berg-FinalProject.pdf · Title: Microsoft Word - Berg-FinalProject .docx Created Date: 4/26/2018 4:40:08 PM

Conclusion: ForFigureA,thereisapositivecorrelationfoundbetweensalaryandpointsscoredwhichmeansthatthehigherthesalarythatispaidgenerallyequatestomorepoints scoredby thatplayer.After takingoutplayerswhoareonlycategorizedasdefensemenandrunningthetestsagainthecorrelationdidinfactjumpfrom.55tojustover.65.ThisshowshowteamsintheNHLplacesuchahighvalueonpointsfortheirforwards,asitistheeasiestwaytogeneralizesuccessontheice. For Figure B, there is a very slight positive correlation between age andpoints scored but it is close enough to zero that we fail to reject the null of nocorrelation. I then created a linear model to test points against all of the othervariables.Basedoffthismodelitcanbeassumedthattheaveragenumberofpointsyouwouldexpectanyplayertoscoreis15.Eachyearoldertheplayer is fromtheminimumageof19youcangenerallytake1pointoffoftheirtotal,eachextragameplayedovertheminimum20gamesplayedwillgenerallyaddabout .5points,andeachextramilliondollarsaddedontotheplayerssalarywillgenerallyincreasetheirpointsby4. ForFigureC, therewasavery slightnegative correlationbetweenageandgoalsscoredbutitwassoclosetozerothatitfailedtorejectthenullhypothesisofnocorrelation.NextIcreatedalinearmodelofthisdataandfoundthat;theaverageassumed number of foals scored was about 8 goals, for each year added to theminimum age of 19 you can generally take off .5 of a goal, for each game playedaddedfromtheminimumof20addsabout.2ofagoal,andforeachmillionaddedtoaplayerssalarygenerallyaddson1.6goals.Ialsoincludedtheqqplotofthislinearmodeltestedagainsttheactualdata,whichwasaboutthesameforFiguresB-D,andconcludedthatthedatawasnormallydistributedexcept inthehighestandlowestsections. For FigureD, therewas a very slight positive correlationbetween age andassists earned but it was close enough to zero that it failed to reject the nullhypothesis of no correlation.Next I created a linearmodel of thisdata and foundthattheaveragenumberofassistswas7.5,foreachyearaddedontotheminimumof 19 you can take away .66of an assist, for each gameplayedover the20 gameminimum you can add on .33, and for each extramillion earned you can add 2.6assists. ForFigureE, Idecided to lookat thisdatabecauseof somethingcalled the“baseball effect”. This is the idea that players who are born earlier in the year(January, February, March) are usually a year older than other people their agegroup foryouthhockeybasedoffof the cutoff and thusgenerallyperformbetterand receivemore attention from pro scouts when the time comes. Although thisdata did not show anything incredibly significant, the highest concentration ofplayerswere born in January,whichwould prove this point to some extent. Thisphenomena has already been looked into by many other people, for example

Page 13: Berg-FinalProjectisaacberg.com/documents/Algorithms/Berg-FinalProject.pdf · Title: Microsoft Word - Berg-FinalProject .docx Created Date: 4/26/2018 4:40:08 PM

(http://fans.canadiens.nhl.com/community/topic/21826-study-correlation-of-birth-month-and-of-canadians-in-the-nhl/). For Figure F, There is a very slight positive correlation between goalieheightsandgamesstarted,butat.144itisnotstrongenoughtorejectthenull.Thisdoesnotshowanysignificant informationthat tallergoaltendersgetstartedmoreoftenthanshorterones,whichwouldmakesenseastherearemanymorefactorsinplaywhendecidingagoalieforeachNHLteamthanjustheight. For Figure G, I used the pairs() function to explore different relationshipswithinthedata.ThetwomainthingsthatInoticedwerethattherewasapositiveexponential relationship between games played and goals and that there was alinearrelationshipbetweensalaryandgamesplayed.