harness racing and sas

Post on 14-Dec-2014

88 Views

Category:

Sports

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Harvard Stats 135 midterm project evaluating SAS techniques.

TRANSCRIPT

HARNESS RACING AND SASUSING SAS TO MODEL HORSE RACES

• “Past Performance” from TrackMaster for races September 26, 2013 at Yonkers Raceway

• Published in advance of the race

• Cost: $1.50

• Comes in XML format – parsed using python

• Contains 10 most recent PPs for each horse racing that day

• 12 races x 8 horses x 10 past performances = 960 records

• Variables of use: Lengths back at each quarter, final time, lead final time, gait, age (meta), track condition, track name, track length

• Created race-level, horse-race-level, and longitudinal data sets for different aspects of this analysis

DATA SET

GAIT AND CONDITION• Hypothesis: Gait and track condition influence race time

• Gait

• Binary: Pacers and Trotters• Each race is one or the other• Each horse is one or the other

• Condition

• Categorical: Fast, Good, or Sloppy• Each race categorized into one

• Created and cleaned race-level data set

• Means test showed means are different for both variables

• T-test showed these differences are statistically significant

REMOVING OUTLIERS

REMOVING OUTLIERS

GAIT T-TEST

CONDITION T-TEST

CORRELATION: LENGTHS BACK AT CALLS• Some horses pull away early, others seem to wait for the

last quarter to go to the front

• TrackMaster reports lengths back from lead and calls at each quarter

• Lengths are recorded as fractional numbers (to the quarter) and as parts of horse

• Nose• Head• Neck

• Additional complication: “costly breaks” of pace and disqualification

• Still not happy – strange lengths back for winners at final

CORRELATION OF LENGTHS BACK BY QUARTER

CORRELATION OF LENGTHS BACK BY QUARTER

• Goal: Quantify how much horses slow down with age

• Merged metadata for each horse with past performance data

• Single-variable regression analysis of mean data set

• Found that age is not a great predictor of speed

• Age: Discrete, yet not categorical

AGE AND SPEED

• Longitudinal data set

• Created dummy variables for past and present track conditions, gaits, and track sizes

• Used SAS’s “Lag” and “Last” Features

• Removed disqualified races

• Modeled race time based on current race conditions and two races prior

MULTIVARIATE REGRESSION

Label ParameterEstimate

StandardError

t Value Pr > |t|

Intercept 104.67788 4.81142 21.76 <.0001

Lag final time

0.01412 0.03120 0.45 0.6510

Lag2 final time

0.11361 0.02975 3.82 0.0001

Pacer -3.68185 0.21247 -17.33 <.0001

Fast -0.77005 0.38954 -1.98 0.0484

Sloppy 0.86942 0.43605 1.99 0.0465

Age 0.05312 0.04023 1.32 0.1871

5/8 Track -2.74052 0.20313 -13.49 <.0001

1 Track -3.18411 0.47824 -6.66 <.0001

MULTIVARIATE REGRESSION

Label ParameterEstimate

StandardError

t Value Pr > |t|

Fast lag 0.35883 0.38598 0.93 0.3528

Sloppy lag 0.48532 0.43151 1.12 0.2610

Fast lag2 0.09472 0.37245 0.25 0.7993

Sloppy lag2

-0.39904 0.42068 -0.95 0.3431

5/8 Track lag

0.14639 0.23680 0.62 0.5366

1 Track lag 0.40192 0.51792 0.78 0.4379

5/8 track lag2

0.58564 0.21764 2.69 0.0073

1 track lag2

0.67260 0.49172 1.37 0.1717

Variables of Interest Control Variables

Final race times from previous races are not great determinants of final race time this race!

Predicting the Winner

RightWrong

• Used the coefficients from my multivariate regression and most recent two races for each horse

• Ranked horses by predicted race values

• But my bets weren’t great! But better than choosing at random!

• Reason: Low, low variance in race times among horses. Not enough predictive power in model, even with R^2 > 0.5

PREDICTION OF SEPTEMBER 26 RACES

• SAS’s LAG and LAST features are great for dealing with longitudinal data

• Most work was on the DATA steps, not the PROC steps

• My model was based on only 960 occurrences, 96 horses

• With more data, might model Pacers and Trotters separately, Conditions separately

• Still want to investigate lengths back for winning horses

• Learned much about SAS and about harness racing

FINAL THOUGHTS

top related