intro to forecasting - part 2 - hrug
TRANSCRIPT
Intro to Forecasting in R Part Deux!
Houston R Users Group Ed Goodwin, CFA
Last time at HRUG…• we left off discussing linear
trend models.
• there was something VERY wrong with this forecast.
• WHAT WAS IT?
How accurate was it?
RMSE Training = 38.3
RMSE Test = 76.6
Our forecast was really inaccurate!
• the 95% confidence interval is doing a poor job of predicting recent values.
• there seems to be a seasonal trend in the data that is increasing over time.
• we are not accounting for things like lower cost of travel and population growth that are affecting the data
The solution?
We need to transform the data!
What are transformations?Transformations replace data with a function of that data
Types of transformations• convenience transforms - changing scale to make
calculations easier (percentages, absolute values, Fahrenheit to Celsius, miles to kilometers)
• log transforms - for compounded data (CPI inflators, market returns, power laws)
• skew reductions - reduce left or right skewness
• additive transforms - makes multiplicative relationships linear
• spread transforms - reduce heteroskedasticity
Some common transforms
TRANSFORM EXAMPLE
Reciprocal x = 1/x
Log x = log(x)
Roots x = x^2; x=sqrt(x)
Common scale y = 1:100; x = 1/y
Forecast with transform• Use log( ) to account for growth factor
in Air Passenger data
More accurate?
RMSE Training = 0.134 RMSE Test = 0.167
Don’t forget to transform the data back!
Back Transformed Plot
Linear Models• lm( ) function to
create a linear model
• tslm( ) is an lm( ) wrapper and adds season and trend variables
• season is a dummy variable based on data decomposition
What does our model look like?
• Use the summary( ) function to get details
How well does it fit?
• Use the residuals( ) function to look at the std error
Plot of Log Forecast using seasonal Dummy Variable
Creating our own dummy variables
• Time series with ‘1’ where variable is TRUE, ‘0’ where FALSE
• Factors are a good place to start when creating dummy variables
• Always have n-1 dummy variables (e.g. days of week would have 6 dummy variables, since all ‘0’ would represent one of the days)
Examples of dummy variables
• Employment status (for credit scores)
• Bank holidays (for econometrics and market data)
• Black Friday and Christmas shopping season for retail sales
• Days of critical events that move (e.g. Super Bowl Sunday, worker strikes, natural disasters)
Easter Holiday 2014-2017• Let’s say you’re in charge
of forecasting sales of Cadbury Eggs for Cadbury Schweppes. The sales peak near the Easter holiday in the US.
• Easter falls at various times of the year (March or April)
• Solution? Create a dummy variable for Easter
EASTER HOLIDAY
2014 April 20th
2015 April 5th
2016 March 27th
2017 April 16th
Easter Dummy Variable