intro to forecasting - part 2 - hrug

Post on 16-Jul-2015

113 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Intro to Forecasting in R Part Deux!

Houston R Users Group Ed Goodwin, CFA

Last time at HRUG…• we left off discussing linear

trend models.

• there was something VERY wrong with this forecast.

• WHAT WAS IT?

How accurate was it?

RMSE Training = 38.3

RMSE Test = 76.6

Our forecast was really inaccurate!

• the 95% confidence interval is doing a poor job of predicting recent values.

• there seems to be a seasonal trend in the data that is increasing over time.

• we are not accounting for things like lower cost of travel and population growth that are affecting the data

The solution?

We need to transform the data!

What are transformations?Transformations replace data with a function of that data

Types of transformations• convenience transforms - changing scale to make

calculations easier (percentages, absolute values, Fahrenheit to Celsius, miles to kilometers)

• log transforms - for compounded data (CPI inflators, market returns, power laws)

• skew reductions - reduce left or right skewness

• additive transforms - makes multiplicative relationships linear

• spread transforms - reduce heteroskedasticity

Some common transforms

TRANSFORM EXAMPLE

Reciprocal x = 1/x

Log x = log(x)

Roots x = x^2; x=sqrt(x)

Common scale y = 1:100; x = 1/y

Forecast with transform• Use log( ) to account for growth factor

in Air Passenger data

More accurate?

RMSE Training = 0.134 RMSE Test = 0.167

Don’t forget to transform the data back!

Back Transformed Plot

Linear Models• lm( ) function to

create a linear model

• tslm( ) is an lm( ) wrapper and adds season and trend variables

• season is a dummy variable based on data decomposition

What does our model look like?

• Use the summary( ) function to get details

How well does it fit?

• Use the residuals( ) function to look at the std error

Plot of Log Forecast using seasonal Dummy Variable

Creating our own dummy variables

• Time series with ‘1’ where variable is TRUE, ‘0’ where FALSE

• Factors are a good place to start when creating dummy variables

• Always have n-1 dummy variables (e.g. days of week would have 6 dummy variables, since all ‘0’ would represent one of the days)

Examples of dummy variables

• Employment status (for credit scores)

• Bank holidays (for econometrics and market data)

• Black Friday and Christmas shopping season for retail sales

• Days of critical events that move (e.g. Super Bowl Sunday, worker strikes, natural disasters)

Easter Holiday 2014-2017• Let’s say you’re in charge

of forecasting sales of Cadbury Eggs for Cadbury Schweppes. The sales peak near the Easter holiday in the US.

• Easter falls at various times of the year (March or April)

• Solution? Create a dummy variable for Easter

EASTER HOLIDAY

2014 April 20th

2015 April 5th

2016 March 27th

2017 April 16th

Easter Dummy Variable

top related