decision 411: class 11 - duke university

Decision 411: Class 11Decision 411: Class 11Today:Today:

ARIMA models with ARIMA models with regressorsregressorsForecasting new productsForecasting new productsReview of models: what to use and whenReview of models: what to use and when

Friday:Friday:Automatic forecasting softwareAutomatic forecasting softwarePolitical & ethical issues in forecastingPolitical & ethical issues in forecasting

ARIMA models with ARIMA models with regressorsregressorsBy adding By adding regressorsregressors to ARIMA models, you to ARIMA models, you can combine the power of multiple regression can combine the power of multiple regression and ARIMA:and ARIMA:

Ability to include “causal” variables (e.g., Ability to include “causal” variables (e.g., promotion effects) in a time series modelpromotion effects) in a time series modelAbility to fit the “optimal” time series model to Ability to fit the “optimal” time series model to residuals of a regression model (residuals of a regression model (⇒⇒ ““regression regression with ARIMA errors”)with ARIMA errors”)Ability to fit “trendAbility to fit “trend--stationary” as well as stationary” as well as “difference“difference--stationary” modelsstationary” models

Simplest case: ARIMA(1,0,0) + Simplest case: ARIMA(1,0,0) + regressorregressor= regression with AR(1) errors= regression with AR(1) errors

The ARIMA(1,0,0) forecasting equation isThe ARIMA(1,0,0) forecasting equation is

1 1t̂ tY Yμ φ −= +

When a When a regressorregressor XX is added, it becomesis added, it becomes

1 1 1 1ˆ ( )t t t tY Y X Xμ φ β φ− −= + + −

which is equivalent towhich is equivalent to

1 1 1ˆ ( )t t t tY X Y Xβ μ φ β− −− = + −

……i.e., the regression errors are i.e., the regression errors are assumed to be an AR (1) process assumed to be an AR (1) process

t tY Xβ−

The same AR transformation The same AR transformation is also applied to X!is also applied to X!

Regression with AR(1) errorsRegression with AR(1) errorsThis is the “workhorse” model of econometrics, This is the “workhorse” model of econometrics, since most regressions of econometric since most regressions of econometric variables have positively variables have positively autocorrelatedautocorrelated errors errors (DW<<2) with an autoregressive signature.(DW<<2) with an autoregressive signature.The AR(1) error process is essentially a proxy The AR(1) error process is essentially a proxy for the effects of other, for the effects of other, unmodeledunmodeled variables variables whose effects are slowly changing in timewhose effects are slowly changing in timeThe “CochraneThe “Cochrane--OrcuttOrcutt transformation” option in transformation” option in multiple regression fits the AR(1) error modelmultiple regression fits the AR(1) error modelEquivalently, you can fit an ARIMA(1,0,0) model Equivalently, you can fit an ARIMA(1,0,0) model with (up to 4) with (up to 4) regressorsregressors..

More general exampleMore general example

Consider an ARIMA(1,1,1) model*, Consider an ARIMA(1,1,1) model*, whose forecasting equation iswhose forecasting equation is

1111 −− −+= ttt eyy θφμˆ

……where where yytt = = YYtt −− YYtt--11

*This model is used here for purposes of illustration because it includes all ARIMA components

……i.e., the regression errors are i.e., the regression errors are assumed to be an ARIMA(1,1,1) process assumed to be an ARIMA(1,1,1) process

Equivalently:Equivalently:

Example, continuedExample, continuedIf a If a regressorregressor XX is added, the new equation isis added, the new equation is

)(ˆ 111111 −−− −+−+= ttttt xxeyy φβθφμ

……where where xxtt = = XXtt −− XXtt--11

t tY Xβ−

1 1 1 1 1ˆ ( )t t t t ty x y x eβ μ φ β θ− − −− = + − −

XX is differenced in is differenced in the same way as the same way as YY

The same AR The same AR transformation transformation is also applied.is also applied.

It assumes that the errors of the regression It assumes that the errors of the regression model are an ARIMA process.model are an ARIMA process.

What’s the logic of this approach?What’s the logic of this approach?By applying the same differencing and AR By applying the same differencing and AR transformations to both transformations to both YY and and XX, the ARIMA , the ARIMA model is effectively fitted to ,model is effectively fitted to ,i.e., i.e., Y Y “controlled” for the effect of “controlled” for the effect of XX..

XY β−

This is the correct thing to do if This is the correct thing to do if X X has has the the same degree of same degree of nonstationaritynonstationarity as as YY and if and if its effect on its effect on Y Y is is contemporaneous.contemporaneous.

ttt XY εβ +=

Reprise: trendReprise: trend--stationaritystationarity vs. vs. differencedifference--stationaritystationarity

Most naturallyMost naturally--occurring time series in occurring time series in business and economics are business and economics are nonstationarynonstationaryby virtue of being by virtue of being trendedtrended… but some are … but some are moremore nonnon--stationary than others.stationary than others.“Trend“Trend--stationary” means that a series can stationary” means that a series can be be stationarizedstationarized merely by merely by detrendingdetrending..“Difference stationary” means that it can only “Difference stationary” means that it can only be be stationarizedstationarized by differencing (i.e. it is a by differencing (i.e. it is a random walk… more unpredictable).random walk… more unpredictable).There are two corresponding types of There are two corresponding types of regression models for regression models for nonstationarynonstationary data.data.

TrendTrend--stationary modelstationary modelAssumption: Assumption: YY and and XX are are trendtrend--revertingreverting, and , and their deviations from their respective trend their deviations from their respective trend lines are correlated.lines are correlated.

Model: first regress Model: first regress YY on on XX and the time index and the time index by fitting an ARIMA(0,0,0) model with by fitting an ARIMA(0,0,0) model with YY as the as the input variable and input variable and XX and and TimeTime as as regressorsregressors..

Identify #’s of AR and/or MA terms to explain Identify #’s of AR and/or MA terms to explain autocorrelation in residuals, ending up with autocorrelation in residuals, ending up with ARIMA(ARIMA(pp,0,,0,qq) + 2 ) + 2 regressorsregressors..

DifferenceDifference--stationary modelstationary modelAssumption: Assumption: YY and and XX are are random walksrandom walks, , and their respective steps are correlated.and their respective steps are correlated.

Model: first regress DIFF(Model: first regress DIFF(YY) on DIFF() on DIFF(XX) by ) by fitting an ARIMA(0,1,0) model with fitting an ARIMA(0,1,0) model with YY as the as the input variable and input variable and XX as a single as a single regressorregressor..

Identify #’s of AR and/or MA terms to Identify #’s of AR and/or MA terms to explain autocorrelation in residuals, ending explain autocorrelation in residuals, ending up with ARIMA(up with ARIMA(pp,1,,1,qq) + 1 ) + 1 regressorregressor..

Both types of models can be fitted as Both types of models can be fitted as ARIMA+regressorsARIMA+regressors

Example: housing starts vs. mortgage ratesExample: housing starts vs. mortgage rates

HousesSAARMortgageRate

1975 1985 1995 2005500

700

900

1100

1300

1500

1700

6

9

12

15

18

21Housing starts Housing starts and mortgage and mortgage rates appear to rates appear to be negatively be negatively related…related…

Let’s consider Let’s consider models for models for predicting predicting housing starts housing starts from mortgage from mortgage rates lagged by rates lagged by 1 month…1 month…

Plot of HOUSESdetrend vs lag1MORTGAGEdetrend

-5 -3 -1 1 3 5 7

lag1MORTGAGEdetrend

-600

-400

-200

0

200

400

600

HO

US

ES

detre

ndPrelude to a trendPrelude to a trend--stationary model: stationary model: scatterplotscatterplot of of detrendeddetrended variablesvariables

A trendA trend--stationary stationary model looks for a model looks for a linear relationship linear relationship between the between the detrendeddetrended variables variables ((rr = = --.74 here).74 here)

Here is the regression of Here is the regression of HousesSAARHousesSAAR on lag(MortgageRate,1) on lag(MortgageRate,1) and Time . (The time index serves to and Time . (The time index serves to detrenddetrend both variables.) both variables.) The The regressorsregressors are highly significant, but the DW stat is only are highly significant, but the DW stat is only 0.33, and the lag0.33, and the lag--1 residual autocorrelation is 0.83! Residual1 residual autocorrelation is 0.83! Residual--vsvs--time plot looks bad as expected…time plot looks bad as expected…

Instead of adding lagged variables as Instead of adding lagged variables as regressorsregressors(our previous approach to (our previous approach to autocorrelatedautocorrelated errors) errors) let’s turn on the Cochranelet’s turn on the Cochrane--OrcuttOrcutt transformation transformation option, which fits an AR(1) model to the errors.option, which fits an AR(1) model to the errors.

With an estimated AR(1) coefficient of 0.849, the residuals now With an estimated AR(1) coefficient of 0.849, the residuals now look much better, and the standard error has been reduced by look much better, and the standard error has been reduced by nearly 50% (down to 74). The plots of (nearly 50% (down to 74). The plots of (studentizedstudentized) residuals and ) residuals and predictions look MUCH better. Note that the mortgage rate predictions look MUCH better. Note that the mortgage rate coefficient is now around coefficient is now around --53 (presumably more correct).53 (presumably more correct).

What happened to RWhat happened to R--squared?!squared?!Q: Why is RQ: Why is R--squared now only 10%, despite the squared now only 10%, despite the dramatic reduction in the standard error?dramatic reduction in the standard error?

A: In this regression model the dependent variable is A: In this regression model the dependent variable is no longer considered to be no longer considered to be HousesSAARHousesSAAR. Instead it is . Instead it is really really HousesSAARHousesSAAR –– 0.849*lag(HousesSAAR,1).0.849*lag(HousesSAAR,1).

This variable has a much lower variance than This variable has a much lower variance than HousesSAARHousesSAAR, because much of the original variance , because much of the original variance is “explained” merely by the AR(1) transformation.is “explained” merely by the AR(1) transformation.

Because there is now less variance to be explained by Because there is now less variance to be explained by the the regressorsregressors, a 10% R, a 10% R--squared in this model is squared in this model is actually better than 56% in the original model.actually better than 56% in the original model.

Now let’s fit the same model in the Now let’s fit the same model in the Forecasting procedure as an Forecasting procedure as an “ARIMA(1,0,0) model + 2 “ARIMA(1,0,0) model + 2 regressorsregressors””

The estimated AR(1) coefficient is the same The estimated AR(1) coefficient is the same as in the Cochraneas in the Cochrane--OrcuttOrcutt model, and the model, and the regression coefficients are also the same regression coefficients are also the same (apart from minor variations due to slightly (apart from minor variations due to slightly different nonlinear estimation algorithms) different nonlinear estimation algorithms)

Residual ACF and PACF are good, Residual ACF and PACF are good, but not “perfect”. We could consider but not “perfect”. We could consider adding more ARIMA terms, e.g., adding more ARIMA terms, e.g., AR(2) or AR(3)…AR(2) or AR(3)…

Going up to AR(3), we get a Going up to AR(3), we get a slightlyslightly lower RMSE (70 rather than lower RMSE (70 rather than 74) and higher estimated effect of lagged mortgage rates, but at74) and higher estimated effect of lagged mortgage rates, but atthe cost of additional model complexity.the cost of additional model complexity.

Plot of diff(HousesSAAR) vs diff(lag(MortgageRate,1))

diff(lag(MortgageRate,1))

diff(

Hou

sesS

AA

R)

-2.1 -1.1 -0.1 0.9 1.9 2.9-360

-160

40

240

440

Prelude to a differencePrelude to a difference--stationary model: stationary model: scatterplotscatterplot of differenced variablesof differenced variables

A differenceA difference--stationary stationary model looks for a linear model looks for a linear relationship between relationship between differenceddifferenced variables variables ((rr = = --.18 here).18 here)

The regression of the The regression of the differenced variables is a differenced variables is a (0,1,0) model with (0,1,0) model with HousesSAARHousesSAAR as the input as the input and lag(MortgageRate,1) as and lag(MortgageRate,1) as the single the single regressorregressor. One . One order of differencing is order of differencing is applied to both the input applied to both the input variable and the variable and the regressor(sregressor(s) ) when d=1. The time index when d=1. The time index has been dropped because it has been dropped because it becomes irrelevant when a becomes irrelevant when a difference is useddifference is used——the trend the trend is now represented by the is now represented by the mean in the model. mean in the model.

The residual ACF has two The residual ACF has two negative spikes negative spikes ⇒⇒MA(2)MA(2) signaturesignature

After setting MA=2, we have a After setting MA=2, we have a (0,1,2) model + 1 (0,1,2) model + 1 regressorregressor. . The estimated coefficient of The estimated coefficient of lag(MortgageRate,1) is larger lag(MortgageRate,1) is larger than in the trendthan in the trend--stationary stationary model, but it plays a slightly model, but it plays a slightly different role in this model.different role in this model.

The MA(2) term is significant, and the MA The MA(2) term is significant, and the MA coefficients add up to less than 0.5. The meancoefficients add up to less than 0.5. The meanis is notnot significant, and probably the constant significant, and probably the constant term should be removed anyway.term should be removed anyway.

Residuals again look Residuals again look fine….fine….

Here are comparisons of the various Here are comparisons of the various trendtrend--stationary and difference stationary and difference stationary models (recall that stationary models (recall that B = CochraneB = Cochrane--OrcuttOrcutt model)model)

On the basis of these results, it’s hard to On the basis of these results, it’s hard to choose between the (finechoose between the (fine--tuned) trendtuned) trend--stationary and differencestationary and difference--stationary stationary models, although trendmodels, although trend--stationary stationary models tend to work well in practice.models tend to work well in practice.

The The regressorsregressors do not reduce do not reduce the errors by much! (Random the errors by much! (Random walk w/drift yields RMSE=78.) walk w/drift yields RMSE=78.) Most of the work is done by Most of the work is done by the “time series” components the “time series” components of the models.of the models.

Another example: convenience Another example: convenience store sales in a university townstore sales in a university town

VariablesVariablesTOTSALES (total sales at 3 stores)TOTSALES (total sales at 3 stores)USESSION (dummy for university in session)USESSION (dummy for university in session)HOMEGAME (dummy for home football games)HOMEGAME (dummy for home football games)

326 daily values from 1/1/99 to 11/22/99326 daily values from 1/1/99 to 11/22/99Time Series Plot for TOTSALES

0 100 200 300 4000

300

600

900

1200

1500

TOTS

ALE

S

Regression model with 2 Regression model with 2 dummy variables (only) dummy variables (only) yields highly yields highly nonstationarynonstationaryresiduals. The series residuals. The series evidently underwent a “step evidently underwent a “step change” and also has local change” and also has local autocorrelation patterns.autocorrelation patterns.

After adding 1After adding 1stst difference to difference to ARIMA model, slight ARIMA model, slight autocorrelation remains at lag autocorrelation remains at lag 2. This is a random walk + 2. This is a random walk + regressorregressor model.model.

After adding MA(2) factor, After adding MA(2) factor, autocorrelation is eliminated autocorrelation is eliminated and error stats are slightly and error stats are slightly improved. This is essentially a improved. This is essentially a simple exponential smoothing simple exponential smoothing model plus model plus regressorsregressors..

ConclusionsConclusionsARIMA models with ARIMA models with regressorsregressors provide a provide a flexible tool for fitting regression models to flexible tool for fitting regression models to nonstationarynonstationary time series data.time series data.

The same order of differencing and/or AR The same order of differencing and/or AR terms are automatically applied to all terms are automatically applied to all variablesvariables

AR and MA terms allow for fineAR and MA terms allow for fine--tuning to tuning to eliminate residual autocorrelationeliminate residual autocorrelation

Forecasting new technologies and Forecasting new technologies and productsproducts

The problem: how can you forecast when The problem: how can you forecast when you don’t have much or any historical data you don’t have much or any historical data for the variable of interest?for the variable of interest?

The solution: base the forecast on The solution: base the forecast on some some other type(s) of dataother type(s) of data

Methods for new product forecastingMethods for new product forecasting

1. Conduct marketing experiments1. Conduct marketing experiments•• “Intentions” to purchase based on product “Intentions” to purchase based on product

characteristicscharacteristics•• Experimental purchasing behaviorExperimental purchasing behavior•• Test marketsTest markets

Methods for new product forecastingMethods for new product forecasting2. Poll the experts2. Poll the experts

•• Ask 10 (+/Ask 10 (+/--) independent experts for their ) independent experts for their estimates of market size, market penetration, estimates of market size, market penetration, etc., and take the etc., and take the medianmedian


3. Search for “analogous” data3. Search for “analogous” data•• Try to obtain historical data for products with Try to obtain historical data for products with

similar characteristics and/or customer basessimilar characteristics and/or customer bases•• Try to find out what assumptions and/or Try to find out what assumptions and/or

models have been used by other forecasters models have been used by other forecasters in similar applicationsin similar applications


4. 4. Use a Use a diffusiondiffusion--ofof--innovation/life innovation/life cycle/growth curvecycle/growth curve modelmodel•• Models we have discussed so far have Models we have discussed so far have

either assumed either assumed constant growthconstant growth (zero, (zero, linear, or exponential) or else linear, or exponential) or else randomlyrandomly--varyingvarying growthgrowth

•• New products and/or technologies often New products and/or technologies often follow a classic Sfollow a classic S--shaped shaped growth curvegrowth curvecharacterized by linear or exponential early characterized by linear or exponential early growth and subsequent market saturationgrowth and subsequent market saturation

Growth curve modelsGrowth curve modelsThere are many different “SThere are many different “S--curve” or curve” or growthgrowth--curve models, originally curve models, originally popularized in the 1960’s: logistic curves, popularized in the 1960’s: logistic curves, GompertzGompertz curves, exponential curves, etc.curves, exponential curves, etc.

The SThe S--curve model in curve model in StatgraphicsStatgraphics is a is a simple exponential formula: simple exponential formula: exp(exp(aa + + bb//tt) )

The best known growth curve model is the The best known growth curve model is the Bass diffusion modelBass diffusion model (Bass 1969).(Bass 1969).

Bass modelBass model

nntt = = dNdNtt //dtdt = p= p((m m −− NNtt) + ) + qq((NNtt /m/m)()(m m −− NNtt))…where:…where:

nntt = sales/adoption rate at time = sales/adoption rate at time ttNNtt = cumulative sales up to time = cumulative sales up to time ttmm = market potential= market potentialpp = coefficient of “innovation” (a.k.a. external = coefficient of “innovation” (a.k.a. external

influence)influence)qq = coefficient of “imitation” (a.k.a. internal = coefficient of “imitation” (a.k.a. internal

influence)influence)

# potential # potential customers who customers who

have not have not purchased yetpurchased yet

Fraction of Fraction of potential market potential market

already already capturedcaptured

Interpretation of the Bass modelInterpretation of the Bass model

•• pp represents a “massrepresents a “mass--media” effect while media” effect while qqrepresents a “wordrepresents a “word--ofof--mouth” effect.mouth” effect.

•• At time At time tt, mass media influences a fraction , mass media influences a fraction pp of the remaining market to adopt, while of the remaining market to adopt, while word of mouth influences a fraction word of mouth influences a fraction qq((NNtt /m/m) ) of the remaining market to adopt. of the remaining market to adopt.

•• The wordThe word--ofof--mouth effect grows in mouth effect grows in proportion to the number who have already proportion to the number who have already adopted, hence it implies exponential early adopted, hence it implies exponential early growth.growth.

Bass model, continuedBass model, continued

The The inflection point inflection point in the growth curve in the growth curve occurs at time occurs at time

t*t* = = −−(1/((1/(p+qp+q))ln())ln(p/qp/q))

As the market nears saturation, the rate of As the market nears saturation, the rate of new adoption approaches new adoption approaches p+qp+q..

Fitting the Bass model to dataFitting the Bass model to data•• Implied growth curve formula (exact):Implied growth curve formula (exact):

NNtt = = mm(1(1−−exp(exp(−−tt((pp++qq)))/(1+()))/(1+(qq//pp)exp()exp(−−tt((pp++qq))))))

•• Difference equation (approximate): Difference equation (approximate):

nnt+t+11 = = NNt+t+11 −− NNtt = = pm + pm + ((qq −− pp))NNtt −− ((q/mq/m))NNtt22

•• The difference equation can be used to estimate The difference equation can be used to estimate p, p, q,q, and and mm by linear regression, although it is better to by linear regression, although it is better to fit the exact equation by nonlinear least squares fit the exact equation by nonlinear least squares and/or Bayesian methods.and/or Bayesian methods.

•• The catch: to get reliable parameter estimates, it’s The catch: to get reliable parameter estimates, it’s best if you are best if you are already past the inflection pointalready past the inflection point!!

Forecasting from the Bass modelForecasting from the Bass model•• Extrapolation of a curve fitted to a few data points Extrapolation of a curve fitted to a few data points

is always dangerous!is always dangerous!•• It is especially dangerous to try to forecast the It is especially dangerous to try to forecast the

inflection point and/or the market potential from inflection point and/or the market potential from very early growth data.very early growth data.

•• In practice, it’s best to try to estimate In practice, it’s best to try to estimate mm, , pp, and , and qqby independent methods (e.g., survey data by independent methods (e.g., survey data analysis, expert opinion, analogous data)analysis, expert opinion, analogous data)

•• The parameters of choice are often The parameters of choice are often mm, , p+qp+q,, and and first year sales, from which first year sales, from which pp and and qq can be can be backed out.backed out.

Representative values of Representative values of pp and and qq(Sultan et al. 1990)(Sultan et al. 1990)

Innovation Imitation Inflection pointElectronic fuel injectors 0.120 0.402 2.3Hybrid corn 0.039 1.01 3.1Cellular telephones 0.004 1.76 3.54MB DRAM's 0.050 0.365 4.8Record players 0.025 0.65 4.83.5" floppy drives 0.030 0.540 5.1Color TV 0.005 0.84 6.1McDonalds fast food 0.018 0.54 6.1All product averages 0.030 0.38 6.2Steam irons 0.029 0.33 6.8B&W TV 0.028 0.25 7.9Clothes dryers 0.017 0.36 8.1Air conditioners 0.010 0.42 8.7Water softeners 0.018 0.30 8.8Motels 0.007 0.36 10.7Electric blankets 0.006 0.24 15.0

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 5 10 15 20

Hybrid cornCellular telephonesRecord playersColor TVMcDonalds fast foodSteam ironsB&W TVClothes dryersAir conditionersWater softenersMotelsElectric blankets

Years

Cumulative adoption

Extensions of the Bass modelExtensions of the Bass model

•• Repeat sales (additional parameters for Repeat sales (additional parameters for probability that an adopter will become a probability that an adopter will become a “regular customer” and the rate of sales to a “regular customer” and the rate of sales to a regular customer)regular customer)

•• Time varying parameters (market size, Time varying parameters (market size, innovation coefficient, etc.)innovation coefficient, etc.)

•• Market mix variables (Market mix variables (p, q, p, q, and and mm could be could be functions of price, advertising, promotions, etc.)functions of price, advertising, promotions, etc.)

ReferencesReferences

•• ““New Product Diffusion Models For Marketing: A Review New Product Diffusion Models For Marketing: A Review and Directions for Further Research” by and Directions for Further Research” by MahajanMahajan, Fuller, , Fuller, and Bass, and Bass, Journal of Marketing, Journal of Marketing, v. 54, no. 1, January 1990v. 54, no. 1, January 1990

•• “A Meta“A Meta--Analysis of Applications of Diffusion Models” by Analysis of Applications of Diffusion Models” by Sultan, Farley, and Sultan, Farley, and LehmannLehmann, , Journal of Marketing Journal of Marketing ResearchResearch, v. 27, no. 1, Feb. 1990, v. 27, no. 1, Feb. 1990

•• “Reflections on ‘A Meta“Reflections on ‘A Meta--Analysis of Applications of Analysis of Applications of Diffusion Models’” by Sultan, Farley, and Diffusion Models’” by Sultan, Farley, and LehmannLehmann, , Journal Journal of Marketing Researchof Marketing Research, v. 33, no. 2, May 1996, v. 33, no. 2, May 1996

•• “New Product Forecasting” by Jeffrey Morrison, “New Product Forecasting” by Jeffrey Morrison, http://pdma.org/visions/jul99/morrison.htmlhttp://pdma.org/visions/jul99/morrison.html (part 1 of 3)(part 1 of 3)

http://pdma.org/visions/jul99/morrison.html

Review Of Everything We’ve CoveredReview Of Everything We’ve Covered

…… or “what to use, and when”or “what to use, and when”

DATA TRANSFORMATIONS DATA TRANSFORMATIONS 1. Deflation by CPI or another price index1. Deflation by CPI or another price index•• Properties:Properties:

Converts data from nominal dollars (or other currency) to constaConverts data from nominal dollars (or other currency) to constant nt dollars; usually helps to stabilize variancedollars; usually helps to stabilize variance

•• When to use:When to use:When data are measured in nominal dollars (or other currency) anWhen data are measured in nominal dollars (or other currency) and d you want to explicitly show the effect of inflationyou want to explicitly show the effect of inflation----i.e., uncover “real i.e., uncover “real growth”growth”

•• Points to keep in mind:Points to keep in mind:To generate a true forecast for the future in nominal terms, youTo generate a true forecast for the future in nominal terms, you will will need to make an explicit forecast of the future value of the prineed to make an explicit forecast of the future value of the price ce indexindex----i.e., you will need to forecast the inflation rate (but this is i.e., you will need to forecast the inflation rate (but this is easy if you're in a period of steady inflation)easy if you're in a period of steady inflation)

2. Natural Logarithm2. Natural Logarithm•• PropertiesProperties::

Converts multiplicative patterns to additive patterns and/or Converts multiplicative patterns to additive patterns and/or linearizeslinearizesexponential growth; differences of logged data are exponential growth; differences of logged data are percentage percentage differences; often stabilizes the variance of data with compounddifferences; often stabilizes the variance of data with compoundgrowth, regardless of whether deflation is also used; when the growth, regardless of whether deflation is also used; when the dependent variable is logged, the model attempts to minimize squdependent variable is logged, the model attempts to minimize squared ared percentage percentage error rather than squared error in original unitserror rather than squared error in original units

•• When to use:When to use:When compound growth is not due to inflation (e.g. when data is When compound growth is not due to inflation (e.g. when data is not not measured in currency); when you do not need to separate inflatiomeasured in currency); when you do not need to separate inflation n from real growth; when data distribution is positive and highly from real growth; when data distribution is positive and highly skewed skewed (e.g., exponential or log(e.g., exponential or log--normal distribution); when variables are normal distribution); when variables are multiplicatively related; when you want to estimate “multiplicatively related; when you want to estimate “elasticitieselasticities” .” .

•• Points to keep in mind:Points to keep in mind:Logging is not the same as deflating: it Logging is not the same as deflating: it linearizeslinearizes growth but does not growth but does not remove a general upward trend; if logged data still have a consiremove a general upward trend; if logged data still have a consistent stent upward trend, then you should use a model that includes a trend upward trend, then you should use a model that includes a trend factorfactor(e.g., random walk with drift, ARIMA, linear exponential smoothi(e.g., random walk with drift, ARIMA, linear exponential smoothing).ng).

3. First difference3. First difference•• Properties:Properties:

Converts “levels” to “changes” Converts “levels” to “changes”

•• When to use:When to use:When you need to When you need to stationarizestationarize a series with a strong trend a series with a strong trend and/or randomand/or random--walk behavior (often useful when fitting walk behavior (often useful when fitting regression models to time series dataregression models to time series data——but not requiredbut not required) )

•• Points to keep in mind:Points to keep in mind:Differencing is an explicit option in ARIMA modeling and it is Differencing is an explicit option in ARIMA modeling and it is implicitly a part of random walk and exponential smoothing implicitly a part of random walk and exponential smoothing models; therefore you would not manually difference the input models; therefore you would not manually difference the input variable (using the DIFF function) when specifying model type variable (using the DIFF function) when specifying model type as “random walk” or “exponential smoothing” or “ARIMA” as “random walk” or “exponential smoothing” or “ARIMA” First difference of LOG(Y) is approximately the percentage First difference of LOG(Y) is approximately the percentage change in Y for changes on the order of less than 10%.change in Y for changes on the order of less than 10%.

4. Seasonal difference4. Seasonal difference•• Properties:Properties:

Converts “levels” to “seasonal changes” Converts “levels” to “seasonal changes”

•• When to use:When to use:When you need to remove the gross features of When you need to remove the gross features of seasonality from a strongly seasonal series without seasonality from a strongly seasonal series without going to the trouble of estimating seasonal indices going to the trouble of estimating seasonal indices

•• Points to keep in mind:Points to keep in mind:Seasonal differencing is an explicit option in ARIMA Seasonal differencing is an explicit option in ARIMA modeling; you MUST include a seasonal difference modeling; you MUST include a seasonal difference (as a modeling option, not an SDIFF transformation (as a modeling option, not an SDIFF transformation of the input variable) if the seasonal pattern is of the input variable) if the seasonal pattern is consistent and you wish it to be maintained in longconsistent and you wish it to be maintained in long--term forecaststerm forecasts

5. Seasonal adjustment5. Seasonal adjustment•• Properties:Properties:

Removes a constant seasonal pattern from a series (either Removes a constant seasonal pattern from a series (either multiplicative or additive) multiplicative or additive) Fancier versions (Census XFancier versions (Census X--12) allow time12) allow time--varying indicesvarying indices

•• When to use:When to use:When you wish to separate out the seasonal component of When you wish to separate out the seasonal component of a series and then fit what's left with a a series and then fit what's left with a nonseasonalnonseasonal model model (regression, smoothing, or trend line); normally use the (regression, smoothing, or trend line); normally use the multiplicative version unless data has been logged multiplicative version unless data has been logged

•• Points to keep in mind:Points to keep in mind:Adds a lot of parameters to the modelAdds a lot of parameters to the model——one for each one for each season of the year. (In season of the year. (In StatgraphicsStatgraphics, the seasonal indices , the seasonal indices are not explicitly shown in the output of the Forecasting are not explicitly shown in the output of the Forecasting procedureprocedure----you must separately run the Seasonal you must separately run the Seasonal Decomposition procedure to display the seasonal indices.)Decomposition procedure to display the seasonal indices.)

FORECASTING MODELSFORECASTING MODELS1. Random walk1. Random walk•• Properties:Properties:

Predicts that “next period equals this period” (perhaps plus a Predicts that “next period equals this period” (perhaps plus a constant); a.k.a. ARIMA(0,1,0) model constant); a.k.a. ARIMA(0,1,0) model

•• When to use:When to use:As a baseline against which to compare more elaborate As a baseline against which to compare more elaborate models; when applied to logged data, it is a “geometric” models; when applied to logged data, it is a “geometric” random walkrandom walk——the default model for stock market data the default model for stock market data

•• Points to keep in mind:Points to keep in mind:Plot of forecasts looks exactly like a plot of the data, except Plot of forecasts looks exactly like a plot of the data, except lagged by one period (and shifted slightly up or down if a driftlagged by one period (and shifted slightly up or down if a driftterm is included); long term forecasts follow a straight line term is included); long term forecasts follow a straight line (horizontal if no growth term is included); confidence intervals(horizontal if no growth term is included); confidence intervalsfor longfor long--term forecasts widen according to a squareterm forecasts widen according to a square--root law root law (sideways(sideways--parabola shape); logically equivalent to MEAN parabola shape); logically equivalent to MEAN model fitted to DIFF(Y)model fitted to DIFF(Y)

2. Linear trend2. Linear trend•• Properties:Properties:

Regression of Y on the time index Regression of Y on the time index

•• When to use:When to use:Rarely the best model for forecastingRarely the best model for forecasting——use only when you have use only when you have few data points, lots of “noise”, and no obvious pattern in datafew data points, lots of “noise”, and no obvious pattern in dataother than a trend; can be used in conjunction with seasonal other than a trend; can be used in conjunction with seasonal adjustmentadjustment——but if you have enough data to seasonally adjust, but if you have enough data to seasonally adjust, you probably should use another model!you probably should use another model!

•• Points to keep in mind:Points to keep in mind:Forecasts follow a straight line whose slope equals the average Forecasts follow a straight line whose slope equals the average trend over the whole estimation period but whose intercept is trend over the whole estimation period but whose intercept is anchored somewhere in the past; shortanchored somewhere in the past; short--term forecasts therefore term forecasts therefore may miss badly and confidence intervals for longmay miss badly and confidence intervals for long--term forecasts term forecasts are usually not reliable; very sensitive to amount of past data are usually not reliable; very sensitive to amount of past data used to fit the trend line; other models that extrapolate a lineused to fit the trend line; other models that extrapolate a linear ar trend into the future (random walk with drift, linear exponentiatrend into the future (random walk with drift, linear exponential l smoothing, ARIMA models with 1 difference w/constant or 2 smoothing, ARIMA models with 1 difference w/constant or 2 differences w/o constant) often do a better job by “differences w/o constant) often do a better job by “reanchoringreanchoring” ” the trend line on recent datathe trend line on recent data

3. Simple moving average3. Simple moving average•• Properties:Properties:

Simple (equally weighted) average of recent data; Simple (equally weighted) average of recent data; “average age” of data in forecast (amount by which “average age” of data in forecast (amount by which forecasts lag behind turning points) is (forecasts lag behind turning points) is (kk+1)/2 for a +1)/2 for a kk--term moving averageterm moving average

•• When to use:When to use:When data is in short supply and/or highly irregular When data is in short supply and/or highly irregular

•• Points to keep in mind:Points to keep in mind:Primitive but relatively robust against outliers and Primitive but relatively robust against outliers and messy data; longmessy data; long--term forecasts are a horizontal line term forecasts are a horizontal line extrapolated from the most recent averageextrapolated from the most recent averageA longA long--term trend can be incorporated via fixedterm trend can be incorporated via fixed--rate rate deflation at an assumed interest ratedeflation at an assumed interest rateNo theoretical basis for confidence limits for forecasts No theoretical basis for confidence limits for forecasts more than 1 period ahead. SG just shows constantmore than 1 period ahead. SG just shows constant--width limits based on an assumption that the datawidth limits based on an assumption that the data--generating process is stationary (no trend, etc.) generating process is stationary (no trend, etc.)

4. Simple exponential smoothing4. Simple exponential smoothing•• Properties:Properties:

Exponentially weighted moving averageExponentially weighted moving average of recent data; of recent data; “average age” of data in forecast (amount by which forecasts “average age” of data in forecast (amount by which forecasts lag behind turning points) is 1/alpha; same as an ARIMA(0,1,1) lag behind turning points) is 1/alpha; same as an ARIMA(0,1,1) model without constant model without constant

•• When to use:When to use:When data are When data are nonseasonalnonseasonal (or (or deseasonalizeddeseasonalized) and display a ) and display a timetime--varying mean without a consistent trend; when many varying mean without a consistent trend; when many series must be forecast in parallelseries must be forecast in parallel

•• Points to keep in mind:Points to keep in mind:LongLong--term forecasts are a horizontal line extrapolated from the term forecasts are a horizontal line extrapolated from the most recent smoothed value; same as a random walk model most recent smoothed value; same as a random walk model without drift if alpha=0.9999; forecasts get smoother and slowerwithout drift if alpha=0.9999; forecasts get smoother and slowerto respond to turning points as alpha approaches zero; to respond to turning points as alpha approaches zero; confidence intervals widen less rapidly than in the random walk confidence intervals widen less rapidly than in the random walk model; a longmodel; a long--term trend can be incorporated via fixedterm trend can be incorporated via fixed--rate rate deflation at an assumed interest rate or by fitting an deflation at an assumed interest rate or by fitting an ARIMA(0,1,1) model with constant.ARIMA(0,1,1) model with constant.

5. Linear exponential smoothing5. Linear exponential smoothing•• Properties:Properties:

Assumes a Assumes a timetime--varying linear trendvarying linear trend as well as a timeas well as a time--varying varying level (Brown's uses 1 parameter, Holt's uses separate smoothing level (Brown's uses 1 parameter, Holt's uses separate smoothing parameters for level and trend); essentially an ARIMA(0,2,2) parameters for level and trend); essentially an ARIMA(0,2,2) model without constant; “damped trend” versions also availablemodel without constant; “damped trend” versions also available

•• When to use:When to use:When data are When data are nonseasonalnonseasonal (or (or deseasonalizeddeseasonalized) and display ) and display timetime--varying local trends (usually applicable to data that are varying local trends (usually applicable to data that are “smoother” in appearance“smoother” in appearance——i.e., less noisyi.e., less noisy——than what would be than what would be well fitted by simple exponential smoothing) well fitted by simple exponential smoothing)

•• Points to keep in mind:Points to keep in mind:LongLong--term forecasts follow a straight line whose slope is the term forecasts follow a straight line whose slope is the estimated local trend at the end of the series; confidence estimated local trend at the end of the series; confidence intervals for longintervals for long--term forecasts widen rapidlyterm forecasts widen rapidly——the model the model assumes that the future is VERY uncertain because of timeassumes that the future is VERY uncertain because of time--varying trends; often does not outperform simple exponential varying trends; often does not outperform simple exponential smoothing, even for data with trends, because extrapolation of smoothing, even for data with trends, because extrapolation of timetime--varying trends is riskyvarying trends is risky

6. Winter's seasonal smoothing6. Winter's seasonal smoothing•• Properties:Properties:

Assumes timeAssumes time--varying level, trend, and seasonal indices varying level, trend, and seasonal indices (either multiplicative or additive seasonality) (either multiplicative or additive seasonality)

•• When to use:When to use:When data are trended and seasonal and you wish to When data are trended and seasonal and you wish to decompose it into local level/trend/seasonal factors; normally decompose it into local level/trend/seasonal factors; normally you use the multiplicative version unless data is logged you use the multiplicative version unless data is logged

•• Points to keep in mind:Points to keep in mind:Initialization of seasonal indices and joint estimation of threeInitialization of seasonal indices and joint estimation of threesmoothing parameters are sometimes trickysmoothing parameters are sometimes tricky——watch to see watch to see that parameter estimates converge and that forecasts and that parameter estimates converge and that forecasts and confidence intervals look reasonable.confidence intervals look reasonable.A popular choice for “automatic” forecasting because it does A popular choice for “automatic” forecasting because it does a little of everything, but has a lot of parameters and a little of everything, but has a lot of parameters and sometimes sometimes overfitsoverfits the data or is unstablethe data or is unstable

7. Multiple regression7. Multiple regression•• Properties:Properties:

A general linear forecasting equation involving several variableA general linear forecasting equation involving several variables s

•• When to use:When to use:When data are correlated with other explanatory or causal When data are correlated with other explanatory or causal variables (e.g., price, advertising, promotions, interest rates,variables (e.g., price, advertising, promotions, interest rates,indicators of general economic activity, etc.).indicators of general economic activity, etc.).When your objective is perhaps not only to forecast, but to When your objective is perhaps not only to forecast, but to measure the measure the impact impact of various factors on the dependent variable of various factors on the dependent variable for purposes of decision making or hypothesis testing (e.g., forfor purposes of decision making or hypothesis testing (e.g., fordetermining optimal values of control variables or doing bangdetermining optimal values of control variables or doing bang--forfor--buck comparisons)buck comparisons)The key is to choose the The key is to choose the rightright variables and the right variables and the right transformationstransformations of those variables to justify the assumption of a of those variables to justify the assumption of a linear model. Useful transformations may include logging, linear model. Useful transformations may include logging, deflating, lagging, differencing, seasonal adjustment, taking deflating, lagging, differencing, seasonal adjustment, taking reciprocals or ratios, etc., but reciprocals or ratios, etc., but don’t try them blindlydon’t try them blindly. . Transformations should have an intuitive explanation and/or be Transformations should have an intuitive explanation and/or be strongly suggested by patterns in the data.strongly suggested by patterns in the data.

Regression, continued:Regression, continued:•• Points to keep in mind:Points to keep in mind:

Forecasts cannot be extrapolated into the future unless and untiForecasts cannot be extrapolated into the future unless and until l values are available for the independent variables; for this reavalues are available for the independent variables; for this reason son the independent variables must often be the independent variables must often be lagged lagged by one or more by one or more periods; but when only lagged variables are used, a regression periods; but when only lagged variables are used, a regression model may fail to outperform a time series model which relies onmodel may fail to outperform a time series model which relies only ly on the history of the dependent variableon the history of the dependent variableRR--squared is not the bottom line. Regressions of squared is not the bottom line. Regressions of nonstationarynonstationaryvariables often have high “Rvariables often have high “R--squared” but this does not necessarily squared” but this does not necessarily indicate a good model!indicate a good model!The standard error of the regression (RMSE error) is the best siThe standard error of the regression (RMSE error) is the best single ngle stat to focus on, although it can only be trusted if the model pstat to focus on, although it can only be trusted if the model passes asses the various diagnostic tests of its assumptions.the various diagnostic tests of its assumptions.Beware of overBeware of over--fitting the data by including too many fitting the data by including too many regressorsregressors for for the amount of data at hand. (Think: how many data points per the amount of data at hand. (Think: how many data points per coefficient?) As a reality check, it is good practice to “valicoefficient?) As a reality check, it is good practice to “validate” the date” the model by testing it on holdmodel by testing it on hold--out data and by comparing its out data and by comparing its performance to a random walk model or other time series modelperformance to a random walk model or other time series model

Regression, continued:Regression, continued:When fitting regression models to time series data, it often When fitting regression models to time series data, it often helps to include helps to include lags of the dependent and independent lags of the dependent and independent variablesvariables as additional as additional regressorsregressors, and/or to , and/or to stationarizestationarize the the dependent variable. (Suggestion: try 1 or 2 lags first, don’t dependent variable. (Suggestion: try 1 or 2 lags first, don’t rush to difference the variables. Beware of “rush to difference the variables. Beware of “overdifferencingoverdifferencing” ” or including highor including high--order lags (>2) without good reason.)order lags (>2) without good reason.)

Including a time index variable as a Including a time index variable as a regressorregressor is equivalent to is equivalent to dede--trendingtrending all the variables before fitting the model, i.e., a all the variables before fitting the model, i.e., a trendtrend--stationary model.stationary model.

Corrections for Corrections for autocorrelatedautocorrelated errors (Cochraneerrors (Cochrane--OrcuttOrcutt or or ARIMA+regressorsARIMA+regressors) are other options for time series models.) are other options for time series models.

“Automatic” model selection techniques such as stepwise “Automatic” model selection techniques such as stepwise regression and allregression and all--possiblepossible--regressions are available, but regressions are available, but beware of beware of overfittingoverfitting: pre: pre--screen the variables for relevance screen the variables for relevance and rank models on the basis of an error measure that and rank models on the basis of an error measure that penalizes complexity (e.g., Cpenalizes complexity (e.g., Cpp or BIC). Also, remember YOU, or BIC). Also, remember YOU, not the computer, are ultimately responsible for the model!not the computer, are ultimately responsible for the model!

8. ARIMA8. ARIMA•• Properties:Properties:

A general class of models that includes random walk, random A general class of models that includes random walk, random trend, seasonal and nontrend, seasonal and non--seasonal exponential smoothing, and seasonal exponential smoothing, and autoauto--regressive modelsregressive models

Forecasts for the Forecasts for the stationarizedstationarized dependent variable are a linear dependent variable are a linear function of lags of the dependent variable and/or lags of the function of lags of the dependent variable and/or lags of the errors errors

•• When to use:When to use:When data are relatively plentiful (4 seasons or more) and can When data are relatively plentiful (4 seasons or more) and can be satisfactorily be satisfactorily stationarizedstationarized by differencing and other by differencing and other mathematical transformationsmathematical transformations

When it is not necessary to explicitly separate out the seasonalWhen it is not necessary to explicitly separate out the seasonalcomponent (if any) in the form of seasonal indices component (if any) in the form of seasonal indices

ARIMA, continuedARIMA, continued•• Points to keep in mind:Points to keep in mind:

ARIMA models are designed to explain all autocorrelation in the ARIMA models are designed to explain all autocorrelation in the original time series; a systematic procedure exists for identifyoriginal time series; a systematic procedure exists for identifying ing the best ARIMA model for any given time series; features of the best ARIMA model for any given time series; features of ARIMA and multiple regression models can be combined in a ARIMA and multiple regression models can be combined in a natural way; ARIMA models often provide a good fit to highly natural way; ARIMA models often provide a good fit to highly aggregated, wellaggregated, well--behaved data; they may perform relatively less behaved data; they may perform relatively less well on disaggregated, irregular, and/or sparse data well on disaggregated, irregular, and/or sparse data RegressorsRegressors can be added to ARIMA models. The resulting can be added to ARIMA models. The resulting model is really a regression model with an ARIMA error process model is really a regression model with an ARIMA error process instead of independent errors. This is often useful as a proxy instead of independent errors. This is often useful as a proxy for for the effects of other, the effects of other, unmodeledunmodeled variables.variables.The simplest The simplest ARIMA+regressorARIMA+regressor model is the ARIMA(1,0,0) model model is the ARIMA(1,0,0) model plus plus regressorsregressors, also known as a regression with AR(1) errors. , also known as a regression with AR(1) errors. This model can also be fitted (in theory!) using the CochraneThis model can also be fitted (in theory!) using the Cochrane--OrcuttOrcutt transformation option in regression.transformation option in regression.

Reporting of forecastsReporting of forecastsForecasts usually should be accompanied Forecasts usually should be accompanied by by plots plots (showing how the forecasts extend (showing how the forecasts extend from the recent data) and credible from the recent data) and credible confidence intervalsconfidence intervalsConfidence intervals need not always be Confidence intervals need not always be 95% (95% (±± 2 standard errors)2 standard errors): sometimes a : sometimes a 50% interval (50% interval (±± 2/3 standard error) 2/3 standard error) or 80% or 80% interval (interval (±± 1.3 standard error) will be more 1.3 standard error) will be more meaningful for decision making.meaningful for decision making.

A final wordA final wordRemember that all models are based on Remember that all models are based on assumptions assumptions about the patterns in the data: about the patterns in the data: how much past data is relevant, what is the how much past data is relevant, what is the nature of the trend or volatility, what are the nature of the trend or volatility, what are the key “drivers” etc.key “drivers” etc.What assumptions are What assumptions are you you comfortable with, comfortable with, based on your knowledge of the data and the based on your knowledge of the data and the results of your analysis? What assumptions results of your analysis? What assumptions would make sense to your boss or client?would make sense to your boss or client?

decision 411: class 11 - duke university

Documents