sta2604_2012_-_studyguide_-001_2012_4_b

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

1/127

STA2604/1

Department of Statistics

STA2604

Forecasting

Study guide for STA2604

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

2/127

i STA2604/1

Table of contents

UNIT 1: An Introduction to Forecasting

1.1 Introduction 1

1.1.1 Forecasting 2

1.1.2 Data 4

1.1.3 Components of a time series 10

1.1.4 Applications of forecasting 14

1.2 Forecasting methods 15

1.2.1 Qualitative methods 15

1.2.2 Quantitative methods 16

1.3 Errors in forecasting and forecast accuracy 18

1.3.1 Absolute deviation 22

1.3.2 Mean absolute deviation 22

1.3.3 Squared error 23

1.3.4 Mean squared error 23

1.3.5 Absolute percentage error (APE) 24

1.3.6 Mean absolute percentage error (MAPE) 25

1.3.7 Forecasting accuracy 25

1.4 Choosing a forecasting technique 26

1.4.1 Factors to consider 26

1.4.2 Strike the balance 28

1.5 An overview of quantitative forecasting techniques 29

1.6 Conclusion 30

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

3/127

ii

UNIT 2: Model Building and Residual Analysis

2.1 Introduction 31

2.2 Multicollinearity 33

2.2.1 Clarification of multicollinearity 33

2.2.2 The variation inflation factor (VIF) 34

2.2.3 Comparing regression models 38

2.3 Basic residual analysis 41

2.3.1 Residual plots 422.3.2 Constant variation assumption 43

2.3.3 Correct functional form assumption 45

2.3.4 Normality assumption 45

2.3.5 Independence assumption 47

2.3.6 Remedy for violations of assumptions 47

2.4 Outliers and influential observations 48

2.4.1 Leverage values 49

2.4.2 Residuals 50

2.4.3 Studentised residuals 52

2.4.3.1 Deleted residuals 53

2.4.4 Cook’s distance 54

2.4.5 Dealing with outliers and influential observations 54

2.5 Conclusion 54

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

4/127

iii STA2604/1

UNIT 3: Time Series Regression

3.1 Introduction 56

3.2 Modeling trend by using polynomial functions 57

3.2.1 No trend 58

3.2.2 Linear trend 58

3.2.3 Quadratic and higher order polynomial trend 59

3.3 Detecting autocorrelation 64

3.3.1 Residual plot inspection 643.3.2 First-order autocorrelation 66

3.3.2.1 Durbin-Watson test for positive autocorrelation 67

3.3.2.2 Durbin-Watson test for negative autocorrelation 69

3.3.2.3 Durbin-Watson test for autocorrelation 70

3.4 Seasonal variation types 71

3.4.1 Constant and increasing seasonal variation 75

3.5 Use of dummy variables and trigonometric function 76

3.5.1 Time series with constant seasonal variation 76

3.5.2 Use of dummy variables 77

3.5.3 High season and low season 78

3.5.4 Use of trigonometric on a model with a linear trend 82

3.6 Growth curve models 83

3.7 AR(1) and AR(p) 84

3.8 Use of trend and seasonality and forecast development 84

3.9 Conclusion 85

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

5/127

iv

UNIT 4: Decomposition of a Time Series

4.1 Introduction 86

4.2 Multiplicative decomposition 87

4.2.1 Trend analysis 87

4.2.2 Seasonal analysis 89

4.2.3 Analysis of random variations in a time series 91

4.2.4 Obtaining a forecast 91

4.3 Additive decomposition 94

4.5 Conclusion 95

UNIT 5: Exponential Smoothing

5.1 Introduction 96

5.2 Simple exponential smoothing 97

5.3 Tracking signals 101

5.4 Holt’s trend corrected exponential smoothing 103

5.5 Holt-Winters methods 105

5.5.1 Additive Holt-Winters method 105

5.5.2 Multiplicative Holt-Winters method 108

5.6 Damped trend exponential 109

5.7 Conclusion 110

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

6/127

v STA2604/1

ABOUT THIS MODULEPrologue

Forecasting is the process of making statements about events whose actual outcomes (typically)

have not yet been observed. A commonplace example might be estimation of the expected value for

some variable of interest at some specified future date. Prediction is similar, but more general term.

Both might refer to formal statistical methods employing time series, cross sectional or longitudinal

data, or alternatively to less formal judgemental methods. More will be seen at various parts of the

presentation of the module.

The module is about Forecasting, which deals with the methods used to predict the future, i.e. to

forecast. Can you think of a situation where predictions of the future are needed or cases where

forecasting is done? By its nature it is a quantitative method that uses numeric data. There arevarious forecasting methods, some of them being qualitative because they are based on non-numeric

data. Even though qualitative methods feature in some of our discussions, they are not dealt with in

depth in this module.

This module presents fundamental aspects of Time Series analysis used in forecasting. The

prescribed textbook for this module is Bowerman, O’Connell and Koehler (2005). We will not study

all the chapters in the book for this module, but will focus on Chapters 1, 5, 6, 7 and 8.

The module is done in one semester. Make sure that you are registered for the right semester and

the material you receive is the correct one.

About the book

The prescribed book is reader-friendly and contains limited mathematical theory. It is geared towards

the practice of forecasting. The authors are experienced practitioners in the field of time series. The

book will assist you in understanding concepts and methodology, and in applying these in practice

(i.e. in real-life situations).

The computer and the calculator

We recommend that you acquire a non-programmable scientific calculator of your own. It is

imperative to have your own calculator in the examination. It is important, although not compulsory,

to have access to a computer in order to undertake the tasks in this module. You may visit a Regional

Centre to use a computer. The text contains output from Excel, MINITAB, JMP IN and SAS. However,

we encourage the use of any software to which you may have access. The above list of computer

software/packages may be used, as well as R, SPSS, Stata, S-Plus and EViews. Your ability to usesuch software will increase your marketability in the workplace. You are encouraged to experiment

with the packages at your disposal.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

7/127

vi

REFERENCESThe prescribed book must be purchased. Refer to the study guide regularly. We shall also refer to a

number of user-friendly textbooks on Time Series that are available in the Unisa library. You do not

need to buy the recommended books for this module.

PRESCRIBED BOOKBowerman, B. L., O’Connell, R. T. & Koehler, A. B. (2005) Forecasting, time series and regression:

an applied approach, 4th edition. Singapore: Thomson Brooks/Cole.

ADDITIONAL USEFUL BOOKS FOR THIS MODULECrosby, J. V. (2000). Cycles, trends, and turning points: practical marketing and sales forecasting

techniques. Lincolnwood, IL: NTC Business Books.

Chapter 4 of this book deals speci fi cally with Time Series, while chapters 1, 2, 3, 7, 10 and 20 deal

with other topics that are very relevant in this module. The remaining chapters illustrate applications

that may expose you even more to time series. It is useful.

Curwin, J. & Slater, R. (2002). Quantitative methods for business decisions (Chapter 14). London:

Thomson Learning.

This book also presents measures that we use in statistics and in time series applications. It can be

used for other modules as well. Find time to read it.

Dexter, B. (1996). Business mathematics (Chapter 15). London: Macdonald and Evans.

Only chapter 15 presents Time Series, and in not more than 12 pages. “Production planning and

forecasting” are presented in Chapter 4 of this book to expose you to real-life applications. I seriously

advise you to look at these two chapters.

Hair, J. R., Anderson, R. E., Tatham, R. L. & Black, W. C. (1998). Multivariate data analysis, 5th

edition. Prentice-Hall, Inc.

Appendix 4A of this book presents some distance measures that are useful in this module. Cook’s

distance is presented on pages 225 and 234 of this appendix. You are urged to read them. This

book is very useful in exposing various applications of multivariate statistics. Read and enjoy it.

Kendall, M. G. (1990). Time series, 3rd edition. London: Edward Arnold.

Simply the best! Kendall exposes us to time series. His is one of the greatest names remembered

when Time Series are mentioned. Even his previous editions still present good information about the

topic. Why not cash in on time series from the horse’s mouth!

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

8/127

vii STA2604/1

THE PRESENTATION OF THE MODULEThis study guide summarises the five prescribed chapters of the textbook.

Prior knowledge

It is important that you are familiar with a section before moving to the next one. This will serve

as a foundation for the forthcoming work. Leaving out work without understanding it can only add

to the accumulation of problems during the examination. This is also true about the prerequisites

from first-year statistics and the knowledge you have acquired through the years. Sensible or smart

application is based on the use of the accumulated techniques, experiences and knowledge. Plotting

of graphs, fitting a linear model, and so on, are needed in some places. You are urged, therefore,

to incorporate all the useful techniques in the solutions to exercises. We advise you to revisit these

topics in your first-year module.

It is necessary to realise that numbers alone do not provide all the answers. It should be clear to

you that aspects of a qualitative nature add value to the predictions made so that the data context is

clear.

This study guide

In this study guide we attempt to present explanations of the concepts in the textbook. It contains

easy examples as well as activities for you to practise. You are encouraged to do the activities

in order to learn effectively. Reading of feedback alone leaves gaps in your learning. There arediscussions following the activities so that the feedback is immediate. Do not just read through them;

try to explore them by testing that you can do them as well, even if you use alternative methods.

The exercises selected for assignments are important in reinforcing what you need to understand in

this module. Take time to understand the aspects that go with them. Analyse the postulates in the

given statements and thereafter the requirements so that it becomes easy to recall what is necessary

in compiling a solution. In that way you do not only solve the problem, you understand it and enjoy

solving it. At the end of the semester there is a two-hour closed-book examination. The discussions

in the study guide and the textbook prepare you for that examination.

This study guide is prepared to guide you through the prescribed book. Therefore, we will always

use it together with the prescribed book. Read them together. The textbook presents the concepts,

study guide attempts to bring the concepts closer to you.

Each study unit starts with the outcomes in order to show you what you need to know and to evaluate

yourself. The table of outcomes also gives each outcome together with the way the outcome will

be assessed, the content needed for that outcome, the activities that will be used to support the

understanding of the content and the way feedback will be given. Your input in the form of positivecriticism to improve the presentation will be of importance in the review of this study guide. You are

therefore encouraged to suggest ways that you believe can improve the presentation of this module.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

9/127

viii

Module position in the curriculum

We have been offering a postgraduate module on Time Series at Unisa, but have become aware of

the need to introduce the module at undergraduate level due to its necessity in the workplace and in

order to fill the gap that is evident when students attempt the postgraduate time series module.

This module is part of the whole Statistics curriculum at Unisa. Its position on the curriculum structure

is as follows:

1st year STA1501 STA1502 STA1503

2nd year STA2601 STA2602 STA2603

STA2604FORECASTING

We are hereSTA2610

3rd year STA3701 STA3702 STA3703 STA3704 STA3705 STA3710

You should already be familiar with some of the modules mentioned above. Knowledge from

STA2604 will help you in STA3704 (Forecasting III).

ASSIGNMENTSThere are two assignments for this module, which are intended to help you learn through various

activities. They also serve as tests to prepare you for the examination. As you do the assignments,

study the reading texts, consult other resources, discuss the work with fellow students or tutors or

do research, you are actively engaged in learning. Looking at the assessment criteria given for

each assignment will help you to understand what is required of you more. The two assignments

per semester prescribed for this module form part of the learning process. The typical assignment

question is a reflection of a typical examination question. There are fixed submission dates for the

assignments and each assignment is based on specific chapters (or sections) in the prescribed book.

You have to adhere to these dates as assignments are only marked if they are received on or before

the due dates.

• Both assignments are compulsory as

• they are the sole contributors towards your year mark and

• they form an integrated part of the learning process and indicate the form and nature of the

questions you can expect in the examination.

Please note that the submission of assignment 01 is the guarantee for examination entire . If you

do not submit assignment 01, UNISA not the Department of Statistics will deny you examination

entry.

You are urged to communicate with your lecturer(s) whenever you encounter dif ficulties in this

module. Do not wait until the assignment due date or the examination to make contact with lecturers.It is helpful to be ready long in advance. You are also encouraged to work with your own peers,

colleagues, friends, etc. Details about the assignments will be given Tutorial letter 101.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

10/127

ix STA2604/1

Time series has its own useful terminology that should be understood. In order to familiarise yourself

with it, let us start with an easy activity. Activities help in the creation of a mind map of the module.

The more you attempt these activities, the better you will understand the work.

GLOSSARY OF TERMSACTIVITY 0.1

(a) Make a list of all the concepts that are printed in bold type in Chapters 1, 5, 6, 7 and 8 of the

prescribed book. They serve as your glossary.

(b) Attempt meanings of these concepts before you deal with the various sections so that you have

an idea before we get there.

DISCUSSION OF ACTIVITY 0.1

(a) There is a missing concept/term among the ones you listed, which is absolutely fundamental. It

appears with other terms or phrases. The term is “data”. You came across the term many times

when you studied other modules and in some other contexts. It is emphasised that it is a useful

aspect in forecasting. If you do not have data, you will not be able to make forecasts.

(b) Do not worry if the meanings you gave do not match the content in the tutorial letter or textbook.

The intention was to make you aware of aspects on which to focus in your learning.What isrequired from you is a step-by-step journey through the prescribed material.

ACTIVITY 0.2

What is the meaning of the word data?


There is a general misconception that data and information are the same concepts. This is not

necessarily the case. Data are records of occurrences from which we obtain information. It is not

necessarily information on its own, but may sometimes be information. The truth is, data possess

information that is seen after some analysis. They are often the raw answers we receive from an

investigation.

WHAT TO EXPECT IN THE MODULEIn this module we use a scientific calculator to perform calculations. We will also draw graphs, form

mathematical models (equations) that are used to develop forecasts and make decisions based on

time series data. Most of these aspects stated were taught at first-year level. The new topic is thepattern of time series data. The way time series data appear is unique because without this form

they cannot qualify to be time series data.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

11/127

x

PREREQUISITES

• The ability to use a scientific calculator.

• Access to a computer package and the ability to use it are highly recommended.

• First-year statistics. These topics appear below and there will be a quick reminder whenever we

need them. We will need

- Simple linear regression

- Correlation measures

- Polynomials

- Graph plotting

When you draw plots required for statistical analysis, these plots should be accurate. Hence, use

a ruler and a lead pencil (not a pen) to construct plots. If you have access to a computer, you are

also encouraged to practise using any statistical package of your choice. Assignments may also be

prepared by means of a computer. Just make sure that you use the correct notation. Avoid using a

computer if you cannot write the correct notation. Remember that you are always welcome to contact

the lecturers whenever you have problems with any aspect of the module.

OUTCOMES

At the end of the module you should be able to do the following:

• Define and apply components of time series.

• Apply time series methods to develop forecasts.

• Specify a prototype forecast model, estimate its parameters and then validate it.

• Use the specified model to derive forecasts.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

12/127

xi STA2604/1

TABLE OF OUTCOMES

Outcomes - At the

end of the moduleyou should beable to

Assessment Content Activities Feedback

- explain and expose

time seriescomponents

- analyse data

- plot graphs

- trend

- seasonality- cycles- irregularity

- examine data

visually- plot graphs

- discuss

likelyerrors

- select a model - balancefactors

- choosing atechnique

- analyse errors- plot graphs

- scrutinisemodels

- develop a model - forming anequation

- regression- exponentialsmoothing

- small build-upexercises

- emphasiseaptness

- estimate parameters - perform

estimations

- estimation

methods

- perform

calculations

- discuss

alternatives

- validate a model - statisticaltests

- hypothesistesting

- test hypotheses - peruse thevarious tests

- develop forecasts - demonstratepatterns

- modelbuilding

- form equations - visit variousalternatives

You will know that you understand this module once you understand the above issues.

Feedback is not just a follow-up of the preceding concepts. It is an opportunity to reinforce some

concepts and revise others. Make use of this opportunity. Feedback is given after every activity,

sometimes with some discussion after the activity, but in many instances, it follows immediately after

the activity.

OVERVIEWTwo of the five study units comprising this module are presented in this study guide.

Unit 1: Narration of the forecasting domain and support elements

(Chapter 1 of Bowerman et al.)

In this unit we will learn more about

• Situations requiring forecasts and forecasting

• Issues about useful data and use of data in developing forecasts

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

13/127

xii

• Basic types of data and approaches (quantitative and qualitative methods)

• Errors, problems and pitfalls in forecasting, as well as depiction of good forecasts

• Factors useful in choosing a forecast technique

• More about quantitative methods

Do the above issues raise some response from you? Do you have any idea of what they mean or

imply? Think and chat with your colleagues, peers or family members. Remember that learning

becomes real and effective only when sharing is involved.

Unit 2: Building a forecast model and examining / verifying its strength

(Chapter 5 of Bowerman et al.)

In this study unit we will learn about

• Multicollinearity of variables:

- Variance inflaction factors

- R2

- adjusted R2

- standard error

- interval length

- C-statistic

• Residual analysis:

- residual plots

- the constant variance assumption

- assumption of correct functional form

- normality assumption

- the independence assumption

• Outliers and influential observations:

- outliers

- influential data

- diagnostic methods to detect outliers and influential observations

- leverage points

- residuals

- Cook’s distance measure

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

14/127

xiii STA2604/1

The measures dealt in with this Unit ensure that the model built for use in forecasting has desirable

properties of limited error and is influenced to the minimum, if at all it is influenced. Also, it is

necessary to make a distinction between outliers and seasonal variations. Sometimes a mistake is

made with an effect of seasonality being misinterpreted as an outlier.

We hope you have come across some of the concepts or issues above. Discuss these with your

colleagues, peers, friends or family members.

DIFFICULTIES IN FORECASTING TECHNOLOGY

Nearly all futurists describe the past as unchangeable, consisting as a collection of knowable facts.

We generally perceive the existence of only one past. When two people give conflicting stories of

the past, we tend to believe that one of them must be lying or mistaken.

This widely accepted view of the past might not be correct. Historians often interject their own beliefs

and biases when they write about the past. Facts become distorted and altered over time. It may

be that past is a reflection of our current conceptual reference. In the most extreme viewpoint, the

concept of time itself comes into question.

The future, on the other hand, is filled will uncertainty. Facts give way to opinions. The facts of the

past provide the raw materials from which the mind makes estimates of the future. All forecasts areopinions of the future (some more carefully formulated than others). The act of making a forecast is

the expression of an opinion. The future consists of a range of possible future phenomena or events.

DEFINING A USEFUL FORECAST

The usefulness of a forecast is not something that lends itself readily to quantification along any

specific dimension (such as accuracy). It involves complex relationships between many things,

including the type of information being forecast, our confidence in the accuracy of the forecast, the

magnitude of our dissatisfaction with the forecast, and the versatility of ways that we can adapt to or

modify the forecast. In other words, the usefulness of a forecast is an application sensitive construct.

Each forecasting situation must be evaluated individually regarding its usefulness.

One of the first rules is to consider how the forecast results will be used. It is important to consider

who the readers of the final report will be during the initial planning stages of a project. It is wasteful

to apply resources on an analysis that has little or no use. The same rule applies to forecasting. We

must strive to develop forecasts that are of maximum usefulness to planners. This means that each

situation must be evaluated individually as to the methodology and type of forecasts that are mostappropriate to the particular application.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

15/127

xiv

FORECASTS CREATE THE FUTURE

Often the way we contemplate the future is an expression of our desire to create that future.

Arguments are that the future is invented, not predicted. The implication is that the future is an

expression of our present thoughts. The idea that we create our own reality is not a new concept. It

is easy to imagine how thoughts might translate into actions that affect the future.

Forecasting can, and often does, contribute to the creation of the future, but it is clear that other

factors are also operating. A holographic theory would stress the interconnectedness of all elements

in the system. At some level, everything contributes to the creation of the future. The degree to

which a forecast can shape the future (or our perception of the future) has yet to be determined

experimentally and experientially.

Sometimes forecasts become part of a creative process, and sometimes they do not. When two

people make mutually exclusive forecasts, both of them cannot be true. At least one forecast is

wrong. Does one person’s forecast create the future, and the other does not? The mechanisms

involved in the construction of the future are not well understood on an individual or social level.

ETHICS IN FORECASTING

Are predictions of the future a form of propaganda, designed to evoke a particular set of behaviours?Note that the desire for control is implicit in all forecasts. Decisions made today are based on

forecasts, which may or may not come to pass. The forecast is a way to control today’s decisions.

The purpose of forecasting is to control the present. In fact, one of the assumptions of forecasting

is that the forecasts will be used by policy-makers to make decisions. It is therefore important to

discuss the ethics of forecasting. Since forecasts can and often do take on a creative role, no one

has the absolute right to make forecasts that involve other peoples futures.

Nearly everyone would agree that we have the right to create our own future. Goal setting is a form

of personal forecasting. It is one way to organize and invent our personal future. Each person has

the right to create their own future. On the other hand, a social forecast might alter the course of an

entire society. Such power can only be accompanied by equivalent responsibility.

There are no clear rules involving the ethics of forecasting. Value impact is important in forecasting,

i.e. the idea that social forecasting must involve physical, cultural and societal values. However,

forecasters cannot leave their own personal biases out of the forecasting process. Even the most

mathematically rigorous techniques involve judgmental inputs that can dramatically alter the forecast.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

16/127

xv STA2604/1

Many futurists have pointed out our obligation to create socially desirable futures. Unfortunately, a

socially desirable future for one person might be another person’s nightmare. For example, modern

ecological theory says that we should think of our planet in terms of sustainable futures. The finite

supply of natural resources forces us to reconsider the desirability of unlimited growth. An optimistic

forecast is that we achieve and maintain an ecologically balanced future. That same forecast, the

idea of zero growth, is a catastrophic nightmare for the corporate and financial institutions of the free

world. The system of profit depends on continual growth for the well-being of individuals, groups,

and institutions.

‘Desirable futures’ is a subjective concept. It can only be understood relative to other information.

The ethics of forecasting certainly involves the obligation to create desirable futures for the person(s)

that might be affected by the forecast. If a goal of forecasting is to create desirable futures, then theforecaster must ask the ethical question of “desirable for whom?”.

To embrace the idea of liberty is to recognise that each person has the right to create their own

future. Forecasters can promote libertarian beliefs by empowering people that might be affected by

the forecast. Involving these people in the forecasting process, gives them the power to become

co-creators in their futures.

BENEFITS OF FORECASTING

Forecasting can help you make the right decisions, and earn/save money. Here are a few examples.

• Define better sale strategies

If a product is declining, maybe it is a good idea to consider stop producing it. But maybe not:

maybe it is just your sales that are declining, but not your competitor’s?

In this case, is there a chance that you can get your market share back?

Forecasting techniques provide answers to these questions – vital questions to your business.

• Size your inventories optimally

Time is money. Room is money. So what you want to do is use all means at your disposal in order

to reduce your stocks – without experiencing any shortages, of course.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

17/127

xvi

How? By forecasting!

Forecasting is designed to help decision making and planning in the present. Forecasts empower

people because their use implies that we can modify variables now to alter (or be prepared for)

the future. A prediction is an invitation to introduce change into a system. There are several

assumptions about forecasting:

• There is no way to state what the future will be with complete certainty. Regardless of the

methods that we use there will always be an element of uncertainty until the forecast horizon

has come to pass.

• There will always be blind spots in forecasts. We cannot, for example, forecast completely new

technologies for which there are no existing paradigms.

• Providing forecasts to policy-makers will help them formulate social policy. The new socialpolicy, in turn, will affect the future, thus changing the accuracy of the forecast.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

18/127

1 STA2604/1

STUDY UNIT 1: An Introduction to Forecasting

1.1 Introduction

Table of outcomes for the study unit

Outcomes - At the endof the module you

should be able toAssessment Content Activities Feedback

- define time seriesterms

- data plots andmeasures

- time seriesword list

- experimentwith data

- discuss eachactivity

- decompose time

series

- graph, visual - time series

components

- plot graphs - critique the

graphs

- calculate time seriesmeasures

- stepwiseexercises

- errors inforecasting

- variouscalculations

If you understand the above outcomes, it will be an indication that you understand this study unit. It

is based on Chapter 1 of the prescribed book.

Forecasting is the scientific process of estimation some aspects of the future in usually unknown

situations. Prediction is a similar, but is more general term. Both can refer to estimation of time

series, cross-sectional or longitudinal data. Usage can differ between areas of application: for

example in hydrology, the terms "forecast" and "forecasting" are sometimes reserved for estimates of

values at certain specific future times, while the term "prediction" is used for more general estimates,

such as the number of times floods will occur over a long period. It is essential that one notes

the emphasis that in this module, forecasting also envelops that it is scientific. This is to ensure

that we do not consider subjective predictions and spiritual prophecies as part of our scope for this

forecasting module. Risk and uncertainty are central to forecasting and prediction. Forecasting

is used in the practice of Customer Demand Planning in every day business forecasting for

manufacturing companies. The discipline of demand planning, also sometimes referred to as supply

chain forecasting, embraces both statistical forecasting and a consensus process. Forecasting is

commonly used in discussion of time-series data. In this module the terms are fairly straightforward

from the prescribed book.

Forecasting has application in many situations:

• Supply chain management - Forecasting can be used in Supply Chain Management to make sure

that the right product is at the right place at the right time. Accurate forecasting will help retailers

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

19/127

2

reduce excess inventory and therefore increase profit margin. Accurate forecasting will also help

them meet consumer demand.

• Weather forecasting, Flood forecasting, and Metereology

• Transport planning and Transport forecasting

• Economic forecasting

• Egain forecasting

• Technology forecasting

• Earthquake forecasting

• Land use forecasting

• Product forecasting

• Player and team performance in sports

• Telecommunications forecasting

• Political forecasting

• Sales forecasting

ACTIVITY 1.1

Consider the terms “forecasting”, “cross-sectional data” and “time series”, which are the main focus

of this study unit.

(a) Attempt to define these terms.

(b) Check the definitions in the book and compare your answers in (a).

Before we discuss the above activity, start by reading slowly through the following discussion. Make

sure you follow the discussion.

1.1.1 Forecasting

Study section 1.1 on page 2 up to the second bullet on page 3.

The few people with whom we discussed the term “forecasting”seemed to have an understanding

of the concept only “in a nutshell”. Many of them made reference to the weather forecast that

was presented on radio, television and the internet. A gap existed in the main understanding of

forecasting.

Various backgrounds exist that show that at every point in time when people lived, they were always

interested in the future. There are stories from history that inform us that when people dreamed,

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

20/127

3 STA2604/1

there were experts to explain the meanings of these dreams in terms of the future. When signs of

future drought arose, the implications of the drought were noted and plans were made to offsets the

impacts that were anticipated. Drought led to hunger. Thus, when predictions were made that there

was drought coming, preparations were made that at the time of the drought, there would be enough

food for every member of the community during the duration of the drought. Predicting the future

even as it was done during those days can be referred to as forecasting. The predicted future was

then used to plan for the future as explained above.

Modern practice has encouraged that the "anticipation of the furture" practice be conceptualised.

It was then formally termed “forecasting”. The current approaches are scientific in order to ensure

that forecasting is practised systematically. The predictions made are now called forecasts. In other

terms, forecasts are future expectations based on scientific guidelines.


The first term we listed in Activity 1.1 was “forecasting”. Did you get that? The term forecasting is a

“natural” operation. We have always done it, sometimes unconsciously. As was explained, predicting

activities has always been practised, even in ancient times. For self-evaluation in terms of the time

series concept, did you define the term forecasting in line with “predicting the future”?

Forecasting indicates more or less what to expect in the future. Once the future is known, preparation

for equitable allocation of resources can be made. Wastages can thus be reduced or eliminated and

gains can be enhanced (or increased).

FURTHER DISCUSSION ON FORECASTING

Forecasting is applied in various real-life situations. Six examples of applications are listed on pages

2 and 3 of the prescribed book. We are close to them at different levels. But what about something

that we as students of the University of South Africa can appreciate?

The number of student enrolments at Unisa is the starting point. The trend pattern will give an

indication of whether there has been a decline or growth in the student numbers over the years. If

you are observant, you will realise that there has been an increase in student numbers over the past

few years. Our “forecast” for next year (2013) is that there will be more students than in 2012.

ACTIVITY 1.2

Weather forecasting was mentioned as a known example where forecasting is used abundantly.

There are many others.

(a) Provide an easy example of a situation where forecasting is needed.

(b) Attempt to explain the details of the example you provided in (a).


We discussed the Unisa example. If you are interested in Southern African politics and elections you

will be interested in making predictions about political parties that are going to be in the forefront in

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

21/127

4

the next election. We might anticipate extreme growth of one party (MDC) and decline of others in

Zimbabwe, based on the trends in the previous elections and developments that prevail. Therefore,

(a) one can for example predict how the political parties will perform in the next election; and

(b) recent performance of the various parties in previous elections may be revisited and analysed,

the current activities of the parties may be analysed closely and one may interact with people to

determine their impressions about various parties.

N.B.: Here we assume normal election conditions where no intimidation and harassments take place.

1.1.2 Data

For this topic you need to study from the middle paragraph of page 3 to the end of page 4.

Data are important for forecasting. Quality data, which loosely refer to reliable and valid data, are the

ones needed for forecasting. We may be misled if we use data of poor quality because results are

likely to be poor as well, even if best methods are used by a proficient analyst. The term data refers

to groups of information that represent the qualitative or quantitative attributes of a variable or set of

variables. Data (plural of "datum", which is seldomly used) are typically the results of measurements

and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed

as the lowest level of abstraction from which information and knowledge are derived. Raw data refers

to a collection of numbers, characters, images or other outputs from devices that collect information

to convert physical quantities into symbols, that are unprocessed.

Without data there will not be forecasting. However, it is important that data be correct (reliable, valid,

realistic, etc). Data need to be both valid for the exercise, and be reliable. If one of these is missed,

then be warned that your forecasts may mislead you or any user. Also, collection of data may

be inadequate to help in supporting the reasoning behind some findings. Experience shows that

when data are collected under certain contexts, explanations and contexts become clearer when

findings are associated with those contexts. Thus, if you assist in data collection of time series or

any statistical data, whenever possible, advise on the inclusion of details of the occurrences of the

data. Giving details around happenings assists in reducing the extent of making assumptions which

may sometimes be incorrect.

The type of information used in forecasting determines the quality of the forecasts. Not all of us like

boxing, but let us discuss the next scenario. Imagine that two boxers were going to fight on the next

Saturday. We were required to make a prediction in order to win a million rand competition. Many

participants looked at the past records of these boxers. They were informed that in the previousseven years boxer Kangaroo Gumbu had won 25 out of 27 fights while boxer Boetie Blood had won

22 of the 30 fights he had in the same period. Gumbu was known for winning well while Blood had

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

22/127

5 STA2604/1

lost dismally in a recent fight. Let us pause and enjoy the predictions (forecasts) made, just to make

a good point..

ACTIVITY 1.3

Either as a person interested in boxing or someone hoping to win the money, you may be tempted to

take a chance at the answer. Make a prediction of the outcome of the fight based on the explanation

given.


Let us determine the odds as statisticians. Using frequencies, Gumbu had a probability of 0.93 of

winning the fight while Blood had probability of 0.73 of winning the fight. On the basis of these odds,

many participants predicted that Gumbu was going to win.

Do you know how the probabilities 0.93 and 0.73 have been obtained? If it is not clear, divide the

number of successes (wins) of each boxer by the total number of fights that each boxer had fought.

The data given were based on certain assumptions. Among others, there was the impression that

the opponents of the two boxers were of the same quality. If they were not, then the prediction would

be carrying some “inaccuracies”. Among other omissions, we were not told that the boxing bout

was going to be held in the catchweight division, where boxers came from different weight divisions

and could not both fall within a single previously defined weight division. Blood had fought only

world-class opponents and came from two weight divisions heavier than the weight to which Gumbu

belonged. That is, there was a difference between the original weights of the two boxers. Gumbu,

on the other hand, was a boxer who talked too much. He had fought some mediocre opponents and

wanted to pretend he was an excellent boxer. He had asked for the fight. In insisting on the fight,

he had called Blood a coward until the bout was sanctioned. At the time he was preparing for an

elimination bout in his weight division after which he was going to fight for a world title if he won.

The planned elimination bout was probably going to be the first real test for Gumbu as a professional

fighter. It was going to come “after I am done with Blood,” boasted Gumbu.

In the street some people were predicting that Gumbu was going to lose, but they did not bet as

money was required. None of those who paid to enter the competition predicted correctly. The fight

ended with a first-round knockout. Blood was the winner. Gumbu was no match.

DISCUSSION OF THE BOXING SCENARIO

The records given were correct, but not complete. Records are past data. We need complete

data and the exact context in which they occurred in order to be able to make accurate forecasts.The analyses that were made about the boxers were correct, but some assumptions were wrong.

Assumptions are used to build cases, and methods are developed on conditions that are given as

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

23/127

6

assumptions. Wrong assumptions may lead to inappropriate methods for data analysis. In cases

where information can be found to limit the use of assumptions, this should be done. However, many

cases provide inadequate information, leaving us with no choice but to depend on assumptions.

Analysis should depend on reasonable assumptions. If in actual practice assumptions are made

for the sake of doing something, decisions and results reached may lead to improper actions. The

analyst should learn the art of making appropriate or reasonable assumptions.

In the case of the example/scenario given, the details were missing, such as that the two boxers were

of different weights. If we knew, this would have helped in our analysis. Sometimes in predicting

about forthcoming games, one needs to also know the quality of opposition that the two opponents

have met in the accumulation of their records. This was also missing in the example. We will insist

on use of the valid assumptions because as we saw, wrong or invalid assumptions are likely to giveinaccurate predictions. The paragraph after the last bullet of the prescribed book on page 3 explains

possible repercussions that come with the wrong assumptions (Bowerman, 2005: 3).

Types of data that are common in real life are cross-sectional data and time series data. Study

the definition of cross-sectional data in the rectangle on page 3. Cross-sectional data refers to data

collected by observing many subjects (such as individuals, firms or countries/regions) at the same

point of time, or without regard to differences in time. Analysis of cross-sectional data usually consists

of comparing the differences among the subjects. For example, we want to measure current obesity

levels in a population. We could draw a sample of 1,000 people randomly from that population (also

known as a cross section of that population), measure their weight and height, and calculate what

percentage of that sample is categorized as obese. Even though we may analyse cross-sectional

data for quality forecasts, in this module we use time series data.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

24/127

7 STA2604/1

Study the definition of time series on page 4.

We will have to be careful when we collect time series data. If the data are listed without time

specification, then we should consider the data to be time series.

SCENARIO

Read the following scenario carefully and make notes as we will keep on referring back to it.

Suppose that Jabulani is a milk salesperson during the week, serving the Florida, Muckleneuk and

VUDEC UNISA campuses. Very fortunately for Jabulani, his milk cows increased and his market

in these campuses also increased from year to year. Jabulani’s business runs from Mondays to

Sundays. (In a time series analysis a typical question would be: what can we say about the trend

of the sales?) Asked differently: should we believe that the sales have a decreasing or increasing

trend? It will be clear later on that the sales levels differ according to days, high on some days and

low on others. The pattern of low sales or high sales on different days have an important connotation

in time series analysis. This will be discussed.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

25/127

8

ACTIVITY 1.4

You have done some first-year statistics modules/courses and some of you did mathematics modules

as well. Let us consider the following data sets and look at them quite closely.

Data set 1.1 16 14 19 26 11 24 1018 15 21 24 12 21 921 15 20 27 13 25 1124 17 24 31 14 27 13

Data set 1.2 16 18 21 2414 15 15 1719 21 20 2426 24 27 31

11 12 13 1424 21 25 2710 9 11 13

(a) The two data sets have exactly the same numbers. There is something strange about their

appearances though. Compare the two data sets.

(b) Can these two data sets be classified as time series data sets? Explain.


On whether data are time series or not

When information about the data presented is limited, there also tends to be a limited feedback from

an analysis made from them. You probably realised that the rows of data set 1.1 are the same as the

columns of data set 1.2 and vice versa. Or, in short, that the data sets are transposes of each other.

The data in their current form cannot be classified as time series data since no chronological pattern

of the time at which they were collected is given. This will become clearer as we proceed.

Discussion

The data above do not necessarily represent time series data, but it can be presented in another way

to form time series data - provided they were collected chronologically over regular time intervals.

Suppose data set 1.1 represents the sales of milk sold by Jabulani from Monday to Sunday for four

weeks. Let 1 = Monday, 2 = Tuesday, ..., 7 = Sunday as given in data set 1.3. The data sets should

therefore be presented as follows:

Data set 1.3 Litres of milk sold by Jabulani

Day1 2 3 4 5 6 7

1 16 14 19 26 11 24 10Week 2 18 15 21 24 12 21 93 21 15 20 27 13 25 114 24 17 24 31 14 27 13

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

26/127

9 STA2604/1

We emphasise that in the initial presentation there was simply no information to explain or

demonstrate the chronological sequence with respect to time and that the data were therefore not

time series data.

ACTIVITY 1.5

You are required to use graphs in addition to other methods to detect patterns in time series data.

Graphical plots reveal information visually, but cannot always be done with ease. The example

that follows, is one of the easy cases where we can draw graphical plots. Analyse the data about

Jabulani’s business by answering the following questions. Make any comments that you believe are

relevant.

(a) Are they time series data? Justify your answer.

(b) Plot the data to reveal the pattern using the following approaches:

(i) Plot the data for each week separately.

(ii) Plot the data of all the weeks in one graphical display.

(iii) Compare the shapes of the graphs.

(c) Which plot provides us with a better idea of comparison?


The emphasis about whether data sets form time series or not, depends entirely on the form, whichis the chronological order in which the various data points should be presented. Did you answer

"yes" in question (a)? If not, what did you reveal? How did you reveal it?

(b) Graphs of the activity

(i) Graphs for separate weeks

Week 1

0

5

10

1520

25

30

1 2 3 4 5 6 7

Days

L i t r e s o f m

i l k

Week 2

0

5

10

1520

25

30

1 2 3 4 5 6 7

Days

L i t r e s o f m

i l k

Week 3

05

10

15

20

25

30

1 2 3 4 5 6 7

Days

L i t r e s o f m i l

Week 4

05

10

15

20

25

30

35

1 2 3 4 5 6 7

Days

L i t r e s o f m i l k

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

27/127

10

(ii) Graph for data of all the weeks

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7

Days

L i t r e s o f m i l k

Week 1

Week 2

Week 3

Week 4

(iii) In terms of the pattern, the graphs reveal that milk sales were highest on Thursdays, Saturdays

and Wednesdays (in order from highest to lowest). The lowest sales were revealed for

Sundays, Fridays, Tuesdays and Mondays (in the order from lowest to highest).

(c) The graphs can be dif ficult to compare when they are on separate systems of axes. The last

graph makes comparison very easy, revealing that the patterns for all four weeks are similar.

The patterns of the highest activity and lowest activity about a phenomenon are important in time

series. Jabulani will easily know when he does more business, when he does least business and he

can plan to find better ways to improve business. Let us start formalising these patterns.

1.1.3 Components of a time series

The components of a time series serve as the building blocks of a time series and describe its pattern(study p. 5-7 of textbook up to the end of section 1.2).

Components are important because they enable us to see the salient features of a structure. Through

them we can make descriptions of what we need to analyse. When we deal with something that we

can describe, we are better able to know the requirements for dealing with it. Time series also has

components that need to be considered and taken care of in their analyses.

Trend

The first component we discuss is trend. The term “trend” is about long-term decline or growth of an activity. It is defined formally as the upward and downward movements that characterise a time

series over a period of time.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

28/127

11 STA2604/1

Time series data may show upward trend or downward trend for a period of years. This may be

due to factors such as increase in population, change in technological progress, large scale shift in

consumers’ demands, and so on. For example, population increases over a period of time, price

increases over a period of years, production of goods on the capital market of the country increases

over a period of years. These are the examples of upward trend. The sales of a commodity may

decrease over a period of time because of better products coming to the market. This is an example

of declining trend or downward trend. The increase or decrease in the movements of a time series

is called trend.

Usually one would not be able to determine from looking at the data whether there is a decreasing

or increasing trend. There are times (but rarely) when we can see the pattern by inspection. Often a

graphical plot clearly shows the trend. The trend may be given in shapes such as linear, exponential,

logarithmic, polynomial, power function, quadratic, and other forms. In general, we use the graphical

displays to find out if there is a decline or increase in the activity. Some examples of trend applications

that we must look at are given on page 5 of Bowerman et al. (2005). Study them.

- Technological changes in the industry

Currently, companies increase ICT usage in their activities for competitive edge over those that do

not incorporate it. Institutions of higher learning have aggressively incorporated ICT in facilitating

learning, especially the distance education ones.

- Changes in consumer tastes

Housing is very expensive and scarce, but for obvious reasons remains a priority for households.

Recently, cities such as Cape Town, Durban, East London, Johannesburg, Port Elizabeth and

Pretoria have experienced a high influx of people from other areas, and employment is biased

towards the youth. As a result housing in these cities is biased towards townhouses and flats.

- Increases in total population

There is an increase since there are more births than deaths. In SA, there is also an influx of

people from other countries. In other countries, natural deaths and deaths that resulted from

holocausts, wars, terrorism and natural disasters such as the tsunami and others, have resulted

in many deaths but much fewer deaths than the births that have occurred over the years. That is

why there is an increase in the world’s population.

- Market growth

In Gauteng, the market of umbrellas decreases in the period April to July. During the rainy season,

which in Gauteng happens to be the summer season, the sales of umbrellas increase.

- Inflation or deflation (price changes)

If we consider one item for simplicity, maize is produced in the period October to May,

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

29/127

12

approximately. During entry period, the price of maize is high because there are more people

looking for a less available commodity. During the periods November to January, maize is in

abundance and the prices drop. As the production level declines, the prices start increasing

again.

ACTIVITY 1.6

Discuss what a time series is, and discuss the meaning of trend effects, seasonal variations, cyclical

variations, and irregular effects.


You should mention a sequence of observations of a variable presented in chronological form

when you describe a time series. Trend should imply a long-term tendency of that time series.Seasonality should include a periodic pattern in the data. Describing cycles should imply up and

down movements of observations around trend levels. Irregular pattern is the portion of the time

series which cannot be accounted for by the three patterns discussed above.

Exploration data set

The next data set is important for exploration. ENJOY IT. It represents the litres of milk that were

demanded from Jabulani. Whether there was stock or not is not an issue here. The data set will be

revisited time and again.

Data set 1.4 Day1 2 3 4 5 6 7

1 16 14 19 26 11 24 10Week 2 18 15 21 24 12 21 9

3 21 15 20 27 13 25 114 24 17 24 31 14 27 13

In general, methods of forecasting that depend on non-numeric information are qualitative forecasting

methods. (Do you remember this from first-year Statistics?) Qualitative data are nominal/words data.

Quantitative forecasting methods on the other hand depend on numerical data.

Bowerman et al. (2005: 7) present a graphical plot Figure 1.1 (a) to display an example of a trend

in a time series. There is no trend line to describe the trend, but can you explain whether there is a

decreasing or increasing trend in the plot to which we are referring?

Cycle

The next component of time series that we discuss is “cycle”. When trends havebeen identified, there

may be some recurring up and down movements visible around trend levels. These movements are

called cycles. Cycles occur over long and medium terms. Page 5 of Bowerman et al. (2005) presents

this component.

Some interesting explanation is presented by Bowerman et al. (2005: 5) about business cycles.

Study it in detail. Bowerman et al. (2005: 7) present Figure 1.1 (c) to display an example of a cycle

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

30/127

13 STA2604/1

in a time series. We need to note that generally, natural occurrences have shown some cyclical

patterns over the years.

The impact of cycles on a time series is either to stimulate or depress its activity, but in general,

their causes are dif ficult to identify and explain. Certain actions by institutions such as government,

trade unions, world organisations, and so on, can induce levels of pessimism and optimism into the

economy which are reflected in changes in the time series levels. Economic indices are usually used

to describe cyclical fluctuations.

Cyclical variations are recurrent upward or downward movements in a time series but the period of

cycle is greater than a year. This restriction makes it different from trend. Also, cyclical variations

are not regular as seasonal variation. There are different types of cycles of varying in length and

size. The ups and downs in business activities are the effects of cyclical variation. A business

cycle showing these oscillatory movements has to pass through four phases-prosperity, recession,

depression and recovery. In a business, these four phases are completed by passing one to another

in this order. Together, they form a cycle.

Cycles are useful in long-term forecasting. Usually it means centuries and millenniums. Our

capabilities and interest in this module do not require us to look beyond a decade. Hence, methods

for developing forecasts that include cycles (or cyclical components) are not in the scope of this

module. However, you still need to understand when cycles are discussed or implied in a forecastingsituation.

Seasonality

The example about milk is given over weekly periods. The definition given by Bowerman et al. (2005:

6) is somewhat misleading! The impression it gives is that observations being investigated, must run

over a year. This is simply not the case. Even the values occurring within a day can be seen to be

seasonal, as you will soon see. First, we provide a more useful and realistic definition of seasonality,

which will be used in the module. The one given in Bowerman et al. shall work when the periods are

over yearly periods. Let us define the concept in the next line:

Seasonal variations are systematic variations that occur within a period and which are tied to some

properties of that period. They are repeated within the period. They are indeed periodic patterns in a

time series that complete themselves within a calendar period and are repeated on the basis of that

period.

Seasonal variations are short-term fluctuations in a time series which occur periodically in a period,

such as a year. In this case it would continue to be repeated year after year. The major factors that

are responsible for the repetitive pattern of seasonal variations are weather conditions and customsof people. More woolen clothes are sold in winter than in the season of summer. Regardless of the

trend we can observe that in each year more ice creams are sold in summer and very little in winter

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

31/127

14

season. The sales in the departmental stores are more during festive seasons that in the normal

days.

Irregular fluctuations

We have not mentioned whether Jabulani was ever robbed of his revenue or stock for his business.

Now we are giving you bad news.

Irregular fluctuations are variations in time series that are short in duration, erratic in nature and

follow no regularity in the occurrence pattern. These variations are also referred to as residual

variations since by definition they represent what is left out in a time series after trend, cyclical and

seasonal variations have been accounted for. Irregular fluctuations results due to the occurrence of

unforeseen events like floods, earthquakes, wars, famines, and so on.

Remember that Jabulani was a smart entrepreneur who would make some estimations of revenue

each morning he left for work. One Tuesday afternoon after he had counted what he thought was his

revenue for the day, he was robbed by two thugs. Fortunately he was neither hurt nor discouraged

to continue with his business. It was happening for the first time. Could he have anticipated being

robbed on that day? We also could not have predicted that event.

The point is, that irregular event changed what could have been the revenue and/or profit for that

day. In time series, irregular fluctuations, which are also called irregular variations, refer to random

fluctuations that are attributed to unpredictable occurrences. Bowerman et al. (2005: 6) appropriately

define them as erratic movements in a time series that follow no recognisable or regular pattern. The

presentation about this concept simply implies that these patterns cannot be accounted for. They

are once-off events. Examples are natural disasters (such as fires, droughts, floods) or man-made

disasters (strikes, boycotts, accidents, acts of violence and so on).

Note that all the components of a time series influence the time series and can occur in any

combination. The most important problem to be solved in forecasting is trying to match the

appropriate model to the pattern of the time series data.

1.1.4 Applications of forecasting

Forecasting has application in many situations. Among others, it can be applied in:

• Supply chain management - Forecasting can be used in Supply Chain Management to make sure

that the right product is at the right place at the right time. Accurate forecasting will help retailers

reduce excess inventory and therefore increase profit margin. Accurate forecasting will also help

them meet consumer demand.

• Weather forecasting, Flood forecasting and Meteorology

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

32/127

15 STA2604/1

• Transport planning and Transportation forecasting

• Economic forecasting


• Earthquake prediction

• Land use forecasting

• Product forecasting

• Player and team performance in sports

• Telecommunications forecasting

• Political Forecasting

1.2 Forecasting Methods

This topic is discussed on pages 7 to 12. Study these pages. On page 7 there is a reminder that

there is no single best forecasting method. There are, however, appropriate methods for any time

series situation. The forecasting methods are described along the same line as types of data that you

dealt with in your Statistics courses/modules at first year level. They are qualitative and quantitative

in nature.

1.2.1 Qualitative methods

Study this topic from page 8 to page 11.

The textbook explains on page 8 that generally, qualitative forecasting methods become an option

to develop forecasts in situations where there are no historical numeric data or where time series

trained statisticians are not available. Opinions of experts are generally used to make predictions

in such cases. Predictions are necessary in all situations, even where there is no data. When this

occurs, qualitative methods are involved.

Common examples of qualitative forecasting methods are judgemental methods. Judgmental

forecasting methods incorporate intuitive judgements, opinions and subjective probability estimates.

• Composite forecasts

• Surveys

• Delphi method

• Scenario building


• Forecast by analogy

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

33/127

16

You do not need to learn more about these for the requirements of this module. However, you

may come across them in applications. Hence, your encounter with them may be of help in future

applications.

1.2.2 Quantitative methods

Quantitative forecasting methods are used (and only possible) when historical data that occur in

numeric form are available. These methods may occur as univariate forecasting methods or as

causal methods (Bowerman et al., 2005: 11).

Univariate forecasting methods depend only on past values of the time series to predict future

values. In this method, data patterns are identified from historical data, the assumption is made

that the patterns will continue in the future and then the pattern is extrapolated in order to develop

forecasts. Study this topic on page 11.

Causal forecasting models, start by identifying variables that are related to the one to be predicted.

This is followed by forming a statistical model that describes the relationship between these

variables and the variable to be forecasted. The common ones are regression models and ordinary

polynomials. Study this topic on page 11.

In the causal forecasting method, the variable of interest, which is the one whose forecasts are

required, depends on other variables. It is thus the dependent variable. The ones on which the

variable of interest depends are known as the independent variables.

Discussion about dependence/independence

Note that Jabulani’s customers are mostly people who received wages on a weekly basis. Some are

paid on Saturday afternoon, but an overwhelming majority is paid on Friday afternoon. In addition,

on Saturday afternoon, there is an item P that is also liked by many milk buyers. If item P is available

before milk arrives, then this item is bought in large quantities, leaving limited disposable income for

the milk purchases. Fortunately for Jabulani, he has in the past four weeks, managed to deliver milk

before item P was delivered. However, most of the buyers who are paid on Saturday tend to meet

the P seller before their milk purchases on Sunday morning.

It is necessary to understand dependencies and correlations when dealing with forecasting. If you fail

to understand them, you may fall in the trap of making wrong assumptions because influences that

may affect your forecasts and constraints coming with correlated variables may lead to developing

inaccurate models and thus leading to wrong forecasts.

Useful common examples are time series and causal methods. There are others as well, but the

following may be of help in your development.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

34/127

17 STA2604/1

Time series methods

Time series methods use historical data as the basis of estimating future outcomes.

• Rolling forecast is a projection into the future based on past performances, routinely updated

on a regular schedule to incorporate data.

• Moving average

• Exponential smoothing

• Extrapolation

• Linear prediction

• Trend estimation

• Growth curve

Causal / econometric methodsSome forecasting methods use the assumption that it is possible to identify the underlying factors

that might influence the variable that is being forecasted. For example, sales of umbrellas might

be associated with weather conditions. If the causes are understood, projections of the influencing

variables can be made and used in the forecast.

• Regression analysis using linear regression or non-linear regression

• Autoregressive moving average (ARMA)

• Autoregressive integrated moving average (ARIMA), e.g. Box-Jenkins

• Econometrics

Other methods

• Simulation

• Prediction market

• Probabilistic forecasting and ensemble forecasting

• Reference class forecasting

These methods are given to you so that when you make references from other forecasting sources,

you will be able to understand where they belong in your module. However, they are not necessarily

required to the extent that is presented in those other sources.

ACTIVITY 1.7

• Do you see any dependence of the variables?

Hint: Focus on milk purchases and disposable income.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

35/127

18


Keeping to the hint, the purchase of an item that is in high demand depends on the availability of

disposable income.

ACTIVITY 1.8

(a) Classify the milk sales in the latest scenario as a dependent or independent variable.

(b) Explain your choice in (a) above. Here confine your response to milk purchases and disposable

income.

(c) Identify the dependent variable and the independent variable.


Regarding (a), milk sales depend on the availability of disposable income. Hence, (b) milk sales

represent the dependent variable. This leads to (c) that sales are the dependent variable and

disposable income is the independent variable.

1.3 Errors in forecasting and forecast accuracy

When it was said that the pattern of information given, such as Jabulani’s milk sales, can help you

make future predictions, no one said your predictions would be perfect.

It is time to note that if the forecasts prepared/developed are not accurate, they may be useless since

they are probably going to mislead the user. When we insist on a scientific method in forecasting, it

was to ensure that we can monitor the methods and test the models so that the inaccuracies in them

are reduced, or ideally, eliminated.

It is important to know the likely errors when you attempt to make predictions or develop forecasts.

If you know them, you can avoid or minimise them. Error is as simple as when you thought Jabulani

was going to sell 500 litres in a specific week and he ends up selling 520 litres. (Note that you could

make an error in litres of milk by overestimating as well.)

The next sections require your learned skill of drawing graphs and interpreting them. The

most common ones you should expect to encounter (draw and interpret) are scatter diagram (or

scatterplot) and time plot. Revise them if you have already forgotten how they are drawn.

Further, you are soon going to engage in a number of calculations. Thus, ensure that you are

ready to perform them, and that you remember descriptive statistics your learnt in your early years

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

36/127

19 STA2604/1

of Statistics. It is also very important to be able to know why the calculations are necessary in any

exercise of building a forecast model.

Bowerman et al. (2005: 12) name two types of forecasts, the point forecast and the prediction

interval. A point forecast is a single number that estimates the actual observation. A prediction

interval is a range of values that gives us some confidence that the actual value is contained in the

interval.

The forecast error as defined in Bowerman et al. (2005: 13) requires that the estimate be found and

be “paired” with the actual observation.

In statistics, a forecast error is the difference between the actual or real and the predicted or forecast

value of a time series or any other phenomenon of interest. In simple cases, a forecast is compared

with an outcome at a single time-point and a summary of forecast errors is constructed over a

collection of such time-points. Here the forecast may be assessed using the difference or using

a proportional error. By convention, the error is defined using the value of the outcome minus the

value of the forecast. In other cases, a forecast may consist of predicted values over a number of

lead-times; in this case an assessment of forecast error may need to consider more general ways of

assessing the match between the time-profiles of the forecast and the outcome. If a main application

of the forecast is to predict when certain thresholds will be crossed, one possible way of assessing

the forecast is to use the timing-error—the difference in time between when the outcome crosses

the threshold and when the forecast does so. When there is interest in the maximum value being

reached, assessment of forecasts can be done using any of:

· the difference of times of the peaks;

· the difference in the peak values in the forecast and outcome;

· the difference between the peak value of the outcome and the value forecast for that time point.

Forecast error can be a calendar forecast error or a cross-sectional forecast error, when we want to

summarize the forecast error over a group of units. If we observe the average forecast error for a

time-series of forecasts for the same product or phenomenon, then we call this a calendar forecast

error or time-series forecast error. If we observe this for multiple products for the same period, then

this is a cross-sectional performance error.

To calculate the forecast errors we subtract the estimates (ŷi) from the actual observation (yi). The

difference is the forecast error. Can you tell what the values of the forecast errors imply? For

example, some may be smaller than others, some negative and others positive!

When Jabulani plans his sales, he makes some estimation of litres of milk that he hopes to sell. In

Week 3 prior to getting to the market, he had made the following estimations (ŷi):

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

37/127

20

Week Day Litres of milk estimated (ŷ)3 1 27

2 113 20

4 265 146 227 9

Remember to refer to the appropriate week of the table of Data set 1.4 for observed values (yi).

ACTIVITY 1.9

(a) On which days were there overestimation?

(b) On which days were there underestimation?

(c) Calculate the forecast errors for these estimates.

(d) Identify the day on which the milk sales were most disappointing! Explain.

(e) On which day did he make the best prediction? Why?


We have not defined the terms overestimation and underestimation formally. They have been

defined in other modules, but we wish to make a reminder. If you make a prediction and the actualobservation turns out to be smaller, we will have overestimated. What is the sign of the forecast

error? Can you now define the term “underestimation”? What about the sign of the forecast error?

Let us get into the questions of the activity. The setup of week 3 is as follows:

Actual observations (y1) 21 15 20 27 13 25 11Estimates observations (ŷ1) 27 11 20 26 14 22 9

(a) Overestimations are visible after pairing by observing the pairs in which the actual observations

are lower than the estimates. These were on Day 1 and Day 5.

(b) Underestimations occurred on Day 2, Day 4, Day 6 and Day 7.

(c) The forecast errors are −6, 4, 0, 1, −1, 3 and 2 for the seven days, respectively.(d) Day 1 was the most disappointing. This is because Jabulani expected to sell 27 litres but only

sold 21 litres. It is the day he made the biggest loss, that is with the largest negative error.

(e) He made the best prediction on Day 3, where the sales were equal to the estimates.

If there was no day when the sales and estimates were equal, then the day with the smallest forecasterror in absolute value would have been the one on which the best prediction was made. This means

that Day 4 and Day 5 are the days on which good predictions were made. However, we note that

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

38/127

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

39/127

22

(b) The plot looks almost random. This means that the forecasting technique provides a good fit to

the data.

1.3.1 Absolute deviation

Forecast errors are used to calculate absolute deviations. The absolute deviation (Bowerman et al.,

2005: 15) requires the forecast errors in absolute terms, i.e., a matter of “how far is the estimate from

the actual observation”.

ACTIVITY 1.11

Calculate the absolute deviations for the estimates in Activity 1.9.


The calculation is fairly straightforward. We need the forecast errors, which were calculated as

Forecast errors (e1) −6 4 0 1 −1 3 2

The absolute deviations are the absolute values of the forecast errors, which we can recall from our

high-school days. The absolute deviations are thus

Absolute deviations (|e1|) 6 4 0 1 1 3 2

1.3.2 Mean absolute deviation

The absolute deviations give us the mean absolute deviation (MAD) when we obtain their average in

the usual way. The MAD (Bowerman et al., 2005: 15) requires the following steps: take the absolute

deviations, add them, divide the sum by their number and the result in the MAD.

ACTIVITY 1.12

Calculate the MAD for the estimates in Activity 1.9.


Absolute deviations (|ei|) 6 4 0 1 1 3 2

The MAD is therefore

M AD =

7i=1

|ei|

n

= 177

= 2.42857.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

40/127

23 STA2604/1

1.3.3 Squared error

Another way to get rid of positive and negative errors is squared errors (Bowerman et al. (2005: 15)).

ACTIVITY 1.13

Calculate the squared errors for the estimates in Activity 1.9.


Forecast errors (ei) −6 4 0 1 −1 3 2

The squared errors are therefore

Squared errors

e2i

36 16 0 1 1 9 24

1.3.4 Mean squared error

The MSE is the average of the squared errors.

ACTIVITY 1.14

Calculate the MSE for the estimates in Activity 1.9.


To calculate the MSE we need the squared errors, which were calculated as

Squared errors

e2i

36 16 0 1 1 9 24

The MSE is therefore

M SE =

7i=1

e2i

n

= 87

7

= 12.42857.

Now, let us pause a little. We have done a few useful calculations. We have also answered a few

questions about errors.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

41/127

24

Do you recall the value of the forecast error on the day that the estimate was perfect? Do you also

see what is meant by a poor estimate? Now can you say what is meant by a good estimate? You

will recall that the errors need to be as small as possible. So far it is not absolutely clear what small

entails.

The MAD and MSE are the measures that we will use to determine if the errors are small which will

indicate a good model. The objective is to select a good forecast model. The model that will be

selected must produce forecasts that are close to the actual observations. The MAD and the MSE

will serve as our tools to select a forecast model.

We need to understand the MAD and the MSE as they relate to the forecast model. The steps are

as follows:

MAD steps MSE steps

Calculate forecast errors Calculate forecast error Determine absolute deviations Determine squared errors

Add the absolute deviations Add the squared errors

Divide by their number Divide by their number

MAD is not in any way “mad”. It is an objective route to good forecasting. The MSE serves the same

purpose.

Sometimes the effectiveness of a model is measured in percentages. Such measures are the

absolute percentage error (APE) and the mean absolute percentage error (MAPE) (Bowerman et

al., 2005: 18).

1.3.5 Absolute percentage error (APE)

APE is the absolute error divided by the corresponding actual observation multiplied by 100.

ACTIVITY 1.15

Calculate the APE for the estimates in Activity 1.9.


To calculate the APE we need the absolute errors and the actual observations, which are

Absolute deviations (|ei|) 6 4 0 1 1 3 2

Actual observations (yi) 21 15 20 27 13 25 11

The APE is therefore

AP E i 28.5714 26.6667 0.00 3.7037 7.6923 12.00 18.1818

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

42/127

25 STA2604/1

1.3.6 Mean absolute percentage error (MAPE)

MAPE is the mean of the APEs. It is defined as

MAPE =

ni=1

AP E i

n .

ACTIVITY 1.16

Calculate the MAPE corresponding to the estimates in Activity 1.11.


To calculate the MAPE we need the APE, which are

AP E i 28.5714 26.6667 0.00 3.7037 7.6923 12.00 18.1818

We obtain

7i=1

AP E i = 96.8159.

The MAPE is therefore

MAPE = 96.8159

7 .

= 13.8308.

The intention when measuring the error is to reduce it to monitor and control to increase the accuracy

of these methods.

1.3.7 Forecasting accuracyThis section summarises the ‘errors in forecasting’ methods presented above and present them as

the level of accuracy achieved. It is important to know that forecast accuracy starts with the forecast

error. As you have seen, the forecast error is the difference between the actual value and the forecast

value for the corresponding period:

et = yt − F twhere e is the forecast error at period t, y is the actual value at period t, and F is the forecast for

period t. The summary of the methods given is given in the next table.

8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

43/127

26

Measures of aggregate error:

Mean Absolute Deviation (MAD) MAD = |et|

n

Mean Absolute Percentage Error (MAPE) MAPE =

etyt

n

Mean squared error (MSE) MSE =

e2t n

Root Mean squared error (RMSE) RMSE =

e2t

n

Please note that business forecasters and practitioners sometimes use different terminology in the

industry. They refer to the PMAD as the MAPE, although they compute this volume weighted MAPE.

Please stick to the textbook notation.

1.4 Choosing a forecasting technique

We have to le

sta2604_2012_-_studyguide_-001_2012_4_b

Documents