sta2604_2012_-_studyguide_-001_2012_4_b
TRANSCRIPT
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
1/127
STA2604/1
Department of Statistics
STA2604
Forecasting
Study guide for STA2604
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
2/127
i STA2604/1
Table of contents
UNIT 1: An Introduction to Forecasting
1.1 Introduction 1
1.1.1 Forecasting 2
1.1.2 Data 4
1.1.3 Components of a time series 10
1.1.4 Applications of forecasting 14
1.2 Forecasting methods 15
1.2.1 Qualitative methods 15
1.2.2 Quantitative methods 16
1.3 Errors in forecasting and forecast accuracy 18
1.3.1 Absolute deviation 22
1.3.2 Mean absolute deviation 22
1.3.3 Squared error 23
1.3.4 Mean squared error 23
1.3.5 Absolute percentage error (APE) 24
1.3.6 Mean absolute percentage error (MAPE) 25
1.3.7 Forecasting accuracy 25
1.4 Choosing a forecasting technique 26
1.4.1 Factors to consider 26
1.4.2 Strike the balance 28
1.5 An overview of quantitative forecasting techniques 29
1.6 Conclusion 30
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
3/127
ii
UNIT 2: Model Building and Residual Analysis
2.1 Introduction 31
2.2 Multicollinearity 33
2.2.1 Clarification of multicollinearity 33
2.2.2 The variation inflation factor (VIF) 34
2.2.3 Comparing regression models 38
2.3 Basic residual analysis 41
2.3.1 Residual plots 422.3.2 Constant variation assumption 43
2.3.3 Correct functional form assumption 45
2.3.4 Normality assumption 45
2.3.5 Independence assumption 47
2.3.6 Remedy for violations of assumptions 47
2.4 Outliers and influential observations 48
2.4.1 Leverage values 49
2.4.2 Residuals 50
2.4.3 Studentised residuals 52
2.4.3.1 Deleted residuals 53
2.4.4 Cook’s distance 54
2.4.5 Dealing with outliers and influential observations 54
2.5 Conclusion 54
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
4/127
iii STA2604/1
UNIT 3: Time Series Regression
3.1 Introduction 56
3.2 Modeling trend by using polynomial functions 57
3.2.1 No trend 58
3.2.2 Linear trend 58
3.2.3 Quadratic and higher order polynomial trend 59
3.3 Detecting autocorrelation 64
3.3.1 Residual plot inspection 643.3.2 First-order autocorrelation 66
3.3.2.1 Durbin-Watson test for positive autocorrelation 67
3.3.2.2 Durbin-Watson test for negative autocorrelation 69
3.3.2.3 Durbin-Watson test for autocorrelation 70
3.4 Seasonal variation types 71
3.4.1 Constant and increasing seasonal variation 75
3.5 Use of dummy variables and trigonometric function 76
3.5.1 Time series with constant seasonal variation 76
3.5.2 Use of dummy variables 77
3.5.3 High season and low season 78
3.5.4 Use of trigonometric on a model with a linear trend 82
3.6 Growth curve models 83
3.7 AR(1) and AR(p) 84
3.8 Use of trend and seasonality and forecast development 84
3.9 Conclusion 85
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
5/127
iv
UNIT 4: Decomposition of a Time Series
4.1 Introduction 86
4.2 Multiplicative decomposition 87
4.2.1 Trend analysis 87
4.2.2 Seasonal analysis 89
4.2.3 Analysis of random variations in a time series 91
4.2.4 Obtaining a forecast 91
4.3 Additive decomposition 94
4.5 Conclusion 95
UNIT 5: Exponential Smoothing
5.1 Introduction 96
5.2 Simple exponential smoothing 97
5.3 Tracking signals 101
5.4 Holt’s trend corrected exponential smoothing 103
5.5 Holt-Winters methods 105
5.5.1 Additive Holt-Winters method 105
5.5.2 Multiplicative Holt-Winters method 108
5.6 Damped trend exponential 109
5.7 Conclusion 110
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
6/127
v STA2604/1
ABOUT THIS MODULEPrologue
Forecasting is the process of making statements about events whose actual outcomes (typically)
have not yet been observed. A commonplace example might be estimation of the expected value for
some variable of interest at some specified future date. Prediction is similar, but more general term.
Both might refer to formal statistical methods employing time series, cross sectional or longitudinal
data, or alternatively to less formal judgemental methods. More will be seen at various parts of the
presentation of the module.
The module is about Forecasting, which deals with the methods used to predict the future, i.e. to
forecast. Can you think of a situation where predictions of the future are needed or cases where
forecasting is done? By its nature it is a quantitative method that uses numeric data. There arevarious forecasting methods, some of them being qualitative because they are based on non-numeric
data. Even though qualitative methods feature in some of our discussions, they are not dealt with in
depth in this module.
This module presents fundamental aspects of Time Series analysis used in forecasting. The
prescribed textbook for this module is Bowerman, O’Connell and Koehler (2005). We will not study
all the chapters in the book for this module, but will focus on Chapters 1, 5, 6, 7 and 8.
The module is done in one semester. Make sure that you are registered for the right semester and
the material you receive is the correct one.
About the book
The prescribed book is reader-friendly and contains limited mathematical theory. It is geared towards
the practice of forecasting. The authors are experienced practitioners in the field of time series. The
book will assist you in understanding concepts and methodology, and in applying these in practice
(i.e. in real-life situations).
The computer and the calculator
We recommend that you acquire a non-programmable scientific calculator of your own. It is
imperative to have your own calculator in the examination. It is important, although not compulsory,
to have access to a computer in order to undertake the tasks in this module. You may visit a Regional
Centre to use a computer. The text contains output from Excel, MINITAB, JMP IN and SAS. However,
we encourage the use of any software to which you may have access. The above list of computer
software/packages may be used, as well as R, SPSS, Stata, S-Plus and EViews. Your ability to usesuch software will increase your marketability in the workplace. You are encouraged to experiment
with the packages at your disposal.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
7/127
vi
REFERENCESThe prescribed book must be purchased. Refer to the study guide regularly. We shall also refer to a
number of user-friendly textbooks on Time Series that are available in the Unisa library. You do not
need to buy the recommended books for this module.
PRESCRIBED BOOKBowerman, B. L., O’Connell, R. T. & Koehler, A. B. (2005) Forecasting, time series and regression:
an applied approach, 4th edition. Singapore: Thomson Brooks/Cole.
ADDITIONAL USEFUL BOOKS FOR THIS MODULECrosby, J. V. (2000). Cycles, trends, and turning points: practical marketing and sales forecasting
techniques. Lincolnwood, IL: NTC Business Books.
Chapter 4 of this book deals speci fi cally with Time Series, while chapters 1, 2, 3, 7, 10 and 20 deal
with other topics that are very relevant in this module. The remaining chapters illustrate applications
that may expose you even more to time series. It is useful.
Curwin, J. & Slater, R. (2002). Quantitative methods for business decisions (Chapter 14). London:
Thomson Learning.
This book also presents measures that we use in statistics and in time series applications. It can be
used for other modules as well. Find time to read it.
Dexter, B. (1996). Business mathematics (Chapter 15). London: Macdonald and Evans.
Only chapter 15 presents Time Series, and in not more than 12 pages. “Production planning and
forecasting” are presented in Chapter 4 of this book to expose you to real-life applications. I seriously
advise you to look at these two chapters.
Hair, J. R., Anderson, R. E., Tatham, R. L. & Black, W. C. (1998). Multivariate data analysis, 5th
edition. Prentice-Hall, Inc.
Appendix 4A of this book presents some distance measures that are useful in this module. Cook’s
distance is presented on pages 225 and 234 of this appendix. You are urged to read them. This
book is very useful in exposing various applications of multivariate statistics. Read and enjoy it.
Kendall, M. G. (1990). Time series, 3rd edition. London: Edward Arnold.
Simply the best! Kendall exposes us to time series. His is one of the greatest names remembered
when Time Series are mentioned. Even his previous editions still present good information about the
topic. Why not cash in on time series from the horse’s mouth!
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
8/127
vii STA2604/1
THE PRESENTATION OF THE MODULEThis study guide summarises the five prescribed chapters of the textbook.
Prior knowledge
It is important that you are familiar with a section before moving to the next one. This will serve
as a foundation for the forthcoming work. Leaving out work without understanding it can only add
to the accumulation of problems during the examination. This is also true about the prerequisites
from first-year statistics and the knowledge you have acquired through the years. Sensible or smart
application is based on the use of the accumulated techniques, experiences and knowledge. Plotting
of graphs, fitting a linear model, and so on, are needed in some places. You are urged, therefore,
to incorporate all the useful techniques in the solutions to exercises. We advise you to revisit these
topics in your first-year module.
It is necessary to realise that numbers alone do not provide all the answers. It should be clear to
you that aspects of a qualitative nature add value to the predictions made so that the data context is
clear.
This study guide
In this study guide we attempt to present explanations of the concepts in the textbook. It contains
easy examples as well as activities for you to practise. You are encouraged to do the activities
in order to learn effectively. Reading of feedback alone leaves gaps in your learning. There arediscussions following the activities so that the feedback is immediate. Do not just read through them;
try to explore them by testing that you can do them as well, even if you use alternative methods.
The exercises selected for assignments are important in reinforcing what you need to understand in
this module. Take time to understand the aspects that go with them. Analyse the postulates in the
given statements and thereafter the requirements so that it becomes easy to recall what is necessary
in compiling a solution. In that way you do not only solve the problem, you understand it and enjoy
solving it. At the end of the semester there is a two-hour closed-book examination. The discussions
in the study guide and the textbook prepare you for that examination.
This study guide is prepared to guide you through the prescribed book. Therefore, we will always
use it together with the prescribed book. Read them together. The textbook presents the concepts,
study guide attempts to bring the concepts closer to you.
Each study unit starts with the outcomes in order to show you what you need to know and to evaluate
yourself. The table of outcomes also gives each outcome together with the way the outcome will
be assessed, the content needed for that outcome, the activities that will be used to support the
understanding of the content and the way feedback will be given. Your input in the form of positivecriticism to improve the presentation will be of importance in the review of this study guide. You are
therefore encouraged to suggest ways that you believe can improve the presentation of this module.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
9/127
viii
Module position in the curriculum
We have been offering a postgraduate module on Time Series at Unisa, but have become aware of
the need to introduce the module at undergraduate level due to its necessity in the workplace and in
order to fill the gap that is evident when students attempt the postgraduate time series module.
This module is part of the whole Statistics curriculum at Unisa. Its position on the curriculum structure
is as follows:
1st year STA1501 STA1502 STA1503
2nd year STA2601 STA2602 STA2603
STA2604FORECASTING
We are hereSTA2610
3rd year STA3701 STA3702 STA3703 STA3704 STA3705 STA3710
You should already be familiar with some of the modules mentioned above. Knowledge from
STA2604 will help you in STA3704 (Forecasting III).
ASSIGNMENTSThere are two assignments for this module, which are intended to help you learn through various
activities. They also serve as tests to prepare you for the examination. As you do the assignments,
study the reading texts, consult other resources, discuss the work with fellow students or tutors or
do research, you are actively engaged in learning. Looking at the assessment criteria given for
each assignment will help you to understand what is required of you more. The two assignments
per semester prescribed for this module form part of the learning process. The typical assignment
question is a reflection of a typical examination question. There are fixed submission dates for the
assignments and each assignment is based on specific chapters (or sections) in the prescribed book.
You have to adhere to these dates as assignments are only marked if they are received on or before
the due dates.
• Both assignments are compulsory as
• they are the sole contributors towards your year mark and
• they form an integrated part of the learning process and indicate the form and nature of the
questions you can expect in the examination.
Please note that the submission of assignment 01 is the guarantee for examination entire . If you
do not submit assignment 01, UNISA not the Department of Statistics will deny you examination
entry.
You are urged to communicate with your lecturer(s) whenever you encounter dif ficulties in this
module. Do not wait until the assignment due date or the examination to make contact with lecturers.It is helpful to be ready long in advance. You are also encouraged to work with your own peers,
colleagues, friends, etc. Details about the assignments will be given Tutorial letter 101.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
10/127
ix STA2604/1
Time series has its own useful terminology that should be understood. In order to familiarise yourself
with it, let us start with an easy activity. Activities help in the creation of a mind map of the module.
The more you attempt these activities, the better you will understand the work.
GLOSSARY OF TERMSACTIVITY 0.1
(a) Make a list of all the concepts that are printed in bold type in Chapters 1, 5, 6, 7 and 8 of the
prescribed book. They serve as your glossary.
(b) Attempt meanings of these concepts before you deal with the various sections so that you have
an idea before we get there.
DISCUSSION OF ACTIVITY 0.1
(a) There is a missing concept/term among the ones you listed, which is absolutely fundamental. It
appears with other terms or phrases. The term is “data”. You came across the term many times
when you studied other modules and in some other contexts. It is emphasised that it is a useful
aspect in forecasting. If you do not have data, you will not be able to make forecasts.
(b) Do not worry if the meanings you gave do not match the content in the tutorial letter or textbook.
The intention was to make you aware of aspects on which to focus in your learning.What isrequired from you is a step-by-step journey through the prescribed material.
ACTIVITY 0.2
What is the meaning of the word data?
DISCUSSION OF ACTIVITY 0.2
There is a general misconception that data and information are the same concepts. This is not
necessarily the case. Data are records of occurrences from which we obtain information. It is not
necessarily information on its own, but may sometimes be information. The truth is, data possess
information that is seen after some analysis. They are often the raw answers we receive from an
investigation.
WHAT TO EXPECT IN THE MODULEIn this module we use a scientific calculator to perform calculations. We will also draw graphs, form
mathematical models (equations) that are used to develop forecasts and make decisions based on
time series data. Most of these aspects stated were taught at first-year level. The new topic is thepattern of time series data. The way time series data appear is unique because without this form
they cannot qualify to be time series data.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
11/127
x
PREREQUISITES
• The ability to use a scientific calculator.
• Access to a computer package and the ability to use it are highly recommended.
• First-year statistics. These topics appear below and there will be a quick reminder whenever we
need them. We will need
- Simple linear regression
- Correlation measures
- Polynomials
- Graph plotting
When you draw plots required for statistical analysis, these plots should be accurate. Hence, use
a ruler and a lead pencil (not a pen) to construct plots. If you have access to a computer, you are
also encouraged to practise using any statistical package of your choice. Assignments may also be
prepared by means of a computer. Just make sure that you use the correct notation. Avoid using a
computer if you cannot write the correct notation. Remember that you are always welcome to contact
the lecturers whenever you have problems with any aspect of the module.
OUTCOMES
At the end of the module you should be able to do the following:
• Define and apply components of time series.
• Apply time series methods to develop forecasts.
• Specify a prototype forecast model, estimate its parameters and then validate it.
• Use the specified model to derive forecasts.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
12/127
xi STA2604/1
TABLE OF OUTCOMES
Outcomes - At the
end of the moduleyou should beable to
Assessment Content Activities Feedback
- explain and expose
time seriescomponents
- analyse data
- plot graphs
- trend
- seasonality- cycles- irregularity
- examine data
visually- plot graphs
- discuss
likelyerrors
- select a model - balancefactors
- choosing atechnique
- analyse errors- plot graphs
- scrutinisemodels
- develop a model - forming anequation
- regression- exponentialsmoothing
- small build-upexercises
- emphasiseaptness
- estimate parameters - perform
estimations
- estimation
methods
- perform
calculations
- discuss
alternatives
- validate a model - statisticaltests
- hypothesistesting
- test hypotheses - peruse thevarious tests
- develop forecasts - demonstratepatterns
- modelbuilding
- form equations - visit variousalternatives
You will know that you understand this module once you understand the above issues.
Feedback is not just a follow-up of the preceding concepts. It is an opportunity to reinforce some
concepts and revise others. Make use of this opportunity. Feedback is given after every activity,
sometimes with some discussion after the activity, but in many instances, it follows immediately after
the activity.
OVERVIEWTwo of the five study units comprising this module are presented in this study guide.
Unit 1: Narration of the forecasting domain and support elements
(Chapter 1 of Bowerman et al.)
In this unit we will learn more about
• Situations requiring forecasts and forecasting
• Issues about useful data and use of data in developing forecasts
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
13/127
xii
• Basic types of data and approaches (quantitative and qualitative methods)
• Errors, problems and pitfalls in forecasting, as well as depiction of good forecasts
• Factors useful in choosing a forecast technique
• More about quantitative methods
Do the above issues raise some response from you? Do you have any idea of what they mean or
imply? Think and chat with your colleagues, peers or family members. Remember that learning
becomes real and effective only when sharing is involved.
Unit 2: Building a forecast model and examining / verifying its strength
(Chapter 5 of Bowerman et al.)
In this study unit we will learn about
• Multicollinearity of variables:
- Variance inflaction factors
- R2
- adjusted R2
- standard error
- interval length
- C-statistic
• Residual analysis:
- residual plots
- the constant variance assumption
- assumption of correct functional form
- normality assumption
- the independence assumption
• Outliers and influential observations:
- outliers
- influential data
- diagnostic methods to detect outliers and influential observations
- leverage points
- residuals
- Cook’s distance measure
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
14/127
xiii STA2604/1
The measures dealt in with this Unit ensure that the model built for use in forecasting has desirable
properties of limited error and is influenced to the minimum, if at all it is influenced. Also, it is
necessary to make a distinction between outliers and seasonal variations. Sometimes a mistake is
made with an effect of seasonality being misinterpreted as an outlier.
We hope you have come across some of the concepts or issues above. Discuss these with your
colleagues, peers, friends or family members.
DIFFICULTIES IN FORECASTING TECHNOLOGY
Nearly all futurists describe the past as unchangeable, consisting as a collection of knowable facts.
We generally perceive the existence of only one past. When two people give conflicting stories of
the past, we tend to believe that one of them must be lying or mistaken.
This widely accepted view of the past might not be correct. Historians often interject their own beliefs
and biases when they write about the past. Facts become distorted and altered over time. It may
be that past is a reflection of our current conceptual reference. In the most extreme viewpoint, the
concept of time itself comes into question.
The future, on the other hand, is filled will uncertainty. Facts give way to opinions. The facts of the
past provide the raw materials from which the mind makes estimates of the future. All forecasts areopinions of the future (some more carefully formulated than others). The act of making a forecast is
the expression of an opinion. The future consists of a range of possible future phenomena or events.
DEFINING A USEFUL FORECAST
The usefulness of a forecast is not something that lends itself readily to quantification along any
specific dimension (such as accuracy). It involves complex relationships between many things,
including the type of information being forecast, our confidence in the accuracy of the forecast, the
magnitude of our dissatisfaction with the forecast, and the versatility of ways that we can adapt to or
modify the forecast. In other words, the usefulness of a forecast is an application sensitive construct.
Each forecasting situation must be evaluated individually regarding its usefulness.
One of the first rules is to consider how the forecast results will be used. It is important to consider
who the readers of the final report will be during the initial planning stages of a project. It is wasteful
to apply resources on an analysis that has little or no use. The same rule applies to forecasting. We
must strive to develop forecasts that are of maximum usefulness to planners. This means that each
situation must be evaluated individually as to the methodology and type of forecasts that are mostappropriate to the particular application.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
15/127
xiv
FORECASTS CREATE THE FUTURE
Often the way we contemplate the future is an expression of our desire to create that future.
Arguments are that the future is invented, not predicted. The implication is that the future is an
expression of our present thoughts. The idea that we create our own reality is not a new concept. It
is easy to imagine how thoughts might translate into actions that affect the future.
Forecasting can, and often does, contribute to the creation of the future, but it is clear that other
factors are also operating. A holographic theory would stress the interconnectedness of all elements
in the system. At some level, everything contributes to the creation of the future. The degree to
which a forecast can shape the future (or our perception of the future) has yet to be determined
experimentally and experientially.
Sometimes forecasts become part of a creative process, and sometimes they do not. When two
people make mutually exclusive forecasts, both of them cannot be true. At least one forecast is
wrong. Does one person’s forecast create the future, and the other does not? The mechanisms
involved in the construction of the future are not well understood on an individual or social level.
ETHICS IN FORECASTING
Are predictions of the future a form of propaganda, designed to evoke a particular set of behaviours?Note that the desire for control is implicit in all forecasts. Decisions made today are based on
forecasts, which may or may not come to pass. The forecast is a way to control today’s decisions.
The purpose of forecasting is to control the present. In fact, one of the assumptions of forecasting
is that the forecasts will be used by policy-makers to make decisions. It is therefore important to
discuss the ethics of forecasting. Since forecasts can and often do take on a creative role, no one
has the absolute right to make forecasts that involve other peoples futures.
Nearly everyone would agree that we have the right to create our own future. Goal setting is a form
of personal forecasting. It is one way to organize and invent our personal future. Each person has
the right to create their own future. On the other hand, a social forecast might alter the course of an
entire society. Such power can only be accompanied by equivalent responsibility.
There are no clear rules involving the ethics of forecasting. Value impact is important in forecasting,
i.e. the idea that social forecasting must involve physical, cultural and societal values. However,
forecasters cannot leave their own personal biases out of the forecasting process. Even the most
mathematically rigorous techniques involve judgmental inputs that can dramatically alter the forecast.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
16/127
xv STA2604/1
Many futurists have pointed out our obligation to create socially desirable futures. Unfortunately, a
socially desirable future for one person might be another person’s nightmare. For example, modern
ecological theory says that we should think of our planet in terms of sustainable futures. The finite
supply of natural resources forces us to reconsider the desirability of unlimited growth. An optimistic
forecast is that we achieve and maintain an ecologically balanced future. That same forecast, the
idea of zero growth, is a catastrophic nightmare for the corporate and financial institutions of the free
world. The system of profit depends on continual growth for the well-being of individuals, groups,
and institutions.
‘Desirable futures’ is a subjective concept. It can only be understood relative to other information.
The ethics of forecasting certainly involves the obligation to create desirable futures for the person(s)
that might be affected by the forecast. If a goal of forecasting is to create desirable futures, then theforecaster must ask the ethical question of “desirable for whom?”.
To embrace the idea of liberty is to recognise that each person has the right to create their own
future. Forecasters can promote libertarian beliefs by empowering people that might be affected by
the forecast. Involving these people in the forecasting process, gives them the power to become
co-creators in their futures.
BENEFITS OF FORECASTING
Forecasting can help you make the right decisions, and earn/save money. Here are a few examples.
• Define better sale strategies
If a product is declining, maybe it is a good idea to consider stop producing it. But maybe not:
maybe it is just your sales that are declining, but not your competitor’s?
In this case, is there a chance that you can get your market share back?
Forecasting techniques provide answers to these questions – vital questions to your business.
• Size your inventories optimally
Time is money. Room is money. So what you want to do is use all means at your disposal in order
to reduce your stocks – without experiencing any shortages, of course.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
17/127
xvi
How? By forecasting!
Forecasting is designed to help decision making and planning in the present. Forecasts empower
people because their use implies that we can modify variables now to alter (or be prepared for)
the future. A prediction is an invitation to introduce change into a system. There are several
assumptions about forecasting:
• There is no way to state what the future will be with complete certainty. Regardless of the
methods that we use there will always be an element of uncertainty until the forecast horizon
has come to pass.
• There will always be blind spots in forecasts. We cannot, for example, forecast completely new
technologies for which there are no existing paradigms.
• Providing forecasts to policy-makers will help them formulate social policy. The new socialpolicy, in turn, will affect the future, thus changing the accuracy of the forecast.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
18/127
1 STA2604/1
STUDY UNIT 1: An Introduction to Forecasting
1.1 Introduction
Table of outcomes for the study unit
Outcomes - At the endof the module you
should be able toAssessment Content Activities Feedback
- define time seriesterms
- data plots andmeasures
- time seriesword list
- experimentwith data
- discuss eachactivity
- decompose time
series
- graph, visual - time series
components
- plot graphs - critique the
graphs
- calculate time seriesmeasures
- stepwiseexercises
- errors inforecasting
- variouscalculations
If you understand the above outcomes, it will be an indication that you understand this study unit. It
is based on Chapter 1 of the prescribed book.
Forecasting is the scientific process of estimation some aspects of the future in usually unknown
situations. Prediction is a similar, but is more general term. Both can refer to estimation of time
series, cross-sectional or longitudinal data. Usage can differ between areas of application: for
example in hydrology, the terms "forecast" and "forecasting" are sometimes reserved for estimates of
values at certain specific future times, while the term "prediction" is used for more general estimates,
such as the number of times floods will occur over a long period. It is essential that one notes
the emphasis that in this module, forecasting also envelops that it is scientific. This is to ensure
that we do not consider subjective predictions and spiritual prophecies as part of our scope for this
forecasting module. Risk and uncertainty are central to forecasting and prediction. Forecasting
is used in the practice of Customer Demand Planning in every day business forecasting for
manufacturing companies. The discipline of demand planning, also sometimes referred to as supply
chain forecasting, embraces both statistical forecasting and a consensus process. Forecasting is
commonly used in discussion of time-series data. In this module the terms are fairly straightforward
from the prescribed book.
Forecasting has application in many situations:
• Supply chain management - Forecasting can be used in Supply Chain Management to make sure
that the right product is at the right place at the right time. Accurate forecasting will help retailers
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
19/127
2
reduce excess inventory and therefore increase profit margin. Accurate forecasting will also help
them meet consumer demand.
• Weather forecasting, Flood forecasting, and Metereology
• Transport planning and Transport forecasting
• Economic forecasting
• Egain forecasting
• Technology forecasting
• Earthquake forecasting
• Land use forecasting
• Product forecasting
• Player and team performance in sports
• Telecommunications forecasting
• Political forecasting
• Sales forecasting
ACTIVITY 1.1
Consider the terms “forecasting”, “cross-sectional data” and “time series”, which are the main focus
of this study unit.
(a) Attempt to define these terms.
(b) Check the definitions in the book and compare your answers in (a).
Before we discuss the above activity, start by reading slowly through the following discussion. Make
sure you follow the discussion.
1.1.1 Forecasting
Study section 1.1 on page 2 up to the second bullet on page 3.
The few people with whom we discussed the term “forecasting”seemed to have an understanding
of the concept only “in a nutshell”. Many of them made reference to the weather forecast that
was presented on radio, television and the internet. A gap existed in the main understanding of
forecasting.
Various backgrounds exist that show that at every point in time when people lived, they were always
interested in the future. There are stories from history that inform us that when people dreamed,
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
20/127
3 STA2604/1
there were experts to explain the meanings of these dreams in terms of the future. When signs of
future drought arose, the implications of the drought were noted and plans were made to offsets the
impacts that were anticipated. Drought led to hunger. Thus, when predictions were made that there
was drought coming, preparations were made that at the time of the drought, there would be enough
food for every member of the community during the duration of the drought. Predicting the future
even as it was done during those days can be referred to as forecasting. The predicted future was
then used to plan for the future as explained above.
Modern practice has encouraged that the "anticipation of the furture" practice be conceptualised.
It was then formally termed “forecasting”. The current approaches are scientific in order to ensure
that forecasting is practised systematically. The predictions made are now called forecasts. In other
terms, forecasts are future expectations based on scientific guidelines.
DISCUSSION OF ACTIVITY 1.1
The first term we listed in Activity 1.1 was “forecasting”. Did you get that? The term forecasting is a
“natural” operation. We have always done it, sometimes unconsciously. As was explained, predicting
activities has always been practised, even in ancient times. For self-evaluation in terms of the time
series concept, did you define the term forecasting in line with “predicting the future”?
Forecasting indicates more or less what to expect in the future. Once the future is known, preparation
for equitable allocation of resources can be made. Wastages can thus be reduced or eliminated and
gains can be enhanced (or increased).
FURTHER DISCUSSION ON FORECASTING
Forecasting is applied in various real-life situations. Six examples of applications are listed on pages
2 and 3 of the prescribed book. We are close to them at different levels. But what about something
that we as students of the University of South Africa can appreciate?
The number of student enrolments at Unisa is the starting point. The trend pattern will give an
indication of whether there has been a decline or growth in the student numbers over the years. If
you are observant, you will realise that there has been an increase in student numbers over the past
few years. Our “forecast” for next year (2013) is that there will be more students than in 2012.
ACTIVITY 1.2
Weather forecasting was mentioned as a known example where forecasting is used abundantly.
There are many others.
(a) Provide an easy example of a situation where forecasting is needed.
(b) Attempt to explain the details of the example you provided in (a).
DISCUSSION OF ACTIVITY 1.2
We discussed the Unisa example. If you are interested in Southern African politics and elections you
will be interested in making predictions about political parties that are going to be in the forefront in
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
21/127
4
the next election. We might anticipate extreme growth of one party (MDC) and decline of others in
Zimbabwe, based on the trends in the previous elections and developments that prevail. Therefore,
(a) one can for example predict how the political parties will perform in the next election; and
(b) recent performance of the various parties in previous elections may be revisited and analysed,
the current activities of the parties may be analysed closely and one may interact with people to
determine their impressions about various parties.
N.B.: Here we assume normal election conditions where no intimidation and harassments take place.
1.1.2 Data
For this topic you need to study from the middle paragraph of page 3 to the end of page 4.
Data are important for forecasting. Quality data, which loosely refer to reliable and valid data, are the
ones needed for forecasting. We may be misled if we use data of poor quality because results are
likely to be poor as well, even if best methods are used by a proficient analyst. The term data refers
to groups of information that represent the qualitative or quantitative attributes of a variable or set of
variables. Data (plural of "datum", which is seldomly used) are typically the results of measurements
and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed
as the lowest level of abstraction from which information and knowledge are derived. Raw data refers
to a collection of numbers, characters, images or other outputs from devices that collect information
to convert physical quantities into symbols, that are unprocessed.
Without data there will not be forecasting. However, it is important that data be correct (reliable, valid,
realistic, etc). Data need to be both valid for the exercise, and be reliable. If one of these is missed,
then be warned that your forecasts may mislead you or any user. Also, collection of data may
be inadequate to help in supporting the reasoning behind some findings. Experience shows that
when data are collected under certain contexts, explanations and contexts become clearer when
findings are associated with those contexts. Thus, if you assist in data collection of time series or
any statistical data, whenever possible, advise on the inclusion of details of the occurrences of the
data. Giving details around happenings assists in reducing the extent of making assumptions which
may sometimes be incorrect.
The type of information used in forecasting determines the quality of the forecasts. Not all of us like
boxing, but let us discuss the next scenario. Imagine that two boxers were going to fight on the next
Saturday. We were required to make a prediction in order to win a million rand competition. Many
participants looked at the past records of these boxers. They were informed that in the previousseven years boxer Kangaroo Gumbu had won 25 out of 27 fights while boxer Boetie Blood had won
22 of the 30 fights he had in the same period. Gumbu was known for winning well while Blood had
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
22/127
5 STA2604/1
lost dismally in a recent fight. Let us pause and enjoy the predictions (forecasts) made, just to make
a good point..
ACTIVITY 1.3
Either as a person interested in boxing or someone hoping to win the money, you may be tempted to
take a chance at the answer. Make a prediction of the outcome of the fight based on the explanation
given.
DISCUSSION OF ACTIVITY 1.3
Let us determine the odds as statisticians. Using frequencies, Gumbu had a probability of 0.93 of
winning the fight while Blood had probability of 0.73 of winning the fight. On the basis of these odds,
many participants predicted that Gumbu was going to win.
Do you know how the probabilities 0.93 and 0.73 have been obtained? If it is not clear, divide the
number of successes (wins) of each boxer by the total number of fights that each boxer had fought.
The data given were based on certain assumptions. Among others, there was the impression that
the opponents of the two boxers were of the same quality. If they were not, then the prediction would
be carrying some “inaccuracies”. Among other omissions, we were not told that the boxing bout
was going to be held in the catchweight division, where boxers came from different weight divisions
and could not both fall within a single previously defined weight division. Blood had fought only
world-class opponents and came from two weight divisions heavier than the weight to which Gumbu
belonged. That is, there was a difference between the original weights of the two boxers. Gumbu,
on the other hand, was a boxer who talked too much. He had fought some mediocre opponents and
wanted to pretend he was an excellent boxer. He had asked for the fight. In insisting on the fight,
he had called Blood a coward until the bout was sanctioned. At the time he was preparing for an
elimination bout in his weight division after which he was going to fight for a world title if he won.
The planned elimination bout was probably going to be the first real test for Gumbu as a professional
fighter. It was going to come “after I am done with Blood,” boasted Gumbu.
In the street some people were predicting that Gumbu was going to lose, but they did not bet as
money was required. None of those who paid to enter the competition predicted correctly. The fight
ended with a first-round knockout. Blood was the winner. Gumbu was no match.
DISCUSSION OF THE BOXING SCENARIO
The records given were correct, but not complete. Records are past data. We need complete
data and the exact context in which they occurred in order to be able to make accurate forecasts.The analyses that were made about the boxers were correct, but some assumptions were wrong.
Assumptions are used to build cases, and methods are developed on conditions that are given as
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
23/127
6
assumptions. Wrong assumptions may lead to inappropriate methods for data analysis. In cases
where information can be found to limit the use of assumptions, this should be done. However, many
cases provide inadequate information, leaving us with no choice but to depend on assumptions.
Analysis should depend on reasonable assumptions. If in actual practice assumptions are made
for the sake of doing something, decisions and results reached may lead to improper actions. The
analyst should learn the art of making appropriate or reasonable assumptions.
In the case of the example/scenario given, the details were missing, such as that the two boxers were
of different weights. If we knew, this would have helped in our analysis. Sometimes in predicting
about forthcoming games, one needs to also know the quality of opposition that the two opponents
have met in the accumulation of their records. This was also missing in the example. We will insist
on use of the valid assumptions because as we saw, wrong or invalid assumptions are likely to giveinaccurate predictions. The paragraph after the last bullet of the prescribed book on page 3 explains
possible repercussions that come with the wrong assumptions (Bowerman, 2005: 3).
Types of data that are common in real life are cross-sectional data and time series data. Study
the definition of cross-sectional data in the rectangle on page 3. Cross-sectional data refers to data
collected by observing many subjects (such as individuals, firms or countries/regions) at the same
point of time, or without regard to differences in time. Analysis of cross-sectional data usually consists
of comparing the differences among the subjects. For example, we want to measure current obesity
levels in a population. We could draw a sample of 1,000 people randomly from that population (also
known as a cross section of that population), measure their weight and height, and calculate what
percentage of that sample is categorized as obese. Even though we may analyse cross-sectional
data for quality forecasts, in this module we use time series data.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
24/127
7 STA2604/1
Study the definition of time series on page 4.
We will have to be careful when we collect time series data. If the data are listed without time
specification, then we should consider the data to be time series.
SCENARIO
Read the following scenario carefully and make notes as we will keep on referring back to it.
Suppose that Jabulani is a milk salesperson during the week, serving the Florida, Muckleneuk and
VUDEC UNISA campuses. Very fortunately for Jabulani, his milk cows increased and his market
in these campuses also increased from year to year. Jabulani’s business runs from Mondays to
Sundays. (In a time series analysis a typical question would be: what can we say about the trend
of the sales?) Asked differently: should we believe that the sales have a decreasing or increasing
trend? It will be clear later on that the sales levels differ according to days, high on some days and
low on others. The pattern of low sales or high sales on different days have an important connotation
in time series analysis. This will be discussed.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
25/127
8
ACTIVITY 1.4
You have done some first-year statistics modules/courses and some of you did mathematics modules
as well. Let us consider the following data sets and look at them quite closely.
Data set 1.1 16 14 19 26 11 24 1018 15 21 24 12 21 921 15 20 27 13 25 1124 17 24 31 14 27 13
Data set 1.2 16 18 21 2414 15 15 1719 21 20 2426 24 27 31
11 12 13 1424 21 25 2710 9 11 13
(a) The two data sets have exactly the same numbers. There is something strange about their
appearances though. Compare the two data sets.
(b) Can these two data sets be classified as time series data sets? Explain.
DISCUSSION OF ACTIVITY 1.4
On whether data are time series or not
When information about the data presented is limited, there also tends to be a limited feedback from
an analysis made from them. You probably realised that the rows of data set 1.1 are the same as the
columns of data set 1.2 and vice versa. Or, in short, that the data sets are transposes of each other.
The data in their current form cannot be classified as time series data since no chronological pattern
of the time at which they were collected is given. This will become clearer as we proceed.
Discussion
The data above do not necessarily represent time series data, but it can be presented in another way
to form time series data - provided they were collected chronologically over regular time intervals.
Suppose data set 1.1 represents the sales of milk sold by Jabulani from Monday to Sunday for four
weeks. Let 1 = Monday, 2 = Tuesday, ..., 7 = Sunday as given in data set 1.3. The data sets should
therefore be presented as follows:
Data set 1.3 Litres of milk sold by Jabulani
Day1 2 3 4 5 6 7
1 16 14 19 26 11 24 10Week 2 18 15 21 24 12 21 93 21 15 20 27 13 25 114 24 17 24 31 14 27 13
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
26/127
9 STA2604/1
We emphasise that in the initial presentation there was simply no information to explain or
demonstrate the chronological sequence with respect to time and that the data were therefore not
time series data.
ACTIVITY 1.5
You are required to use graphs in addition to other methods to detect patterns in time series data.
Graphical plots reveal information visually, but cannot always be done with ease. The example
that follows, is one of the easy cases where we can draw graphical plots. Analyse the data about
Jabulani’s business by answering the following questions. Make any comments that you believe are
relevant.
(a) Are they time series data? Justify your answer.
(b) Plot the data to reveal the pattern using the following approaches:
(i) Plot the data for each week separately.
(ii) Plot the data of all the weeks in one graphical display.
(iii) Compare the shapes of the graphs.
(c) Which plot provides us with a better idea of comparison?
DISCUSSION OF ACTIVITY 1.5
The emphasis about whether data sets form time series or not, depends entirely on the form, whichis the chronological order in which the various data points should be presented. Did you answer
"yes" in question (a)? If not, what did you reveal? How did you reveal it?
(b) Graphs of the activity
(i) Graphs for separate weeks
Week 1
0
5
10
1520
25
30
1 2 3 4 5 6 7
Days
L i t r e s o f m
i l k
Week 2
0
5
10
1520
25
30
1 2 3 4 5 6 7
Days
L i t r e s o f m
i l k
Week 3
05
10
15
20
25
30
1 2 3 4 5 6 7
Days
L i t r e s o f m i l
Week 4
05
10
15
20
25
30
35
1 2 3 4 5 6 7
Days
L i t r e s o f m i l k
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
27/127
10
(ii) Graph for data of all the weeks
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7
Days
L i t r e s o f m i l k
Week 1
Week 2
Week 3
Week 4
(iii) In terms of the pattern, the graphs reveal that milk sales were highest on Thursdays, Saturdays
and Wednesdays (in order from highest to lowest). The lowest sales were revealed for
Sundays, Fridays, Tuesdays and Mondays (in the order from lowest to highest).
(c) The graphs can be dif ficult to compare when they are on separate systems of axes. The last
graph makes comparison very easy, revealing that the patterns for all four weeks are similar.
The patterns of the highest activity and lowest activity about a phenomenon are important in time
series. Jabulani will easily know when he does more business, when he does least business and he
can plan to find better ways to improve business. Let us start formalising these patterns.
1.1.3 Components of a time series
The components of a time series serve as the building blocks of a time series and describe its pattern(study p. 5-7 of textbook up to the end of section 1.2).
Components are important because they enable us to see the salient features of a structure. Through
them we can make descriptions of what we need to analyse. When we deal with something that we
can describe, we are better able to know the requirements for dealing with it. Time series also has
components that need to be considered and taken care of in their analyses.
Trend
The first component we discuss is trend. The term “trend” is about long-term decline or growth of an activity. It is defined formally as the upward and downward movements that characterise a time
series over a period of time.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
28/127
11 STA2604/1
Time series data may show upward trend or downward trend for a period of years. This may be
due to factors such as increase in population, change in technological progress, large scale shift in
consumers’ demands, and so on. For example, population increases over a period of time, price
increases over a period of years, production of goods on the capital market of the country increases
over a period of years. These are the examples of upward trend. The sales of a commodity may
decrease over a period of time because of better products coming to the market. This is an example
of declining trend or downward trend. The increase or decrease in the movements of a time series
is called trend.
Usually one would not be able to determine from looking at the data whether there is a decreasing
or increasing trend. There are times (but rarely) when we can see the pattern by inspection. Often a
graphical plot clearly shows the trend. The trend may be given in shapes such as linear, exponential,
logarithmic, polynomial, power function, quadratic, and other forms. In general, we use the graphical
displays to find out if there is a decline or increase in the activity. Some examples of trend applications
that we must look at are given on page 5 of Bowerman et al. (2005). Study them.
- Technological changes in the industry
Currently, companies increase ICT usage in their activities for competitive edge over those that do
not incorporate it. Institutions of higher learning have aggressively incorporated ICT in facilitating
learning, especially the distance education ones.
- Changes in consumer tastes
Housing is very expensive and scarce, but for obvious reasons remains a priority for households.
Recently, cities such as Cape Town, Durban, East London, Johannesburg, Port Elizabeth and
Pretoria have experienced a high influx of people from other areas, and employment is biased
towards the youth. As a result housing in these cities is biased towards townhouses and flats.
- Increases in total population
There is an increase since there are more births than deaths. In SA, there is also an influx of
people from other countries. In other countries, natural deaths and deaths that resulted from
holocausts, wars, terrorism and natural disasters such as the tsunami and others, have resulted
in many deaths but much fewer deaths than the births that have occurred over the years. That is
why there is an increase in the world’s population.
- Market growth
In Gauteng, the market of umbrellas decreases in the period April to July. During the rainy season,
which in Gauteng happens to be the summer season, the sales of umbrellas increase.
- Inflation or deflation (price changes)
If we consider one item for simplicity, maize is produced in the period October to May,
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
29/127
12
approximately. During entry period, the price of maize is high because there are more people
looking for a less available commodity. During the periods November to January, maize is in
abundance and the prices drop. As the production level declines, the prices start increasing
again.
ACTIVITY 1.6
Discuss what a time series is, and discuss the meaning of trend effects, seasonal variations, cyclical
variations, and irregular effects.
DISCUSSION OF ACTIVITY 1.6
You should mention a sequence of observations of a variable presented in chronological form
when you describe a time series. Trend should imply a long-term tendency of that time series.Seasonality should include a periodic pattern in the data. Describing cycles should imply up and
down movements of observations around trend levels. Irregular pattern is the portion of the time
series which cannot be accounted for by the three patterns discussed above.
Exploration data set
The next data set is important for exploration. ENJOY IT. It represents the litres of milk that were
demanded from Jabulani. Whether there was stock or not is not an issue here. The data set will be
revisited time and again.
Data set 1.4 Day1 2 3 4 5 6 7
1 16 14 19 26 11 24 10Week 2 18 15 21 24 12 21 9
3 21 15 20 27 13 25 114 24 17 24 31 14 27 13
In general, methods of forecasting that depend on non-numeric information are qualitative forecasting
methods. (Do you remember this from first-year Statistics?) Qualitative data are nominal/words data.
Quantitative forecasting methods on the other hand depend on numerical data.
Bowerman et al. (2005: 7) present a graphical plot Figure 1.1 (a) to display an example of a trend
in a time series. There is no trend line to describe the trend, but can you explain whether there is a
decreasing or increasing trend in the plot to which we are referring?
Cycle
The next component of time series that we discuss is “cycle”. When trends havebeen identified, there
may be some recurring up and down movements visible around trend levels. These movements are
called cycles. Cycles occur over long and medium terms. Page 5 of Bowerman et al. (2005) presents
this component.
Some interesting explanation is presented by Bowerman et al. (2005: 5) about business cycles.
Study it in detail. Bowerman et al. (2005: 7) present Figure 1.1 (c) to display an example of a cycle
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
30/127
13 STA2604/1
in a time series. We need to note that generally, natural occurrences have shown some cyclical
patterns over the years.
The impact of cycles on a time series is either to stimulate or depress its activity, but in general,
their causes are dif ficult to identify and explain. Certain actions by institutions such as government,
trade unions, world organisations, and so on, can induce levels of pessimism and optimism into the
economy which are reflected in changes in the time series levels. Economic indices are usually used
to describe cyclical fluctuations.
Cyclical variations are recurrent upward or downward movements in a time series but the period of
cycle is greater than a year. This restriction makes it different from trend. Also, cyclical variations
are not regular as seasonal variation. There are different types of cycles of varying in length and
size. The ups and downs in business activities are the effects of cyclical variation. A business
cycle showing these oscillatory movements has to pass through four phases-prosperity, recession,
depression and recovery. In a business, these four phases are completed by passing one to another
in this order. Together, they form a cycle.
Cycles are useful in long-term forecasting. Usually it means centuries and millenniums. Our
capabilities and interest in this module do not require us to look beyond a decade. Hence, methods
for developing forecasts that include cycles (or cyclical components) are not in the scope of this
module. However, you still need to understand when cycles are discussed or implied in a forecastingsituation.
Seasonality
The example about milk is given over weekly periods. The definition given by Bowerman et al. (2005:
6) is somewhat misleading! The impression it gives is that observations being investigated, must run
over a year. This is simply not the case. Even the values occurring within a day can be seen to be
seasonal, as you will soon see. First, we provide a more useful and realistic definition of seasonality,
which will be used in the module. The one given in Bowerman et al. shall work when the periods are
over yearly periods. Let us define the concept in the next line:
Seasonal variations are systematic variations that occur within a period and which are tied to some
properties of that period. They are repeated within the period. They are indeed periodic patterns in a
time series that complete themselves within a calendar period and are repeated on the basis of that
period.
Seasonal variations are short-term fluctuations in a time series which occur periodically in a period,
such as a year. In this case it would continue to be repeated year after year. The major factors that
are responsible for the repetitive pattern of seasonal variations are weather conditions and customsof people. More woolen clothes are sold in winter than in the season of summer. Regardless of the
trend we can observe that in each year more ice creams are sold in summer and very little in winter
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
31/127
14
season. The sales in the departmental stores are more during festive seasons that in the normal
days.
Irregular fluctuations
We have not mentioned whether Jabulani was ever robbed of his revenue or stock for his business.
Now we are giving you bad news.
Irregular fluctuations are variations in time series that are short in duration, erratic in nature and
follow no regularity in the occurrence pattern. These variations are also referred to as residual
variations since by definition they represent what is left out in a time series after trend, cyclical and
seasonal variations have been accounted for. Irregular fluctuations results due to the occurrence of
unforeseen events like floods, earthquakes, wars, famines, and so on.
Remember that Jabulani was a smart entrepreneur who would make some estimations of revenue
each morning he left for work. One Tuesday afternoon after he had counted what he thought was his
revenue for the day, he was robbed by two thugs. Fortunately he was neither hurt nor discouraged
to continue with his business. It was happening for the first time. Could he have anticipated being
robbed on that day? We also could not have predicted that event.
The point is, that irregular event changed what could have been the revenue and/or profit for that
day. In time series, irregular fluctuations, which are also called irregular variations, refer to random
fluctuations that are attributed to unpredictable occurrences. Bowerman et al. (2005: 6) appropriately
define them as erratic movements in a time series that follow no recognisable or regular pattern. The
presentation about this concept simply implies that these patterns cannot be accounted for. They
are once-off events. Examples are natural disasters (such as fires, droughts, floods) or man-made
disasters (strikes, boycotts, accidents, acts of violence and so on).
Note that all the components of a time series influence the time series and can occur in any
combination. The most important problem to be solved in forecasting is trying to match the
appropriate model to the pattern of the time series data.
1.1.4 Applications of forecasting
Forecasting has application in many situations. Among others, it can be applied in:
• Supply chain management - Forecasting can be used in Supply Chain Management to make sure
that the right product is at the right place at the right time. Accurate forecasting will help retailers
reduce excess inventory and therefore increase profit margin. Accurate forecasting will also help
them meet consumer demand.
• Weather forecasting, Flood forecasting and Meteorology
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
32/127
15 STA2604/1
• Transport planning and Transportation forecasting
• Economic forecasting
• Technology forecasting
• Earthquake prediction
• Land use forecasting
• Product forecasting
• Player and team performance in sports
• Telecommunications forecasting
• Political Forecasting
1.2 Forecasting Methods
This topic is discussed on pages 7 to 12. Study these pages. On page 7 there is a reminder that
there is no single best forecasting method. There are, however, appropriate methods for any time
series situation. The forecasting methods are described along the same line as types of data that you
dealt with in your Statistics courses/modules at first year level. They are qualitative and quantitative
in nature.
1.2.1 Qualitative methods
Study this topic from page 8 to page 11.
The textbook explains on page 8 that generally, qualitative forecasting methods become an option
to develop forecasts in situations where there are no historical numeric data or where time series
trained statisticians are not available. Opinions of experts are generally used to make predictions
in such cases. Predictions are necessary in all situations, even where there is no data. When this
occurs, qualitative methods are involved.
Common examples of qualitative forecasting methods are judgemental methods. Judgmental
forecasting methods incorporate intuitive judgements, opinions and subjective probability estimates.
• Composite forecasts
• Surveys
• Delphi method
• Scenario building
• Technology forecasting
• Forecast by analogy
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
33/127
16
You do not need to learn more about these for the requirements of this module. However, you
may come across them in applications. Hence, your encounter with them may be of help in future
applications.
1.2.2 Quantitative methods
Quantitative forecasting methods are used (and only possible) when historical data that occur in
numeric form are available. These methods may occur as univariate forecasting methods or as
causal methods (Bowerman et al., 2005: 11).
Univariate forecasting methods depend only on past values of the time series to predict future
values. In this method, data patterns are identified from historical data, the assumption is made
that the patterns will continue in the future and then the pattern is extrapolated in order to develop
forecasts. Study this topic on page 11.
Causal forecasting models, start by identifying variables that are related to the one to be predicted.
This is followed by forming a statistical model that describes the relationship between these
variables and the variable to be forecasted. The common ones are regression models and ordinary
polynomials. Study this topic on page 11.
In the causal forecasting method, the variable of interest, which is the one whose forecasts are
required, depends on other variables. It is thus the dependent variable. The ones on which the
variable of interest depends are known as the independent variables.
Discussion about dependence/independence
Note that Jabulani’s customers are mostly people who received wages on a weekly basis. Some are
paid on Saturday afternoon, but an overwhelming majority is paid on Friday afternoon. In addition,
on Saturday afternoon, there is an item P that is also liked by many milk buyers. If item P is available
before milk arrives, then this item is bought in large quantities, leaving limited disposable income for
the milk purchases. Fortunately for Jabulani, he has in the past four weeks, managed to deliver milk
before item P was delivered. However, most of the buyers who are paid on Saturday tend to meet
the P seller before their milk purchases on Sunday morning.
It is necessary to understand dependencies and correlations when dealing with forecasting. If you fail
to understand them, you may fall in the trap of making wrong assumptions because influences that
may affect your forecasts and constraints coming with correlated variables may lead to developing
inaccurate models and thus leading to wrong forecasts.
Useful common examples are time series and causal methods. There are others as well, but the
following may be of help in your development.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
34/127
17 STA2604/1
Time series methods
Time series methods use historical data as the basis of estimating future outcomes.
• Rolling forecast is a projection into the future based on past performances, routinely updated
on a regular schedule to incorporate data.
• Moving average
• Exponential smoothing
• Extrapolation
• Linear prediction
• Trend estimation
• Growth curve
Causal / econometric methodsSome forecasting methods use the assumption that it is possible to identify the underlying factors
that might influence the variable that is being forecasted. For example, sales of umbrellas might
be associated with weather conditions. If the causes are understood, projections of the influencing
variables can be made and used in the forecast.
• Regression analysis using linear regression or non-linear regression
• Autoregressive moving average (ARMA)
• Autoregressive integrated moving average (ARIMA), e.g. Box-Jenkins
• Econometrics
Other methods
• Simulation
• Prediction market
• Probabilistic forecasting and ensemble forecasting
• Reference class forecasting
These methods are given to you so that when you make references from other forecasting sources,
you will be able to understand where they belong in your module. However, they are not necessarily
required to the extent that is presented in those other sources.
ACTIVITY 1.7
• Do you see any dependence of the variables?
Hint: Focus on milk purchases and disposable income.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
35/127
18
DISCUSSION OF ACTIVITY 1.7
Keeping to the hint, the purchase of an item that is in high demand depends on the availability of
disposable income.
ACTIVITY 1.8
(a) Classify the milk sales in the latest scenario as a dependent or independent variable.
(b) Explain your choice in (a) above. Here confine your response to milk purchases and disposable
income.
(c) Identify the dependent variable and the independent variable.
DISCUSSION OF ACTIVITY 1.8
Regarding (a), milk sales depend on the availability of disposable income. Hence, (b) milk sales
represent the dependent variable. This leads to (c) that sales are the dependent variable and
disposable income is the independent variable.
1.3 Errors in forecasting and forecast accuracy
When it was said that the pattern of information given, such as Jabulani’s milk sales, can help you
make future predictions, no one said your predictions would be perfect.
It is time to note that if the forecasts prepared/developed are not accurate, they may be useless since
they are probably going to mislead the user. When we insist on a scientific method in forecasting, it
was to ensure that we can monitor the methods and test the models so that the inaccuracies in them
are reduced, or ideally, eliminated.
It is important to know the likely errors when you attempt to make predictions or develop forecasts.
If you know them, you can avoid or minimise them. Error is as simple as when you thought Jabulani
was going to sell 500 litres in a specific week and he ends up selling 520 litres. (Note that you could
make an error in litres of milk by overestimating as well.)
The next sections require your learned skill of drawing graphs and interpreting them. The
most common ones you should expect to encounter (draw and interpret) are scatter diagram (or
scatterplot) and time plot. Revise them if you have already forgotten how they are drawn.
Further, you are soon going to engage in a number of calculations. Thus, ensure that you are
ready to perform them, and that you remember descriptive statistics your learnt in your early years
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
36/127
19 STA2604/1
of Statistics. It is also very important to be able to know why the calculations are necessary in any
exercise of building a forecast model.
Bowerman et al. (2005: 12) name two types of forecasts, the point forecast and the prediction
interval. A point forecast is a single number that estimates the actual observation. A prediction
interval is a range of values that gives us some confidence that the actual value is contained in the
interval.
The forecast error as defined in Bowerman et al. (2005: 13) requires that the estimate be found and
be “paired” with the actual observation.
In statistics, a forecast error is the difference between the actual or real and the predicted or forecast
value of a time series or any other phenomenon of interest. In simple cases, a forecast is compared
with an outcome at a single time-point and a summary of forecast errors is constructed over a
collection of such time-points. Here the forecast may be assessed using the difference or using
a proportional error. By convention, the error is defined using the value of the outcome minus the
value of the forecast. In other cases, a forecast may consist of predicted values over a number of
lead-times; in this case an assessment of forecast error may need to consider more general ways of
assessing the match between the time-profiles of the forecast and the outcome. If a main application
of the forecast is to predict when certain thresholds will be crossed, one possible way of assessing
the forecast is to use the timing-error—the difference in time between when the outcome crosses
the threshold and when the forecast does so. When there is interest in the maximum value being
reached, assessment of forecasts can be done using any of:
· the difference of times of the peaks;
· the difference in the peak values in the forecast and outcome;
· the difference between the peak value of the outcome and the value forecast for that time point.
Forecast error can be a calendar forecast error or a cross-sectional forecast error, when we want to
summarize the forecast error over a group of units. If we observe the average forecast error for a
time-series of forecasts for the same product or phenomenon, then we call this a calendar forecast
error or time-series forecast error. If we observe this for multiple products for the same period, then
this is a cross-sectional performance error.
To calculate the forecast errors we subtract the estimates (ŷi) from the actual observation (yi). The
difference is the forecast error. Can you tell what the values of the forecast errors imply? For
example, some may be smaller than others, some negative and others positive!
When Jabulani plans his sales, he makes some estimation of litres of milk that he hopes to sell. In
Week 3 prior to getting to the market, he had made the following estimations (ŷi):
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
37/127
20
Week Day Litres of milk estimated (ŷ)3 1 27
2 113 20
4 265 146 227 9
Remember to refer to the appropriate week of the table of Data set 1.4 for observed values (yi).
ACTIVITY 1.9
(a) On which days were there overestimation?
(b) On which days were there underestimation?
(c) Calculate the forecast errors for these estimates.
(d) Identify the day on which the milk sales were most disappointing! Explain.
(e) On which day did he make the best prediction? Why?
DISCUSSION OF ACTIVITY 1.9
We have not defined the terms overestimation and underestimation formally. They have been
defined in other modules, but we wish to make a reminder. If you make a prediction and the actualobservation turns out to be smaller, we will have overestimated. What is the sign of the forecast
error? Can you now define the term “underestimation”? What about the sign of the forecast error?
Let us get into the questions of the activity. The setup of week 3 is as follows:
Actual observations (y1) 21 15 20 27 13 25 11Estimates observations (ŷ1) 27 11 20 26 14 22 9
(a) Overestimations are visible after pairing by observing the pairs in which the actual observations
are lower than the estimates. These were on Day 1 and Day 5.
(b) Underestimations occurred on Day 2, Day 4, Day 6 and Day 7.
(c) The forecast errors are −6, 4, 0, 1, −1, 3 and 2 for the seven days, respectively.(d) Day 1 was the most disappointing. This is because Jabulani expected to sell 27 litres but only
sold 21 litres. It is the day he made the biggest loss, that is with the largest negative error.
(e) He made the best prediction on Day 3, where the sales were equal to the estimates.
If there was no day when the sales and estimates were equal, then the day with the smallest forecasterror in absolute value would have been the one on which the best prediction was made. This means
that Day 4 and Day 5 are the days on which good predictions were made. However, we note that
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
38/127
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
39/127
22
(b) The plot looks almost random. This means that the forecasting technique provides a good fit to
the data.
1.3.1 Absolute deviation
Forecast errors are used to calculate absolute deviations. The absolute deviation (Bowerman et al.,
2005: 15) requires the forecast errors in absolute terms, i.e., a matter of “how far is the estimate from
the actual observation”.
ACTIVITY 1.11
Calculate the absolute deviations for the estimates in Activity 1.9.
DISCUSSION OF ACTIVITY 1.11
The calculation is fairly straightforward. We need the forecast errors, which were calculated as
Forecast errors (e1) −6 4 0 1 −1 3 2
The absolute deviations are the absolute values of the forecast errors, which we can recall from our
high-school days. The absolute deviations are thus
Absolute deviations (|e1|) 6 4 0 1 1 3 2
1.3.2 Mean absolute deviation
The absolute deviations give us the mean absolute deviation (MAD) when we obtain their average in
the usual way. The MAD (Bowerman et al., 2005: 15) requires the following steps: take the absolute
deviations, add them, divide the sum by their number and the result in the MAD.
ACTIVITY 1.12
Calculate the MAD for the estimates in Activity 1.9.
DISCUSSION OF ACTIVITY 1.12
Absolute deviations (|ei|) 6 4 0 1 1 3 2
The MAD is therefore
M AD =
7i=1
|ei|
n
= 177
= 2.42857.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
40/127
23 STA2604/1
1.3.3 Squared error
Another way to get rid of positive and negative errors is squared errors (Bowerman et al. (2005: 15)).
ACTIVITY 1.13
Calculate the squared errors for the estimates in Activity 1.9.
DISCUSSION OF ACTIVITY 1.13
Forecast errors (ei) −6 4 0 1 −1 3 2
The squared errors are therefore
Squared errors
e2i
36 16 0 1 1 9 24
1.3.4 Mean squared error
The MSE is the average of the squared errors.
ACTIVITY 1.14
Calculate the MSE for the estimates in Activity 1.9.
DISCUSSION OF ACTIVITY 1.14
To calculate the MSE we need the squared errors, which were calculated as
Squared errors
e2i
36 16 0 1 1 9 24
The MSE is therefore
M SE =
7i=1
e2i
n
= 87
7
= 12.42857.
Now, let us pause a little. We have done a few useful calculations. We have also answered a few
questions about errors.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
41/127
24
Do you recall the value of the forecast error on the day that the estimate was perfect? Do you also
see what is meant by a poor estimate? Now can you say what is meant by a good estimate? You
will recall that the errors need to be as small as possible. So far it is not absolutely clear what small
entails.
The MAD and MSE are the measures that we will use to determine if the errors are small which will
indicate a good model. The objective is to select a good forecast model. The model that will be
selected must produce forecasts that are close to the actual observations. The MAD and the MSE
will serve as our tools to select a forecast model.
We need to understand the MAD and the MSE as they relate to the forecast model. The steps are
as follows:
MAD steps MSE steps
Calculate forecast errors Calculate forecast error Determine absolute deviations Determine squared errors
Add the absolute deviations Add the squared errors
Divide by their number Divide by their number
MAD is not in any way “mad”. It is an objective route to good forecasting. The MSE serves the same
purpose.
Sometimes the effectiveness of a model is measured in percentages. Such measures are the
absolute percentage error (APE) and the mean absolute percentage error (MAPE) (Bowerman et
al., 2005: 18).
1.3.5 Absolute percentage error (APE)
APE is the absolute error divided by the corresponding actual observation multiplied by 100.
ACTIVITY 1.15
Calculate the APE for the estimates in Activity 1.9.
DISCUSSION OF ACTIVITY 1.15
To calculate the APE we need the absolute errors and the actual observations, which are
Absolute deviations (|ei|) 6 4 0 1 1 3 2
Actual observations (yi) 21 15 20 27 13 25 11
The APE is therefore
AP E i 28.5714 26.6667 0.00 3.7037 7.6923 12.00 18.1818
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
42/127
25 STA2604/1
1.3.6 Mean absolute percentage error (MAPE)
MAPE is the mean of the APEs. It is defined as
MAPE =
ni=1
AP E i
n .
ACTIVITY 1.16
Calculate the MAPE corresponding to the estimates in Activity 1.11.
DISCUSSION OF ACTIVITY 1.16
To calculate the MAPE we need the APE, which are
AP E i 28.5714 26.6667 0.00 3.7037 7.6923 12.00 18.1818
We obtain
7i=1
AP E i = 96.8159.
The MAPE is therefore
MAPE = 96.8159
7 .
= 13.8308.
The intention when measuring the error is to reduce it to monitor and control to increase the accuracy
of these methods.
1.3.7 Forecasting accuracyThis section summarises the ‘errors in forecasting’ methods presented above and present them as
the level of accuracy achieved. It is important to know that forecast accuracy starts with the forecast
error. As you have seen, the forecast error is the difference between the actual value and the forecast
value for the corresponding period:
et = yt − F twhere e is the forecast error at period t, y is the actual value at period t, and F is the forecast for
period t. The summary of the methods given is given in the next table.
-
8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b
43/127
26
Measures of aggregate error:
Mean Absolute Deviation (MAD) MAD = |et|
n
Mean Absolute Percentage Error (MAPE) MAPE =
etyt
n
Mean squared error (MSE) MSE =
e2t n
Root Mean squared error (RMSE) RMSE =
e2t
n
Please note that business forecasters and practitioners sometimes use different terminology in the
industry. They refer to the PMAD as the MAPE, although they compute this volume weighted MAPE.
Please stick to the textbook notation.
1.4 Choosing a forecasting technique
We have to le