sta2604_2012_-_studyguide_-001_2012_4_b

Upload: shaun-neville

Post on 01-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    1/127

    STA2604/1

    Department of Statistics

    STA2604

    Forecasting

    Study guide for STA2604

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    2/127

    i STA2604/1

    Table of contents

    UNIT 1: An Introduction to Forecasting

    1.1 Introduction 1

    1.1.1 Forecasting 2

    1.1.2 Data 4

    1.1.3 Components of a time series 10

    1.1.4 Applications of forecasting 14

    1.2 Forecasting methods 15

    1.2.1 Qualitative methods 15

    1.2.2 Quantitative methods 16

    1.3 Errors in forecasting and forecast accuracy 18

    1.3.1 Absolute deviation 22

    1.3.2 Mean absolute deviation 22

    1.3.3 Squared error 23

    1.3.4 Mean squared error 23

    1.3.5 Absolute percentage error (APE) 24

    1.3.6 Mean absolute percentage error (MAPE) 25

    1.3.7 Forecasting accuracy 25

    1.4 Choosing a forecasting technique 26

    1.4.1 Factors to consider 26

    1.4.2 Strike the balance 28

    1.5 An overview of quantitative forecasting techniques 29

    1.6 Conclusion 30

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    3/127

    ii

    UNIT 2: Model Building and Residual Analysis

    2.1 Introduction 31

    2.2 Multicollinearity 33

    2.2.1 Clarification of multicollinearity 33

    2.2.2 The variation inflation factor (VIF) 34

    2.2.3 Comparing regression models 38

    2.3 Basic residual analysis 41

    2.3.1 Residual plots 422.3.2 Constant variation assumption 43

    2.3.3 Correct functional form assumption 45

    2.3.4 Normality assumption 45

    2.3.5 Independence assumption 47

    2.3.6 Remedy for violations of assumptions 47

    2.4 Outliers and influential observations 48

    2.4.1 Leverage values 49

    2.4.2 Residuals 50

    2.4.3 Studentised residuals 52

    2.4.3.1 Deleted residuals 53

    2.4.4 Cook’s distance 54

    2.4.5 Dealing with outliers and influential observations 54

    2.5 Conclusion 54

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    4/127

    iii STA2604/1

    UNIT 3: Time Series Regression

    3.1 Introduction 56

    3.2 Modeling trend by using polynomial functions 57

    3.2.1 No trend 58

    3.2.2 Linear trend 58

    3.2.3 Quadratic and higher order polynomial trend 59

    3.3 Detecting autocorrelation 64

    3.3.1 Residual plot inspection 643.3.2 First-order autocorrelation 66

    3.3.2.1 Durbin-Watson test for positive autocorrelation 67

    3.3.2.2 Durbin-Watson test for negative autocorrelation 69

    3.3.2.3 Durbin-Watson test for autocorrelation 70

    3.4 Seasonal variation types 71

    3.4.1 Constant and increasing seasonal variation 75

    3.5 Use of dummy variables and trigonometric function 76

    3.5.1 Time series with constant seasonal variation 76

    3.5.2 Use of dummy variables 77

    3.5.3 High season and low season 78

    3.5.4 Use of trigonometric on a model with a linear trend 82

    3.6 Growth curve models 83

    3.7 AR(1) and AR(p) 84

    3.8 Use of trend and seasonality and forecast development 84

    3.9 Conclusion 85

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    5/127

    iv

    UNIT 4: Decomposition of a Time Series

    4.1 Introduction 86

    4.2 Multiplicative decomposition 87

    4.2.1 Trend analysis 87

    4.2.2 Seasonal analysis 89

    4.2.3 Analysis of random variations in a time series 91

    4.2.4 Obtaining a forecast 91

    4.3 Additive decomposition 94

    4.5 Conclusion 95

    UNIT 5: Exponential Smoothing

    5.1 Introduction 96

    5.2 Simple exponential smoothing 97

    5.3 Tracking signals 101

    5.4 Holt’s trend corrected exponential smoothing 103

    5.5 Holt-Winters methods 105

    5.5.1 Additive Holt-Winters method 105

    5.5.2 Multiplicative Holt-Winters method 108

    5.6 Damped trend exponential 109

    5.7 Conclusion 110

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    6/127

    v STA2604/1

    ABOUT THIS MODULEPrologue

    Forecasting is the process of making statements about events whose actual outcomes (typically)

    have not yet been observed. A commonplace example might be estimation of the expected value for 

    some variable of interest at some specified future date. Prediction is similar, but more general term.

    Both might refer to formal statistical methods employing time series, cross sectional or longitudinal

    data, or alternatively to less formal judgemental methods. More will be seen at various parts of the

    presentation of the module.

    The module is about Forecasting, which deals with the methods used to predict the future, i.e. to

    forecast. Can you think of a situation where predictions of the future are needed or cases where

    forecasting is done? By its nature it is a quantitative method that uses numeric data. There arevarious forecasting methods, some of them being qualitative because they are based on non-numeric

    data. Even though qualitative methods feature in some of our discussions, they are not dealt with in

    depth in this module.

    This module presents fundamental aspects of Time Series analysis used in forecasting. The

    prescribed textbook for this module is Bowerman, O’Connell and Koehler (2005). We will not study

    all the chapters in the book for this module, but will focus on Chapters 1, 5, 6, 7 and 8.

    The module is done in one semester. Make sure that you are registered for the right semester and

    the material you receive is the correct one.

    About the book

    The prescribed book is reader-friendly and contains limited mathematical theory. It is geared towards

    the practice of forecasting. The authors are experienced practitioners in the  field of time series. The

    book will assist you in understanding concepts and methodology, and in applying these in practice

    (i.e. in real-life situations).

    The computer and the calculator 

    We recommend that you acquire a non-programmable scientific calculator of your own. It is

    imperative to have your own calculator in the examination. It is important, although not compulsory,

    to have access to a computer in order to undertake the tasks in this module. You may visit a Regional

    Centre to use a computer. The text contains output from Excel, MINITAB, JMP IN and SAS. However,

    we encourage the use of any software to which you may have access. The above list of computer 

    software/packages may be used, as well as R, SPSS, Stata, S-Plus and EViews. Your ability to usesuch software will increase your marketability in the workplace. You are encouraged to experiment

    with the packages at your disposal.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    7/127

    vi

    REFERENCESThe prescribed book must be purchased. Refer to the study guide regularly. We shall also refer to a

    number of user-friendly textbooks on Time Series that are available in the Unisa library. You do not

    need to buy the recommended books for this module.

    PRESCRIBED BOOKBowerman, B. L., O’Connell, R. T. & Koehler, A. B. (2005) Forecasting, time series and regression:

    an applied approach, 4th edition. Singapore: Thomson Brooks/Cole.

    ADDITIONAL USEFUL BOOKS FOR THIS MODULECrosby, J. V. (2000). Cycles, trends, and turning points: practical marketing and sales forecasting

    techniques. Lincolnwood, IL: NTC Business Books.

    Chapter 4 of this book deals speci fi cally with Time Series, while chapters 1, 2, 3, 7, 10 and 20 deal 

    with other topics that are very relevant in this module. The remaining chapters illustrate applications

    that may expose you even more to time series. It is useful.

    Curwin, J. & Slater, R. (2002). Quantitative methods for business decisions (Chapter 14). London:

    Thomson Learning.

    This book also presents measures that we use in statistics and in time series applications. It can be

    used for other modules as well. Find time to read it.

    Dexter, B. (1996). Business mathematics (Chapter 15). London: Macdonald and Evans.

    Only chapter 15 presents Time Series, and in not more than 12 pages. “Production planning and 

    forecasting” are presented in Chapter 4 of this book to expose you to real-life applications. I seriously 

    advise you to look at these two chapters.

    Hair, J. R., Anderson, R. E., Tatham, R. L. & Black, W. C. (1998). Multivariate data analysis, 5th

    edition. Prentice-Hall, Inc.

     Appendix 4A of this book presents some distance measures that are useful in this module. Cook’s

    distance is presented on pages 225 and 234 of this appendix. You are urged to read them. This

    book is very useful in exposing various applications of multivariate statistics. Read and enjoy it.

    Kendall, M. G. (1990). Time series, 3rd edition. London: Edward Arnold.

    Simply the best! Kendall exposes us to time series. His is one of the greatest names remembered 

    when Time Series are mentioned. Even his previous editions still present good information about the

    topic. Why not cash in on time series from the horse’s mouth! 

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    8/127

    vii STA2604/1

    THE PRESENTATION OF THE MODULEThis study guide summarises the  five prescribed chapters of the textbook.

    Prior knowledge

    It is important that you are familiar with a section before moving to the next one. This will serve

    as a foundation for the forthcoming work. Leaving out work without understanding it can only add

    to the accumulation of problems during the examination. This is also true about the prerequisites

    from  first-year statistics and the knowledge you have acquired through the years. Sensible or smart

    application is based on the use of the accumulated techniques, experiences and knowledge. Plotting

    of graphs,  fitting a linear model, and so on, are needed in some places. You are urged, therefore,

    to incorporate all the useful techniques in the solutions to exercises. We advise you to revisit these

    topics in your  first-year module.

    It is necessary to realise that numbers alone do not provide all the answers. It should be clear to

    you that aspects of a qualitative nature add value to the predictions made so that the data context is

    clear.

    This study guide

    In this study guide we attempt to present explanations of the concepts in the textbook. It contains

    easy examples as well as activities for you to practise. You are encouraged to do the activities

    in order to learn effectively. Reading of feedback alone leaves gaps in your learning. There arediscussions following the activities so that the feedback is immediate. Do not just read through them;

    try to explore them by testing that you can do them as well, even if you use alternative methods.

    The exercises selected for assignments are important in reinforcing what you need to understand in

    this module. Take time to understand the aspects that go with them. Analyse the postulates in the

    given statements and thereafter the requirements so that it becomes easy to recall what is necessary

    in compiling a solution. In that way you do not only solve the problem, you understand it and enjoy

    solving it. At the end of the semester there is a two-hour closed-book examination. The discussions

    in the study guide and the textbook prepare you for that examination.

    This study guide is prepared to guide you through the prescribed book. Therefore, we will always

    use it together with the prescribed book. Read them together. The textbook presents the concepts,

    study guide attempts to bring the concepts closer to you.

    Each study unit starts with the outcomes in order to show you what you need to know and to evaluate

    yourself. The table of outcomes also gives each outcome together with the way the outcome will

    be assessed, the content needed for that outcome, the activities that will be used to support the

    understanding of the content and the way feedback will be given. Your input in the form of positivecriticism to improve the presentation will be of importance in the review of this study guide. You are

    therefore encouraged to suggest ways that you believe can improve the presentation of this module.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    9/127

    viii

    Module position in the curriculum

    We have been offering a postgraduate module on Time Series at Unisa, but have become aware of 

    the need to introduce the module at undergraduate level due to its necessity in the workplace and in

    order to  fill the gap that is evident when students attempt the postgraduate time series module.

    This module is part of the whole Statistics curriculum at Unisa. Its position on the curriculum structure

    is as follows:

    1st year STA1501 STA1502 STA1503

    2nd year STA2601 STA2602 STA2603

    STA2604FORECASTING

    We are hereSTA2610

    3rd year STA3701 STA3702 STA3703 STA3704 STA3705 STA3710

    You should already be familiar with some of the modules mentioned above. Knowledge from

    STA2604 will help you in STA3704 (Forecasting III).

    ASSIGNMENTSThere are two assignments for this module, which are intended to help you learn through various

    activities. They also serve as tests to prepare you for the examination. As you do the assignments,

    study the reading texts, consult other resources, discuss the work with fellow students or tutors or 

    do research, you are actively engaged in learning. Looking at the assessment criteria given for 

    each assignment will help you to understand what is required of you more. The two assignments

    per semester prescribed for this module form part of the learning process. The typical assignment

    question is a reflection of a typical examination question. There are  fixed submission dates for the

    assignments and each assignment is based on specific chapters (or sections) in the prescribed book.

    You have to adhere to these dates as assignments are only marked if they are received on or before

    the due dates.

    •  Both assignments are compulsory as

    •  they are the sole contributors towards your year mark and

    •  they form an integrated part of the learning process and indicate the form and nature of the

    questions you can expect in the examination.

    Please note that the submission of assignment 01 is the guarantee for examination entire . If you

    do not submit assignment 01, UNISA not the Department of Statistics will deny you examination

    entry.

    You are urged to communicate with your lecturer(s) whenever you encounter dif ficulties in this

    module. Do not wait until the assignment due date or the examination to make contact with lecturers.It is helpful to be ready long in advance. You are also encouraged to work with your own peers,

    colleagues, friends, etc. Details about the assignments will be given Tutorial letter 101.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    10/127

    ix STA2604/1

    Time series has its own useful terminology that should be understood. In order to familiarise yourself 

    with it, let us start with an easy activity. Activities help in the creation of a mind map of the module.

    The more you attempt these activities, the better you will understand the work.

    GLOSSARY OF TERMSACTIVITY 0.1

    (a) Make a list of all the concepts that are printed in bold type in Chapters 1, 5, 6, 7 and 8 of the

    prescribed book. They serve as your glossary.

    (b) Attempt meanings of these concepts before you deal with the various sections so that you have

    an idea before we get there.

    DISCUSSION OF ACTIVITY 0.1

    (a) There is a missing concept/term among the ones you listed, which is absolutely fundamental. It

    appears with other terms or phrases. The term is “data”. You came across the term many times

    when you studied other modules and in some other contexts. It is emphasised that it is a useful

    aspect in forecasting. If you do not have data, you will not be able to make forecasts.

    (b) Do not worry if the meanings you gave do not match the content in the tutorial letter or textbook.

    The intention was to make you aware of aspects on which to focus in your learning.What isrequired from you is a step-by-step journey through the prescribed material.

    ACTIVITY 0.2

    What is the meaning of the word data?

    DISCUSSION OF ACTIVITY 0.2

    There is a general misconception that data and information are the same concepts. This is not

    necessarily the case. Data are records of occurrences from which we obtain information. It is not

    necessarily information on its own, but may sometimes be information. The truth is, data possess

    information that is seen after some analysis. They are often the raw answers we receive from an

    investigation.

    WHAT TO EXPECT IN THE MODULEIn this module we use a scientific calculator to perform calculations. We will also draw graphs, form

    mathematical models (equations) that are used to develop forecasts and make decisions based on

    time series data. Most of these aspects stated were taught at  first-year level. The new topic is thepattern of time series data. The way time series data appear is unique because without this form

    they cannot qualify to be time series data.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    11/127

    x

    PREREQUISITES

    •  The ability to use a scientific calculator.

    • Access to a computer package and the ability to use it are highly recommended.

    •  First-year statistics. These topics appear below and there will be a quick reminder whenever we

    need them. We will need

    - Simple linear regression

    - Correlation measures

    - Polynomials

    - Graph plotting

    When you draw plots required for statistical analysis, these plots should be accurate. Hence, use

    a ruler and a lead pencil (not a pen) to construct plots. If you have access to a computer, you are

    also encouraged to practise using any statistical package of your choice. Assignments may also be

    prepared by means of a computer. Just make sure that you use the correct notation. Avoid using a

    computer if you cannot write the correct notation. Remember that you are always welcome to contact

    the lecturers whenever you have problems with any aspect of the module.

    OUTCOMES

     At the end of the module you should be able to do the following:

    •  Define and apply components of time series.

    •  Apply time series methods to develop forecasts.

    •  Specify a prototype forecast model, estimate its parameters and then validate it.

    •  Use the specified model to derive forecasts.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    12/127

    xi STA2604/1

    TABLE OF OUTCOMES

    Outcomes - At the

    end of the moduleyou should beable to

    Assessment Content Activities Feedback

    - explain and expose

    time seriescomponents

    - analyse data

    - plot graphs

    - trend

    - seasonality- cycles- irregularity

    - examine data

    visually- plot graphs

    - discuss

    likelyerrors

    - select a model - balancefactors

    - choosing atechnique

    - analyse errors- plot graphs

    - scrutinisemodels

    - develop a model - forming anequation

    - regression- exponentialsmoothing

    - small build-upexercises

    - emphasiseaptness

    - estimate parameters - perform

    estimations

    - estimation

    methods

    - perform

    calculations

    - discuss

    alternatives

    - validate a model - statisticaltests

    - hypothesistesting

    - test hypotheses - peruse thevarious tests

    - develop forecasts - demonstratepatterns

    - modelbuilding

    - form equations - visit variousalternatives

    You will know that you understand this module once you understand the above issues.

    Feedback is not just a follow-up of the preceding concepts. It is an opportunity to reinforce some

    concepts and revise others. Make use of this opportunity. Feedback is given after every activity,

    sometimes with some discussion after the activity, but in many instances, it follows immediately after 

    the activity.

    OVERVIEWTwo of the  five study units comprising this module are presented in this study guide.

    Unit 1: Narration of the forecasting domain and support elements

    (Chapter 1 of Bowerman et al.)

    In this unit we will learn more about

    •  Situations requiring forecasts and forecasting

    •   Issues about useful data and use of data in developing forecasts

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    13/127

    xii

    •  Basic types of data and approaches (quantitative and qualitative methods)

    •  Errors, problems and pitfalls in forecasting, as well as depiction of good forecasts

    •  Factors useful in choosing a forecast technique

    •  More about quantitative methods

    Do the above issues raise some response from you? Do you have any idea of what they mean or 

    imply? Think and chat with your colleagues, peers or family members. Remember that learning

    becomes real and effective only when sharing is involved.

    Unit 2: Building a forecast model and examining / verifying its strength

    (Chapter 5 of Bowerman et al.)

    In this study unit we will learn about

    •  Multicollinearity of variables:

    - Variance inflaction factors

    - R2

    - adjusted R2

    - standard error 

    - interval length

    - C-statistic

    •  Residual analysis:

    - residual plots

    - the constant variance assumption

    - assumption of correct functional form

    - normality assumption

    - the independence assumption

    •  Outliers and influential observations:

    - outliers

    - influential data

    - diagnostic methods to detect outliers and influential observations

    - leverage points

    - residuals

    - Cook’s distance measure

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    14/127

    xiii STA2604/1

    The measures dealt in with this Unit ensure that the model built for use in forecasting has desirable

    properties of limited error and is influenced to the minimum, if at all it is influenced. Also, it is

    necessary to make a distinction between outliers and seasonal variations. Sometimes a mistake is

    made with an effect of seasonality being misinterpreted as an outlier.

    We hope you have come across some of the concepts or issues above. Discuss these with your 

    colleagues, peers, friends or family members.

    DIFFICULTIES IN FORECASTING TECHNOLOGY

    Nearly all futurists describe the past as unchangeable, consisting as a collection of knowable facts.

    We generally perceive the existence of only one past. When two people give conflicting stories of 

    the past, we tend to believe that one of them must be lying or mistaken.

    This widely accepted view of the past might not be correct. Historians often interject their own beliefs

    and biases when they write about the past. Facts become distorted and altered over time. It may

    be that past is a reflection of our current conceptual reference. In the most extreme viewpoint, the

    concept of time itself comes into question.

    The future, on the other hand, is  filled will uncertainty. Facts give way to opinions. The facts of the

    past provide the raw materials from which the mind makes estimates of the future. All forecasts areopinions of the future (some more carefully formulated than others). The act of making a forecast is

    the expression of an opinion. The future consists of a range of possible future phenomena or events.

    DEFINING A USEFUL FORECAST

    The   usefulness of a forecast is not something that lends itself readily to quantification along any

    specific dimension (such as accuracy). It involves complex relationships between many things,

    including the type of information being forecast, our confidence in the accuracy of the forecast, the

    magnitude of our dissatisfaction with the forecast, and the versatility of ways that we can adapt to or 

    modify the forecast. In other words, the usefulness of a forecast is an application sensitive construct.

    Each forecasting situation must be evaluated individually regarding its usefulness.

    One of the  first rules is to consider how the forecast results will be used. It is important to consider 

    who the readers of the  final report will be during the initial planning stages of a project. It is wasteful

    to apply resources on an analysis that has little or no use. The same rule applies to forecasting. We

    must strive to develop forecasts that are of maximum usefulness to planners. This means that each

    situation must be evaluated individually as to the methodology and type of forecasts that are mostappropriate to the particular application.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    15/127

    xiv

    FORECASTS CREATE THE FUTURE

    Often the way we contemplate the future is an expression of our desire to create that future.

     Arguments are that the future is invented, not predicted. The implication is that the future is an

    expression of our present thoughts. The idea that we create our own reality is not a new concept. It

    is easy to imagine how thoughts might translate into actions that affect the future.

    Forecasting can, and often does, contribute to the creation of the future, but it is clear that other 

    factors are also operating. A holographic theory would stress the interconnectedness of all elements

    in the system. At some level, everything contributes to the creation of the future. The degree to

    which a forecast can shape the future (or our perception of the future) has yet to be determined

    experimentally and experientially.

    Sometimes forecasts become part of a creative process, and sometimes they do not. When two

    people make mutually exclusive forecasts, both of them cannot be true. At least one forecast is

    wrong. Does one person’s forecast create the future, and the other does not? The mechanisms

    involved in the construction of the future are not well understood on an individual or social level.

    ETHICS IN FORECASTING

     Are predictions of the future a form of propaganda, designed to evoke a particular set of behaviours?Note that the desire for control is implicit in all forecasts. Decisions made today are based on

    forecasts, which may or may not come to pass. The forecast is a way to control today’s decisions.

    The purpose of forecasting is to control the present. In fact, one of the assumptions of forecasting

    is that the forecasts will be used by policy-makers to make decisions. It is therefore important to

    discuss the ethics of forecasting. Since forecasts can and often do take on a creative role, no one

    has the absolute right to make forecasts that involve other peoples futures.

    Nearly everyone would agree that we have the right to create our own future. Goal setting is a form

    of personal forecasting. It is one way to organize and invent our personal future. Each person has

    the right to create their own future. On the other hand, a social forecast might alter the course of an

    entire society. Such power can only be accompanied by equivalent responsibility.

    There are no clear rules involving the ethics of forecasting. Value impact is important in forecasting,

    i.e. the idea that social forecasting must involve physical, cultural and societal values. However,

    forecasters cannot leave their own personal biases out of the forecasting process. Even the most

    mathematically rigorous techniques involve judgmental inputs that can dramatically alter the forecast.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    16/127

    xv STA2604/1

    Many futurists have pointed out our obligation to create socially desirable futures. Unfortunately, a

    socially desirable future for one person might be another person’s nightmare. For example, modern

    ecological theory says that we should think of our planet in terms of sustainable futures. The  finite

    supply of natural resources forces us to reconsider the desirability of unlimited growth. An optimistic

    forecast is that we achieve and maintain an ecologically balanced future. That same forecast, the

    idea of zero growth, is a catastrophic nightmare for the corporate and  financial institutions of the free

    world. The system of profit depends on continual growth for the well-being of individuals, groups,

    and institutions.

    ‘Desirable futures’ is a subjective concept. It can only be understood relative to other information.

    The ethics of forecasting certainly involves the obligation to create desirable futures for the person(s)

    that might be affected by the forecast. If a goal of forecasting is to create desirable futures, then theforecaster must ask the ethical question of “desirable for whom?”.

    To embrace the idea of liberty is to recognise that each person has the right to create their own

    future. Forecasters can promote libertarian beliefs by empowering people that might be affected by

    the forecast. Involving these people in the forecasting process, gives them the power to become

    co-creators in their futures.

    BENEFITS OF FORECASTING

    Forecasting can help you make the right decisions, and earn/save money. Here are a few examples.

    • Define better sale strategies

    If a product is declining, maybe it is a good idea to consider stop producing it.  But maybe not:

    maybe it is just your sales that are declining, but not your competitor’s?

    In this case, is there a chance that you can get your market share back?

    Forecasting techniques provide answers to these questions – vital questions to your business.

    • Size your inventories optimally

    Time is money. Room is money. So what you want to do is use all means at your disposal in order 

    to reduce your stocks – without experiencing any shortages, of course.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    17/127

    xvi

    How? By forecasting!

    Forecasting is designed to help decision making and planning in the present. Forecasts empower 

    people because their use implies that we can modify variables now to alter (or be prepared for)

    the future. A prediction is an invitation to introduce change into a system. There are several

    assumptions about forecasting:

    •  There is no way to state what the future will be with complete certainty. Regardless of the

    methods that we use there will always be an element of uncertainty until the forecast horizon

    has come to pass.

    •  There will always be blind spots in forecasts. We cannot, for example, forecast completely new

    technologies for which there are no existing paradigms.

    •  Providing forecasts to policy-makers will help them formulate social policy. The new socialpolicy, in turn, will affect the future, thus changing the accuracy of the forecast.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    18/127

    1 STA2604/1

    STUDY UNIT 1: An Introduction to Forecasting

    1.1 Introduction

    Table of outcomes for the study unit

    Outcomes - At the endof the module you

    should be able toAssessment Content Activities Feedback

    - define time seriesterms

    - data plots andmeasures

    - time seriesword list

    - experimentwith data

    - discuss eachactivity

    - decompose time

    series

    - graph, visual - time series

    components

    - plot graphs - critique the

    graphs

    - calculate time seriesmeasures

    - stepwiseexercises

    - errors inforecasting

    - variouscalculations

    If you understand the above outcomes, it will be an indication that you understand this study unit. It

    is based on Chapter 1 of the prescribed book.

    Forecasting is the scientific process of estimation some aspects of the future in usually unknown

    situations. Prediction is a similar, but is more general term. Both can refer to estimation of time

    series, cross-sectional or longitudinal data. Usage can differ between areas of application: for 

    example in hydrology, the terms "forecast" and "forecasting" are sometimes reserved for estimates of 

    values at certain specific future times, while the term "prediction" is used for more general estimates,

    such as the number of times   floods will occur over a long period. It is essential that one notes

    the emphasis that in this module, forecasting also envelops that it is scientific. This is to ensure

    that we do not consider subjective predictions and spiritual prophecies as part of our scope for this

    forecasting module. Risk and uncertainty are central to forecasting and prediction. Forecasting

    is used in the practice of Customer Demand Planning in every day business forecasting for 

    manufacturing companies. The discipline of demand planning, also sometimes referred to as supply

    chain forecasting, embraces both statistical forecasting and a consensus process. Forecasting is

    commonly used in discussion of time-series data. In this module the terms are fairly straightforward

    from the prescribed book.

    Forecasting has application in many situations:

    •  Supply chain management - Forecasting can be used in Supply Chain Management to make sure

    that the right product is at the right place at the right time. Accurate forecasting will help retailers

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    19/127

    2

    reduce excess inventory and therefore increase profit margin. Accurate forecasting will also help

    them meet consumer demand.

    •  Weather forecasting, Flood forecasting, and Metereology

    •  Transport planning and Transport forecasting

    •  Economic forecasting

    •  Egain forecasting

    •  Technology forecasting

    •  Earthquake forecasting

    •  Land use forecasting

    •  Product forecasting

    •  Player and team performance in sports

    •  Telecommunications forecasting

    •  Political forecasting

    •  Sales forecasting

     ACTIVITY 1.1

    Consider the terms “forecasting”, “cross-sectional data” and “time series”, which are the main focus

    of this study unit.

    (a) Attempt to define these terms.

    (b) Check the definitions in the book and compare your answers in (a).

    Before we discuss the above activity, start by reading slowly through the following discussion. Make

    sure you follow the discussion.

    1.1.1 Forecasting

    Study section 1.1 on page 2 up to the second bullet on page 3.

    The few people with whom we discussed the term “forecasting”seemed to have an understanding

    of the concept only “in a nutshell”. Many of them made reference to the weather forecast that

    was presented on radio, television and the internet. A gap existed in the main understanding of 

    forecasting.

    Various backgrounds exist that show that at every point in time when people lived, they were always

    interested in the future. There are stories from history that inform us that when people dreamed,

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    20/127

    3 STA2604/1

    there were experts to explain the meanings of these dreams in terms of the future. When signs of 

    future drought arose, the implications of the drought were noted and plans were made to offsets the

    impacts that were anticipated. Drought led to hunger. Thus, when predictions were made that there

    was drought coming, preparations were made that at the time of the drought, there would be enough

    food for every member of the community during the duration of the drought. Predicting the future

    even as it was done during those days can be referred to as forecasting. The predicted future was

    then used to plan for the future as explained above.

    Modern practice has encouraged that the "anticipation of the furture" practice be conceptualised.

    It was then formally termed “forecasting”. The current approaches are scientific in order to ensure

    that forecasting is practised systematically. The predictions made are now called forecasts. In other 

    terms, forecasts are future expectations based on scientific guidelines.

    DISCUSSION OF ACTIVITY 1.1

    The  first term we listed in Activity 1.1 was “forecasting”. Did you get that? The term forecasting is a

    “natural” operation. We have always done it, sometimes unconsciously. As was explained, predicting

    activities has always been practised, even in ancient times. For self-evaluation in terms of the time

    series concept, did you define the term forecasting in line with “predicting the future”?

    Forecasting indicates more or less what to expect in the future. Once the future is known, preparation

    for equitable allocation of resources can be made. Wastages can thus be reduced or eliminated and

    gains can be enhanced (or increased).

    FURTHER DISCUSSION ON FORECASTING

    Forecasting is applied in various real-life situations. Six examples of applications are listed on pages

    2 and 3 of the prescribed book. We are close to them at different levels. But what about something

    that we as students of the University of South Africa can appreciate?

    The number of student enrolments at Unisa is the starting point. The trend pattern will give an

    indication of whether there has been a decline or growth in the student numbers over the years. If 

    you are observant, you will realise that there has been an increase in student numbers over the past

    few years. Our “forecast” for next year (2013) is that there will be more students than in 2012.

     ACTIVITY 1.2

    Weather forecasting was mentioned as a known example where forecasting is used abundantly.

    There are many others.

    (a) Provide an easy example of a situation where forecasting is needed.

    (b) Attempt to explain the details of the example you provided in (a).

    DISCUSSION OF ACTIVITY 1.2

    We discussed the Unisa example. If you are interested in Southern African politics and elections you

    will be interested in making predictions about political parties that are going to be in the forefront in

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    21/127

    4

    the next election. We might anticipate extreme growth of one party (MDC) and decline of others in

    Zimbabwe, based on the trends in the previous elections and developments that prevail. Therefore,

    (a) one can for example predict how the political parties will perform in the next election; and

    (b) recent performance of the various parties in previous elections may be revisited and analysed,

    the current activities of the parties may be analysed closely and one may interact with people to

    determine their impressions about various parties.

    N.B.: Here we assume normal election conditions where no intimidation and harassments take place.

    1.1.2 Data

    For this topic you need to study from the middle paragraph of page 3 to the end of page 4.

    Data are important for forecasting. Quality data, which loosely refer to reliable and valid data, are the

    ones needed for forecasting. We may be misled if we use data of poor quality because results are

    likely to be poor as well, even if best methods are used by a proficient analyst. The term data refers

    to groups of information that represent the qualitative or quantitative attributes of a variable or set of 

    variables. Data (plural of "datum", which is seldomly used) are typically the results of measurements

    and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed

    as the lowest level of abstraction from which information and knowledge are derived.  Raw data refers

    to a collection of numbers, characters, images or other outputs from devices that collect information

    to convert physical quantities into symbols, that are unprocessed.

    Without data there will not be forecasting. However, it is important that data be correct (reliable, valid,

    realistic, etc). Data need to be both valid for the exercise, and be reliable. If one of these is missed,

    then be warned that your forecasts may mislead you or any user. Also, collection of data may

    be inadequate to help in supporting the reasoning behind some   findings. Experience shows that

    when data are collected under certain contexts, explanations and contexts become clearer when

    findings are associated with those contexts. Thus, if you assist in data collection of time series or 

    any statistical data, whenever possible, advise on the inclusion of details of the occurrences of the

    data. Giving details around happenings assists in reducing the extent of making assumptions which

    may sometimes be incorrect.

    The type of information used in forecasting determines the quality of the forecasts. Not all of us like

    boxing, but let us discuss the next scenario. Imagine that two boxers were going to  fight on the next

    Saturday. We were required to make a prediction in order to win a million rand competition. Many

    participants looked at the past records of these boxers. They were informed that in the previousseven years boxer Kangaroo Gumbu had won 25 out of 27  fights while boxer Boetie Blood had won

    22 of the 30  fights he had in the same period. Gumbu was known for winning well while Blood had

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    22/127

    5 STA2604/1

    lost dismally in a recent  fight. Let us pause and enjoy the predictions (forecasts) made, just to make

    a good point..

     ACTIVITY 1.3

    Either as a person interested in boxing or someone hoping to win the money, you may be tempted to

    take a chance at the answer. Make a prediction of the outcome of the  fight based on the explanation

    given.

    DISCUSSION OF ACTIVITY 1.3

    Let us determine the odds as statisticians. Using frequencies, Gumbu had a probability of  0.93 of 

    winning the  fight while Blood had probability of  0.73 of winning the  fight. On the basis of these odds,

    many participants predicted that Gumbu was going to win.

    Do you know how the probabilities  0.93 and  0.73 have been obtained? If it is not clear, divide the

    number of successes (wins) of each boxer by the total number of  fights that each boxer had fought.

    The data given were based on certain assumptions. Among others, there was the impression that

    the opponents of the two boxers were of the same quality. If they were not, then the prediction would

    be carrying some “inaccuracies”. Among other omissions, we were not told that the boxing bout

    was going to be held in the catchweight division, where boxers came from different weight divisions

    and could not both fall within a single previously defined weight division. Blood had fought only

    world-class opponents and came from two weight divisions heavier than the weight to which Gumbu

    belonged. That is, there was a difference between the original weights of the two boxers. Gumbu,

    on the other hand, was a boxer who talked too much. He had fought some mediocre opponents and

    wanted to pretend he was an excellent boxer. He had asked for the  fight. In insisting on the  fight,

    he had called Blood a coward until the bout was sanctioned. At the time he was preparing for an

    elimination bout in his weight division after which he was going to   fight for a world title if he won.

    The planned elimination bout was probably going to be the  first real test for Gumbu as a professional

    fighter. It was going to come “after I am done with Blood,” boasted Gumbu.

    In the street some people were predicting that Gumbu was going to lose, but they did not bet as

    money was required. None of those who paid to enter the competition predicted correctly. The  fight

    ended with a  first-round knockout. Blood was the winner. Gumbu was no match.

    DISCUSSION OF THE BOXING SCENARIO

    The records given were correct, but not complete. Records are past data. We need complete

    data and the exact context in which they occurred in order to be able to make accurate forecasts.The analyses that were made about the boxers were correct, but some assumptions were wrong.

     Assumptions are used to build cases, and methods are developed on conditions that are given as

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    23/127

    6

    assumptions. Wrong assumptions may lead to inappropriate methods for data analysis. In cases

    where information can be found to limit the use of assumptions, this should be done. However, many

    cases provide inadequate information, leaving us with no choice but to depend on assumptions.

     Analysis should depend on reasonable assumptions. If in actual practice assumptions are made

    for the sake of doing something, decisions and results reached may lead to improper actions. The

    analyst should learn the art of making appropriate or reasonable assumptions.

    In the case of the example/scenario given, the details were missing, such as that the two boxers were

    of different weights. If we knew, this would have helped in our analysis. Sometimes in predicting

    about forthcoming games, one needs to also know the quality of opposition that the two opponents

    have met in the accumulation of their records. This was also missing in the example. We will insist

    on use of the valid assumptions because as we saw, wrong or invalid assumptions are likely to giveinaccurate predictions. The paragraph after the last bullet of the prescribed book on page 3 explains

    possible repercussions that come with the wrong assumptions (Bowerman, 2005: 3).

    Types of data that are common in real life are  cross-sectional data and time series data. Study

    the definition of cross-sectional data in the rectangle on page 3. Cross-sectional data refers to data

    collected by observing many subjects (such as individuals,  firms or countries/regions) at the same

    point of time, or without regard to differences in time. Analysis of cross-sectional data usually consists

    of comparing the differences among the subjects. For example, we want to measure current obesity

    levels in a population. We could draw a sample of 1,000 people randomly from that population (also

    known as a cross section of that population), measure their weight and height, and calculate what

    percentage of that sample is categorized as obese. Even though we may analyse cross-sectional

    data for quality forecasts, in this module we use time series data.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    24/127

    7 STA2604/1

    Study the definition of time series on page 4.

    We will have to be careful when we collect time series data. If the data are listed without time

    specification, then we should consider the data to be time series.

    SCENARIO

    Read the following scenario carefully and make notes as we will keep on referring back to it.

    Suppose that Jabulani is a milk salesperson during the week, serving the Florida, Muckleneuk and

    VUDEC UNISA campuses. Very fortunately for Jabulani, his milk cows increased and his market

    in these campuses also increased from year to year. Jabulani’s business runs from Mondays to

    Sundays. (In a time series analysis a typical question would be: what can we say about the trend

    of the sales?) Asked differently: should we believe that the sales have a decreasing or increasing

    trend? It will be clear later on that the sales levels differ according to days, high on some days and

    low on others. The pattern of low sales or high sales on different days have an important connotation

    in time series analysis. This will be discussed.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    25/127

    8

     ACTIVITY 1.4

    You have done some first-year statistics modules/courses and some of you did mathematics modules

    as well. Let us consider the following data sets and look at them quite closely.

    Data set 1.1   16 14 19 26 11 24 1018 15 21 24 12 21 921 15 20 27 13 25 1124 17 24 31 14 27 13

    Data set 1.2   16 18 21 2414 15 15 1719 21 20 2426 24 27 31

    11 12 13 1424 21 25 2710 9 11 13

    (a) The two data sets have exactly the same numbers. There is something strange about their 

    appearances though. Compare the two data sets.

    (b) Can these two data sets be classified as time series data sets? Explain.

    DISCUSSION OF ACTIVITY 1.4

    On whether data are time series or not

    When information about the data presented is limited, there also tends to be a limited feedback from

    an analysis made from them. You probably realised that the rows of data set 1.1 are the same as the

    columns of data set 1.2 and vice versa. Or, in short, that the data sets are transposes of each other.

    The data in their current form cannot be classified as time series data since no chronological pattern

    of the time at which they were collected is given. This will become clearer as we proceed.

    Discussion

    The data above do not necessarily represent time series data, but it can be presented in another way

    to form time series data - provided they were collected chronologically over regular time intervals.

    Suppose data set 1.1 represents the sales of milk sold by Jabulani from Monday to Sunday for four 

    weeks. Let 1 = Monday, 2 = Tuesday, ..., 7 = Sunday as given in data set 1.3. The data sets should

    therefore be presented as follows:

    Data set 1.3 Litres of milk sold by Jabulani

    Day1 2 3 4 5 6 7

    1   16 14 19 26 11 24 10Week 2   18 15 21 24 12 21 93   21 15 20 27 13 25 114   24 17 24 31 14 27 13

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    26/127

    9 STA2604/1

    We emphasise that in the initial presentation there was simply no information to explain or 

    demonstrate the chronological sequence with respect to time and that the data were therefore not

    time series data.

     ACTIVITY 1.5

    You are required to use graphs in addition to other methods to detect patterns in time series data.

    Graphical plots reveal information visually, but cannot always be done with ease. The example

    that follows, is one of the easy cases where we can draw graphical plots. Analyse the data about

    Jabulani’s business by answering the following questions. Make any comments that you believe are

    relevant.

    (a) Are they time series data? Justify your answer.

    (b) Plot the data to reveal the pattern using the following approaches:

    (i) Plot the data for each week separately.

    (ii) Plot the data of all the weeks in one graphical display.

    (iii) Compare the shapes of the graphs.

    (c) Which plot provides us with a better idea of comparison?

    DISCUSSION OF ACTIVITY 1.5

    The emphasis about whether data sets form time series or not, depends entirely on the form, whichis the chronological order in which the various data points should be presented. Did you answer 

    "yes" in question (a)? If not, what did you reveal? How did you reveal it?

    (b) Graphs of the activity

    (i) Graphs for separate weeks

     

    Week 1

    0

    5

    10

    1520

    25

    30

    1 2 3 4 5 6 7

    Days

       L   i   t  r  e  s  o   f  m

       i   l   k

     

    Week 2

    0

    5

    10

    1520

    25

    30

    1 2 3 4 5 6 7

    Days

       L   i   t  r  e  s  o   f  m

       i   l   k

     

    Week 3

    05

    10

    15

    20

    25

    30

    1 2 3 4 5 6 7

    Days

       L   i   t  r  e  s  o   f  m   i   l

     

    Week 4

    05

    10

    15

    20

    25

    30

    35

    1 2 3 4 5 6 7

    Days

       L   i   t  r  e  s  o   f  m   i   l   k

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    27/127

    10

    (ii) Graph for data of all the weeks

     

    0

    5

    10

    15

    20

    25

    30

    35

    1 2 3 4 5 6 7

    Days

       L   i   t  r  e  s  o   f  m   i   l   k

    Week 1

    Week 2

    Week 3

    Week 4

    (iii) In terms of the pattern, the graphs reveal that milk sales were highest on Thursdays, Saturdays

    and Wednesdays (in order from highest to lowest). The lowest sales were revealed for 

    Sundays, Fridays, Tuesdays and Mondays (in the order from lowest to highest).

    (c) The graphs can be dif ficult to compare when they are on separate systems of axes. The last

    graph makes comparison very easy, revealing that the patterns for all four weeks are similar.

    The patterns of the highest activity and lowest activity about a phenomenon are important in time

    series. Jabulani will easily know when he does more business, when he does least business and he

    can plan to  find better ways to improve business. Let us start formalising these patterns.

    1.1.3 Components of a time series

    The components of a time series serve as the building blocks of a time series and describe its pattern(study p. 5-7 of textbook up to the end of section 1.2).

    Components are important because they enable us to see the salient features of a structure. Through

    them we can make descriptions of what we need to analyse. When we deal with something that we

    can describe, we are better able to know the requirements for dealing with it. Time series also has

    components that need to be considered and taken care of in their analyses.

    Trend

    The  first component we discuss is trend. The term “trend” is about long-term decline or growth of an activity. It is defined formally as the upward and downward movements that characterise a time

    series over a period of time.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    28/127

    11 STA2604/1

    Time series data may show upward trend or downward trend for a period of years. This may be

    due to factors such as increase in population, change in technological progress, large scale shift in

    consumers’ demands, and so on. For example, population increases over a period of time, price

    increases over a period of years, production of goods on the capital market of the country increases

    over a period of years. These are the examples of upward trend. The sales of a commodity may

    decrease over a period of time because of better products coming to the market. This is an example

    of declining trend or downward trend. The increase or decrease in the movements of a time series

    is called trend.

    Usually one would not be able to determine from looking at the data whether there is a decreasing

    or increasing trend. There are times (but rarely) when we can see the pattern by inspection. Often a

    graphical plot clearly shows the trend. The trend may be given in shapes such as linear, exponential,

    logarithmic, polynomial, power function, quadratic, and other forms. In general, we use the graphical

    displays to find out if there is a decline or increase in the activity. Some examples of trend applications

    that we must look at are given on page 5 of Bowerman et al. (2005). Study them.

    - Technological changes in the industry

    Currently, companies increase ICT usage in their activities for competitive edge over those that do

    not incorporate it. Institutions of higher learning have aggressively incorporated ICT in facilitating

    learning, especially the distance education ones.

    - Changes in consumer tastes

    Housing is very expensive and scarce, but for obvious reasons remains a priority for households.

    Recently, cities such as Cape Town, Durban, East London, Johannesburg, Port Elizabeth and

    Pretoria have experienced a high influx of people from other areas, and employment is biased

    towards the youth. As a result housing in these cities is biased towards townhouses and  flats.

    - Increases in total population

    There is an increase since there are more births than deaths. In SA, there is also an influx of 

    people from other countries. In other countries, natural deaths and deaths that resulted from

    holocausts, wars, terrorism and natural disasters such as the tsunami and others, have resulted

    in many deaths but much fewer deaths than the births that have occurred over the years. That is

    why there is an increase in the world’s population.

    - Market growth

    In Gauteng, the market of umbrellas decreases in the period April to July. During the rainy season,

    which in Gauteng happens to be the summer season, the sales of umbrellas increase.

    - Inflation or deflation (price changes)

    If we consider one item for simplicity, maize is produced in the period October to May,

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    29/127

    12

    approximately. During entry period, the price of maize is high because there are more people

    looking for a less available commodity. During the periods November to January, maize is in

    abundance and the prices drop. As the production level declines, the prices start increasing

    again.

     ACTIVITY 1.6

    Discuss what a time series is, and discuss the meaning of trend effects, seasonal variations, cyclical

    variations, and irregular effects.

    DISCUSSION OF ACTIVITY 1.6

    You should mention a sequence of observations of a variable presented in chronological form

    when you describe a time series. Trend should imply a long-term tendency of that time series.Seasonality should include a periodic pattern in the data. Describing cycles should imply up and

    down movements of observations around trend levels. Irregular pattern is the portion of the time

    series which cannot be accounted for by the three patterns discussed above.

    Exploration data set

    The next data set is important for exploration. ENJOY IT. It represents the litres of milk that were

    demanded from Jabulani. Whether there was stock or not is not an issue here. The data set will be

    revisited time and again.

    Data set 1.4 Day1 2 3 4 5 6 7

    1   16 14 19 26 11 24 10Week 2   18 15 21 24 12 21 9

    3   21 15 20 27 13 25 114   24 17 24 31 14 27 13

    In general, methods of forecasting that depend on non-numeric information are qualitative forecasting

    methods. (Do you remember this from  first-year Statistics?) Qualitative data are nominal/words data.

    Quantitative forecasting methods on the other hand depend on numerical data.

    Bowerman et al. (2005: 7) present a graphical plot Figure 1.1 (a) to display an example of a trend

    in a time series. There is no trend line to describe the trend, but can you explain whether there is a

    decreasing or increasing trend in the plot to which we are referring?

    Cycle

    The next component of time series that we discuss is “cycle”. When trends havebeen identified, there

    may be some recurring up and down movements visible around trend levels. These movements are

    called cycles. Cycles occur over long and medium terms. Page 5 of Bowerman et al. (2005) presents

    this component.

    Some interesting explanation is presented by Bowerman et al. (2005: 5) about business cycles.

    Study it in detail. Bowerman et al. (2005: 7) present Figure 1.1 (c) to display an example of a cycle

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    30/127

    13 STA2604/1

    in a time series. We need to note that generally, natural occurrences have shown some cyclical

    patterns over the years.

    The impact of cycles on a time series is either to stimulate or depress its activity, but in general,

    their causes are dif ficult to identify and explain. Certain actions by institutions such as government,

    trade unions, world organisations, and so on, can induce levels of pessimism and optimism into the

    economy which are reflected in changes in the time series levels. Economic indices are usually used

    to describe cyclical  fluctuations.

    Cyclical variations are recurrent upward or downward movements in a time series but the period of 

    cycle is greater than a year. This restriction makes it different from trend. Also, cyclical variations

    are not regular as seasonal variation. There are different types of cycles of varying in length and

    size. The ups and downs in business activities are the effects of cyclical variation. A business

    cycle showing these oscillatory movements has to pass through four phases-prosperity, recession,

    depression and recovery. In a business, these four phases are completed by passing one to another 

    in this order. Together, they form a cycle.

    Cycles are useful in long-term forecasting. Usually it means centuries and millenniums. Our 

    capabilities and interest in this module do not require us to look beyond a decade. Hence, methods

    for developing forecasts that include cycles (or cyclical components) are not in the scope of this

    module. However, you still need to understand when cycles are discussed or implied in a forecastingsituation.

    Seasonality

    The example about milk is given over weekly periods. The definition given by Bowerman et al. (2005:

    6) is somewhat misleading! The impression it gives is that observations being investigated, must run

    over a year. This is simply not the case. Even the values occurring within a day can be seen to be

    seasonal, as you will soon see. First, we provide a more useful and realistic definition of seasonality,

    which will be used in the module. The one given in Bowerman et al. shall work when the periods are

    over yearly periods. Let us define the concept in the next line:

    Seasonal variations are systematic variations that occur within a period and which are tied to some

    properties of that period. They are repeated within the period. They are indeed periodic patterns in a

    time series that complete themselves within a calendar period and are repeated on the basis of that

    period.

    Seasonal variations are short-term  fluctuations in a time series which occur periodically in a period,

    such as a year. In this case it would continue to be repeated year after year. The major factors that

    are responsible for the repetitive pattern of seasonal variations are weather conditions and customsof people. More woolen clothes are sold in winter than in the season of summer. Regardless of the

    trend we can observe that in each year more ice creams are sold in summer and very little in winter 

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    31/127

    14

    season. The sales in the departmental stores are more during festive seasons that in the normal

    days.

    Irregular  fluctuations

    We have not mentioned whether Jabulani was ever robbed of his revenue or stock for his business.

    Now we are giving you bad news.

    Irregular   fluctuations are variations in time series that are short in duration, erratic in nature and

    follow no regularity in the occurrence pattern. These variations are also referred to as residual

    variations since by definition they represent what is left out in a time series after trend, cyclical and

    seasonal variations have been accounted for. Irregular  fluctuations results due to the occurrence of 

    unforeseen events like  floods, earthquakes, wars, famines, and so on.

    Remember that Jabulani was a smart entrepreneur who would make some estimations of revenue

    each morning he left for work. One Tuesday afternoon after he had counted what he thought was his

    revenue for the day, he was robbed by two thugs. Fortunately he was neither hurt nor discouraged

    to continue with his business. It was happening for the  first time. Could he have anticipated being

    robbed on that day? We also could not have predicted that event.

    The point is, that irregular event changed what could have been the revenue and/or profit for that

    day. In time series, irregular  fluctuations, which are also called irregular variations, refer to random

    fluctuations that are attributed to unpredictable occurrences. Bowerman et al. (2005: 6) appropriately

    define them as erratic movements in a time series that follow no recognisable or regular pattern. The

    presentation about this concept simply implies that these patterns cannot be accounted for. They

    are once-off events. Examples are natural disasters (such as  fires, droughts,  floods) or man-made

    disasters (strikes, boycotts, accidents, acts of violence and so on).

    Note that all the components of a time series influence the time series and can occur in any

    combination. The most important problem to be solved in forecasting is trying to match the

    appropriate model to the pattern of the time series data.

    1.1.4 Applications of forecasting

    Forecasting has application in many situations. Among others, it can be applied in:

    •  Supply chain management - Forecasting can be used in Supply Chain Management to make sure

    that the right product is at the right place at the right time. Accurate forecasting will help retailers

    reduce excess inventory and therefore increase profit margin. Accurate forecasting will also help

    them meet consumer demand.

    •  Weather forecasting, Flood forecasting and Meteorology

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    32/127

    15 STA2604/1

    •  Transport planning and Transportation forecasting

    •  Economic forecasting

    •  Technology forecasting

    •  Earthquake prediction

    •  Land use forecasting

    •  Product forecasting

    •  Player and team performance in sports

    •  Telecommunications forecasting

    •  Political Forecasting

    1.2 Forecasting Methods

    This topic is discussed on pages 7 to 12. Study these pages. On page 7 there is a reminder that

    there is no single best forecasting method. There are, however, appropriate methods for any time

    series situation. The forecasting methods are described along the same line as types of data that you

    dealt with in your Statistics courses/modules at  first year level. They are qualitative and quantitative

    in nature.

    1.2.1 Qualitative methods

    Study this topic from page 8 to page 11.

    The textbook explains on page 8 that generally, qualitative forecasting methods become an option

    to develop forecasts in situations where there are no historical numeric data or where time series

    trained statisticians are not available. Opinions of experts are generally used to make predictions

    in such cases. Predictions are necessary in all situations, even where there is no data. When this

    occurs, qualitative methods are involved.

    Common examples of qualitative forecasting methods are judgemental methods. Judgmental

    forecasting methods incorporate intuitive judgements, opinions and subjective probability estimates.

    •  Composite forecasts

    •   Surveys

    •  Delphi method

    •  Scenario building

    •  Technology forecasting

    •  Forecast by analogy

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    33/127

    16

    You do not need to learn more about these for the requirements of this module. However, you

    may come across them in applications. Hence, your encounter with them may be of help in future

    applications.

    1.2.2 Quantitative methods

    Quantitative forecasting methods are used (and only possible) when historical data that occur in

    numeric form are available. These methods may occur as univariate forecasting methods or as

    causal methods (Bowerman et al., 2005: 11).

    Univariate forecasting methods depend only on past values of the time series to predict future

    values. In this method, data patterns are identified from historical data, the assumption is made

    that the patterns will continue in the future and then the pattern is extrapolated in order to develop

    forecasts. Study this topic on page 11.

    Causal forecasting models, start by identifying variables that are related to the one to be predicted.

    This is followed by forming a statistical model that describes the relationship between these

    variables and the variable to be forecasted. The common ones are regression models and ordinary

    polynomials. Study this topic on page 11.

    In the causal forecasting method, the variable of interest, which is the one whose forecasts are

    required, depends on other variables. It is thus the dependent variable. The ones on which the

    variable of interest depends are known as the independent variables.

    Discussion about dependence/independence

    Note that Jabulani’s customers are mostly people who received wages on a weekly basis. Some are

    paid on Saturday afternoon, but an overwhelming majority is paid on Friday afternoon. In addition,

    on Saturday afternoon, there is an item P that is also liked by many milk buyers. If item P is available

    before milk arrives, then this item is bought in large quantities, leaving limited disposable income for 

    the milk purchases. Fortunately for Jabulani, he has in the past four weeks, managed to deliver milk

    before item P was delivered. However, most of the buyers who are paid on Saturday tend to meet

    the P seller before their milk purchases on Sunday morning.

    It is necessary to understand dependencies and correlations when dealing with forecasting. If you fail

    to understand them, you may fall in the trap of making wrong assumptions because influences that

    may affect your forecasts and constraints coming with correlated variables may lead to developing

    inaccurate models and thus leading to wrong forecasts.

    Useful common examples are time series and causal methods. There are others as well, but the

    following may be of help in your development.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    34/127

    17 STA2604/1

    Time series methods

    Time series methods use historical data as the basis of estimating future outcomes.

    •   Rolling forecast is a projection into the future based on past performances, routinely updated

    on a regular schedule to incorporate data.

    •   Moving average

    •   Exponential smoothing

    •   Extrapolation

    •   Linear prediction

    •   Trend estimation

    •   Growth curve

    Causal / econometric methodsSome forecasting methods use the assumption that it is possible to identify the underlying factors

    that might influence the variable that is being forecasted. For example, sales of umbrellas might

    be associated with weather conditions. If the causes are understood, projections of the influencing

    variables can be made and used in the forecast.

    •   Regression analysis using linear regression or non-linear regression

    •   Autoregressive moving average (ARMA)

    •   Autoregressive integrated moving average (ARIMA), e.g. Box-Jenkins

    •  Econometrics

    Other methods

    •   Simulation

    •   Prediction market

    •   Probabilistic forecasting and ensemble forecasting

    •   Reference class forecasting

    These methods are given to you so that when you make references from other forecasting sources,

    you will be able to understand where they belong in your module. However, they are not necessarily

    required to the extent that is presented in those other sources.

     ACTIVITY 1.7

    •  Do you see any dependence of the variables?

    Hint: Focus on milk purchases and disposable income.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    35/127

    18

    DISCUSSION OF ACTIVITY 1.7

    Keeping to the hint, the purchase of an item that is in high demand depends on the availability of 

    disposable income.

     ACTIVITY 1.8

    (a) Classify the milk sales in the latest scenario as a dependent or independent variable.

    (b) Explain your choice in (a) above. Here confine your response to milk purchases and disposable

    income.

    (c) Identify the dependent variable and the independent variable.

    DISCUSSION OF ACTIVITY 1.8

    Regarding (a), milk sales depend on the availability of disposable income. Hence, (b) milk sales

    represent the dependent variable. This leads to (c) that sales are the dependent variable and

    disposable income is the independent variable.

    1.3 Errors in forecasting and forecast accuracy

    When it was said that the pattern of information given, such as Jabulani’s milk sales, can help you

    make future predictions, no one said your predictions would be perfect.

    It is time to note that if the forecasts prepared/developed are not accurate, they may be useless since

    they are probably going to mislead the user. When we insist on a scientific method in forecasting, it

    was to ensure that we can monitor the methods and test the models so that the inaccuracies in them

    are reduced, or ideally, eliminated.

    It is important to know the likely errors when you attempt to make predictions or develop forecasts.

    If you know them, you can avoid or minimise them. Error is as simple as when you thought Jabulani

    was going to sell 500 litres in a specific week and he ends up selling 520 litres. (Note that you could

    make an error in litres of milk by overestimating as well.)

    The next sections require your learned skill of drawing graphs and interpreting them. The

    most common ones you should expect to encounter (draw and interpret) are scatter diagram (or 

    scatterplot) and time plot. Revise them if you have already forgotten how they are drawn.

    Further, you are soon going to engage in a number of calculations. Thus, ensure that you are

    ready to perform them, and that you remember descriptive statistics your learnt in your early years

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    36/127

    19 STA2604/1

    of Statistics. It is also very important to be able to know why the calculations are necessary in any

    exercise of building a forecast model.

    Bowerman et al. (2005: 12) name two types of forecasts, the point forecast and the prediction

    interval. A point forecast is a single number that estimates the actual observation. A prediction

    interval is a range of values that gives us some confidence that the actual value is contained in the

    interval.

    The forecast error as defined in Bowerman et al. (2005: 13) requires that the estimate be found and

    be “paired” with the actual observation.

    In statistics, a forecast error is the difference between the actual or real and the predicted or forecast

    value of a time series or any other phenomenon of interest. In simple cases, a forecast is compared

    with an outcome at a single time-point and a summary of forecast errors is constructed over a

    collection of such time-points. Here the forecast may be assessed using the difference or using

    a proportional error. By convention, the error is defined using the value of the outcome minus the

    value of the forecast. In other cases, a forecast may consist of predicted values over a number of 

    lead-times; in this case an assessment of forecast error may need to consider more general ways of 

    assessing the match between the time-profiles of the forecast and the outcome. If a main application

    of the forecast is to predict when certain thresholds will be crossed, one possible way of assessing

    the forecast is to use the timing-error—the difference in time between when the outcome crosses

    the threshold and when the forecast does so. When there is interest in the maximum value being

    reached, assessment of forecasts can be done using any of:

    ·   the difference of times of the peaks;

    ·   the difference in the peak values in the forecast and outcome;

    ·   the difference between the peak value of the outcome and the value forecast for that time point.

    Forecast error can be a calendar forecast error or a cross-sectional forecast error, when we want to

    summarize the forecast error over a group of units. If we observe the average forecast error for a

    time-series of forecasts for the same product or phenomenon, then we call this a calendar forecast

    error or time-series forecast error. If we observe this for multiple products for the same period, then

    this is a cross-sectional performance error.

    To calculate the forecast errors we subtract the estimates (ŷi) from the actual observation (yi). The

    difference is the forecast error. Can you tell what the values of the forecast errors imply? For 

    example, some may be smaller than others, some negative and others positive!

    When Jabulani plans his sales, he makes some estimation of litres of milk that he hopes to sell. In

    Week 3 prior to getting to the market, he had made the following estimations  (ŷi):

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    37/127

    20

    Week Day Litres of milk estimated (ŷ)3 1   27

    2   113   20

    4   265   146   227   9

    Remember to refer to the appropriate week of the table of Data set 1.4 for observed values  (yi).

     ACTIVITY 1.9

    (a) On which days were there overestimation?

    (b) On which days were there underestimation?

    (c) Calculate the forecast errors for these estimates.

    (d) Identify the day on which the milk sales were most disappointing! Explain.

    (e) On which day did he make the best prediction? Why?

    DISCUSSION OF ACTIVITY 1.9

    We have not defined the terms   overestimation  and   underestimation   formally. They have been

    defined in other modules, but we wish to make a reminder. If you make a prediction and the actualobservation turns out to be smaller, we will have overestimated. What is the sign of the forecast

    error? Can you now define the term “underestimation”? What about the sign of the forecast error?

    Let us get into the questions of the activity. The setup of week 3 is as follows:

     Actual observations (y1) 21 15 20 27 13 25 11Estimates observations (ŷ1) 27 11 20 26 14 22 9

    (a) Overestimations are visible after pairing by observing the pairs in which the actual observations

    are lower than the estimates. These were on Day 1 and Day 5.

    (b) Underestimations occurred on Day 2, Day 4, Day 6 and Day 7.

    (c) The forecast errors are −6, 4, 0, 1, −1, 3 and 2 for the seven days, respectively.(d) Day 1 was the most disappointing. This is because Jabulani expected to sell 27 litres but only

    sold 21 litres. It is the day he made the biggest loss, that is with the largest negative error.

    (e) He made the best prediction on Day 3, where the sales were equal to the estimates.

    If there was no day when the sales and estimates were equal, then the day with the smallest forecasterror in absolute value would have been the one on which the best prediction was made. This means

    that Day 4 and Day 5 are the days on which good predictions were made. However, we note that

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    38/127

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    39/127

    22

    (b) The plot looks almost random. This means that the forecasting technique provides a good  fit to

    the data.

    1.3.1 Absolute deviation

    Forecast errors are used to calculate absolute deviations. The absolute deviation (Bowerman et al.,

    2005: 15) requires the forecast errors in absolute terms, i.e., a matter of “how far is the estimate from

    the actual observation”.

     ACTIVITY 1.11

    Calculate the absolute deviations for the estimates in Activity 1.9.

    DISCUSSION OF ACTIVITY 1.11

    The calculation is fairly straightforward. We need the forecast errors, which were calculated as

    Forecast errors (e1)   −6 4 0 1   −1 3 2

    The absolute deviations are the absolute values of the forecast errors, which we can recall from our 

    high-school days. The absolute deviations are thus

     Absolute deviations (|e1|) 6 4 0 1 1 3 2

    1.3.2 Mean absolute deviation

    The absolute deviations give us the mean absolute deviation (MAD) when we obtain their average in

    the usual way. The MAD (Bowerman et al., 2005: 15) requires the following steps: take the absolute

    deviations, add them, divide the sum by their number and the result in the MAD.

     ACTIVITY 1.12

    Calculate the MAD for the estimates in Activity 1.9.

    DISCUSSION OF ACTIVITY 1.12

     Absolute deviations (|ei|) 6 4 0 1 1 3 2

    The MAD is therefore

    M AD   =

    7i=1

    |ei|

    n

    = 177

    = 2.42857.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    40/127

    23 STA2604/1

    1.3.3 Squared error 

     Another way to get rid of positive and negative errors is squared errors (Bowerman et al. (2005: 15)).

     ACTIVITY 1.13

    Calculate the squared errors for the estimates in Activity 1.9.

    DISCUSSION OF ACTIVITY 1.13

    Forecast errors (ei)   −6 4 0 1   −1 3 2

    The squared errors are therefore

    Squared errors

    e2i

      36 16 0 1 1 9 24

    1.3.4 Mean squared error 

    The MSE is the average of the squared errors.

     ACTIVITY 1.14

    Calculate the MSE for the estimates in Activity 1.9.

    DISCUSSION OF ACTIVITY 1.14

    To calculate the MSE we need the squared errors, which were calculated as

    Squared errors

    e2i

      36 16 0 1 1 9 24

    The MSE is therefore

    M SE    =

    7i=1

    e2i

    n

    = 87

    7

    = 12.42857.

    Now, let us pause a little. We have done a few useful calculations. We have also answered a few

    questions about errors.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    41/127

    24

    Do you recall the value of the forecast error on the day that the estimate was perfect? Do you also

    see what is meant by a poor estimate? Now can you say what is meant by a good estimate? You

    will recall that the errors need to be as small as possible. So far it is not absolutely clear what small

    entails.

    The MAD and MSE are the measures that we will use to determine if the errors are small which will

    indicate a good model. The objective is to select a good forecast model. The model that will be

    selected must produce forecasts that are close to the actual observations. The MAD and the MSE

    will serve as our tools to select a forecast model.

    We need to understand the MAD and the MSE as they relate to the forecast model. The steps are

    as follows:

    MAD steps MSE steps

    Calculate forecast errors Calculate forecast error Determine absolute deviations Determine squared errors

     Add the absolute deviations Add the squared errors

    Divide by their number Divide by their number 

    MAD is not in any way “mad”. It is an objective route to good forecasting. The MSE serves the same

    purpose.

    Sometimes the effectiveness of a model is measured in percentages. Such measures are the

    absolute percentage error (APE) and the mean absolute percentage error (MAPE) (Bowerman et

    al., 2005: 18).

    1.3.5 Absolute percentage error (APE)

     APE is the absolute error divided by the corresponding actual observation multiplied by 100.

     ACTIVITY 1.15

    Calculate the APE for the estimates in Activity 1.9.

    DISCUSSION OF ACTIVITY 1.15

    To calculate the APE we need the absolute errors and the actual observations, which are

     Absolute deviations (|ei|) 6 4 0 1 1 3 2

     Actual observations (yi) 21 15 20 27 13 25 11

    The APE is therefore

    AP E i   28.5714 26.6667 0.00 3.7037 7.6923 12.00 18.1818

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    42/127

    25 STA2604/1

    1.3.6 Mean absolute percentage error (MAPE)

    MAPE is the mean of the APEs. It is defined as

    MAPE  =

    ni=1

    AP E i

    n  .

     ACTIVITY 1.16

    Calculate the MAPE corresponding to the estimates in Activity 1.11.

    DISCUSSION OF ACTIVITY 1.16

    To calculate the MAPE we need the APE, which are

    AP E i   28.5714 26.6667 0.00 3.7037 7.6923 12.00 18.1818

    We obtain

    7i=1

    AP E i = 96.8159.

    The MAPE is therefore

    MAPE    = 96.8159

    7  .

    = 13.8308.

    The intention when measuring the error is to reduce it to monitor and control to increase the accuracy

    of these methods.

    1.3.7 Forecasting accuracyThis section summarises the ‘errors in forecasting’ methods presented above and present them as

    the level of accuracy achieved. It is important to know that forecast accuracy starts with the forecast

    error. As you have seen, the forecast error is the difference between the actual value and the forecast

    value for the corresponding period:

    et =  yt − F twhere  e is the forecast error at period  t,  y is the actual value at period  t, and  F  is the forecast for 

    period t. The summary of the methods given is given in the next table.

  • 8/9/2019 sta2604_2012_-_studyguide_-001_2012_4_b

    43/127

    26

    Measures of aggregate error:

    Mean Absolute Deviation (MAD) MAD = |et|

    n

    Mean Absolute Percentage Error (MAPE) MAPE =

    etyt

    n

    Mean squared error (MSE) MSE =

    e2t n

    Root Mean squared error (RMSE) RMSE =

     e2t

    n

    Please note that business forecasters and practitioners sometimes use different terminology in the

    industry. They refer to the PMAD as the MAPE, although they compute this volume weighted MAPE.

    Please stick to the textbook notation.

    1.4 Choosing a forecasting technique

    We have to le