lb34

27
Laboratory Work #3-4 Purpose : The study of Time series and forecasting. Elaborated: Carolina Postica, EMREI-125 Controlled: Natalia Şîşcan, prof. catedrei de Informatica si Statistica 1

Upload: carolina-postica

Post on 24-Oct-2015

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LB34

Laboratory Work #3-4

Purpose: The study of Time series and forecasting.

Elaborated: Carolina Postica, EMREI-125

Controlled: Natalia Şîşcan, prof. catedrei de Informatica si Statistica

ASEM

2013

1

Page 2: LB34

Theory:

Time series analysis is one quantitative method we use to determine patterns in data collected over time. Time series analysis is used to detect patterns of change in statistical information over regular intervals of time. We will project these patterns to obtain estimation of the variable for the future. Thus, time series analysis helps us cope with uncertainty about the future.

Time series analysis aims to describe past evolutions, to identify types of variation in the historical dataset and to extend it with a certain level of significance to the future. We can determine four kinds of change, or variation, involved in time series analysis; they are:• trend or tendency• cyclical fluctuations• seasonal fluctuations• irregular variation

Trend AnalysisBeing on of the most studied components, of the four components of a time series, secular trend represents the long-term tendency of the series. One way to describe the trend component is to fit a line visually to a set of points on a graph. The tendency analysis and description aims to extend it for the long term forecast or to eliminate its influence in order to obtain short term forecasts. Any given graph, however, is subject to slightly different interpretations by different analysts. We may estimate the tendency also by using a mathematical model fitting a trend line by the method of least squares and by smoothing the data using mobile averages. In our section, we will focus on the method of least squares, since visually fitting a line to a time series is not a completely dependable process.

Adjusting Linear Trend by Least Squares MethodMost part of the tendencies is described by a first degree equation. Some trends can be described by a curved line according to the 2nd degree equation or an exponential or logarithm function. The trends described by the 1st degree equation are called linear trends. Before developing the equation for a linear trend, we need to review the general equation for estimating a first degree equation:

2

Page 3: LB34

The previous equation system can be obtained if the array of periods is coded or translated into figures.8.3.3 Marking or Coding TimeNormally, we measure the variable time in terms such as weeks, months, and years. In order to make predictions, we will convert these traditional measures of time to an array of numbers that simplifies the computation. The process is called time coding. To use coding here, we find the mean time and then subtract that value from each of the sample times. Suppose our time series consists of only three points, 1996, 1997, and 1998. If we had to place these numbers in the equation, we would find the calculations monotonous and complicated. Instead, we can transform the values 1996, 1997 and 1998 into corresponding values of -1,0 and 1, where 0 represents the mean (1997), -1 represents the first year mark (1996-1997 = -1) and 1 represents the last year mark(1998-1997=1). We need to consider two cases when we are coding time values. The first is a time series with an odd number of elements, as in the previous example.The second is a series with an even number of elements. Consider Table 8-3., in part a, on the left, we have an odd number of years. Thus the process is the same as the one we just described using the years 1996, 1997, and 1998. In part b, on the right, we have an even number of elements. In cases like this, when we find the mean and subtract it from each element, the fraction ½ becomes part of the answer. To simplify the coding process and to remove the ½, we multiply each time element by 2. We will denote the coded or translated time with ti. We have two reasons for this translation of time. First, it eliminates the needto square numbers as large as 1993, 1994, 1995 and so on. This method also sets the mean year to allow us to simplify the equations.

3

Page 4: LB34

a) when there is an odd numberof elements in the time seriesb) when there is an even number of elements in thetime seriesExample: Using the least squares method in a time series with an even numbers of elements, in order to measure the trend evolution. Consider the data in the following table, 8-4, illustrating the number of ships loaded at Constanta City between 1992 and 1999. In this problem, we want to find the equation that will describe the secular trend of loadings. To calculate the necessary values for the equations let us look at Table 8-4. With these values, we can now substitute into the equations to find the slope and the Y-intercept for the line describing the trend in ship loadings:

Thus, the general linear equation describing the secular trend in ship loadingis:

4

Page 5: LB34

Yˆ = a + bt = 139.25+ 7.536twhere,Yˆ : estimated annual number of ships loaded.t: coded time value representing the number of half-year intervals(a minus sign indicates half year intervals before 1995 ½; a plus signindicates half year intervals after 1995 ½ ).Projecting the trend equationOnce we have developed the trend equation, we can project it to forecast the variable in question. In the problem of finding the secular trend in ship loadings, for instance, we determined that the appropriate secular trend equation was:Yˆ = 139.25 + 7.536tNow suppose we want to estimate ship loadings for 2000. First, we must convert 2000 to the value of the coded time (in half year intervals):bti = 2000-1995 ½ = 4.5 years = 9 half years intervals. Substituting the value into the equation for the secular trend, we get:Yˆ = 139.25 + 7.536*(9) = 139.25+67.82 = 207 ships loaded.

Statistics for Business AdministrationTherefore, we have estimated 207 ships will be loaded in 1990. If the number of elements in our time series had been odd, not even, our procedure would have been the same except that we would have dealt with one year intervals, not half-year intervals.

Forecasting TechniquesForecasting is an essential technique of any business success, playing an important role in running an efficient business activity. It comprises marketing, financial planning, production forecast and other areas. It is extremely rare to consider a managerial decision without a form

5

Page 6: LB34

of forecasting. The forecast is not a product itself, but a tool in supporting the managerial decision. Forecasted values can be obtained using qualitative or quantitative techniques. If the forecast is the result of an expression of one or more experts personal judgment or opinion it is called a judgmental technique. The quantitative forecasting techniques are concerning time series and the regression analysis. Time series analysis exploits techniques that estimate one or more future values of the time series. The forecast depends upon the model of behavior of the time series. Within the regression analysis, the variable to be forecasted, the dependent variable, is expressed as a mathematical function upon the independent variable, as presented in Chapter 7. All three types of forecast allow to be used in conjunction with one another. The judgmental techniques are often used together with an appropiate time series or regression analysis.1 Judgmental TechniquesJudgmental forecasting techniques are by their divergent nature, subjective and are involving qualities as intuition, experience, attitudes and opinions. They ussualy lead to forecasts that are based upon qualitative criteria. The most important judgmetal technique is Delphi method. The method utilises a group of experts, located in different locations. In addition to this experts, there are one or more decision makers, responsible to take the decision based on the forecast. Also there should be a group of execution who is in charge of the running the procedure, preparing the questionnaires and analysing of their results. The first step of the method is to send a first questionnaire to a panel of experts and analyses the answers. Based upon the results of the first questionnaire the experts are receiving the second reviewd questionnaire together with the results of the first one. Based on the results of both questionnaires and using their own expertise the decision makers ultimately come forth with a forecast. The key to the Delphy method is the feedback of the information obtained after the first questionnaire to the group of experts. So each member of the group has acces to the other conclusions. The succes depends also upon the quality of the questionnaires. Occasionaly two or more iterations may be used to achieve an effective forecast, if there is a divergence of opinion between the experts after two iterations.. This leads to the third iteration. Another judgmental technique is the Group of Experts technique. Many experts are brought together interacting to each other to produce a concensus forecast.

2 Quantitative TechniquesWe can identify two categories of quantitative techniques based upon time series analysis and based upon the regression equations between an independent and a dependent variable.Because a time series is a description of the past, a logical procedure for the forecast is to make use of the historical data. If history is to repeat itself, past data are reproducing what to expect in the future, we can postulate a mathematical model that is prepresentative for the process. If this model is known, except possibly for certain parameters we can generate the forecast.If the model is not known the past data may suggest its form. In most concrete situations knowledge of the exact form of the model that generates the time series is unknown. Usually the model is chosen by observing the outcomes of a time series over a period of time and by displaying the historical data. Once the form of the model is chosen, an appropriate

6

Page 7: LB34

mathematical model of generating time series process is identified, except possibly for the unknown values of certain parameters.We will denote by:- Xi the random variable that is observed over time,- a the constant value of the model,- ei random error occuring at time t, frequently assumed to have expectedvalue equal to zero and constant variance- Ft + 1 forecast of the value of the time series at time t + 1It is reasonable to expect that Ft + 1 will be a function of some, or all, observed values of the time series prior to time t + 1.

A. Forecasting Techniques for Constant Level ModelsSuppose that the generating process of the time series is identified as a constant level model with random fluctuations an appropiate model using the previous notations is given by:

Statistics for Business AdministrationThe following are four techniques that are often used in the practical activity:Last value forecasting procedure:Let yi be the value the random variable Yt takes on. The last value forecasting procedure is to assume that the forecast at time t + 1, that is Ft + 1, equals the value of the time series observed at time t:

Average forecasting procedureThe forecast at time t + 1 will be the mean value of all the historical data, so:

This estimates is an excellent one if the process is entirely stable and as a result the assumptions about the model are correct. When using large masses of data, this estimate is to unmanageable. Also some decision makers do not want to use too old data, because there is skepticism about the existence of the model for too long time.Moving average forecasting procedureAccording to this procedure using a moving average estimate the forecast is using only the last n periods of the last fluctuation:

7

Page 8: LB34

This method is using the relevant data in the past n periods and it is easily updated from period to period, because the first observation is lopped off and the last one is added. The moving average estimators are combining the advantages of the previous estimators in that it uses recent history and represents multiple observations.

A disadvantage of this method is that it assigns the same weight to xt-n+1 as to xt and intuitively one would expect to put more weight on the most recent observations.Exponential Smothing Forecasting ProcedureIn the case of the exponential smoothing the forecasted value will be a weighted sum of the last observation and the previous forecast:

where,0 < α < 1 is called the smothing constant.The exponential smoothing technique represents a recursive relationship and can be expressed as :

In this case the largest weight goes to xt and the weight decreases to earlier observations.Another alternative form is given by:

which gives a more simple justification for this procedure. This alternative form is often easier to use. A measure of efficiency of the exponential smoothing can be obtained under the assumption that the process is stable. That is, X1, X2, ..., Xt are independent, identically distributed random variables with variance σ2. For large t, it follows that:

so that the variance is equivalent to a moving average computed out of a number of (2 - α)/α observations.Suppose that the time series can be represented by a linear trend superimposed with random fluctuations. If the linear trend has the slope b, than b is called the trend factor. The model can be represented by the trend equation:

where,Yt: the random variable observed at time ta: the intercept, the constant valueb: the trend factor

8

Page 9: LB34

et: random error occuring at time tFor the linear trend model we will introduce the concept of smoothed level. If yt is the observed value of the time series at time t, then the smoothed level at time t, St, will be a linear combination of yt and the smoothed value at the preceding time t - 1 corrected by adding the trend factor, the slope, to indicate the passage of a unit of time as follows:

The forecast for the time t + 1 will be:

The slope is unknown it must be estimated. For this we can use again exponential smoothing:

where,bt : the smoohted value of the trend at the end of period t0 < β < 1: another smothing constant possibly different from αNow we can express the smothing level St as:

The forecast for m periods ahead is given by:

As in the case of exponential smoothing for a constant level model an initial value is required to start the smoothing process for the linear trend model. This initialization is frquently obtained by fitting a straight line to some past data, using the least squares method. the line can be used to obtaine an initial value of the smoothed level of the time series, S0, and an initial valueof the smoothed trend level b0.Thus:

If a forecast for period t = 2 is desired than it can be obtained from:

9

Page 10: LB34

Working Mode:

1) Compute the table of time series indicators for the next data set and to make a description of the evolution of the GDP.

Dataset: 1. Gross domestic product (GDP) of Moldova

Gross Domestic Product and Gross Value Added by Economic Activities,

PricesCurrent prices

Years 1995 1996 1997 1998 1999 2000 2001 2002Gross Domestic Product 6,48 7,80 8,92 9,12 12,32 16,02 19,05 22,56

PricesCurrent prices

Years 2003 2004 2005 2006 2007 2008 2009 2010Gross Domestic Product 27,62 32,03 37,65 44,75 53,43 62,92 60,43 71,89

MOLDOVA GDP abs modification coefficient growth rate of growth, % modification rate, %

bil.$ current basic current basic current basic current basic2004 2,6 2005 3 0,4 0,4 1,1538462 1,15384615 115,384615 115,384615 15,3846154 15,38461542006 3,4 0,4 0,8 1,1333333 1,30769231 113,333333 130,769231 13,3333333 30,76923082007 4,4 1 1,8 1,2941176 1,69230769 129,411765 169,230769 29,4117647 69,23076922008 6,1 1,7 3,5 1,3863636 2,34615385 138,636364 234,615385 38,6363636 134,6153852009 5,4 -0,7 2,8 0,8852459 2,07692308 88,5245902 207,692308 -11,4754098 107,6923082010 5,8 0,4 3,2 1,0740741 2,23076923 107,407407 223,076923 7,40740741 123,076923

Dataset: 2. Gross domestic product (GDP) of Mexico

10

Page 11: LB34

Transaction Gross domestic product (expenditure approach)

Measure US $, constant prices, constant PPPs, OECD base year, billions

Frequency Annual

Time2002 2003 2004 2005 2006 2007 2008 2009 201

02011 2012

Country                        Mexico

  1187,3 1203,8 1252,8 1293,8 1359,2 1405,0 1422,1 1336,8 1407,9 1463,1 1520,5

MEXICO GDP abs modification coefficient growth rate of growth, % modification rate, % bil.$ current basic current basic current basic current basic

2004 1252,756

2005 1293,78841,03224

6 41,0322461,032753

6 1,03275358103,27535

8 103,2753583,2753584

6 3,27535846

2006 1359,23865,44980

2 106,482051,050587

7 1,08499824105,05877

3 108,4998245,0587728

4 8,49982425

2007 1404,955 45,71728 152,199331,033634

5 1,12149161103,36344

9 112,149161 3,3634494 12,1491609

2008 1422,06517,10936

8 169,30871,012177

9 1,13514899101,21778

7 113,5148991,2177874

6 13,5148994

2009 1336,777

-85,28747

3 84,0212220,940025

6 1,0670691194,002559

7 106,706911

-5,9974402

9 6,70691105

2010 1407,942 71,16498 155,18621,053236

2 1,12387585105,32362

4 112,3875855,3236235

1 12,3875853

Conclusion: In both countries the GDP rate of growth fell out in 2009 when the world was taken by a crisis.

2) Construct the time linear regression.

Consider the X values as numbers from 1, 2 …, and the Y values - the GDP values for 2004-2010 years. Use the INTERCEPT and SLOPE functions to determine the parameters of the regression.Or use the special tool in Excel. Tools/Add-ins and Select Analysis Tool Pak and press OK

Then Tools/Data Analysis… Select RegressionConstruct the linear regression equation.Where Y is GDP, and X is time variable.

11

Page 12: LB34

year GDP, bil.$ (Y)

t

2004 12005 22006 32007 42008 52009 62010 7

The time linear regression for Mexico.

SUMMARY OUTPUT

12

Page 13: LB34

Regression Statistics

Multiple R 0,744211R Square 0,55385Adjusted R Square 0,46462Standard Error 46,60207Observations 7

ANOVA

  df SS MS FSignificance

F

Regression 1 13480,08 13480,08 6,207003 0,055068Residual 5 10858,76 2171,753Total 6 24338,84

 Coefficient

sStandard

Error t Stat P-value Lower 95%Upper 95%

Lower 95,0%

Upper 95,0%

Intercept -42682,7 17675,58 -2,41479 0,060506 -88119,3 2753,789

-88119,

3 2753,789

X Variable 1 21,94154 8,806963 2,491386 0,055068 -0,69748 44,58056

-0,6974

8 44,58056

RESIDUAL OUTPUT

Observation Predicted Y Residuals

1 1288,107 -35,3512 1310,048 -16,26033 1331,99 27,247914 1353,932 51,023655 1375,873 46,191486 1397,815 -61,03757 1419,756 -11,8141

13

Page 14: LB34

2003 2004 2005 2006 2007 2008 2009 2010 20111150

1200

1250

1300

1350

1400

1450

f(x) = 21.94154309023 x − 42682.7454316695R² = 0.553850403861616f(x) = − 7.85998248348219 x² + 31571.9112317863 x − 31703045.8881069R² = 0.767067759936526

GDP Fluctuations of Mexico in period of 2004-2010 years

GDPPolynomial (GDP)Linear (GDP)Polynomial (GDP)

Years

GDP,

bln

$

The time linear regression for Moldova.

SUMMARY OUTPUT

Regression Statistics

Multiple R 0,930999R Square 0,86676Adjusted R Square 0,840111Standard Error 0,566632Observations 7

ANOVA

  df SS MS FSignificanc

e F

Regression 110,4432

110,4432

132,5261

4 0,002314

Residual 51,60535

70,32107

1

Total 612,0485

7

14

Page 15: LB34

 Coefficien

tsStandard Error t Stat P-value

Lower 95%

Upper 95%

Lower 95,0%

Upper 95,0%

Intercept -1221,32214,916

3-

5,682760,00235

1 -1773,78-

668,858-

1773,78-

668,858X Variable 1 0,610714

0,107083

5,703169

0,002314 0,335448

0,885981

0,335448

0,885981

RESIDUAL OUTPUT

Observation

Predicted Y

Residuals

1 2,5535710,04642

92 3,164286 -0,164293 3,775 -0,375

4 4,3857140,01428

6

5 4,9964291,10357

16 5,607143 -0,207147 6,217857 -0,41786

15

Page 16: LB34

2003 2004 2005 2006 2007 2008 2009 2010 20110

1

2

3

4

5

6

7

f(x) = 0.610714285714286 x − 1221.31785714286R² = 0.866759544700042f(x) = − 0.0488095237988071 x² + 196.532142814117 x − 197828.276147291R² = 0.883368903639794

GDP Fluctuations of Moldova in period of 2004-2010 years

GDPLinear (GDP)Polynomial (GDP)

Years

GDP,

bln

$

3) Calculate the forecasting for Mexican GDP in next 2 years (2011-12 year).

I) 1st method of forecasting GDP by regression formula:

y = 21,942x - 42683R² = 0,5539

In the cell for the year 2011 I wrote the equation y = 21,942x – 42683, putting instead of x the time variable of the 2011 year – 8. The same thing I did to the year 2012.

MEXICO GDP bil.$ t

2004 1252,756 12005 1293,788 22006 1359,238 32007 1404,955 42008 1422,065 52009 1336,777 62010 1407,942 72011 1441,736 82012 1463,678 9

16

Page 17: LB34

II)2nd method of forecasting GDP using average time indicators:

Calculating average absolute modification by formula Δ=(GDP last year-GDP first year)/Nr. Of years-1

Calculating average coefficient of growth by formula I =n−1√GDPcurrent¿GDPbase

Calculating forecasted GDP by formula of average absolute modification =GDP last year+ Δ(average absolute modification)

Calculating forecasted GDP by formula of average coefficient of growth = GDP last year*I (average coefficient of growth)

Total years: 7 forecasting

average absolute modofication forecasting

average coefficient growth

Average absolute modification: 25,864367 2011 1433,80644 2011 1435,615Average coefficient of growth: 1,01965454 2012 1459,67081 2012 1463,831

4) Calculate the forecasting for Moldavian GDP in next 2 year (2011-12 year).

I) 1st method of forecasting GDP by regression formula:

y = 0,6107x - 1221,3R² = 0,8668

In the cell for the year 2011 I wrote the equation y = 0,6107x - 1221,3, putting instead of x the time variable of the 2011 year – 8. The same thing I did to the year 2012.

MOLDOVA GDP bil.$ t

2004 2,6 12005 3 22006 3,4 32007 4,4 42008 6,1 52009 5,4 62010 5,8 72011 6,8285 82012 7,4392 9

17

Page 18: LB34

II)2nd method of forecasting GDP using average time indicators:

Calculating average absolute modification by formula Δ=(GDP last year-GDP first year)/Nr. Of years-1

Calculating average coefficient of growth by formula I =n−1√GDPcurrent¿GDPbase

Calculating forecasted GDP by formula of average absolute modification =GDP last year+ Δ(average absolute modification)

Calculating forecasted GDP by formula of average coefficient of growth = GDP last year*I (average coefficient of growth)

Total years: 7 forecasting

average absolute modofication

forecasting

average coefficient growth

Average absolute modification:

0,533333 2011 6,333333333 2011 6,629851

Average coefficient of growth:

1,143078 2012 6,866666667 2012

7,57843521

5) Draw the line plot for GDP

Select in plot the line “GDP” and then use the menu Chart/ Add trend lineConstruct the linear equation, power equation, exponential equation. What the plot (graph) shows the tendency is better?

18

Page 19: LB34

6) Compare the evolution of the two countries (Moldova and the selected country). Use the average absolute modification and average modification rate.

Total years: 7Average absolute modification: 0,533333

Averag modification rate:15,4496791

Conclusion: The average growth rata shows how much this phenomenon has increased in relative size, in the analyzed period, on average from one unit to another of interval.

The average growth rata (or av. modification rate) is calculated as the difference between dynamic environment index, expressed as a percentage.R =( I *100 ) - 100;

While average absolute modification can be calculated by the formula:

7) Conclusions about the evolution and possibilities of the time series analysis.

The chronological series composed of two rows of parallel data, the first rows show variation of the time periods variation and the second rows show the variation of the phenomenon or characteristics studied over time. The chronological series has the following features: variability, homogeneity, interdependence to its terms. The variability is because each term is obtained by centralizing of individual data different as level of development. These individual data exist because, that in social phenomena acts besides the essential causes, determinants causes, and a sufficient number of nonessential causes. When analyzing time series, we have to take into account the fact that they are prepared for the complex units. For them, the degree of variation of indicators includes structural variations from one time unit to another.Interdependent of terms: the indicators are successive values of the some phenomena recorded in the some territorial or administrative units. In time series need to know the trend

19

Total years: 7Average absolute modification: 25,864367

Average modification rate:2,04025856

Page 20: LB34

curve specific to each stage of development, and statistical expressing is the action of the low even causing them. Because of the interdependence among terms, in the case of chronological series it’s necessary to know the trend line (curve) which is specific to each development stage and which expresses statistical and in quantitative manner the action of law that determine them. Taking into account all these particularities, the statistical analysis of chronological series shall be based on a system of indicators that characterize many quantitative relations of the inside of series and at the period to which data is referring.The indicators obtained by processing a chronological time series can be organized in a system, here each indicator may emphasize one aspect of the development of the phenomena studied.The of these indicators is determined by the manner in which the time series, by the significance of the period chosen for the evolution of the phenomenon studied, by the homogeneity of empirical data used and the length of the time series. The number of terms must be large enough to satisfy the law of large numbers, they interpret statistical regularities evolutionary phenomena.When working with heterogeneous time series, with various development trends it is necessary to calculate these indicators on each step as partial indicators, or indicators obtained from processing is not real and practical and theoretical conclusions have not made proper basis and can not be calculations used for forecast.

Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values. While regression analysis is often employed in such a way as to test theories that the current values of one or more independent time series affect the current value of another time series, this type of analysis of time series is not called "time series analysis".

Time series data often arise when monitoring industrial processes or tracking corporate business metrics. The essential difference between modelling data via time series methods or using the process monitoring methods chapter is the following: Time series analysis accounts for the fact that data points taken over time may have an internal structure (such as autocorrelation, trend or seasonal variation) that should be accounted for.

Anyway, time series analysis gave the possibility to observe the evolution of GDP series of 2 developed countries. In both analyses we can see the evolution trends and influence of external factors on productivity of the given country ( e.g. The 2009 world crisis that affected country’s GDP). Also, thank to the time series indicators we can analyse the evolution of each country’s GDP volume, determining the absolute modifications and average modifications that help us to determine the character of development of the country’s economy.

20