introduction to regression lecture 2.1

71
Diploma in Statistics Introduction to Regression Lecture 2.1 1 Introduction to Regression Lecture 2.1 1. Review of Lecture 1.1 2. Correlation 3. Pitfalls with Regression and Correlation 4. Introducing Multiple Linear Regression Job times case study Stamp sales case study 5. Homework

Upload: audra-gonzales

Post on 31-Dec-2015

51 views

Category:

Documents


1 download

DESCRIPTION

Introduction to Regression Lecture 2.1. Review of Lecture 1.1 Correlation Pitfalls with Regression and Correlation Introducing Multiple Linear Regression Job times case study Stamp sales case study Homework. Review of Lecture 1.1. Scatter plot of US mail handling data, exceptions deleted. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 1

Introduction to RegressionLecture 2.1

1. Review of Lecture 1.1

2. Correlation

3. Pitfalls with Regression and Correlation

4. Introducing Multiple Linear Regression

– Job times case study

– Stamp sales case study

5. Homework

Page 2: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 2

Review of Lecture 1.1

Scatter plot of US mail handling data,exceptions deleted

150 160 170 180 190

Volume

550

600

650

700

Manhours

Page 3: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 3

Always look ar your data!

"Although regression can be done without ever looking at a scatter plot, that is the statistical equivalent of flying blind"

Amy Lap Mui Choi, JF MSISS, 1993/94.

"Decision-making under risk is when you know what will probably happen and

decision-making under uncertainty is when you probably know what will happen." 

Anon., JF MSISS 1995/96

Page 4: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 4

Simple linear regression modelwith Normal model for chance variation

150 160 170 180 190

Volume

550

600

650

700

Manhours

Y = α + βX +

Page 5: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 5

The prediction formula

Prediction equation:

Prediction equation allowing for chance variation:

XˆˆY

ˆ2XˆˆY

Page 6: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 6

Homework

Use the prediction formula

to predict the extra manpower requirement during Christmas period, based on the experience of Period 7, Fiscal 1963,

when Y was 1,070 and X was 270.

Compare with actual.

Comment.

40X3.350Y

Page 7: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 7

Application 1Confidence interval for marginal change

Recall confidence interval for

or

Confidence interval for :

Small sample:

)ˆ(SE2ˆ

)ˆ(SEtˆ05,.2n

Page 8: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 8

Selected critical values for the t-distribution .25 .10 .05 .02 .01 .002 .001

= 1 2.41 6.31 12.71 31.82 63.66 318.32 636.61 2 1.60 2.92 4.30 6.96 9.92 22.33 31.60 3 1.42 2.35 3.18 4.54 5.84 10.22 12.92 4 1.34 2.13 2.78 3.75 4.60 7.17 8.61 5 1.30 2.02 2.57 3.36 4.03 5.89 6.87 6 1.27 1.94 2.45 3.14 3.71 5.21 5.96 7 1.25 1.89 2.36 3.00 3.50 4.79 5.41 8 1.24 1.86 2.31 2.90 3.36 4.50 5.04 9 1.23 1.83 2.26 2.82 3.25 4.30 4.78 10 1.22 1.81 2.23 2.76 3.17 4.14 4.59 12 1.21 1.78 2.18 2.68 3.05 3.93 4.32 15 1.20 1.75 2.13 2.60 2.95 3.73 4.07 20 1.18 1.72 2.09 2.53 2.85 3.55 3.85 24 1.18 1.71 2.06 2.49 2.80 3.47 3.75 30 1.17 1.70 2.04 2.46 2.75 3.39 3.65 40 1.17 1.68 2.02 2.42 2.70 3.31 3.55 60 1.16 1.67 2.00 2.39 2.66 3.23 3.46 120 1.16 1.66 1.98 2.36 2.62 3.16 3.37 ∞ 1.15 1.64 1.96 2.33 2.58 3.09 3.29

Page 9: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 9

Application 2Testing the statistical significance of the

intercept

Formal test:

H0: = 0

Test statistic:

Critical value: 2 (or t21, .05 = 2.08)

Calculated value: 0.848

Comparison: Z < 2 (or t < 2.08)

Conclusion: Accept H0

)ˆ(SEˆ

)ˆ(SE0ˆ

Z

Page 10: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 10

Testing the statistical significance of the intercept

Informal test:

is less than its standard error,

Draw a picture!

46.59)ˆ(SE

4394.50ˆ

Page 11: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 11

Regression Analysis: Manhours versus Volume

The regression equation isManhours = 50.4 + 3.35 Volume

Predictor Coef SE Coef T PConstant 50.44 59.46 0.85 0.406Volume 3.3454 0.3401 9.84 0.000

S = 18.9300

More on Minitab results

Page 12: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 12

Homework

In a study of a wholesaler's distribution costs, undertaken with a view to cost control, the volume of goods handled and the overall costs were recorded for one month in each of ten depots in a distribution network. The results are presented in the following table.

Depot 1 2 3 4 5 6 7 8 9 10

Volume 48 57 49 45 50 62 58 55 38 51 (£ thousands) Costs 20 22 19 18 20 24 21 21 15 20 (£ hundreds)

Page 13: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 13

Homework

The simple linear regression of costs (Y) on volume (X) was calculated, and resulted in the following numerical summary.

Regression Analysis: Costs versus Volume

The regression equation isCosts = 2.98 + 0.332 Volume

Predictor Coef SE Coef T PConstant 2.982 1.646 1.81 0.108Volume 0.33174 0.03182 10.42 0.000

S = 0.667603

Page 14: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 14

Homework

(i) Draw a scatter plot for these data. Comment. Interpret the numerical summary in context.

(ii) Calculate a prediction interval for costs next month when Volume in Depot 1 is planned to be £40,000, and Volume in Depot 2 is planned to be £51,000.

(iii) Next month, when the two depots recorded volumes of £40,000 and £51,000 as planned, costs were £1,700 and £2,300 respectively. Comment on each case. Illustrate with an enhancement of your scatter plot.

Page 15: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 15

Homework Solution (i)

656055504540

24

23

22

21

20

19

18

17

16

15

Volume

Co

sts

Scatterplot of Costs vs Volume

There appears to be a strong positive relationship between Costs and Volume.

Page 16: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 16

Homework Solution (i)

Costs increase approximately linearly with Volume, by around £33.20 for every £1,000 increase in Volume, from a base of around £300.

(Costs = 2.98 + 0.332 Volume)

The cost for a given volume is subject to chance variation with a standard deviation of around £67.

(S = 0.667603)

Page 17: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 17

Homework Solution

(ii) Volume = £40,000, Costs (£1,491 , £1,759)

Volume = £51,000, Costs (£1,857 , £2,124)

(iii) £1,700 is within the corresponding prediction interval, satisfactory.

£2,300 is outside the corresponding prediction interval, too high. An investigation is needed.

Illustrate

Page 18: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 18

Confidence interval for mean response:

Prediction interval for next response:

2

XX s

XX

n

1

n

1s2Xˆˆˆ

2

XX s

XX

n

1

n

11s2XˆˆY

More precise formulas

(ii) Volume = £40,000, Costs (£1,444 , £1,807)

Volume = £51,000, Costs (£1,829 , £2,151)

Page 19: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 19

Standard error

• of prediction

• of estimation

Ref: "The Standard Error of Prediction"

Extra Notes folder in mstuart/get or Diploma webpage

2

Xs

XX

n

1

n

11s

2

Xs

XX

n

1

n

1s

Page 20: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 20

Homework Solution

656055504540

24

22

20

18

16

14

Volume

Co

sts

S 0.667603R-Sq 93.1%R-Sq(adj) 92.3%

Regression

95% PI

Fitted Line PlotCosts = 2.982 + 0.3317 Volume

Page 21: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 21

2. Correlation

• The correlation coefficient formula

• r and reduction of prediction error

• Positive and negative correlation

• Perfect correlation

• Conventional interpretations of r

Page 22: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 22

The correlation coefficient formula

Recall

equivalently,

n

1i

2in

1n

1i

2in

1

iin1

YYXX

)YY)(XX(r

yx

xy

ss

s

2

in1

2iin

1

)XX(

)YY)(XX(ˆ 2X

XY

s

s

Y

X

s

sˆr

X

Y

s

srˆ

Page 23: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 23

Scatter plot showing zero correlation

Page 24: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 24

Correlation r = 0.1 to r = 0.9

Data Desk

Page 25: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 25

r and reduction in prediction error

Page 26: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 26

r and reduction in prediction error

Y2 sr1ˆ

Ys2YY00r

ˆ2XˆXˆYˆ2XˆˆY

0ˆ1r

Page 27: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 27

Positive and negative correlation

Page 28: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 28

Perfect correlation, positive and negative

Page 29: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 29

Conventional interpretations of r

Science / Engineering: r > 0.9 is "interesting"

Econometrics: r > 0.7 is "interesting",

otherwise, r > 0.5 is "interesting"

Sociology: r > 0.3 is "interesting"

Recommendation: compare s to SY

Page 30: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 30

3. Pitfalls with regression and correlation

Anscombe's data

X1 Y1 X2 Y2 X3 Y3 X4 Y4

10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76

13 7.58 13 8.74 13 12.74 8 7.7 9 8.81 9 8.77 9 7.11 8 8.84

11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.10 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.10 4 5.39 19 12.50

12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 5 5.68 5 4.74 5 5.73 8 6.89

Page 31: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 31

Anscombe's data summary

Set 1 Set 2 Set 3 Set 4 X1 Y1 X2 Y2 X3 Y3 X4 Y4

Count 11 11

11 11 11 11 11 11

Mean 9.0 7.5

9.0 7.5 9.0 7.5 9.0 7.5

Standard deviation

3.32 2.03 3.32 2.03 3.32 2.03 3.32 2.03

a b a b a b a b Simple linear

regression 3.0 0.5 3.0 0.5 3.0 0.5 3.0 0.5

Correlation 0.82

0.82 0.82 0.82

Page 32: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 32

Anscombe's scatter plots

Page 33: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 33

Homework

The shelf life of packaged foods depends on many factors. Dry cereal (such as corn flakes) is considered to be a moisture-sensitive product, with the shelf life determined primarily by moisture. In a study of the shelf life of one brand of cereal, packets of cereal were stored in controlled conditions (23°C and 50% relative humidity) for a range of times, and moisture content was measured. The results were as follows.

Draw a scatter diagram. Comment. What action is suggested? Why?

Storage Time

0 3 6 8 10 13 16 20 24 27 30 34 37 41

Moisture Content

2.8 3.0 3.1 3.2 3.4 3.4 3.5 3.1 3.8 4.0 4.1 4.3 4.4 4.9

Page 34: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 34

Following appropriate action, the following regression was computed.

The regression equation isMoisture Content = 2.86 + 0.0417 Storage Time

Predictor Coef SE Coef T PConstant 2.86122 0.02488 115.01 0.000Storage Time 0.041660 0.001177 35.40 0.000

S = 0.0493475

Calculate a 95% confidence interval for the daily change in moisture content; show details.

Page 35: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 35

Was the action you suggested on studying the scatter diagram in part (a) justified? Explain.

Predict the moisture content of a packet of cereal stored under these conditions for 3 weeks; calculate a prediction interval.

What would be the effect on your interval of not taking the action you suggested on studying the scatter diagram? Why?

Taste tests indicate that this brand of cereal is unacceptably soggy when the moisture content exceeds 4. Based on your prediction interval, do you think that a box of cereal that has been on the shelf for 3 weeks will be acceptable? Explain.

What about 4 weeks? 5 weeks? What is acceptable?

Page 36: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 36

Reading

SA Sections 6.4, 6.5

Page 37: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 37

4 Introducing Multiple Linear Regression

• SLR explaining variation in Y

in terms of variation in X

• MLR explaining variation in Y

in terms of variation in several X 's

Page 38: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 38

Example 1What determines the taste of mature cheese?

• X1 = Acetic Acid

• X2 = Hydrogen Sulphide

• X3 = Lactic Acid

• Y = Taste Score

Page 39: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 39

Example 2Explaining crime rates

Variable Description

M percentage of males aged 14–24 So indicator variable for a southern state Ed mean years of schooling Po1 police expenditure in 1960 Po2 police expenditure in 1959 LF labour force participation rate M.F number of males per 1000 females Pop state population NW number of nonwhites per 1000 people U1 unemployment rate of urban males 14–24 U2 unemployment rate of urban males 35–39 GDP gross domestic product per head Ineq income inequality Prob probability of imprisonment Time average time served in state prisons

Crime rate of crimes in a particular category per head of population

Page 40: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 40

Example 3Estimating tree volume / timber yield

For a sample of 31 black cherry trees in the Allegheny

National Forest, Pennsylvania, measure

• Y = volume (cubic feet),

• X1 = height (feet)

• X2 = diameter (inches) (at 54 inches above

ground

Page 41: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 41

Example 4The Stamp Sales Case Study

The problem

• January 1984, An Post established

• New business plan; sales forecasts required

• Historical sales data available

bring in a consultant!

Page 42: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 42

Example 5A production prediction problem

• The problem

• The data

• Initial data analysis

– dotplots– lineplots (time series plots)– scatterplot matrix

• Model fitting / estimation

• Model criticism

• Application

Page 43: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 43

Erie Metal Products: The problem

Metal products fabrication:

customers order varying quantities of products of varying complexity;

customers demand accurate and precise order delivery times.

Page 44: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 44

Stephan Clark Metal Products

A specially designed cabinet Rear view

Page 45: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 45

Stephan Clark Metal Products

Instrument casing Another view

Page 46: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 46

Stephan Clark Metal Products

Instrument casing; oblique view Lockers

Page 47: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 47

Stephan Clark Metal Products

• "One customer is an international manufacturer of petrochemical equipment."

• "Stephen Clark supplies painted metalwork components, panels and fabrications, which are used throughout the customer's product range."

• "Stephen Clark plays an important part in them being able to cope with frequent scheduling changes."

• "Through careful program management, we are able to offer excellent flexibility of supply, delivering finished product against weekly call-offs."

Page 48: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 48

Table 8.1 Times, in hours, to complete jobs with varying numbers of units, numbers of operations per unit and priority status (normal or rushed)

Order Jobtime Units Operations Normal (0)

number (hours) per unit or Rushed (1)? 1 153 100 6 0

2 192 35 11 0 3 162 127 7 1 4 240 64 12 0 5 339 600 5 1 6 185 14 16 1 7 235 96 11 1 8 506 257 13 0 9 260 21 9 1

10 161 39 8 0 11 835 426 14 0 12 586 843 6 0 13 444 391 8 0 14 240 84 13 1 15 303 235 9 1 16 775 520 12 0 17 136 76 8 1 18 271 139 11 1 19 385 165 14 1 20 451 304 10 0

Erie Metal Products: The data

Page 49: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 49

The variables

• Response:

– Jobtime, time (hours) to complete an order

• Explanatory:

– Units, the number of units ordered

– Operations per Unit, the number of operations involved in manufacturing a unit,

– Rushed, indicator of "rushed" priority status

– Total Operations Units × Operations per Unit

Page 50: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 50

Initial data analysis, dotplots

Page 51: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 51

Initial data analysis, lineplots

0 5 10 15 20

Job number

200

400

600

800

Job times

0 5 10 15 20

Job number

0

200

400

600

800

Units

0 5 10 15 20

Job number

5

10

15

Operationsper Unit

0 5 10 15 20

Job number

0

2000

4000

6000

TotalOperations

Page 52: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 52

Initial data analysis, scatterplot matrix

Page 53: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 53

The multiple linear regression model

Jobtime =

Units × Units

Ops × Ops

T_Ops × T_Ops

Rushed × Rushed

Page 54: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 54

Model parameters

The regression coefficients:

Units, Ops, T_Ops, Rushed

The "uncertainty" parameter:

standard deviation of

Page 55: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 55

Parameter estimates

Prediction formula

Jobtime = 44 – 0.07×Units + 9.8×Ops + 0.1×T_Ops – 38×Rushed ± 15

Exercise

Job 9, a rushed job with 21 units and 9 operations per unit, took 260 hours to complete.

Was this reasonable?

4.7ˆ

38ˆ,10.0ˆ,8.9ˆ,07.0ˆ,44ˆ RushedOps_TOpsUnits

Page 56: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 56

Find values for and that minimise the deviations

Y1 − − X1,

Y2 − − X2,

Y3 − − X3,

Yn − − Xn

Choosing values for the regression coefficients,SLR

Page 57: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 57

The method of least squares,SLR

Find values for and that minimise the sum of the squared deviations:

(Y1 − − X1)2

+ (Y2 − − X2)2

+ (Y3 − − X3)2

+ (Yn − − Xn)2

Page 58: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 58

The method of least squares,MLR

Find values for and that minimise the sum of the squared deviations:

(Y1 − − 1X11− 2X21− 3X31 − etc. )2

+ (Y2 − − 1X12− 2X22− 3X32 − etc. )2

+ (Y3 − − 1X13− 2X23− 3X33 − etc. )2

+ (Yn − − 1X1n− 2X2n− 3X3n − etc. )2

Minitab!

Page 59: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 59

Regression of Jobtime on other variables

Predictor Coef SE Coef T PConstant 77.24 44.76 1.73 0.105Units -0.1507 0.1121 -1.34 0.199Ops 7.152 4.305 1.66 0.117T_Ops 0.11460 0.01322 8.67 0.000Rushed -24.94 19.11 -1.31 0.211

S = 37.4612

Page 60: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 60

Exercise

From the computer output, write down the parameter estimates and the prediction formula.

Predict job times for a typical job, say 300 units requiring 10 operations per unit, both normal and rushed.

Page 61: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 61

Exercise (continued)

Is this a useful prediction?

What is S?

What is 2S?

When will my order arrive?

NEXT

Diagnostics; analysis of residuals

Page 62: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 62

Homework

Predict job times for

small (U=100, O=5),

medium (U=300, O=10) and

large (U=500, O=15) jobs,

both normal and rushed.

Present the results in tabular form.

Page 63: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 63

Return toThe Stamp Sales Case Study

The problem

• January 1984, An Post established

• New business plan; sales forecasts required

• Historical sales data available

bring in a consultant!

Page 64: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 64

Historical dataTable 1.4 Annual sales of stamps and metered mail, 1949 - 1983

Year Stamp Sales1

Meter Sales

Total Sales

Year Stamp Sales

Meter Sales

Total Sales

1949 245.2 42.0 287.2 1967 234.3 162.8 397.1 1950 224.4 48.6 273.0 1968 238.6 169.3 407.9 1951 241.3 52.1 293.4 1969 242.7 186.5 429.3 1952 251.3 60.9 312.3 1970 226.4 197.5 423.9 1953 236.7 65.8 302.5 1971 199.4 172.2 371.6 1954 231.6 69.1 300.7 1972 205.4 192.8 398.2 1955 235.8 75.1 310.8 1973 201.6 195.9 397.4 1956 253.0 90.4 343.4 1974 191.1 199.6 390.8 1957 262.6 98.1 360.7 1975 181.0 213.3 394.3 1958 265.4 104.6 370.0 1976 174.9 240.9 415.8 1959 266.0 107.5 373.4 1977 181.0 258.4 439.3 1960 278.4 112.4 390.8 1978 188.2 240.8 429.0 1961 277.7 116.9 394.6 1979 112.5 163.5 276.0 1962 235.9 105.0 340.9 1980 163.7 211.5 375.2 1963 230.0 105.2 335.2 1981 162.1 195.3 357.4 1964 234.8 121.3 356.1 1982 148.9 228.5 377.4 1965 228.8 149.0 377.8 1983 151.2 259.7 410.9 1966 230.1 153.7 383.8 1 Sales are recorded as millions of standard stamp equivalents, that is, total revenue in a year divided by

the price of a stamp for a standard sealed letter for internal delivery, and divided by 1,000,000.

Page 65: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 65

Trend projection?

100

200

300

400

Data

1950 1960 1970 1980

Year

Stamp Sales

Meter Sales

Total Sales

Page 66: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 66

Factors influencing sales

• Economic growth

• Stamp prices

• Alternative product prices

measurement problems!

Page 67: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 67

Project: develop a sales forecasting system for An Post

Terms of reference

1. Identify and collect the relevant macro-economic data.

2. Establish a data base containing the data needed for model building;

3. Identify, estimate and check a dynamic regression model suitable for the purposes outlined below:

Page 68: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 68

(a) medium-term (one to five years) forecasting of aggregate demand for postal services;

(b) analysis of the effects of levels of general economic activity, postal prices and the prices of competing services, on aggregate demand for postal services;

(c) use as a benchmark for the analysis of the effects of demand stimulation activities.

Page 69: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 69

Project: develop a sales forecasting system for An Post

Terms of reference

1. Identify and collect the relevant macro-economic data.

2. Establish a data base containing the data needed for model building;

3. Identify, estimate and check a dynamic regression model suitable for the purposes outlined below:

Page 70: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 70

(a) medium-term (one to five years) forecasting of aggregate demand for postal services;

(b) analysis of the effects of levels of general economic activity, postal prices and the prices of competing services, on aggregate demand for postal services;

(c) use as a benchmark for the analysis of the effects of demand stimulation activities.

Page 71: Introduction to Regression Lecture 2.1

Diploma in StatisticsIntroduction to Regression

Lecture 2.1 71

Reading

SA Sections 1.6, 8.1, 8.2,