fcas final project report - galit shmueli...artika . 61910321 vividha arabandi 61910249 ... 1....

11
FCAS FINAL PROJECT REPORT Forecasting Temperature Using Weather Time Series Data of US & Canadian Cities for Effective Inventory Stock Plan at Retail Stores Section B- Group 10 Rahul Sharma 61910137 Himanshu Jain 61910148 Rahul Garg 61910740 Artika . 61910321 Vividha Arabandi 61910249 Varghese Zachariah 61910374

Upload: others

Post on 29-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FCAS FINAL PROJECT REPORT - Galit Shmueli...Artika . 61910321 Vividha Arabandi 61910249 ... 1. Executive Summary..... 1 1.1 Problem Summary..... 1 1.2 Description of data, its source,

FCAS FINAL PROJECT REPORT Forecasting Temperature Using Weather Time

Series Data of US & Canadian Cities for Effective Inventory Stock Plan at Retail Stores

Section B- Group 10

Rahul Sharma 61910137

Himanshu Jain 61910148

Rahul Garg 61910740

Artika . 61910321

Vividha Arabandi 61910249

Varghese Zachariah 61910374

Page 2: FCAS FINAL PROJECT REPORT - Galit Shmueli...Artika . 61910321 Vividha Arabandi 61910249 ... 1. Executive Summary..... 1 1.1 Problem Summary..... 1 1.2 Description of data, its source,

Table of Contents

1. Executive Summary...............................................................

1

1.1 Problem

Summary..............................................................

1

1.2 Description of data, its source, key characteristics, &

chart(s)....................................................................................

1

1.3 High level description of the final forecasting method

and performance on meaningful performance

metrics...........................

2

1.4 Conclusions and

recommendations........................................

3

2. Technical Summary...............................................................

4

2.1 Data

preparation/issues......................................................

4

2.2 Forecasting

methods............................................................

5

2.3 Performance compared to

benchmarks....................................

5

3. Appendix………………...............................................................

6

Page 3: FCAS FINAL PROJECT REPORT - Galit Shmueli...Artika . 61910321 Vividha Arabandi 61910249 ... 1. Executive Summary..... 1 1.1 Problem Summary..... 1 1.2 Description of data, its source,

1

1. Executive Summary

1.1| Problem Summary

Several cities in the continental US and Canada see temperature well below freezing. This

has several consequences for local business as mobility is restricted and transport cost

increase. Thus, for six cities, that show extreme weather condition (Chicago, Denver,

Detroit, Vancouver, Toronto, Montreal), we have collected weather data for the past five

years and intend to prepare a model that predicts the possible temperature range for the

coming year. This will help Retail owners like Walmart, Kmart, Weston, Costco along with

various mom-pop retails can plan their inventory stock up of various crucial SKUs. Inventory

planning is critical as the extreme weather these cities see lead to costly and difficult

inventory replenishment. There is also a decrease in sales as movement becomes more

difficult and customer tend to avoid venturing out. This could lead to a potential loss of

millions of dollars in overstocking and understocking costs.

1.2| Description of data, its source, key characteristics, & chart(s)

The data series contains ~5 years of high temporal resolution (hourly measurements) data of

temperature in Kelvin in 3 US cities and 3 Canadian cities (Chicago, Denver, Detroit,

Vancouver, Toronto, Montreal). The data has been downloaded from kaggle.com1

The data has the following key characteristics in terms of the time series components:

▪ Noise- This is the non-systematic component and is present in all the time series. It

shows the random variation that results from measurement errors or other causes that

are not accounted for.

▪ Level- This is a systematic component and a point estimate that shows the average

value of the series. This is also present in all kinds of time series.

▪ Trend- All the six time-series have a linear trend.

▪ Seasonality- In all the six time-series we can observe additive seasonality.

Charts showing the raw data vs time for all the 6 cities can be seen in Exhibit 1

Exhibit 1: Time-Series chart for Chicago, Denver, Detroit, Vancouver, Toronto & Montreal from Dec 1, 2012 to Nov 30, 2017

1 https://www.kaggle.com/selfishgene/historical-hourly-weather-data/home

Page 4: FCAS FINAL PROJECT REPORT - Galit Shmueli...Artika . 61910321 Vividha Arabandi 61910249 ... 1. Executive Summary..... 1 1.1 Problem Summary..... 1 1.2 Description of data, its source,

2

1.3| High level description of the final forecasting method and

performance on meaningful performance metrics

The following methods were tried on the time series data for the six cities:

A) Holt Winter (Triple Exponential) with additive seasonality - This method of

smoothing was chosen because the time series had both a linear trend and seasonality. The

seasonality was a 365 period. We chose additive seasonality based on the raw data analysis.

Period 365

Alpha (Level) Optimise [Refer Technical Summary]

Beta (Trend) Optimise [Refer Technical Summary]

Gamma (seasonality) Optimise [Refer Technical Summary]

B) Regression with Linear Trend – In some cities, Holt Winter’s model performance was

not satisfactory, wherein the RMSE was quite high. We then tried linear regression for

forecasting.

Categorical Variables: Season_Index_1 …..Season_index_12 , Predictor Variable : t

Output variable: City temperature

We had tried with Quadratic trend; however, the trend portion is so flat, that the coefficient of

t2 was insignificant.

C) ARIMA – In linear regression model, we saw that the forecasts were very staggered, i.e.

in a window of few nearby forecasts were same value. This was a strong indication for

autocorrelation. On performing the lag analysis, it was confirmed that there was

Page 5: FCAS FINAL PROJECT REPORT - Galit Shmueli...Artika . 61910321 Vividha Arabandi 61910249 ... 1. Executive Summary..... 1 1.1 Problem Summary..... 1 1.2 Description of data, its source,

3

autocorrelation of lag-1. Therefore, we performed ARIMA and adjusted the linear regression

forecasts by adding the latest forecasted errors/residuals.

Results:

City Forecasting Method - Recommendation

Denver Linear Regression

Chicago Holt-Winter Smoothing

Detroit Linear Regression

Montreal Linear Regression

Toronto Holt-Winter Smoothing

Vancouver Holt-Winter Smoothing

1.4| Conclusions and recommendations

From the project, we have the following conclusions and recommendations for the project:

• Forecasting over short horizon is better: We learnt that forecasting over a shorter

horizon is better than forecasting over a longer horizon as local market intelligence

can be incorporated.

• Need to check autocorrelation: We saw in the linear regression models, that the

‘binned’ forecasts can exist – where same forecasts are given for 2-3 days window.

Thus, a lag-1 or lag-2 autocorrelation can exist, and lag analysis should be done.

• Avoid overcomplication: It is advisable to not invoice many variables to forecast.

• Use Control Charts for model performance. Review and adjust the model after a

fixed period (every 6 months) or when control chart indication of error.

2. Technical Summary

2.1| Data preparation/issues

Data has been taken from Kaggle (https://www.kaggle.com/selfishgene/historical-hourly-

weather-data#temperature.csv). Various data attributes are timestamp, city name and

temperature (in kelvin). The time series has components like

Level | Noise | Trend – Linear (very flat) | Seasonality

For preparing the data for use, following steps were taken

Page 6: FCAS FINAL PROJECT REPORT - Galit Shmueli...Artika . 61910321 Vividha Arabandi 61910249 ... 1. Executive Summary..... 1 1.1 Problem Summary..... 1 1.2 Description of data, its source,

4

1. Missing Value Treatment – There were entries missing in the data we downloaded.

As we have monthly seasonality in the data, we used Mode within a month for

imputing the missing values

2. Daily aggregation from hourly – The data we downloaded from Kaggle had hourly

temperature values. As our business goal is to help retailers use the weather forecast

in planning their inventories better and inventories are managed at the daily level not

hourly, temperature variations within the day are not relevant. Furthermore, we want

to concentrate on the effect of cold weather, hence, we used minimum temperature of

the day while aggregating the temperature to a daily level

3. For model preparation, columns ‘t’ & ‘t2’ were added and categorical variables were

created for Linear Regression for Season Index

4. Data was partitioned - Training: 4 years (1461 years); Validation: 1 year (365

records)

2.2| Forecasting methods

Following are the details of the Forecasting methods for Denver. All other cities are detailed

in the Appendix.

Holt’s Winter Smoothing - Alpha(Level) – 0.5849 , Beta(Trend) – 0.0038 , Gamma(Season) – 0.00817

Error Measures: Training

Record ID Value

SSE 17660.14

MSE 12.08771

MAPE 0.977481

MAD 2.674989

CFE -7.96718

MFE -0.00545

TSE -2.9784

Error Measures: Validation

Record ID Value

SSE 28687.15

MSE 78.59492

MAPE 2.777486

MAD 7.751918

CFE 2540.677

MFE 6.960758

TSE 327.7481

240

260

280

300

320

1

52

10

3

15

4

20

5

25

6

30

7

35

8

40

9

46

0

51

1

56

2

61

3

66

4

71

5

76

6

81

7

86

8

91

9

97

0

10

21

10

72

11

23

11

74

12

25

12

76

13

27

13

78

14

29

Training: Actual Vs Fitted

Denver Fitted: Denver

240

260

280

300

Validation: Actual vs Forecast

Denver Forecast -Holt Winter

Page 7: FCAS FINAL PROJECT REPORT - Galit Shmueli...Artika . 61910321 Vividha Arabandi 61910249 ... 1. Executive Summary..... 1 1.1 Problem Summary..... 1 1.2 Description of data, its source,

5

Linear Regression -

ARIMA – Autocorrelation Adjustment

2.3| Performance compared to benchmarks

We have used Naive forecast (lag 365 value) for benchmarking. Ie 01-April, 2017 was forecasted

using 01-April,2016.

We see that in some cities where Holt Winter smoothing didn’t perform well, RMSE was greater

than Naïve model, thus for those cities we have recommended Linear Regression that has

consistently performed better.

240

290

340

1

54

10

7

16

02

13

26

6

31

9

37

24

25

47

85

31

58

4

63

76

90

74

37

96

84

9

90

2

95

51

00

8

10

61

11

14

11

67

12

20

12

73

13

26

13

79

14

32

Training: Actual vs Fitted

Denver Prediction: Denver

Training: Prediction Summary

Metric Value

SSE 34266.26

MSE 23.45398

RMSE 4.842931

MAD 3.714774

R2 0.721353

Validation: Prediction Summary

Metric Value

SSE 7883.837

MSE 21.59955

RMSE 4.647532

MAD 3.440248

R2 0.709864240

260

280

300

Validation: Actual vs Forecast

Denver Prediction: Denver - Regression

Page 8: FCAS FINAL PROJECT REPORT - Galit Shmueli...Artika . 61910321 Vividha Arabandi 61910249 ... 1. Executive Summary..... 1 1.1 Problem Summary..... 1 1.2 Description of data, its source,

6

3. Appendix

Chicago

Holt Winter Smoothing with additive seasonality

0.881039

0.001831

0.082461

Alpha (Level)

Beta (Trend)

Gamma (seasonality)

Error Measures: Training Error Measures: Validation

Record ID Value Record ID Value Column1

SSE 17683.87 SSE 13263.69 115.1681

MSE 12.10395 MSE 36.33887 6.028173

MAPE 0.988919 MAPE 1.684337 1.29782

MAD 2.740549 MAD 4.688776 2.165358

CFE -3.5127 CFE 316.2392 17.78312

MFE -0.0024 MFE 0.866409 0.930811

TSE -1.28175 TSE 67.446 8.212551

Page 9: FCAS FINAL PROJECT REPORT - Galit Shmueli...Artika . 61910321 Vividha Arabandi 61910249 ... 1. Executive Summary..... 1 1.1 Problem Summary..... 1 1.2 Description of data, its source,

7

Detroit

Linear Regression

Montreal

Linear Regression

-30

-20

-10

0

10

204

27

05

42

71

7

42

72

9

42

74

1

42

75

3

42

76

5

42

77

7

42

78

9

42

80

1

42

81

3

42

82

5

42

83

7

42

84

9

42

86

1

42

87

3

42

88

5

42

89

7

42

90

9

42

92

1

42

93

3

42

94

5

42

95

7

42

96

9

42

98

1

42

99

3

43

00

5

43

01

7

43

02

9

43

04

1

43

05

3

43

06

5

Residual

Training: Prediction Summary

Metric Value

SSE 36252.1001

MSE 24.8132102

RMSE 4.981285998

MAD 3.921526299

R2 0.782243802

240

290

340

1

75

14

9

22

3

29

7

37

1

44

5

51

9

59

3

66

7

74

1

81

5

88

9

96

3

10

37

11

11

11

85

12

59

13

33

14

07

Training: Actual vs Fitted

Detroit Prediction: Detroit

240

260

280

300

1

21

41

61

81

10

1

12

1

14

1

16

1

18

1

20

1

22

1

24

1

26

1

28

1

30

1

32

1

34

1

36

1

Validation: Actual vs Forecast

Detroit Prediction: Detroit

Validation: Prediction Summary

Metric Value

SSE 10167.56

MSE 27.85632

RMSE 5.277908

MAD 4.194072

R2 0.670782

-30

-10

10

30

1

21

41

61

81

10

1

12

1

14

1

16

1

18

1

20

1

22

1

24

1

26

1

28

1

30

1

32

1

34

1

36

1

Validation: Residual

-40

-20

0

20

1

71

14

1

21

1

28

1

35

1

42

1

49

1

56

1

63

1

70

1

77

1

84

1

91

1

98

1

10

51

11

21

11

91

12

61

13

31

14

01

Residual

Training: Prediction Summary

Metric Value

SSE 42051.7681

MSE 28.7828666

RMSE 5.3649666

MAD 4.1825682

R2 0.79321145

Validation: Prediction Summary

Metric Value

SSE 11189.4193

MSE 30.6559432

RMSE 5.53678095

MAD 4.20776217

R2 0.73692502

Page 10: FCAS FINAL PROJECT REPORT - Galit Shmueli...Artika . 61910321 Vividha Arabandi 61910249 ... 1. Executive Summary..... 1 1.1 Problem Summary..... 1 1.2 Description of data, its source,

8

Toronto

Holt Winter Smoothing with additive seasonality

Alpha (Level) 0.881039

Beta (Trend) 0.001831

Gamma (seasonality) 0.082461

240

260

280

300

320

17

11

41

21

12

81

35

14

21

49

15

61

63

1

70

17

71

84

19

11

98

11

05

11

12

1

11

91

12

61

13

31

14

01

Training: Actual vs Fitted

Montreal Prediction: Montreal

-20

-10

0

10

20

30

1

71

14

1

21

1

28

1

35

1

42

1

49

1

56

1

63

1

70

1

77

1

84

1

91

1

98

1

10

51

11

21

11

91

12

61

13

31

14

01

Training: Residual

220

240

260

280

300

1

20

39

58

77

96

11

5

13

4

15

3

17

2

19

1

21

0

22

9

24

8

26

7

28

6

30

5

32

4

34

3

36

2

Validation: Actual vs Forecasted

Montreal Prediction: Montreal

-20

-10

0

10

20

1

20

39

58

77

96

11

5

13

4

15

3

17

2

19

1

21

0

22

9

24

8

26

7

28

6

30

5

32

4

34

3

36

2

Validation:Residual

Error Measures: Training Error Measures: Validation

Record ID Value Record ID Value

SSE 16628.13 SSE 20712.86

MSE 11.38134 MSE 56.74755

MAPE 0.937095 MAPE 2.291616

MAD 2.584219 MAD 6.360996

CFE 2.469481 CFE -2026.17

MFE 0.00169 MFE -5.55116

TSE 0.9556 TSE -318.531

Page 11: FCAS FINAL PROJECT REPORT - Galit Shmueli...Artika . 61910321 Vividha Arabandi 61910249 ... 1. Executive Summary..... 1 1.1 Problem Summary..... 1 1.2 Description of data, its source,

9

Vancouver

Holt Winter Smoothing with additive seasonality

Alpha (Level) 0.584918

Beta (Trend) 0.003815

Gamma (seasonality) 0.008179

-30

-20

-10

0

10

201

15

29

43

57

71

85

99

11

3

12

7

14

1

15

5

16

9

18

3

19

7

21

1

22

5

23

9

25

3

26

7

28

1

29

5

30

9

32

3

33

7

35

1

36

5

Residual

Error Measures: Training Error Measures: Validation

Record ID Value Record ID Value

SSE 6362.835 SSE 16076.25

MSE 4.355123 MSE 44.04452

MAPE 0.529988 MAPE 2.134632

MAD 1.478359 MAD 5.938589

CFE 3.152989 CFE -1815.87

MFE 0.002158 MFE -4.97499

TSE 2.132763 TSE -305.775