walmart sales forecast

28
(Praxis Business School) Data Mining Assignment A report on Sales forecasting for Walmart Submitted to Prof. Suman K Mazumdar In partial fulfillment of the requirements of the subject (iSAS) On (26 th September, 2015) By Anurag Mukherjee 1

Upload: anurag-mukhopadhyay

Post on 15-Apr-2017

517 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Walmart sales forecast

(Praxis Business School)

Data Mining Assignment

A report on

Sales forecasting for Walmart

Submitted to

Prof. Suman K Mazumdar

In partial fulfillment of the requirements of the subject

(iSAS)

On (26th September, 2015)

By

Anurag Mukherjee

1

Page 2: Walmart sales forecast

Sales forecasting for Walmart

2

Page 3: Walmart sales forecast

Table of Content

Sl No Topic Page

1 Cover Page 12 Title Page 23 Executive Summary 34 Background 35 Business Problem 36 Data Overview 47 Exploratory Analysis 58 Examining the final features dataset : 19

9Merging of train and features for the final data set creation 20

10 Model Building 23

Executive Summary :

3

Page 4: Walmart sales forecast

Walmart is the world's largest company by revenue, according to the Fortune Global 500 list in 2014, as well as the biggest private employer in the world with 2.2 million employees. Walmart is a family-owned business, as the company is controlled by the Walton family. Sam Walton's heirs own over 50 percent of Walmart through their holding company, Walton Enterprises, and through their individual holdings. It is also one of the world's most valuable companies by market value,[10]and is also the largest grocery retailer in the U.S. In 2009, it generated 51 percent of its US$258 billion (equivalent to $284 billion in 2015) sales in the U.S. from its grocery business.

We are provided with datasets containing sales per store,per department on weekly basis.We are are about to forecast sales for Walmart to help the company in taking much better data driven decisions for inventory planning and channel optimization.

Background :

Wal-Mart Stores, Inc. is an American multinational retail corporation that operates a chain ofdiscount department stores and warehouse stores. Headquartered in Bentonville, Arkansas, United States, the company was founded by Sam Walton in 1962 and incorporated on October 31, 1969. It has over 11,000 stores in 28 countries, under a total of 65 banners. The company operates under the Walmart name in the United States and Canada. It operates asWalmart de México y Centroamérica in Mexico, as Asda in the United Kingdom, as Seiyu in Japan, and as Best Price in India. It has wholly owned operations in Argentina, Brazil, and Canada. It also owns and operates the Sam's Club retail warehouses.

Business Problem :

With historical sales data for 45 Walmart stores located in different regions. Each store contains many departments, and the aim is to project the sales for each department in each store. To add to the challenge, selected holiday markdown events are included in the dataset. These markdowns are known to affect sales.

Data Overview :

train.csv

4

Page 5: Walmart sales forecast

This is the historical training data, which covers to 2010-02-05 to 2012-11-01. Within this file you will find the following fields:

Store - the store number

Dept - the department number

Date - the week

Weekly_Sales - sales for the given department in the given store

IsHoliday - whether the week is a special holiday week

features.csv

This file contains additional data related to the store, department, and regional activity for the given dates. It contains the following fields:

Store - the store number

Date - the week

Temperature - average temperature in the region

Fuel_Price - cost of fuel in the region

MarkDown1-5 - anonymized data related to promotional markdowns that Walmart is running. MarkDown data is only available after Nov 2011, and is not available for all stores all the time. Any missing value is marked with an NA.

CPI - the consumer price index

Unemployment - the unemployment rate

IsHoliday - whether the week is a special holiday week

Exploratory Analysis :

5

Page 6: Walmart sales forecast

1.train.csv

1.1 Importing the raw dataset :

proc import out=walmart_train datafile='/folders/myshortcuts/myfolder/train_walmart.csv' dbms=csv replace;

getnames=yes;

run;

1.2 Checking the contents of train.csv :

proc contents data=walmart_train;

run;

Alphabetic List of Variables and Attributes

# Variable Type Len Format Informat

3 Date Num 8 DDMMYY10. DDMMYY10.

2 Dept Num 8 BEST12. BEST32.

6 IsHoliday Char 5 $5. $5.

4 Month_Year Num 8 DATETIME. ANYDTDTM40.

1 Store Num 8 BEST12. BEST32.

5 Weekly_Sales

Num 8 BEST12. BEST32.

1.3 Checking the basic statistical measures

proc means data=walmart_train;

6

Page 7: Walmart sales forecast

var Weekly_Sales;

run;

Analysis Variable : Weekly_Sales

N Mean Std Dev Minimum Maximum

421570

15981.26

22711.18

-4988.94 693099.36

Negative Sales Indicate Returns.

1.4 Plot of Weekly_Sales Vs Date :

1.5 Plotting Sales Year Wise :

7

Page 8: Walmart sales forecast

proc sql;

create table walmart_train_data as

select Date,sum(Weekly_Sales) as Sales

from walmart_train

group by Date;

run;

2010 Sales Report :

data Sales_2010;

set walmart_train_data(keep=Sales Date where=(Date between '05Feb2010'd and '31Dec2010'd));

run;

*plotting 2010 Sales by Date;

ods graphics / reset imagemap;

proc sgplot data=WORK.SALES_2010;

vbar Date / response=Sales stat=Mean name='Bar';

yaxis grid;

run;

ods graphics / reset;

proc print data=Sales_2010;

run;

8

Page 9: Walmart sales forecast

0

20000000

40000000

60000000

80000000Sa

les

(Mea

n)

05/02/2010

12/02/2010

19/02/2010

26/02/2010

05/03/2010

12/03/2010

19/03/2010

26/03/2010

02/04/2010

09/04/2010

16/04/2010

23/04/2010

30/04/2010

07/05/2010

14/05/2010

21/05/2010

28/05/2010

04/06/2010

11/06/2010

18/06/2010

25/06/2010

02/07/2010

09/07/2010

16/07/2010

23/07/2010

30/07/2010

06/08/2010

13/08/2010

20/08/2010

27/08/2010

03/09/2010

10/09/2010

17/09/2010

24/09/2010

01/10/2010

08/10/2010

15/10/2010

22/10/2010

29/10/2010

05/11/2010

12/11/2010

19/11/2010

26/11/2010

03/12/2010

10/12/2010

17/12/2010

24/12/2010

31/12/2010

Date

Obs Date Sales

1 05/02/2010 49750740.50

2 12/02/2010 48336677.63

3 19/02/2010 48276993.78

4 26/02/2010 43968571.13

5 05/03/2010 46871470.30

6 12/03/2010 45925396.51

7 19/03/2010 44988974.64

8 26/03/2010 44133961.05

(First 8 Sales figures for 2010 for convenience)

9

Page 10: Walmart sales forecast

2011 Sales Report :

data Sales_2011;

set walmart_train_data(keep=Sales Date where=(Date between '07Jan2011'd and '30Dec2011'd));

run;

*plotting 2011 Sales by Date;

ods graphics / reset imagemap;

proc sgplot data=WORK.SALES_2011;

vbar Date / response=Sales stat=Mean name='Bar';

yaxis grid;

run;

ods graphics / reset;

proc print data=Sales_2011;

run;

0

20000000

40000000

60000000

80000000

Sale

s (M

ean)

07/01/2011

14/01/2011

21/01/2011

28/01/2011

04/02/2011

11/02/2011

18/02/2011

25/02/2011

04/03/2011

11/03/2011

18/03/2011

25/03/2011

01/04/2011

08/04/2011

15/04/2011

22/04/2011

29/04/2011

06/05/2011

13/05/2011

20/05/2011

27/05/2011

03/06/2011

10/06/2011

17/06/2011

24/06/2011

01/07/2011

08/07/2011

15/07/2011

22/07/2011

29/07/2011

05/08/2011

12/08/2011

19/08/2011

26/08/2011

02/09/2011

09/09/2011

16/09/2011

23/09/2011

30/09/2011

07/10/2011

14/10/2011

21/10/2011

28/10/2011

04/11/2011

11/11/2011

18/11/2011

25/11/2011

02/12/2011

09/12/2011

16/12/2011

23/12/2011

30/12/2011

Date

10

Page 11: Walmart sales forecast

Sales in tabular Form :

Obs Date Sales

1 07/01/2011

42775787.77

2 14/01/2011

40673678.04

3 21/01/2011

40654648.03

4 28/01/2011

39599852.99

5 04/02/2011

46153111.12

6 11/02/2011

47336192.79

7 18/02/2011

48716164.12

(First 8 Sales figures for 2011 for convenience)

11

Page 12: Walmart sales forecast

2012 Sales Report :

data Sales_2012;

set walmart_train_data(keep=Sales Date where=(Date between '06Jan2012'd and '26Oct2012'd));

run;

*plotting 2012 Sales by Date;

ods graphics / reset imagemap;

proc sgplot data=WORK.SALES_2012;

vbar Date / response=Sales stat=Mean name='Bar';

yaxis grid;

run;

ods graphics / reset;

proc print data=Sales_2012; run;

;

0

10000000

20000000

30000000

40000000

50000000

Sales

(Mea

n)

Date

12

Page 13: Walmart sales forecast

Sales in tabular form - 2012

Obs Date Sales

1 06/01/2012

44955421.95

2 13/01/2012

42023078.48

3 20/01/2012

42080996.56

4 27/01/2012

39834974.67

5 03/02/2012

46085608.09

6 10/02/2012

50009407.92

7 17/02/2012

50197056.96

8 24/02/2012

45771506.57

13

Page 14: Walmart sales forecast

1.6.Outlier Treatment for train.csv :

The data being a time series record have some seasonalities .During the month of December there’s a sales spike.This can be explained further by Markdowns.

Markdown 1,2,4,5 doesnt seem to be that effective as compared to Markdown 3.

14

Page 15: Walmart sales forecast

15

Page 16: Walmart sales forecast

As the spike in the sales would affect the entire model,the difference of excess sales has been distributed across all the records.

data wal;

set walmart_train_data;

where Sales > 50000000;

sales_diff=Sales-46243899.58;

run;

proc sql;

create table mapper as

select sum(Sales_diff) from

wal;

run;

*total excess sales from weeks having > 50000000 = 181638262.18;

data walmart_final;

set walmart_train;

if Weekly_Sales > 50000000 then Weekly_Sales=46243899.58;

Weekly_Sales_new=Weekly_Sales+(181638262.18/421570);

run;

proc univariate data=walmart_final;

var Weekly_Sales;

run;

16

Page 17: Walmart sales forecast

2.features.csv

2.1 Importing raw data set :

proc import out=walmart_features datafile='/folders/myshortcuts/myfolder/features.csv' dbms=csv replace;

getnames=yes;

guessingrows=200;

run;

2.2 Checking the contents of features.csv :

Alphabetic List of Variables and Attributes

# Variable Type Len Format Informat

4 CPI Char 11 $11. $11.

2 Date Num 8 YYMMDD10. YYMMDD10.

6 Fuel_Price Num 8 BEST12. BEST32.

13 IsHoliday Char 5 $5. $5.

7 MarkDown1 Char 8 $8. $8.

8 MarkDown2 Char 8 $8. $8.

9 MarkDown3 Char 8 $8. $8.

10 MarkDown4 Char 8 $8. $8.

11 MarkDown5 Char 8 $8. $8.

1 Store Num 8 BEST12. BEST32.

5 Temperature Num 8 BEST12. BEST32.

12 Unemployment

Char 5 $5. $5.

14 VAR14 Char 1 $1. $1.

3 Weekly_Sales Char 8 $8. $8.

17

Page 18: Walmart sales forecast

2.3 Checking the basic statistical measures of features.csv :

proc means data=walmart_features;

run;

2.4 Outlier Treatment :

data walmart_f;

set walmart_features;

format Date DDMMYY10.;

if MarkDown1="NA" or MarkDown1="#N/A" then MarkDown1=0;

if MarkDown2="NA" or MarkDown2="#N/A" then MarkDown2=0;

if MarkDown3="NA" or MarkDown3="#N/A" then MarkDown3=0;

if MarkDown4="NA" or MarkDown4="#N/A" then MarkDown4=0;

if MarkDown5="NA" or MarkDown5="#N/A" then MarkDown5=0;

if IsHoliday="TRUE" then IsHoliday_Yes=1;

else IsHoliday_Yes=0;

if Weekly_Sales="#N/A" then Weekly_Sales=0;

run;

18

Page 19: Walmart sales forecast

data walmart_features_1 (keep=Store Date Weekly_Sales_n Fuel_Price IsHoliday_Yes MarkDown1_n MarkDown1_n MarkDown2_n MarkDown3_n MarkDown4_n MarkDown5_n Temperature Unemployment CPI) ;

set walmart_f;

MarkDown1_n=MarkDown1*1;

MarkDown2_n=MarkDown2*1;

MarkDown3_n=MarkDown3*1;

MarkDown4_n=MarkDown4*1;

MarkDown5_n=MarkDown5*1;

Weekly_Sales_n=Weekly_Sales*1;

run;

19

Page 20: Walmart sales forecast

Examining the final features dataset :

proc contents data=walmart_features_1;

run;

Alphabetic List of Variables and Attributes

# Variable Type Len Format Informat

3 CPI Char 11 $11. $11.

2 Date Num 8 DDMMYY10. YYMMDD10.

5 Fuel_Price Num 8 BEST12. BEST32.

7 IsHoliday_Yes Num 8

8 MarkDown1_n Num 8

9 MarkDown2_n Num 8

10 MarkDown3_n Num 8

11 MarkDown4_n Num 8

12 MarkDown5_n Num 8

1 Store Num 8 BEST12. BEST32.

4 Temperature Num 8 BEST12. BEST32.

6 Unemployment Char 5 $5. $5.

13 Weekly_Sales_n

Num 8

20

Page 21: Walmart sales forecast

Merging of train and features for the final data set creation :

proc sql;

create table walmart_final_1 as

select a.*,b.CPI,b.Temperature,b.Fuel_Price,b.MarkDown1_n,b.MarkDown2_n,b.MarkDown3_n,b.MarkDown4_n,b.MarkDown5_n,b.Unemployment,b.IsHoliday_Yes

from walmart_final as a left join walmart_features_1 as b on

a.Date=b.Date and a.Store=b.Store;

run;

data walmart_final_2 (drop=IsHoliday Month_Year Unemployment Weekly_Sales);

set walmart_final_1;

run;

21

Page 22: Walmart sales forecast

proc contents data=walmart_final_2;

run;

Alphabetic List of Variables and Attributes

# Variable Type Len Format Informat

5 CPI Char 11 $11. $11.

3 Date Num 8 DDMMYY10. DDMMYY10.

2 Dept Num 8 BEST12. BEST32.

7 Fuel_Price Num 8 BEST12. BEST32.

13 IsHoliday_Yes Num 8

8 MarkDown1_n Num 8

9 MarkDown2_n Num 8

10 MarkDown3_n Num 8

11 MarkDown4_n Num 8

12 MarkDown5_n Num 8

1 Store Num 8 BEST12. BEST32.

6 Temperature Num 8 BEST12. BEST32.

4 Weekly_Sales_new

Num 8

22

Page 23: Walmart sales forecast

Printing the final dataset after merge :

proc print data=walmart_final_2(obs=10);

run;

Obs

Store

De

pt

Date Weekly_Sales_ne

w

CPI Temperatur

e

Fuel_Pric

e

MarkDown1_

n

MarkDown2_

n

MarkDown3_

n

MarkDown4_

n

MarkDown5_

n

IsHoliday_Y

es

1 1 45

05/02/20

10

468.30 211.0963582

42.31 2.572

0 0 0 0 0 0

2 1 5 05/02/20

10

32660.24 211.0963582

42.31 2.572

0 0 0 0 0 0

3 1 9 05/02/20

10

17361.85 211.0963582

42.31 2.572

0 0 0 0 0 0

4 1 29

05/02/20

10

7455.81 211.0963582

42.31 2.572

0 0 0 0 0 0

5 1 92

05/02/20

10

140315.80

211.0963582

42.31 2.572

0 0 0 0 0 0

6 1 42

05/02/20

10

8797.57 211.0963582

42.31 2.572

0 0 0 0 0 0

7 1 80

05/02/20

10

16125.03 211.0963582

42.31 2.572

0 0 0 0 0 0

8 1 19

05/02/20

10

2377.91 211.0963582

42.31 2.572

0 0 0 0 0 0

9 1 32

05/02/20

10

12306.70 211.0963582

42.31 2.572

0 0 0 0 0 0

1 1 4 05/0 67211.49 211.0 42.31 2.57 0 0 0 0 0 0

23

Page 24: Walmart sales forecast

Obs

Store

De

pt

Date Weekly_Sales_ne

w

CPI Temperatur

e

Fuel_Pric

e

MarkDown1_

n

MarkDown2_

n

MarkDown3_

n

MarkDown4_

n

MarkDown5_

n

IsHoliday_Y

es

0 0 2/2010

963582

2

Model Building :

proc reg data=walmart_final_2;

model Weekly_Sales_new= Fuel_Price MarkDown3_n Temperature ;

run;

24