isqs6347 team5 proposal_032513

17
ISQS 6347- Data &Text Mining Spring 2013 Team 5 Project title Data Analysis For Abuelo’s Class number / Semester ISQS6347 Spring 2013 Section 1 Student names Preeti Prajapati Neha Soam Ming Kuo Hui The type of this project Data Mining Academic Project The nature and source of the dataset Nature - Available in SAS file format Source Abuelo’s Restaurant Completion date May 16 2013 2013

Upload: texas-tech-university

Post on 16-Jul-2015

93 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Isqs6347 team5 proposal_032513

ISQS 6347- Data &Text Mining

Spring 2013 Team 5

Project title Data Analysis For Abuelo’s

Class number / Semester ISQS6347 Spring 2013 – Section 1 Student names Preeti Prajapati

Neha Soam Ming Kuo Hui

The type of this project Data Mining Academic Project The nature and source of the dataset Nature - Available in SAS file format

Source – Abuelo’s Restaurant Completion date May 16 2013

2013

Page 2: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

2 | Abuelo's

Table of Contents Introduction ............................................................................................................................................................ 3

Business Background ...................................................................................................................................... 3

Objective............................................................................................................................................................... 3

Project Overview ................................................................................................................................................... 3

Dataset Availability and Description ......................................................................................................... 3

Table 1 : Attributes & their Description................................................................................................... 4

Data Quality and Preparation ...................................................................................................................... 4

Table 2................................................................................................................................................................... 6

Data Exploration & Preprocessing ................................................................................................................. 7

Data Preparation ............................................................................................................................................... 7

Figure 3: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide) . 8

Figure 4: Output Result from SAS Enterprise Miner ........................................................................... 8

Preprocessing Tasks ........................................................................................................................................ 9

Data Mining Methodologies ........................................................................................................................... 10

Primitive Results and Findings ..................................................................................................................... 11

Data Filtration & Addition of New Variables ...................................................................................... 13

Refined Data’s Exploration ........................................................................................................................ 13

Page 3: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

3 | Abuelo's

Introduction The purpose of this project is to analyze a restaurant’s sales data and to generate a model that would aid at restaurant’s management decisions. The restaurant would be examined in this project, Abuelo’s, is a real restaurant and all data collected are real data. By collecting, exploring, processing, and analyzing the real life data via the data mining techniques, we learned from lecture, we are able to generate a model that is useful and can be applied to restaurant’s decision making.

Business Background The Abuelo’s is a Mexican restaurant that has established stores in several cities since 1989. Abuelo’s has consistently been on the leading edge of Mexican cuisine, combining menu creativity, outstanding food and beverage quality, colorful plate presentations and superior service in an impressive Mexican courtyard-themed atmosphere. Every dish is made to order from scratch using only the freshest premium ingredients.

Objective Recently Abuelo’s is planning to adopt a new menu to replace the old one. The restaurant has been conducting trails of new Value Items. Value item has a lower cost as well as a lower profit margin compared to its full version (i.e. Chicken Zucchini and Chicken Zucchini Lite). But value items are more frequently ordered than other items. The new menu differs from the old one in that it is extended with Value Items and some other new items which are not treated as value items in the list. The main objective of this project is to analyze the effect of value items on the total profit return. The result of this project is expected to aid at decisions of what value items should be deleted or stayed on the menu.

Project Overview

Dataset Availability and Description The data for Abuelo's is available for year 2011 and 2012 in excel and SAS files. The attributes and descriptions of the available data are listed in table below:

Page 4: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

4 | Abuelo's

Attribute Name Attribute Description

UID Unique ID representing combination of item number and store ID

Store ID Unique ID assigned to each store

Item Number Number assigned to an item

Minor Category Category of item

Product Description Description of item

Quantity Quantity sold for each item in different stores

Avg Unit Price Average unit price of an item

Avg Unit Cost Average unit cost of an item

Guest Count Sum of customer visits in one stores in a particular week

Week IND Number assigned to each week in one year

Number Item Number Number assigned to an item

Table 1 : Attributes & their Description

Note: The dataset has approximately 1,827,700 rows and has minimal missing values.

Data Quality and Preparation The dataset used comes from previous student project; therefore, many data preparation tasks have been done and the dataset has already been transformed into SAS file format. However, after exploring the dataset, we observed some issues that may require further considerations and adjustments before the data analysis and mining stage:

» UID is not a unique identifier, and it has no value for 2406 records.

Page 5: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

5 | Abuelo's

Figure 1: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide)

» Purpose of Num_Item_Number attribute is unclear – the value contained is same to that of Item_Number, but their data types are different. In addition, Num_Item_Number has 187 missing value (but there are no missing value in Item_Number).

Page 6: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

6 | Abuelo's

» Output Result from SAS Enterprise Miner

Figure 2: Output Result from SAS EnterpriseMiner

» Unclear variable values: Some Avg_Unit_Price contain 0, indicating the price of item is $0. Some Avg_Unit_Cost contain 0 and negative value.

» There are 28 Item_Number having duplicate values but with different

Product_Description. Table of Items that Have Same Number but Different Description (Show First Two)

Item_Number Minor_Category Product_Description

101090 Sub Cooked Taco Meat BF 2.5 oz - Sub

101090 Sub Cooked Taco Meat CK 2.5 oz - Sub

12067 Margaritas Patron Shaken Margarita

12067 Margaritas Shaken Margarita

Table 2

Page 7: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

7 | Abuelo's

Data Exploration & Preprocessing The tasks of preliminary data mining include data preparation, data exploration, data model selection, and discussion of primitive findings. By performing preliminary data mining, we are able to examine data quality and observe the issues such as missing data and duplicated or erroneous data. The appropriate data methodologies are chosen and applied based on nature of dataset and objective of project – to analyze the effect of valued items on the total profit return. .

Data Preparation The dataset, available in SAS file format, contains data and information as shown in Table 1.

Attribute Name Attribute Description

UID Unique ID representing combination of item number and store ID

StoreID Unique ID assigned to each store

ItemNumber Number assigned to an item

MinorCategory Category of item

ProductDescription Description of item

Quantity Quantity sold for each item in different stores

AvgUnitPrice Average unit price of an item

AvgUnitCost Average unit cost of an item

GuestCount Sum of customer visits in one stores in a particular week

WeekIND Number assigned to each week in one year

NumItemNumber Number assigned to an item

Table 1: Initial Data from Dataset

Because the dataset is already cleansed and is well prepared, at this stage we focused on data exploration and examination. We found several issues that may affect the analysis of project. Four major issues observed are listed as followed:

UID is not a unique identifier, and 2406 of the records have no value (see Figure 1). Purpose of NumItemNumber attribute is unclear – the value contained is same to

that of ItemNumber, but their data types are different. In addition, NumItemNumber

has 187 missing value (see Figure 2).

Unclear variable values: o Some AvgUnitPrice contain 0, indicating the price of item is $0. o Some AvgUnitCost contain 0 and negative value.

There are 28 ItemNumbers having duplicate values but with different ProductDescription (see Table 2)

Page 8: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

8 | Abuelo's

Figure 3: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide)

Figure 4: Output Result from SAS Enterprise Miner

Page 9: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

9 | Abuelo's

Table 2: Items that Have Same Number but Different Description (Show First Two)

Preprocessing Tasks The objective of this project is to determine whether the valued item has created any effects on the profit generated. Therefore, we decided to add additional data attributes,Profit, Valued_Item_Flag, and New_Item_Flag, to represent sales profit, valued menu item, and new menu item, respectively, by combining the information of menu items. One thing needs to be noted for the newly added attributes is that majority of data are missing for the new item flag and valued item flag. The reason is because not all stores of Abuelo’s participated in this research of new valued menu. Therefore, which data should be chosen for our project analysis is a very important concern. Figure 3 below is the screenshot of modified dataset, All_Profit_Flag. Table 3 lists the three newly added attributes in dataset.

Figure 3: Table of Modified Dataset All_Profit_Flag

Item_Number Minor_Category Product_Description

101090 Sub Cooked Taco Meat BF 2.5 oz - Sub

101090 Sub Cooked Taco Meat CK 2.5 oz - Sub

12067 Margaritas Patron Shaken Margarita

12067 Margaritas Shaken Margarita

Page 10: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

10 | Abuelo's

Attribute Name Attribute Description

Profit Sales profit of an item at a store during a week

NewItemFlag Flag for indicating the new menu item

ValuedItemFlag Flag for indicating the valued menu item

Table 3: Newly Added Attributes in Dataset

Data Mining Methodologies The data mining models chosen for our project must meet two important criteria: the nature of dataset and the objective of this business analysis project. Since our objective is to determine whether the valued menu item increases sales profit of a store, at this preliminary data mining stage we decided to use a Regression model to analyze the importance of valued item in terms of profits generated. Figure 5 and 6 are variable configuration and design of data process flow. The configuration shown in Figure 5 and 6 are subject to be changed and modified later.

Figure 5: Variable Configuration for Regression

Figure 6: Data Process Flow for Regression

Initially we only included two input variables, New_Item_Flag and Valued_Item_Flag, and one target variable, Profit, for the regression analysis. As we mentioned earlier in report, there are many data missing for the flags of new item and valued item. As a result, the data

Page 11: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

11 | Abuelo's

must go through a filtering step to exclude the data rows which have no information about new/valued item flags. Below is the result of Filter. About 90% of observations are excluded after filtering.

Figure 7

Primitive Results and Findings Figure 8 shows the result of Regression node. According to Type 3 Analysis of Effects, if we only analyzed the effects of new item and valued item on the profit, new item seems to have a significant effect on profit (Pr< .0001). On the other hand, the valued item does not have any significant effect on the change of profit. At this preliminary data mining stage, we concluded that regression analysis indicated that the valued item has no significant impact on sales profit.

Page 12: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

12 | Abuelo's

Figure 8: Output of Regression Model

Page 13: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

13 | Abuelo's

Data Filtration & Addition of New Variables We used Enterprise Guide to filter out the missing data and to add new variables like Profit,

New_Item_Flag&Valued_Item_Flag. Then we exported this refined dataset to use it in

Enterprise Miner.

Figure 9: Enterprise Guide showing newly introduce variables

Refined Data’s Exploration After filtering & adding “Profit” column in the existing dataset using Enterprise Guide, we used that dataset for further analysis. Figure 9 shows the variable settings for this dataset.

Page 14: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

14 | Abuelo's

Figure 10

In Explore Window, Actions -> Plot, use 3D bar charts which will show dialog in Figure 10 & Figure 11 shows the same dialog enlarged.

Figure 11

Page 15: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

15 | Abuelo's

Figure 12

Figure 12 shows 3D Bar Chart Plot with Profit as Response, year as Series &Valued_Item_Flag as Category.

Page 16: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

16 | Abuelo's

Figure 13

Figure 14

Figure 16 shows result of Segment Profile node with the variables settings shown in Figure

15

Page 17: Isqs6347 team5 proposal_032513

May 15, 2013 [ISQS 6347 – Final Project Report]

17 | Abuelo's

Figure 15

Figure 16