become a citizen data scientist

Become a Citizen Data Scientist

Marketing Perspective

©uluumy, 2016

Understand your customer:

Profiling, Segmentation, Targeting and Recommendation using Microsoft Azure ML, SQL, Power BI

©uluumy, 2016 2

Take a look to our course:50% Off


©uluumy, 2016 3

https://www.udemy.com/become-a-citizen-data-scientist-marketing-perspective/?couponCode=EBOOK50

Syllabus

▪ Introduction

▪ Lay the foundation

▪ Explore

▪ Segment

▪ Target

▪ Recommend

©uluumy, 2016 4

Introduction

©uluumy, 2016 5

Citizen Data Scientist

©uluumy, 2016 6

According to a Mckinsey Study, demand for Data

Scientists is projected to exceed supply by more than

50% by 2018.

Source: MCKINSEY, "Big data: The next frontier for innovation, competition, and productivity", 2011

©uluumy, 2016 7

The term Citizen Data Scientist was introduced by Gartner in its 2015 Hype Cycle for Emerging Technologies which we’re going present later in this lecture.Here is the definition given by Gartner :

“A person who creates or generates models that leverage predictive or prescriptive analytics but whose primary job function is outside of the field of statistics”

Source: GARTNER, "Hype Cycle for Emerging Technologies", 2015

©uluumy, 2016 8

Gartner Hype Cycle

Gartner Hype Cycle provides a graphic representation of the maturity and adoption of technologies and applications, and how they are potentially relevant to solving real business problems and exploiting new

opportunities.

©uluumy, 2016 9

Gartner’s Emerging Technologies Hype Cycle contains a representative set of still-maturing technologies that receive interest from clients, and technologies that Gartner feels are significant and should be monitored.“

Gartner is predicting that Citizen Data Scientist and Advanced analytics with Self-service delivery to reach the Plateau of productivity in 2 to 5 years.

©uluumy, 2016 10

©uluumy, 2016 11

Source: GARTNER, "Hype Cycle for Emerging Technologies", 2015

Why do we need a citizen data scientist?

▪ First, the shortage of data scientists (because of the highly specialized skills needed in: computer science, coding, mathematics, machine learning, statistics) .... and still they need to have in-depth knowledge of the business

▪ Second, the rise of self-service data preparation

▪ Third, the development of advanced analytics platforms (Microsoft Machine Learning, IBM Watson, ...)

©uluumy, 2016 12

The need is so, that according to Gartner, by 2017, the

number of citizen data scientists will grow 5 times faster

than the number of highly skilled data scientists.

©uluumy, 2016 13

Who could be a Citizen Data Scientist?

In most companies, they’re already there. Here who they are. They Have:

▪ Solid business domain knowledge (marketing, finance, sales, operations, ...),

▪ Analytical mindset,

▪ The willingness to learn new methods and use new tools

©uluumy, 2016 14

Benefits

The rise of Citizen Data Scientist is a great opportunity for every organization. Because business people will bring with them:

▪ Contextual knowledge

▪ the Democratization of analytics in every department

©uluumy, 2016 15

Lay the Foundations: Definitions

Data Science is not equal to Big Data…

©uluumy, 2016 16

“You don’t have to have a petabyte of data and the

expenses that come along with it in order to predict the

interest of your customer base”

Source: John W. Foreman : Data Smart: Using Data Science to Transform Information into Insight

©uluumy, 2016 17

Data Science:

“Data Science is the transformation of data using

mathematics and statistics into valuable insight,

decisions, and products”

Source: Introduction to Machine Learning, 2nd Edition, MIT Press

©uluumy, 2016 18

Machine learning:

“The goal of machine learning is to program computers

to use example data or past experience to solve a given

problem.”

Source: Introduction to Machine Learning, 2nd Edition, MIT Press

©uluumy, 2016 19

Machine Learning : 2 (main) categories

1- Supervised Learning: Prediction. You want to predict unknown answers from answers you already have.

2- Unsupervised Learning: Categorization. You want to find unknown answers mostly grouping- directly from data.

©uluumy, 2016 20

Supervised Learning..

Supervised learning can be separated into to general categories of algorithms:

• Classification algorithms: are used to predict categorical responses. As example we can cite:

▪ Credit card fraud detection

▪ Customer likely to churn

▪ Customer targeting

• Regression: used to predict continuous variable.

• Example: Predict the future sales of a product©uluumy, 2016 21

Unsupervised Learning..

Unsupervised Learning: Categorization. You want to find unknown answers mostly grouping- directly from data.

• Customer segmentation

• Recommendation system

©uluumy, 2016 22

Data Science Process

According to a KDNuggets poll, 43% of the advanced analytics projects use the CRISP-DM methodology

©uluumy, 2016 23

CRIS-DM Methodology

Source Wikipedia©uluumy, 2016 24Source : Wikipedia

The process is composed of 5 steps

The key point to note is that the Process is circular rather

than linear. It means that we can and should go back and

forth between the steps.

Source: CRISP-DM : Cross Industry Standard Process for Data Mininghttp://spss.ch/upload/1107356429_CrispDM1.0.pdf

©uluumy, 2016 25

http://spss.ch/upload/1107356429_CrispDM1.0.pdf

1- Business Understanding

The first step is BUSINESS UDERSTANDING. It’s the most critical step of the process. You need to frame the problem. At the end of this stage you should have a deep understanding of the problem you want to resolve and a clear idea about the data you will use

©uluumy, 2016 26

2- Data Understanding

The second step is DATA UNDERSTANDING. Your business knowledge will help you to contextualize your data. You notice that the steps Business understanding and Data Understanding are linked together with a double arrow.

©uluumy, 2016 27

3- Data Preparation

The third step is DATA PREPARATION. In this stage you will check for the common issues like missing values and outliers. Also doing operating like filtering merging and transformationAlso you run some data exploration using graphics and tables.

©uluumy, 2016 28

4- Modeling

The fourth step is the MODELING step. In this stage you build your model (for example a regression or a classification). You notice that this step is linked to the previous one with a double arrow which mean that you will often need to step back to Data Preparation.

©uluumy, 2016 29

5- Evaluation

The next step is EVALUATION. Every model you build has to be evaluated in term of accuracy, robustness and deployability. You notice that at this stage you may have to step back to the Business Understanding stage if the model you have built could not be deployed

©uluumy, 2016 30

6- Deployment

The last step is DEPLOYMENT. The final purpose of any data science project is to give actionable insight.

©uluumy, 2016 31

Data Science Toolbox

©uluumy, 2016 32

©uluumy, 2016 33

•Data preparation : SQL. a must know tool• 60% of data scientists said they spent the most time

cleaning and organizing data *• SQL : first among the top 10 in-demand skills for data

scientists*

•Data visualization : Power BI and Excel.•Data analysis : Azure Machine Learning

• 55% of data scientists think that Machine Learning had significant importance for their companies *

* source: CrowdFlower, 2016 Data Science Report

©uluumy, 2016 34

CRIS-DM & Data Science Toolbox

Source Wikipedia©uluumy, 2016 35

Data Preparation

The first tool is SQL.According to the CrowdFlower 2016 Data Science Report, 60% of data scientists said they spent the most time cleaning and organizing data.It’s THE language of database.You will have to use SQL in order to process the data preparation step.SQL is a must KNOW tool.

©uluumy, 2016 36

Data Visualization & Data Analysis

• Data VisualizationFor Data Visualization, we will use Power BI

• Data AnalysisMicrosoft Azure ML is one of the most relevant tool to use for citizen data scientist because of its ability to quickly create machine learning experiments and because its slight learning curve.

©uluumy, 2016 37

Microsoft Azure Machine Learning

« While machine learning has been around for a long time, usage was primarily restricted to people with deep skills and deep pockets. The cloud changes this dynamic completely »

Joseph Sirosh Corporate Vice President, Data Group at Microsoft

©uluumy, 2016 38

Azure Machine Learning Workflow

At high level we can divide it into 3 blocks: Data, Machine Learning Services, and Visualisation

You can clearly see how this workflow ca be embedded into the CRISP Data science we have presented in a previous lecture

©uluumy, 2016 39

Source: Microsoft

Marketing Framework Analysis

About 90% of the data collected by companies

today are related to customer actions and

marketing activities

©uluumy, 2016 40

“Marketing thinking is shifting from trying to maximize the company’s profit from each transaction to maximize the long-run profit from each relationship”, To rephrase it, Companies has shifted from being product-centric to CUSTOMER centric”.

Philip Kotler

©uluumy, 2016 41

That’s why it’s of vital importance to know as much as possible about our customer and to customize as much as possible our offer to each.

Despite the huge amount of data, we now have on each of our customer (from CRM, Web Site, Social Media) and the complexity to have a 360 view of the customer; still, we can frame the relationship with our customer with these 4 questions:

©uluumy, 2016 42

1- Who are my customers?

2- How to reach and interact with them?

3- Which customer should I target?

4- What is the best next-offer?

These 4 questions lead as to this 4 blocks Marketing Framework Analysis:

©uluumy, 2016 43

©uluumy, 2016 44

Philip Kotler, the founder of Modern Marketing said “Marketing’s future lies in database marketing where we know enough about each customer to make relevant and customized offers to each”

Explore

©uluumy, 2016 45

AdventureWorks is a sample database created for use in demos and training on each version of Microsoft SQL Server. A company which manufactures and sells metal and composite bicycles to North American, European and Asian.two categories of customers:B2B : team of salesB2C : E-commerce

Case Study : Adventure Works

©uluumy, 2016 46

According to the CrowdFlower 2016 Data Science Report, SQL is first among the top 10 in-demand skills for data scientists.

SQL Basics

©uluumy, 2016 47

SQL stands for Structured Query LanguageSQL is the language of databases : creation, access, manipulationRelational Database : a software to offer access to stored information and their manipulation.Information are stored in tables

SQL Basics

©uluumy, 2016 48

Tables : A set of data arranged in columns and rows. The columns represent characteristics of stored data and the rows represent actual data entries. Tables for Database is what's spreadsheet for Excel.SQL Server Express (a free "lite" version of SQL Server)

SQL Basics

©uluumy, 2016 49

Table Relationship:

On fundamental concept of database is the tables relationship.Let’s take an example from our database AdventureWorksWe took 2 tables: Product and Sales. Each table must have one Primary Key.

SQL Basics

©uluumy, 2016 50

A primary key is a field in a table which uniquely identifies each row in the table. That means that Primary keys must contain unique values. In our example: ProductKey is the primary Key of the table DimProduct.

SQL Basics

©uluumy, 2016 51

A FOREIGN KEY in one table is a column which points to a PRIMARY KEY in another table.Let’s look to our example. ProducKey is a foreign key column of the table FactInternetSales which refer to the primary key of the table DimProduct. It allows to identify the relationship between the two tables.

SQL Basics

©uluumy, 2016 52

©uluumy, 2016 53

SQL main operations

SQL contains 4 main operations. We can• Insert of new data into a table • SELECT data from a table• Update data already existing in a table• Delete data from table

©uluumy, 2016 54

SQL main operations

The insert, update and delete operation are usually restricted to the Database administrator. As a Citizen Data Scientist you will essentially need to select data from the database.

Let’s look how to select data from a table...

©uluumy, 2016 55

SQL main operations

Selection:

Here is the general syntax of a data selection

SELECT <Column List> FROM <Table Name>WHERE <Search Condition>

©uluumy, 2016 56

SQL main operations

Aggregation

How to group data and use aggregates...

SELECT <Column List>, <Aggregate Function> (<Column Name>) FROM <Table Name>WHERE <Search Condition>GROUP BY <Column List>

©uluumy, 2016 57

SQL main operations

Selection from 2 tables

How to select data from more than one table...

SELECT <Column List> FROM <Table1>JOIN <Table2> ON <Table1>. <Column1> = <Table2>. <Column1>

©uluumy, 2016 58

Customer Dashboard using Power BI

©uluumy, 2016 59

Segment

“Customer segmentation is the process of diving a customer into groups of individuals who are similar in specific ways relevant to marketing”

Source “A Marketer’s Guide to Analytics”, SAS

©uluumy, 2016 60

Segmentation: Types

“The literature about types of segmentation is very diverse.The best I could find is the one given in the SS paper “A Marketer’s Guide to Analytics” It distinguishes between two main types of segmentation:• Foundation segmentation• Targeting segmentation

©uluumy, 2016 61

Segmentation: Types

Foundation segmentation: Core segmentsIt has these proprieties:• All customers are included• Each customer falls into only one segment• Each segment can be subdivided into clusters• Attributes: value, profit, attrition, risk, demographics,

firmographics, etc.

©uluumy, 2016 62

Segmentation: Types

The second type is Targeting Segmentation It identifies customers with specific needs and preferences. Useful for specific marketing programs and campaigns identifies customers with specific needs and preferences.It has these features:• Not all customers can be included• Each customer may fall into many different segments

©uluumy, 2016 63

Good segmentation

A good segmentation must have these three features: • Relevant to the business objective• Simple: understandable and easy to characterize• Actionable

©uluumy, 2016 64

Managerial Segmentation

RFM method

We will use a very simple yet insightful method to build a customer segmentation which is relevant, simple and most importantly actionableRFM method has been around for decade. Yet it’s is still very useful

©uluumy, 2016 65

RFM in a nutshell

RFM is an acronym for Recency, Frequency and Monetary

• Recency: number of days since last purchase/Use/visit• Frequency: number of purchase/use/visit• M: Amount of purchase / time spent

©uluumy, 2016 66

RFM in a nutshell

Based on each of these 3 factors, all the customers are ranked and given a score from 1 to 4 (depending on which quartile they are). 1 being the best score.Now for each customer we have a composite score R-F-MAs each factor could have 4 different values (1,2,3, or 4)We can in theory divide our customer into until 64 segments!!

©uluumy, 2016 67

RFM in a nutshell

It’s a good first step …but we cannot stop here because we want to have simple but ACTIONABLE segmentationThat’s why we have used the term Managerial SegmentationAs a managerial decision we can decide that we need to have let’s say 9 different segments based on the RFM score we have already computed

©uluumy, 2016 68

RFM: 9 segments

Here the description of each of the 9 segments:

Best: R (1) AND F (1) AND M (1): it’s simple they have the highest score...Novice: R (1) AND F (3-4)Active High Value: R (1) AND M (1,2)Active: R (1)

©uluumy, 2016 69

RFM: 9 segments

Warm High Value: R (2) AND M (1)Warm: R (2)Win-back: R (3,4) AND {F (1) OR M (1)}Cold: R (3)Almost lost: R (4)

©uluumy, 2016 70

RFM: actions

Now that we have our customer segmentation.What Action can we take based on this segmentationHere are some ideas/Examples:

• Best Customer: • “Thank you” gift• “Exclusive preview” of new service/product

©uluumy, 2016 71

RFM: actions

• Novice:• Connection on social media• Personal greeting message• Free shipping

• Warm High-Value: • Next best offer “Get $50 in “ZZZZ” Dollars for every

$50 you spend”• Almost Lost

• “Last chance” special offer©uluumy, 2016 72

Power BI TreeMap visualization of the 9 resulting segments

©uluumy, 2016 73

Target

©uluumy, 2016 74

Classification model : basics

Here is a basic data flow for any classification model

Data training is the input of the classification algorithm. The purpose is to “train” the algorithm with historical data which contain the labels (target) variable.

For example, say we want to create a model to predict which customer is likely to respond positively to specific marketing campaign.

Training data contains a list of customers who were targeted in the past for the same kind of campaign. The labels variable is a Yes/NO variable

Source: http://www.cs.princeton.edu/~schapire/talks/picasso-minicourse.pdf©uluumy, 2016 75

Classification Model : Evaluation

To evaluate and chose which model is the best fitted for our problem we can use several measures. Here are the most widely used: • Accuracy: the proportion of the total number of

predictions that were correct.• Positive Predictive Value or Precision: the proportion

of positive cases that were correctly identified.

©uluumy, 2016 76

Classification Model : Evaluation

• Negative Predictive Value: the proportion of negative cases that were correctly identified.

• Sensitivity or Recall: the proportion of actual positive cases which are correctly identified.

• Specificity: the proportion of actual negative cases which are correctly identified.

• ROC curve: It is created by plotting the recall against the false positive rate

©uluumy, 2016 77

Confusion Matrix

Source: http://www.analyticsvidhya.com/blog/2016/02/7-important-model-evaluation-error-metrics/

©uluumy, 2016 78

Classification Fundamental concept : Bias-Variance Tradeoff

Google's Research Director Peter Norvig claimed that

"We don’t have better algorithms. We just have moredata."

©uluumy, 2016 79

Prediction Error

▪ You can never have a prediction model without error.

▪ Without going further with the maths behind it, predictionerror is mainly divided into 2 elements: Bias and Variance.

▪ Error due to Bias is the difference between the predictedvalue and the correct value.

▪ Error due to Variance is defined as the variability of a modelprediction for a given data point.

▪ As “a picture's worth a thousand words”, let’s look to thisgraphic taken from one of the best article I found on thissubject “Understanding the Bias-Variance Trade-off”.

©uluumy, 2016 80

Bias vs Variance

Bulls-eye represents the graphical visualization of bias and Variance. Each point is the result of one iteration of the model building.

The center of the target is a model that predicts perfectly the actual values.

Source : scott.fortmann-roe.com/docs/BiasVariance.html©uluumy, 2016 81

Bias vs Variance

We have mainly four cases:

▪ Low Bias and Low Variance: That’s wherewe want to be! We have here a goodmodel

▪ High Bias and Low Variance: that’s whatwe call an under-fitted model. It meansthat our model lacks some information.It’s too simple Maybe we have to addvariables to our training data. Alsoevaluating models using other methodscould be a good option too.

©uluumy, 2016 82

Bias vs Variance

We have mainly four cases:

▪ Low Bias and High Variance: We have anOver-Fitted Model. It means that the modelis too complicated for the data we have. Putsimply, the model cannot be generalized. Thesolution is to add more data into our trainingset and/or to reduce the number of features(the complexity), we use Ensemble methodlike random forest, bagging and boosting

▪ High Bias and High Variance: we still need towork on our model. My suggestion is totackle first the Bias error by using othermethods and adding variables if you can

©uluumy, 2016 83

Bias –Variance TradeOff

Here is another way to sum-up the bias-variance trade-off:

Prediction Error is plotted against Model complexity twice: the green line is the result using the training data. The red line is the result using the test Data

Source: Hastie, Tibshirani, Friedman “Elements of Statistical Learning” 2001©uluumy, 2016 84

Under-Fitted Model

When the model is too simple (lowcomplexity):

▪ The gap between the two plots is narrow.That’s an indication for low variance

▪ The prediction error is high for trainingand test data. It means a High-Bias

▪ Hence we have an under-fitted model

©uluumy, 2016 85

Over-Fitted Model

▪ Higher the complexity is, higher the gapis between the two plots

▪ When The prediction error between thetraining and the test data become toowide.

▪ It means that the model reached theover-fitting mode

©uluumy, 2016 86

Overview diagram of Azure Machine Learning Studio

Microsoft Azure Machine Learning Studio is a drag-and-drop cloud-based service you can use to build, test, and deploy predictive analytics solutions on your data. Machine Learning Studio publishes models as web services that can easily be consumed by custom apps or BI tools such as Excel or Power BI.This Figure (source) summarizes the basic high-level steps that are required to create, test, and deploy a new Azure Machine Learning prediction model

©uluumy, 2016 87

Source : https://azure.microsoft.com/en-us/documentation/articles/machine-learning-studio-overview-diagram/

©uluumy, 2016 88

Recommend

Recommendation systems are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item.

Source : Wikipedia

©uluumy, 2016 89

Two primary methodologies

• Collaborative Filtering : the item recommended to the user is based on the past purchase and preference of similar users

• Content-based filtering : Based on the attributes of items purchased by the user, suggest items with similar properties.

Best examples : Amazon, Netflix

©uluumy, 2016 90

Example

©uluumy, 2016 91

Overview diagram of Azure Machine Learning Studio

Let’s look at how these two methods works using this very simple example of movies ratingHere is the example: we have a list of 6 movies (items) and 7 users. Each has rated the movies that she watched (from 1 to 5 stars). Daniel has not seen the movie "The Notebook"We want to decide if we will recommend this movie to him or not based on a prediction of his rating for the movie.So let’s start with a collaborating filtering approach

©uluumy, 2016 92

Collaborative Filtering

Daniel has not seen the movie "The Notebook"• We select the subgroup of users who watched the same movies as Daniel

and also who watched "The Notebook".• Among this group, we select the users who are "similar" to Daniel in term

of rating (for example using KNN algorithm). • We compute the average rating that Daniel's "neighbors" gave to "The

Notebook". • It gives as the predicted rating of Daniel for the movie "The Notebook"• We repeat the steps 1 to for 4 for all movies that Daniel haven't seen• We recommend Daniel the best predicted rated movies.

©uluumy, 2016 93

Content-based filtering:

We want to predict the Daniel rating for the movie "The Notebook"using the similarity between items (in our example movies), and not users, to make predictions• We select the movies that are similar to "The Notebook". Based on the

genre we can divide movies into to groups "Action" (Skyfall, Star War, X-Men) and "Romance" ("P. S I Love You", "Titanic", "The Notebook")

• Daniel have rated "P. S I Love You" and "Titanic" which are similar to "The Notebook". Based on his rating of these two movies, we give a predicted rating of "The Notebook"

• We repeat the steps 1 and 2 for all movies that Daniel haven't seen• We recommend Daniel the best predicted rated movies.

©uluumy, 2016 94

Labs

©uluumy, 2016 95

Case Study : Adventure Works

▪ AdventureWorks is a sample database created for use in demos and training on each version of Microsoft SQL Server.

▪ A company which manufactures and sells metal and composite bicycles to North American, European and Asian.

▪ two categories of customers:

▪ B2B : team of sales

▪ B2C : E-commerce©uluumy, 2016 96

Setup the Lab Environment : Tools

• Office 2016:Free Trial: products.office.com/en-us/try

• Power BIFree : powerbi.microsoft.com/en-us/desktop/

• Install SQL Server 2014 ExpressFree : microsoft.com/en-us/server-cloud/Products/sql-server-editions/sql-server-express.aspx

• MICROSOFT AZURE MLFree : studio.azureml.net/

©uluumy, 2016 97

https://products.office.com/en-us/try

https://powerbi.microsoft.com/en-us/desktop/

https://www.microsoft.com/en-us/server-cloud/Products/sql-server-editions/sql-server-express.aspx

https://studio.azureml.net/

SQL Server 2014 Expressstep 1

microsoft.com/en-us/server-cloud/Products/sql-server-editions/sql-server-express.aspx

©uluumy, 2016 98

https://www.microsoft.com/en-us/server-cloud/Products/sql-server-editions/sql-server-express.aspx

SQL Server 2014 ExpressStep 2

©uluumy, 2016 99

SQL Server 2014 ExpressStep 3 : Sign in

©uluumy, 2016 100

SQL Server 2014 ExpressStep 4a:

1- Chose SQL Server 2014 Express 64 Bit.

2- Choose your language

3- Scroll down click continue

©uluumy, 2016 101

SQL Server 2014 ExpressStep 4b:

1- Chose SQL Server Management Studio Express 64 Bit.

2- Choose your language

3- Scroll down click continue

©uluumy, 2016 102

SQL Server 2014 ExpressStep 5

When the files are downloaded : Execute

Choose the first option as shown here

©uluumy, 2016 103

SQL Server 2014 Express

Finally… you should have Microsoft SQL Server management Studio installed

©uluumy, 2016 104

Microsoft Azure Machine Learning

• Microsoft Azure ML is could-based service. So you don’t have to install anything. All you need is to have a Microsoft account ID

• Here is the address: studio.azureml.net/

©uluumy, 2016 105

https://studio.azureml.net/

DATA

©uluumy, 2016 106

Adventure Works 2014 Warehouse

Download the database : Adventure Works 2014 Warehouse

(Adventure Works 2014 Warehouse Script.zip)

from this address (the official Microsoft examples):

msftdbprodsamples.codeplex.com/releases

©uluumy, 2016 107

http://msftdbprodsamples.codeplex.com/releases

How to install the Database AdventureWorks in SQL Server Management Studio

©uluumy, 2016 108

Step 1 :

Open Microsoft SQL Server Management Studio

Server Name : YourLocalHost\SQLExpressOn my Laptop: ULUUMY\SQLExpress

©uluumy, 2016 109

Step 2 :

If you have this Error message…

©uluumy, 2016 110

Step 2 :

Open SQL Server 2014 Configuration Manager

©uluumy, 2016 111

Step 2 :

Then, Start SQL Server Service

©uluumy, 2016 112

Step 3 :

Open the file instawdbdw.sql

(from the Adventure Works 2014 Warehouse Script you have already downloaded)

©uluumy, 2016 113

Step 4 :

Put Management Studio into SQLCMD mode

Tools > Options > Query Execution and selecting By default, open new queries in SQLCMD mode

©uluumy, 2016 114

Step 5 :

Change it to the path of the Adventure Works database you have already downloaded

©uluumy, 2016 115

Step 6 :

Execute

©uluumy, 2016 116

Final result

©uluumy, 2016 117

Lab 1 : Data preparation using SQL

▪ The first lab is kind of the foundation for the following labs.

▪ We will use SQL to extract information about our customers:

▪ First we will create two tables:– Customers: socioeconomics and geographic data like, gender,

income, education, number of children, postal code and province.

– Sales : All the internet orders made by customers including quantity, amount, date and products features such as model, category and subcategory.

©uluumy, 2016 118

▪ These tasks should already have been done:

▪ Install SQL Server Express 2014.

▪ Download the database : Adventure Works 2014 Warehouse.

▪ Install the database in SQL Server Management Studio.

▪ You can download the SQL code here (Explore.sql)

©uluumy, 2016 119

https://github.com/uluumy/Become-a-Citizen-Data-Scientist/tree/master/SQL Scripts

Lab 2 : Customer Dashboard using Power BI

▪ You should already have installed Power BI.

▪ If not please go back to Lay the Foundation Section, Setup the Lab Environment Lecture.

▪ Also you should have finished lab 1 before, because you will need the two tables that we have created during that lab.

▪ If not, please, go back to lab 1.

▪ We will build a customer dashboard using the two tables we have already created during Lab1 : Customers and Sales.

▪ Here is an example:

©uluumy, 2016 120

©uluumy, 2016 121

Lab 3 : Managerial Segmentation using SQL

▪ In this lab we’re going to segment our customer based on their purchases.

▪ We will use the method we have presented in the previous lecture (managerial Segmentation).

▪ At the end we will divide our customers into 9 segments.

▪ Marketing team will be able to tailor a specific strategy for each segment.

©uluumy, 2016 122

▪ You should have finished lab 1 before, because you will need the two tables we have created during that lab.

▪ If not, please, go back to lab 1.

▪ Here is a Power BI TreeMap visualization of the 9 resulting segments.

©uluumy, 2016 123

©uluumy, 2016 124

You can downloand from GitHub the SQL script

ManagerialSegmentation.sql which implement the solution.

©uluumy, 2016 125


Lab 4 : Bike Buyers targeting using Azure ML

▪ During lab 2 we have noticed that only 50% of our customers have already bought a bike from Adventure Works.

▪ The management team, decided the by the end of the fiscal year, the company should have increased this percentage by 5%.

▪ In order to achieve this objective the marketing team will launch a telemarketing campaign to reach those among our customers who have never bought a bike.

©uluumy, 2016 126

▪ As it's not possible to contact all of them (around 9,000 customers) due to time and budget constraints, the marketing team wants to target only the most likely among them to be interested in our offer.

▪ So the VP Marketing asked, you , as the team citizen data scientist, to build a model in order to achieve this goal.

©uluumy, 2016 127

Our Challenge:

Target customers who have never purchased a bike (from

AdventureWorks) and who are the most likely to be

interested in buying one.

©uluumy, 2016 128

First.. let's get the dataset

You have two options:

▪ Create the table in SQL Server using the script BikeBuyerTargeting.sql (to download here ) and then export it to a csv file.

▪ Or download here the dataset BikeBuyerTargeting.csv.

▪ Now let's go to Azure ML Studio.

©uluumy, 2016 129


https://github.com/uluumy/Become-a-Citizen-Data-Scientist/tree/master/CSV Files

Part 1 : Predictive ModelFinal Experiment

©uluumy, 2016 130

Step 1 : studio.azureml.net/Sign In

©uluumy, 2016 131

Step 2 : DATASETSNew

©uluumy, 2016 132

Step 2 :DATASETfrom local file

©uluumy, 2016 133

Step 2 : DATSETUpload the file BikeBuyerTargeting.csv

©uluumy, 2016 134

Step 2 :DATASETthe file BikeBuyerTargeting is uploaded

©uluumy, 2016 135

Step 3 :Create New EXPERIMENT

©uluumy, 2016 136

Step 3 : Blank Experiment

©uluumy, 2016 137

Step 4 : 1- Select BikeBuyerTargeting dataset2- Drag and drop the dataset

©uluumy, 2016 138

Step 4 :Insert the module Select Columns in Dataset(Data Transformation / Manipulation)

©uluumy, 2016 139

Step 4 :

1-On the Properties lane of module “Select Columns in Dataset” Click Launch Column

2- Remove these columns as shown here

3-Click Check button

12

3

©uluumy, 2016 142

Step 5 : Search for Split Data (Data Transformation / Sample and Split)1- Drag and drop it2-Connect it to “Select Columns in DataSet3- In Properties, change Fraction of Rows to 0.7

©uluumy, 2016 143

Step 6 :Add the module Two-Class Boosted Decision Tree (Machine Learning / Initialize Model /Classification)

©uluumy, 2016 144

Step 6 :

3-Connect it to Two-Class Boosted Decision Tree (the connector on the left)

4-Connect it to Split Model (the connector on the right)©uluumy, 2016 146

Step 6 :

Train Model Proprieties1- From the Proprieties Pane, click Launch column selector

2- Select the label Variable BikeBuyer

1 2

©uluumy, 2016 147

Step 7 :

3-Connect it to Train Model (the connector on the left)

4-Connect it to Split Model (the connector on the right)

©uluumy, 2016 149

Step 9 :Let’s visualize the result2 variables have been added : Scored Labels and Scored Probabilities

©uluumy, 2016 151

Step 1Here is the result… 3 modules have been added by Azure ML1-Web Service Input2-Web Service Output3- Bike Buyer Targeting (trained model)

©uluumy, 2016 157

Step 2

1- Delete the connection between “Web Service Input” and the module “Select Column..”2-Connect “Web Service Input” to “Score Model”

1

2

©uluumy, 2016 158

Step 3

1- Select the module Select Column..” 2- Launch column selector 3-Add “BikeBuyer” to the excluded variables

1

23

©uluumy, 2016 159

Lab 5 : Next best offer using Azure ML

▪ Following the brilliant success of the previous targeted campaign, the VP marketing has asked you to work on a way to improve our customers’ retention and loyalty.

▪ In fact 50 % of our customers belong to the 3 segments "Win-Back", "Cold" and "Almost Lost". Customers in these segments have not bought a product for at least 1 year... That's something to be addressed.

©uluumy, 2016 165

Your challenge :

Build a recommendation system which suggests to each of our 18,484 customers, 3 items that she/he could be interested in.

©uluumy, 2016 166

First.. let's get the dataset

You have two options:

▪ Create the tables in SQL Server using the script Recommend.sql (to download here) and then export them to csv files.

▪ Or download here the dataset Rating.csv, User.csv and Item.csv

▪ Now let's go to Azure ML Studio

©uluumy, 2016 167


https://github.com/uluumy/Become-a-Citizen-Data-Scientist/tree/master/CSV Files

Step 2 :

DATASET from local file Upload the files RatingRecommendation.csv, UserRecommendation.csv, ItemRecommendation.csv

©uluumy, 2016 171

Step 4 :

Insert the module Edit Metatdata(Data Transformation / Manipulation)Connect it to the dataset

©uluumy, 2016 175

Step 4 :

1-On the Properties lane of module “Edit Metadata” Click Launch Column2- Include the variable “ImpliciteRating”3-Click Check button

1

2

3

©uluumy, 2016 176

Step 5 : Search for Split Data (Data Transformation / Sample and Split)1- Drag and drop it2-Connect it to “Edit Metadata”

©uluumy, 2016 178

Step 6 :

1-Add the module Train Matchbox Recommender (Machine Learning / Train)2-Connect it to Split Data (the left connector)3-Change Number of Recommendation to 3

©uluumy, 2016 179

Step 8 :Add the module “Remove Duplicate Rows “ Connect it to UserRecommendation datasetand Launch column selector

©uluumy, 2016 181

Step 9 :

Add the dataset ItemRecommendation.csvThen connect it to Train Matchbox (the right connector)

©uluumy, 2016 184

Step 10 :

1 Select Score Matchbox Recommendation 2- Drag and drop it2-Connect it from left to right to “Train Matchbox”, “Split Data”, “Remove Duplicate” and “ItemRecommendation”

©uluumy, 2016 185

Part 2 : Web Service

7 O'clock Shadow by Christopher (CC BY-SA©uluumy, 2016 187

Step 1 :

Click on Set Up Web Service and choose Predictive Web Service

Step 1

Here is the result… 3 modules have been added by Azure ML1-Web Service Input2-Web Service Output3- Lab 5 Recommendation System

©uluumy, 2016 188

Step 2

We want to keep only the variable CustomerKey from the module “Edit Metadata”Add “Select Columns in Dataset”Connect it with “Edit Metadat” and “Score Matchbox”Launch column selector and include CustomerKey

©uluumy, 2016 189

Step 2

1- Delete the connection between “Web Service Input” and the module “Edit Metadata”2-Connect “Web Service Input” to “Score Matchbox”

©uluumy, 2016 190

Resources

▪ SQL– Querying with Transact-SQLmva.microsoft.com/en-US/training-courses/querying-with-transactsql-10530?l=TjT07f87_9804984382

▪ MICROSOFT AZURE ML– Microsoft Azure Essentials: Fundamentals of Azure:

https://mva.microsoft.com/ebooks

– Building Recommendation Systems in Azure:

mva.microsoft.com/en-US/training-courses/building-recommendation-systems-in-azure-13765?l=j6AfbmlXB_4505513172

(was a very helpful resource to find the good data example to use throughout

the course and also to create the last lab)

©uluumy, 2016 194

https://mva.microsoft.com/en-US/training-courses/querying-with-transactsql-10530?l=TjT07f87_9804984382

https://mva.microsoft.com/ebooks

https://mva.microsoft.com/en-US/training-courses/building-recommendation-systems-in-azure-13765?l=j6AfbmlXB_4505513172

▪ POWER BI– Documentation

https://powerbi.microsoft.com/en-us/documentation/powerbi-service-get-started

▪ Book– Data Mining Techniques: For Marketing, Sales, and Customer Relationship

Management,Gordon S. Linoff (Author), Michael J. A. Berry (Author)

▪ Blog– http://www.kdnuggets.com/

©uluumy, 2016 195

https://powerbi.microsoft.com/en-us/documentation/powerbi-service-get-started

http://www.kdnuggets.com/

Image Credits

Bike Hängärtner CC BY

London Bike Show 2013 by Jon Arm CC BY

bikes in malacca by CC BY-ND

IA030694a simonsimages CC BY

Ines Njers CC BY-ND

The wheels Saku Takakusaki CC BY-ND

Cruisers... by micadew CC BY-SA

Rear end by Craig Sunter CC BY-ND

©uluumy, 2016 196

Image Credits

elite classes-2528 by jim simonson CC BY

corner by eflon CC BY

The oneJesus del Toro Garcia CC BY

'Phantom' bicycle JamesGardinerCollection CC0

Drahtesel mike goehler CC BY-ND

rotating shadow by rippchenmitkraut66 CC BY-ND

©uluumy, 2016 197

Take a look to our course:50% Off


©uluumy, 2016 198

https://www.udemy.com/become-a-citizen-data-scientist-marketing-perspective/?couponCode=EBOOK50

Keep in touch..

Uluumy.com

©uluumy, 2016 199

uluumy.com

https://github.com/uluumy

https://twitter.com/UluumyTweet

https://www.facebook.com/uluumy/

https://www.youtube.com/channel/UCwj4T5ONmv7rmlWoUnXYqVw

become a citizen data scientist

Data & Analytics