become a citizen data scientist

199
Become a Citizen Data Scientist Marketing Perspective ©uluumy, 2016

Upload: mohamed-jendoubi

Post on 20-Feb-2017

153 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: Become a citizen data scientist

Become a Citizen Data Scientist

Marketing Perspective

©uluumy, 2016

Page 2: Become a citizen data scientist

Understand your customer:

Profiling, Segmentation, Targeting and Recommendation using Microsoft Azure ML, SQL, Power BI

©uluumy, 2016 2

Page 3: Become a citizen data scientist

Take a look to our course:50% Off

Become a Citizen Data Scientist

©uluumy, 2016 3

Page 4: Become a citizen data scientist

Syllabus

▪ Introduction

▪ Lay the foundation

▪ Explore

▪ Segment

▪ Target

▪ Recommend

©uluumy, 2016 4

Page 5: Become a citizen data scientist

Introduction

©uluumy, 2016 5

Page 6: Become a citizen data scientist

Citizen Data Scientist

©uluumy, 2016 6

Page 7: Become a citizen data scientist

According to a Mckinsey Study, demand for Data

Scientists is projected to exceed supply by more than

50% by 2018.

Source: MCKINSEY, "Big data: The next frontier for innovation, competition, and productivity", 2011

©uluumy, 2016 7

Page 8: Become a citizen data scientist

The term Citizen Data Scientist was introduced by Gartner in its 2015 Hype Cycle for Emerging Technologies which we’re going present later in this lecture.Here is the definition given by Gartner :

“A person who creates or generates models that leverage predictive or prescriptive analytics but whose primary job function is outside of the field of statistics”

Source: GARTNER, "Hype Cycle for Emerging Technologies", 2015

©uluumy, 2016 8

Page 9: Become a citizen data scientist

Gartner Hype Cycle

Gartner Hype Cycle provides a graphic representation of the maturity and adoption of technologies and applications, and how they are potentially relevant to solving real business problems and exploiting new

opportunities.

©uluumy, 2016 9

Page 10: Become a citizen data scientist

Gartner’s Emerging Technologies Hype Cycle contains a representative set of still-maturing technologies that receive interest from clients, and technologies that Gartner feels are significant and should be monitored.“

Gartner is predicting that Citizen Data Scientist and Advanced analytics with Self-service delivery to reach the Plateau of productivity in 2 to 5 years.

©uluumy, 2016 10

Page 11: Become a citizen data scientist

©uluumy, 2016 11

Source: GARTNER, "Hype Cycle for Emerging Technologies", 2015

Page 12: Become a citizen data scientist

Why do we need a citizen data scientist?

▪ First, the shortage of data scientists (because of the highly specialized skills needed in: computer science, coding, mathematics, machine learning, statistics) .... and still they need to have in-depth knowledge of the business

▪ Second, the rise of self-service data preparation

▪ Third, the development of advanced analytics platforms (Microsoft Machine Learning, IBM Watson, ...)

©uluumy, 2016 12

Page 13: Become a citizen data scientist

The need is so, that according to Gartner, by 2017, the

number of citizen data scientists will grow 5 times faster

than the number of highly skilled data scientists.

©uluumy, 2016 13

Page 14: Become a citizen data scientist

Who could be a Citizen Data Scientist?

In most companies, they’re already there. Here who they are. They Have:

▪ Solid business domain knowledge (marketing, finance, sales, operations, ...),

▪ Analytical mindset,

▪ The willingness to learn new methods and use new tools

©uluumy, 2016 14

Page 15: Become a citizen data scientist

Benefits

The rise of Citizen Data Scientist is a great opportunity for every organization. Because business people will bring with them:

▪ Contextual knowledge

▪ the Democratization of analytics in every department

©uluumy, 2016 15

Page 16: Become a citizen data scientist

Lay the Foundations: Definitions

Data Science is not equal to Big Data…

©uluumy, 2016 16

Page 17: Become a citizen data scientist

“You don’t have to have a petabyte of data and the

expenses that come along with it in order to predict the

interest of your customer base”

Source: John W. Foreman : Data Smart: Using Data Science to Transform Information into Insight

©uluumy, 2016 17

Page 18: Become a citizen data scientist

Data Science:

“Data Science is the transformation of data using

mathematics and statistics into valuable insight,

decisions, and products”

Source: Introduction to Machine Learning, 2nd Edition, MIT Press

©uluumy, 2016 18

Page 19: Become a citizen data scientist

Machine learning:

“The goal of machine learning is to program computers

to use example data or past experience to solve a given

problem.”

Source: Introduction to Machine Learning, 2nd Edition, MIT Press

©uluumy, 2016 19

Page 20: Become a citizen data scientist

Machine Learning : 2 (main) categories

1- Supervised Learning: Prediction. You want to predict unknown answers from answers you already have.

2- Unsupervised Learning: Categorization. You want to find unknown answers mostly grouping- directly from data.

©uluumy, 2016 20

Page 21: Become a citizen data scientist

Supervised Learning..

Supervised learning can be separated into to general categories of algorithms:

• Classification algorithms: are used to predict categorical responses. As example we can cite:

▪ Credit card fraud detection

▪ Customer likely to churn

▪ Customer targeting

• Regression: used to predict continuous variable.

• Example: Predict the future sales of a product©uluumy, 2016 21

Page 22: Become a citizen data scientist

Unsupervised Learning..

Unsupervised Learning: Categorization. You want to find unknown answers mostly grouping- directly from data.

• Customer segmentation

• Recommendation system

©uluumy, 2016 22

Page 23: Become a citizen data scientist

Data Science Process

According to a KDNuggets poll, 43% of the advanced analytics projects use the CRISP-DM methodology

©uluumy, 2016 23

Page 24: Become a citizen data scientist

CRIS-DM Methodology

Source Wikipedia©uluumy, 2016 24Source : Wikipedia

Page 25: Become a citizen data scientist

The process is composed of 5 steps

The key point to note is that the Process is circular rather

than linear. It means that we can and should go back and

forth between the steps.

Source: CRISP-DM : Cross Industry Standard Process for Data Mininghttp://spss.ch/upload/1107356429_CrispDM1.0.pdf

©uluumy, 2016 25

Page 26: Become a citizen data scientist

1- Business Understanding

The first step is BUSINESS UDERSTANDING. It’s the most critical step of the process. You need to frame the problem. At the end of this stage you should have a deep understanding of the problem you want to resolve and a clear idea about the data you will use

©uluumy, 2016 26

Page 27: Become a citizen data scientist

2- Data Understanding

The second step is DATA UNDERSTANDING. Your business knowledge will help you to contextualize your data. You notice that the steps Business understanding and Data Understanding are linked together with a double arrow.

©uluumy, 2016 27

Page 28: Become a citizen data scientist

3- Data Preparation

The third step is DATA PREPARATION. In this stage you will check for the common issues like missing values and outliers. Also doing operating like filtering merging and transformationAlso you run some data exploration using graphics and tables.

©uluumy, 2016 28

Page 29: Become a citizen data scientist

4- Modeling

The fourth step is the MODELING step. In this stage you build your model (for example a regression or a classification). You notice that this step is linked to the previous one with a double arrow which mean that you will often need to step back to Data Preparation.

©uluumy, 2016 29

Page 30: Become a citizen data scientist

5- Evaluation

The next step is EVALUATION. Every model you build has to be evaluated in term of accuracy, robustness and deployability. You notice that at this stage you may have to step back to the Business Understanding stage if the model you have built could not be deployed

©uluumy, 2016 30

Page 31: Become a citizen data scientist

6- Deployment

The last step is DEPLOYMENT. The final purpose of any data science project is to give actionable insight.

©uluumy, 2016 31

Page 32: Become a citizen data scientist

Data Science Toolbox

©uluumy, 2016 32

Page 33: Become a citizen data scientist

©uluumy, 2016 33

Page 34: Become a citizen data scientist

•Data preparation : SQL. a must know tool• 60% of data scientists said they spent the most time

cleaning and organizing data *• SQL : first among the top 10 in-demand skills for data

scientists*

•Data visualization : Power BI and Excel.•Data analysis : Azure Machine Learning

• 55% of data scientists think that Machine Learning had significant importance for their companies *

* source: CrowdFlower, 2016 Data Science Report

©uluumy, 2016 34

Page 35: Become a citizen data scientist

CRIS-DM & Data Science Toolbox

Source Wikipedia©uluumy, 2016 35

Page 36: Become a citizen data scientist

Data Preparation

The first tool is SQL.According to the CrowdFlower 2016 Data Science Report, 60% of data scientists said they spent the most time cleaning and organizing data.It’s THE language of database.You will have to use SQL in order to process the data preparation step.SQL is a must KNOW tool.

©uluumy, 2016 36

Page 37: Become a citizen data scientist

Data Visualization & Data Analysis

• Data VisualizationFor Data Visualization, we will use Power BI

• Data AnalysisMicrosoft Azure ML is one of the most relevant tool to use for citizen data scientist because of its ability to quickly create machine learning experiments and because its slight learning curve.

©uluumy, 2016 37

Page 38: Become a citizen data scientist

Microsoft Azure Machine Learning

« While machine learning has been around for a long time, usage was primarily restricted to people with deep skills and deep pockets. The cloud changes this dynamic completely »

Joseph Sirosh Corporate Vice President, Data Group at Microsoft

©uluumy, 2016 38

Page 39: Become a citizen data scientist

Azure Machine Learning Workflow

At high level we can divide it into 3 blocks: Data, Machine Learning Services, and Visualisation

You can clearly see how this workflow ca be embedded into the CRISP Data science we have presented in a previous lecture

©uluumy, 2016 39

Source: Microsoft

Page 40: Become a citizen data scientist

Marketing Framework Analysis

About 90% of the data collected by companies

today are related to customer actions and

marketing activities

©uluumy, 2016 40

Page 41: Become a citizen data scientist

“Marketing thinking is shifting from trying to maximize the company’s profit from each transaction to maximize the long-run profit from each relationship”, To rephrase it, Companies has shifted from being product-centric to CUSTOMER centric”.

Philip Kotler

©uluumy, 2016 41

Page 42: Become a citizen data scientist

That’s why it’s of vital importance to know as much as possible about our customer and to customize as much as possible our offer to each.

Despite the huge amount of data, we now have on each of our customer (from CRM, Web Site, Social Media) and the complexity to have a 360 view of the customer; still, we can frame the relationship with our customer with these 4 questions:

©uluumy, 2016 42

Page 43: Become a citizen data scientist

1- Who are my customers?

2- How to reach and interact with them?

3- Which customer should I target?

4- What is the best next-offer?

These 4 questions lead as to this 4 blocks Marketing Framework Analysis:

©uluumy, 2016 43

Page 44: Become a citizen data scientist

©uluumy, 2016 44

Page 45: Become a citizen data scientist

Philip Kotler, the founder of Modern Marketing said “Marketing’s future lies in database marketing where we know enough about each customer to make relevant and customized offers to each”

Explore

©uluumy, 2016 45

Page 46: Become a citizen data scientist

AdventureWorks is a sample database created for use in demos and training on each version of Microsoft SQL Server. A company which manufactures and sells metal and composite bicycles to North American, European and Asian.two categories of customers:B2B : team of salesB2C : E-commerce

Case Study : Adventure Works

©uluumy, 2016 46

Page 47: Become a citizen data scientist

According to the CrowdFlower 2016 Data Science Report, SQL is first among the top 10 in-demand skills for data scientists.

SQL Basics

©uluumy, 2016 47

Page 48: Become a citizen data scientist

SQL stands for Structured Query LanguageSQL is the language of databases : creation, access, manipulationRelational Database : a software to offer access to stored information and their manipulation.Information are stored in tables

SQL Basics

©uluumy, 2016 48

Page 49: Become a citizen data scientist

Tables : A set of data arranged in columns and rows. The columns represent characteristics of stored data and the rows represent actual data entries. Tables for Database is what's spreadsheet for Excel.SQL Server Express (a free "lite" version of SQL Server)

SQL Basics

©uluumy, 2016 49

Page 50: Become a citizen data scientist

Table Relationship:

On fundamental concept of database is the tables relationship.Let’s take an example from our database AdventureWorksWe took 2 tables: Product and Sales. Each table must have one Primary Key.

SQL Basics

©uluumy, 2016 50

Page 51: Become a citizen data scientist

A primary key is a field in a table which uniquely identifies each row in the table. That means that Primary keys must contain unique values. In our example: ProductKey is the primary Key of the table DimProduct.

SQL Basics

©uluumy, 2016 51

Page 52: Become a citizen data scientist

A FOREIGN KEY in one table is a column which points to a PRIMARY KEY in another table.Let’s look to our example. ProducKey is a foreign key column of the table FactInternetSales which refer to the primary key of the table DimProduct. It allows to identify the relationship between the two tables.

SQL Basics

©uluumy, 2016 52

Page 53: Become a citizen data scientist

©uluumy, 2016 53

Page 54: Become a citizen data scientist

SQL main operations

SQL contains 4 main operations. We can• Insert of new data into a table • SELECT data from a table• Update data already existing in a table• Delete data from table

©uluumy, 2016 54

Page 55: Become a citizen data scientist

SQL main operations

The insert, update and delete operation are usually restricted to the Database administrator. As a Citizen Data Scientist you will essentially need to select data from the database.

Let’s look how to select data from a table...

©uluumy, 2016 55

Page 56: Become a citizen data scientist

SQL main operations

Selection:

Here is the general syntax of a data selection

SELECT <Column List> FROM <Table Name>WHERE <Search Condition>

©uluumy, 2016 56

Page 57: Become a citizen data scientist

SQL main operations

Aggregation

How to group data and use aggregates...

SELECT <Column List>, <Aggregate Function> (<Column Name>) FROM <Table Name>WHERE <Search Condition>GROUP BY <Column List>

©uluumy, 2016 57

Page 58: Become a citizen data scientist

SQL main operations

Selection from 2 tables

How to select data from more than one table...

SELECT <Column List> FROM <Table1>JOIN <Table2> ON <Table1>. <Column1> = <Table2>. <Column1>

©uluumy, 2016 58

Page 59: Become a citizen data scientist

Customer Dashboard using Power BI

©uluumy, 2016 59

Page 60: Become a citizen data scientist

Segment

“Customer segmentation is the process of diving a customer into groups of individuals who are similar in specific ways relevant to marketing”

Source “A Marketer’s Guide to Analytics”, SAS

©uluumy, 2016 60

Page 61: Become a citizen data scientist

Segmentation: Types

“The literature about types of segmentation is very diverse.The best I could find is the one given in the SS paper “A Marketer’s Guide to Analytics” It distinguishes between two main types of segmentation:• Foundation segmentation• Targeting segmentation

©uluumy, 2016 61

Page 62: Become a citizen data scientist

Segmentation: Types

Foundation segmentation: Core segmentsIt has these proprieties:• All customers are included• Each customer falls into only one segment• Each segment can be subdivided into clusters• Attributes: value, profit, attrition, risk, demographics,

firmographics, etc.

©uluumy, 2016 62

Page 63: Become a citizen data scientist

Segmentation: Types

The second type is Targeting Segmentation It identifies customers with specific needs and preferences. Useful for specific marketing programs and campaigns identifies customers with specific needs and preferences.It has these features:• Not all customers can be included• Each customer may fall into many different segments

©uluumy, 2016 63

Page 64: Become a citizen data scientist

Good segmentation

A good segmentation must have these three features: • Relevant to the business objective• Simple: understandable and easy to characterize• Actionable

©uluumy, 2016 64

Page 65: Become a citizen data scientist

Managerial Segmentation

RFM method

We will use a very simple yet insightful method to build a customer segmentation which is relevant, simple and most importantly actionableRFM method has been around for decade. Yet it’s is still very useful

©uluumy, 2016 65

Page 66: Become a citizen data scientist

RFM in a nutshell

RFM is an acronym for Recency, Frequency and Monetary

• Recency: number of days since last purchase/Use/visit• Frequency: number of purchase/use/visit• M: Amount of purchase / time spent

©uluumy, 2016 66

Page 67: Become a citizen data scientist

RFM in a nutshell

Based on each of these 3 factors, all the customers are ranked and given a score from 1 to 4 (depending on which quartile they are). 1 being the best score.Now for each customer we have a composite score R-F-MAs each factor could have 4 different values (1,2,3, or 4)We can in theory divide our customer into until 64 segments!!

©uluumy, 2016 67

Page 68: Become a citizen data scientist

RFM in a nutshell

It’s a good first step …but we cannot stop here because we want to have simple but ACTIONABLE segmentationThat’s why we have used the term Managerial SegmentationAs a managerial decision we can decide that we need to have let’s say 9 different segments based on the RFM score we have already computed

©uluumy, 2016 68

Page 69: Become a citizen data scientist

RFM: 9 segments

Here the description of each of the 9 segments:

Best: R (1) AND F (1) AND M (1): it’s simple they have the highest score...Novice: R (1) AND F (3-4)Active High Value: R (1) AND M (1,2)Active: R (1)

©uluumy, 2016 69

Page 70: Become a citizen data scientist

RFM: 9 segments

Warm High Value: R (2) AND M (1)Warm: R (2)Win-back: R (3,4) AND {F (1) OR M (1)}Cold: R (3)Almost lost: R (4)

©uluumy, 2016 70

Page 71: Become a citizen data scientist

RFM: actions

Now that we have our customer segmentation.What Action can we take based on this segmentationHere are some ideas/Examples:

• Best Customer: • “Thank you” gift• “Exclusive preview” of new service/product

©uluumy, 2016 71

Page 72: Become a citizen data scientist

RFM: actions

• Novice:• Connection on social media• Personal greeting message• Free shipping

• Warm High-Value: • Next best offer “Get $50 in “ZZZZ” Dollars for every

$50 you spend”• Almost Lost

• “Last chance” special offer©uluumy, 2016 72

Page 73: Become a citizen data scientist

Power BI TreeMap visualization of the 9 resulting segments

©uluumy, 2016 73

Page 74: Become a citizen data scientist

Target

©uluumy, 2016 74

Page 75: Become a citizen data scientist

Classification model : basics

Here is a basic data flow for any classification model

Data training is the input of the classification algorithm. The purpose is to “train” the algorithm with historical data which contain the labels (target) variable.

For example, say we want to create a model to predict which customer is likely to respond positively to specific marketing campaign.

Training data contains a list of customers who were targeted in the past for the same kind of campaign. The labels variable is a Yes/NO variable

Source: http://www.cs.princeton.edu/~schapire/talks/picasso-minicourse.pdf©uluumy, 2016 75

Page 76: Become a citizen data scientist

Classification Model : Evaluation

To evaluate and chose which model is the best fitted for our problem we can use several measures. Here are the most widely used: • Accuracy: the proportion of the total number of

predictions that were correct.• Positive Predictive Value or Precision: the proportion

of positive cases that were correctly identified.

©uluumy, 2016 76

Page 77: Become a citizen data scientist

Classification Model : Evaluation

• Negative Predictive Value: the proportion of negative cases that were correctly identified.

• Sensitivity or Recall: the proportion of actual positive cases which are correctly identified.

• Specificity: the proportion of actual negative cases which are correctly identified.

• ROC curve: It is created by plotting the recall against the false positive rate

©uluumy, 2016 77

Page 78: Become a citizen data scientist

Confusion Matrix

Source: http://www.analyticsvidhya.com/blog/2016/02/7-important-model-evaluation-error-metrics/

©uluumy, 2016 78

Page 79: Become a citizen data scientist

Classification Fundamental concept : Bias-Variance Tradeoff

Google's Research Director Peter Norvig claimed that

"We don’t have better algorithms. We just have moredata."

©uluumy, 2016 79

Page 80: Become a citizen data scientist

Prediction Error

▪ You can never have a prediction model without error.

▪ Without going further with the maths behind it, predictionerror is mainly divided into 2 elements: Bias and Variance.

▪ Error due to Bias is the difference between the predictedvalue and the correct value.

▪ Error due to Variance is defined as the variability of a modelprediction for a given data point.

▪ As “a picture's worth a thousand words”, let’s look to thisgraphic taken from one of the best article I found on thissubject “Understanding the Bias-Variance Trade-off”.

©uluumy, 2016 80

Page 81: Become a citizen data scientist

Bias vs Variance

Bulls-eye represents the graphical visualization of bias and Variance. Each point is the result of one iteration of the model building.

The center of the target is a model that predicts perfectly the actual values.

Source : scott.fortmann-roe.com/docs/BiasVariance.html©uluumy, 2016 81

Page 82: Become a citizen data scientist

Bias vs Variance

We have mainly four cases:

▪ Low Bias and Low Variance: That’s wherewe want to be! We have here a goodmodel

▪ High Bias and Low Variance: that’s whatwe call an under-fitted model. It meansthat our model lacks some information.It’s too simple Maybe we have to addvariables to our training data. Alsoevaluating models using other methodscould be a good option too.

©uluumy, 2016 82

Page 83: Become a citizen data scientist

Bias vs Variance

We have mainly four cases:

▪ Low Bias and High Variance: We have anOver-Fitted Model. It means that the modelis too complicated for the data we have. Putsimply, the model cannot be generalized. Thesolution is to add more data into our trainingset and/or to reduce the number of features(the complexity), we use Ensemble methodlike random forest, bagging and boosting

▪ High Bias and High Variance: we still need towork on our model. My suggestion is totackle first the Bias error by using othermethods and adding variables if you can

©uluumy, 2016 83

Page 84: Become a citizen data scientist

Bias –Variance TradeOff

Here is another way to sum-up the bias-variance trade-off:

Prediction Error is plotted against Model complexity twice: the green line is the result using the training data. The red line is the result using the test Data

Source: Hastie, Tibshirani, Friedman “Elements of Statistical Learning” 2001©uluumy, 2016 84

Page 85: Become a citizen data scientist

Under-Fitted Model

When the model is too simple (lowcomplexity):

▪ The gap between the two plots is narrow.That’s an indication for low variance

▪ The prediction error is high for trainingand test data. It means a High-Bias

▪ Hence we have an under-fitted model

©uluumy, 2016 85

Page 86: Become a citizen data scientist

Over-Fitted Model

▪ Higher the complexity is, higher the gapis between the two plots

▪ When The prediction error between thetraining and the test data become toowide.

▪ It means that the model reached theover-fitting mode

©uluumy, 2016 86

Page 87: Become a citizen data scientist

Overview diagram of Azure Machine Learning Studio

Microsoft Azure Machine Learning Studio is a drag-and-drop cloud-based service you can use to build, test, and deploy predictive analytics solutions on your data. Machine Learning Studio publishes models as web services that can easily be consumed by custom apps or BI tools such as Excel or Power BI.This Figure (source) summarizes the basic high-level steps that are required to create, test, and deploy a new Azure Machine Learning prediction model

©uluumy, 2016 87

Page 88: Become a citizen data scientist

Source : https://azure.microsoft.com/en-us/documentation/articles/machine-learning-studio-overview-diagram/

©uluumy, 2016 88

Page 89: Become a citizen data scientist

Recommend

Recommendation systems are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item.

Source : Wikipedia

©uluumy, 2016 89

Page 90: Become a citizen data scientist

Two primary methodologies

• Collaborative Filtering : the item recommended to the user is based on the past purchase and preference of similar users

• Content-based filtering : Based on the attributes of items purchased by the user, suggest items with similar properties.

Best examples : Amazon, Netflix

©uluumy, 2016 90

Page 91: Become a citizen data scientist

Example

©uluumy, 2016 91

Page 92: Become a citizen data scientist

Overview diagram of Azure Machine Learning Studio

Let’s look at how these two methods works using this very simple example of movies ratingHere is the example: we have a list of 6 movies (items) and 7 users. Each has rated the movies that she watched (from 1 to 5 stars). Daniel has not seen the movie "The Notebook"We want to decide if we will recommend this movie to him or not based on a prediction of his rating for the movie.So let’s start with a collaborating filtering approach

©uluumy, 2016 92

Page 93: Become a citizen data scientist

Collaborative Filtering

Daniel has not seen the movie "The Notebook"• We select the subgroup of users who watched the same movies as Daniel

and also who watched "The Notebook".• Among this group, we select the users who are "similar" to Daniel in term

of rating (for example using KNN algorithm). • We compute the average rating that Daniel's "neighbors" gave to "The

Notebook". • It gives as the predicted rating of Daniel for the movie "The Notebook"• We repeat the steps 1 to for 4 for all movies that Daniel haven't seen• We recommend Daniel the best predicted rated movies.

©uluumy, 2016 93

Page 94: Become a citizen data scientist

Content-based filtering:

We want to predict the Daniel rating for the movie "The Notebook"using the similarity between items (in our example movies), and not users, to make predictions• We select the movies that are similar to "The Notebook". Based on the

genre we can divide movies into to groups "Action" (Skyfall, Star War, X-Men) and "Romance" ("P. S I Love You", "Titanic", "The Notebook")

• Daniel have rated "P. S I Love You" and "Titanic" which are similar to "The Notebook". Based on his rating of these two movies, we give a predicted rating of "The Notebook"

• We repeat the steps 1 and 2 for all movies that Daniel haven't seen• We recommend Daniel the best predicted rated movies.

©uluumy, 2016 94

Page 95: Become a citizen data scientist

Labs

©uluumy, 2016 95

Page 96: Become a citizen data scientist

Case Study : Adventure Works

▪ AdventureWorks is a sample database created for use in demos and training on each version of Microsoft SQL Server.

▪ A company which manufactures and sells metal and composite bicycles to North American, European and Asian.

▪ two categories of customers:

▪ B2B : team of sales

▪ B2C : E-commerce©uluumy, 2016 96

Page 97: Become a citizen data scientist

Setup the Lab Environment : Tools

• Office 2016:Free Trial: products.office.com/en-us/try

• Power BIFree : powerbi.microsoft.com/en-us/desktop/

• Install SQL Server 2014 ExpressFree : microsoft.com/en-us/server-cloud/Products/sql-server-editions/sql-server-express.aspx

• MICROSOFT AZURE MLFree : studio.azureml.net/

©uluumy, 2016 97

Page 98: Become a citizen data scientist

SQL Server 2014 Expressstep 1

microsoft.com/en-us/server-cloud/Products/sql-server-editions/sql-server-express.aspx

©uluumy, 2016 98

Page 99: Become a citizen data scientist

SQL Server 2014 ExpressStep 2

©uluumy, 2016 99

Page 100: Become a citizen data scientist

SQL Server 2014 ExpressStep 3 : Sign in

©uluumy, 2016 100

Page 101: Become a citizen data scientist

SQL Server 2014 ExpressStep 4a:

1- Chose SQL Server 2014 Express 64 Bit.

2- Choose your language

3- Scroll down click continue

©uluumy, 2016 101

Page 102: Become a citizen data scientist

SQL Server 2014 ExpressStep 4b:

1- Chose SQL Server Management Studio Express 64 Bit.

2- Choose your language

3- Scroll down click continue

©uluumy, 2016 102

Page 103: Become a citizen data scientist

SQL Server 2014 ExpressStep 5

When the files are downloaded : Execute

Choose the first option as shown here

©uluumy, 2016 103

Page 104: Become a citizen data scientist

SQL Server 2014 Express

Finally… you should have Microsoft SQL Server management Studio installed

©uluumy, 2016 104

Page 105: Become a citizen data scientist

Microsoft Azure Machine Learning

• Microsoft Azure ML is could-based service. So you don’t have to install anything. All you need is to have a Microsoft account ID

• Here is the address: studio.azureml.net/

©uluumy, 2016 105

Page 106: Become a citizen data scientist

DATA

©uluumy, 2016 106

Page 107: Become a citizen data scientist

Adventure Works 2014 Warehouse

Download the database : Adventure Works 2014 Warehouse

(Adventure Works 2014 Warehouse Script.zip)

from this address (the official Microsoft examples):

msftdbprodsamples.codeplex.com/releases

©uluumy, 2016 107

Page 108: Become a citizen data scientist

How to install the Database AdventureWorks in SQL Server Management Studio

©uluumy, 2016 108

Page 109: Become a citizen data scientist

Step 1 :

Open Microsoft SQL Server Management Studio

Server Name : YourLocalHost\SQLExpressOn my Laptop: ULUUMY\SQLExpress

©uluumy, 2016 109

Page 110: Become a citizen data scientist

Step 2 :

If you have this Error message…

©uluumy, 2016 110

Page 111: Become a citizen data scientist

Step 2 :

Open SQL Server 2014 Configuration Manager

©uluumy, 2016 111

Page 112: Become a citizen data scientist

Step 2 :

Then, Start SQL Server Service

©uluumy, 2016 112

Page 113: Become a citizen data scientist

Step 3 :

Open the file instawdbdw.sql

(from the Adventure Works 2014 Warehouse Script you have already downloaded)

©uluumy, 2016 113

Page 114: Become a citizen data scientist

Step 4 :

Put Management Studio into SQLCMD mode

Tools > Options > Query Execution and selecting By default, open new queries in SQLCMD mode

©uluumy, 2016 114

Page 115: Become a citizen data scientist

Step 5 :

Change it to the path of the Adventure Works database you have already downloaded

©uluumy, 2016 115

Page 116: Become a citizen data scientist

Step 6 :

Execute

©uluumy, 2016 116

Page 117: Become a citizen data scientist

Final result

©uluumy, 2016 117

Page 118: Become a citizen data scientist

Lab 1 : Data preparation using SQL

▪ The first lab is kind of the foundation for the following labs.

▪ We will use SQL to extract information about our customers:

▪ First we will create two tables:– Customers: socioeconomics and geographic data like, gender,

income, education, number of children, postal code and province.

– Sales : All the internet orders made by customers including quantity, amount, date and products features such as model, category and subcategory.

©uluumy, 2016 118

Page 119: Become a citizen data scientist

▪ These tasks should already have been done:

▪ Install SQL Server Express 2014.

▪ Download the database : Adventure Works 2014 Warehouse.

▪ Install the database in SQL Server Management Studio.

▪ You can download the SQL code here (Explore.sql)

©uluumy, 2016 119

Page 120: Become a citizen data scientist

Lab 2 : Customer Dashboard using Power BI

▪ You should already have installed Power BI.

▪ If not please go back to Lay the Foundation Section, Setup the Lab Environment Lecture.

▪ Also you should have finished lab 1 before, because you will need the two tables that we have created during that lab.

▪ If not, please, go back to lab 1.

▪ We will build a customer dashboard using the two tables we have already created during Lab1 : Customers and Sales.

▪ Here is an example:

©uluumy, 2016 120

Page 121: Become a citizen data scientist

©uluumy, 2016 121

Page 122: Become a citizen data scientist

Lab 3 : Managerial Segmentation using SQL

▪ In this lab we’re going to segment our customer based on their purchases.

▪ We will use the method we have presented in the previous lecture (managerial Segmentation).

▪ At the end we will divide our customers into 9 segments.

▪ Marketing team will be able to tailor a specific strategy for each segment.

©uluumy, 2016 122

Page 123: Become a citizen data scientist

▪ You should have finished lab 1 before, because you will need the two tables we have created during that lab.

▪ If not, please, go back to lab 1.

▪ Here is a Power BI TreeMap visualization of the 9 resulting segments.

©uluumy, 2016 123

Page 124: Become a citizen data scientist

©uluumy, 2016 124

Page 125: Become a citizen data scientist

You can downloand from GitHub the SQL script

ManagerialSegmentation.sql which implement the solution.

©uluumy, 2016 125

Page 126: Become a citizen data scientist

Lab 4 : Bike Buyers targeting using Azure ML

▪ During lab 2 we have noticed that only 50% of our customers have already bought a bike from Adventure Works.

▪ The management team, decided the by the end of the fiscal year, the company should have increased this percentage by 5%.

▪ In order to achieve this objective the marketing team will launch a telemarketing campaign to reach those among our customers who have never bought a bike.

©uluumy, 2016 126

Page 127: Become a citizen data scientist

▪ As it's not possible to contact all of them (around 9,000 customers) due to time and budget constraints, the marketing team wants to target only the most likely among them to be interested in our offer.

▪ So the VP Marketing asked, you , as the team citizen data scientist, to build a model in order to achieve this goal.

©uluumy, 2016 127

Page 128: Become a citizen data scientist

Our Challenge:

Target customers who have never purchased a bike (from

AdventureWorks) and who are the most likely to be

interested in buying one.

©uluumy, 2016 128

Page 129: Become a citizen data scientist

First.. let's get the dataset

You have two options:

▪ Create the table in SQL Server using the script BikeBuyerTargeting.sql (to download here ) and then export it to a csv file.

▪ Or download here the dataset BikeBuyerTargeting.csv.

▪ Now let's go to Azure ML Studio.

©uluumy, 2016 129

Page 130: Become a citizen data scientist

Part 1 : Predictive ModelFinal Experiment

©uluumy, 2016 130

Page 131: Become a citizen data scientist

Step 1 : studio.azureml.net/Sign In

©uluumy, 2016 131

Page 132: Become a citizen data scientist

Step 2 : DATASETSNew

©uluumy, 2016 132

Page 133: Become a citizen data scientist

Step 2 :DATASETfrom local file

©uluumy, 2016 133

Page 134: Become a citizen data scientist

Step 2 : DATSETUpload the file BikeBuyerTargeting.csv

©uluumy, 2016 134

Page 135: Become a citizen data scientist

Step 2 :DATASETthe file BikeBuyerTargeting is uploaded

©uluumy, 2016 135

Page 136: Become a citizen data scientist

Step 3 :Create New EXPERIMENT

©uluumy, 2016 136

Page 137: Become a citizen data scientist

Step 3 : Blank Experiment

©uluumy, 2016 137

Page 138: Become a citizen data scientist

Step 4 : 1- Select BikeBuyerTargeting dataset2- Drag and drop the dataset

©uluumy, 2016 138

Page 139: Become a citizen data scientist

Step 4 :Insert the module Select Columns in Dataset(Data Transformation / Manipulation)

©uluumy, 2016 139

Page 140: Become a citizen data scientist

Step 4 : Drag and drop it

©uluumy, 2016 140

Page 141: Become a citizen data scientist

Step 4 :Connect it to the dataset

©uluumy, 2016 141

Page 142: Become a citizen data scientist

Step 4 :

1-On the Properties lane of module “Select Columns in Dataset” Click Launch Column

2- Remove these columns as shown here

3-Click Check button

12

3

©uluumy, 2016 142

Page 143: Become a citizen data scientist

Step 5 : Search for Split Data (Data Transformation / Sample and Split)1- Drag and drop it2-Connect it to “Select Columns in DataSet3- In Properties, change Fraction of Rows to 0.7

©uluumy, 2016 143

Page 144: Become a citizen data scientist

Step 6 :Add the module Two-Class Boosted Decision Tree (Machine Learning / Initialize Model /Classification)

©uluumy, 2016 144

Page 145: Become a citizen data scientist

Step 6 :

1- Select the module Train Model (Machine Learning / Train)

2-Drag and drop it

©uluumy, 2016 145

Page 146: Become a citizen data scientist

Step 6 :

3-Connect it to Two-Class Boosted Decision Tree (the connector on the left)

4-Connect it to Split Model (the connector on the right)©uluumy, 2016 146

Page 147: Become a citizen data scientist

Step 6 :

Train Model Proprieties1- From the Proprieties Pane, click Launch column selector

2- Select the label Variable BikeBuyer

1 2

©uluumy, 2016 147

Page 148: Become a citizen data scientist

Step 7 :

1 Select Score Model (Machine Learning / Score)

2- Drag and drop it

©uluumy, 2016 148

Page 149: Become a citizen data scientist

Step 7 :

3-Connect it to Train Model (the connector on the left)

4-Connect it to Split Model (the connector on the right)

©uluumy, 2016 149

Page 150: Become a citizen data scientist

Step 8 :Run the experiment

©uluumy, 2016 150

Page 151: Become a citizen data scientist

Step 9 :Let’s visualize the result2 variables have been added : Scored Labels and Scored Probabilities

©uluumy, 2016 151

Page 152: Become a citizen data scientist

Step 10 :Add Evaluate Model (Machine Learning / Evaluate)

©uluumy, 2016 152

Page 153: Become a citizen data scientist

Step 11 :Run… and visualize the result

©uluumy, 2016 153

Page 154: Become a citizen data scientist

Step 11 :ROC…

©uluumy, 2016 154

Page 155: Become a citizen data scientist

Step 11 :Measures…

©uluumy, 2016 155

Page 156: Become a citizen data scientist

Part 2 : Web Service

Step 1 : Click on Set Up Web Service and choose Predictive Web Service

©uluumy, 2016 156

Page 157: Become a citizen data scientist

Step 1Here is the result… 3 modules have been added by Azure ML1-Web Service Input2-Web Service Output3- Bike Buyer Targeting (trained model)

©uluumy, 2016 157

Page 158: Become a citizen data scientist

Step 2

1- Delete the connection between “Web Service Input” and the module “Select Column..”2-Connect “Web Service Input” to “Score Model”

1

2

©uluumy, 2016 158

Page 159: Become a citizen data scientist

Step 3

1- Select the module Select Column..” 2- Launch column selector 3-Add “BikeBuyer” to the excluded variables

1

23

©uluumy, 2016 159

Page 160: Become a citizen data scientist

Step 4

Run …Then Click Deploy Web Service

©uluumy, 2016 160

Page 161: Become a citizen data scientist

Step 4

Here is the result….Click on Test

©uluumy, 2016 161

Page 162: Become a citizen data scientist

Step 5

Let’s test the web service

©uluumy, 2016 162

Page 163: Become a citizen data scientist

Step 5

The prediction (“yes” with a probability equal to 0.831

©uluumy, 2016 163

Page 164: Become a citizen data scientist

Try to improve the model in term of precision (0.836).

Optional Challenge:

©uluumy, 2016 164

Page 165: Become a citizen data scientist

Lab 5 : Next best offer using Azure ML

▪ Following the brilliant success of the previous targeted campaign, the VP marketing has asked you to work on a way to improve our customers’ retention and loyalty.

▪ In fact 50 % of our customers belong to the 3 segments "Win-Back", "Cold" and "Almost Lost". Customers in these segments have not bought a product for at least 1 year... That's something to be addressed.

©uluumy, 2016 165

Page 166: Become a citizen data scientist

Your challenge :

Build a recommendation system which suggests to each of our 18,484 customers, 3 items that she/he could be interested in.

©uluumy, 2016 166

Page 167: Become a citizen data scientist

First.. let's get the dataset

You have two options:

▪ Create the tables in SQL Server using the script Recommend.sql (to download here) and then export them to csv files.

▪ Or download here the dataset Rating.csv, User.csv and Item.csv

▪ Now let's go to Azure ML Studio

©uluumy, 2016 167

Page 168: Become a citizen data scientist

Part 1 : Recommendation ModelFinal Experiment

O'clock Shadow by Christopher (CC BY-SA©uluumy, 2016 168

Page 169: Become a citizen data scientist

Step 1 :

studio.azureml.net/Sign In

©uluumy, 2016 169

Page 170: Become a citizen data scientist

Step 2 :

DATASETSNew

©uluumy, 2016 170

Page 171: Become a citizen data scientist

Step 2 :

DATASET from local file Upload the files RatingRecommendation.csv, UserRecommendation.csv, ItemRecommendation.csv

©uluumy, 2016 171

Page 172: Become a citizen data scientist

Step 3 :

Create New EXPERIMENT

©uluumy, 2016 172

Page 173: Become a citizen data scientist

Step 3 :

Blank Experiment

©uluumy, 2016 173

Page 174: Become a citizen data scientist

Step 4 :

1- Select RatingRecommendation dataset2- Drag and drop the dataset

©uluumy, 2016 174

Page 175: Become a citizen data scientist

Step 4 :

Insert the module Edit Metatdata(Data Transformation / Manipulation)Connect it to the dataset

©uluumy, 2016 175

Page 176: Become a citizen data scientist

Step 4 :

1-On the Properties lane of module “Edit Metadata” Click Launch Column2- Include the variable “ImpliciteRating”3-Click Check button

1

2

3

©uluumy, 2016 176

Page 177: Become a citizen data scientist

Step 4 :

Choose Integer as Data Type

©uluumy, 2016 177

Page 178: Become a citizen data scientist

Step 5 : Search for Split Data (Data Transformation / Sample and Split)1- Drag and drop it2-Connect it to “Edit Metadata”

©uluumy, 2016 178

Page 179: Become a citizen data scientist

Step 6 :

1-Add the module Train Matchbox Recommender (Machine Learning / Train)2-Connect it to Split Data (the left connector)3-Change Number of Recommendation to 3

©uluumy, 2016 179

Page 180: Become a citizen data scientist

Step 7 :

Add the dataset UserRecommendaion.csv

©uluumy, 2016 180

Page 181: Become a citizen data scientist

Step 8 :Add the module “Remove Duplicate Rows “ Connect it to UserRecommendation datasetand Launch column selector

©uluumy, 2016 181

Page 182: Become a citizen data scientist

Step 8 :

Include he variable CustomerKey

©uluumy, 2016 182

Page 183: Become a citizen data scientist

Step 8 :

Then connect it to Train Matchbox (the middle connector)

©uluumy, 2016 183

Page 184: Become a citizen data scientist

Step 9 :

Add the dataset ItemRecommendation.csvThen connect it to Train Matchbox (the right connector)

©uluumy, 2016 184

Page 185: Become a citizen data scientist

Step 10 :

1 Select Score Matchbox Recommendation 2- Drag and drop it2-Connect it from left to right to “Train Matchbox”, “Split Data”, “Remove Duplicate” and “ItemRecommendation”

©uluumy, 2016 185

Page 186: Become a citizen data scientist

Step 11 :

Run the experimentThe model recommends 3 items for each user

©uluumy, 2016 186

Page 187: Become a citizen data scientist

Part 2 : Web Service

7 O'clock Shadow by Christopher (CC BY-SA©uluumy, 2016 187

Step 1 :

Click on Set Up Web Service and choose Predictive Web Service

Page 188: Become a citizen data scientist

Step 1

Here is the result… 3 modules have been added by Azure ML1-Web Service Input2-Web Service Output3- Lab 5 Recommendation System

©uluumy, 2016 188

Page 189: Become a citizen data scientist

Step 2

We want to keep only the variable CustomerKey from the module “Edit Metadata”Add “Select Columns in Dataset”Connect it with “Edit Metadat” and “Score Matchbox”Launch column selector and include CustomerKey

©uluumy, 2016 189

Page 190: Become a citizen data scientist

Step 2

1- Delete the connection between “Web Service Input” and the module “Edit Metadata”2-Connect “Web Service Input” to “Score Matchbox”

©uluumy, 2016 190

Page 191: Become a citizen data scientist

Step 4

Run …Then Click Deploy Web Service

©uluumy, 2016 191

Page 192: Become a citizen data scientist

Step 4

Here is the result….Click on Test

©uluumy, 2016 192

Page 193: Become a citizen data scientist

Include the segment factor in your

recommender system.

Optional Challenge

©uluumy, 2016 193

Page 194: Become a citizen data scientist

Resources

▪ SQL– Querying with Transact-SQLmva.microsoft.com/en-US/training-courses/querying-with-transactsql-10530?l=TjT07f87_9804984382

▪ MICROSOFT AZURE ML– Microsoft Azure Essentials: Fundamentals of Azure:

https://mva.microsoft.com/ebooks

– Building Recommendation Systems in Azure:

mva.microsoft.com/en-US/training-courses/building-recommendation-systems-in-azure-13765?l=j6AfbmlXB_4505513172

(was a very helpful resource to find the good data example to use throughout

the course and also to create the last lab)

©uluumy, 2016 194

Page 195: Become a citizen data scientist

▪ POWER BI– Documentation

https://powerbi.microsoft.com/en-us/documentation/powerbi-service-get-started

▪ Book– Data Mining Techniques: For Marketing, Sales, and Customer Relationship

Management,Gordon S. Linoff (Author), Michael J. A. Berry (Author)

▪ Blog– http://www.kdnuggets.com/

©uluumy, 2016 195

Page 196: Become a citizen data scientist

Image Credits

Bike Hängärtner CC BY

London Bike Show 2013 by Jon Arm CC BY

bikes in malacca by CC BY-ND

IA030694a simonsimages CC BY

Ines Njers CC BY-ND

The wheels Saku Takakusaki CC BY-ND

Cruisers... by micadew CC BY-SA

Rear end by Craig Sunter CC BY-ND

©uluumy, 2016 196

Page 197: Become a citizen data scientist

Image Credits

elite classes-2528 by jim simonson CC BY

corner by eflon CC BY

The oneJesus del Toro Garcia CC BY

'Phantom' bicycle JamesGardinerCollection CC0

Drahtesel mike goehler CC BY-ND

rotating shadow by rippchenmitkraut66 CC BY-ND

©uluumy, 2016 197

Page 198: Become a citizen data scientist

Take a look to our course:50% Off

Become a Citizen Data Scientist

©uluumy, 2016 198