become a citizen data scientist
TRANSCRIPT
Become a Citizen Data Scientist
Marketing Perspective
©uluumy, 2016
Understand your customer:
Profiling, Segmentation, Targeting and Recommendation using Microsoft Azure ML, SQL, Power BI
©uluumy, 2016 2
Take a look to our course:50% Off
Become a Citizen Data Scientist
©uluumy, 2016 3
Syllabus
▪ Introduction
▪ Lay the foundation
▪ Explore
▪ Segment
▪ Target
▪ Recommend
©uluumy, 2016 4
Introduction
©uluumy, 2016 5
Citizen Data Scientist
©uluumy, 2016 6
According to a Mckinsey Study, demand for Data
Scientists is projected to exceed supply by more than
50% by 2018.
Source: MCKINSEY, "Big data: The next frontier for innovation, competition, and productivity", 2011
©uluumy, 2016 7
The term Citizen Data Scientist was introduced by Gartner in its 2015 Hype Cycle for Emerging Technologies which we’re going present later in this lecture.Here is the definition given by Gartner :
“A person who creates or generates models that leverage predictive or prescriptive analytics but whose primary job function is outside of the field of statistics”
Source: GARTNER, "Hype Cycle for Emerging Technologies", 2015
©uluumy, 2016 8
Gartner Hype Cycle
Gartner Hype Cycle provides a graphic representation of the maturity and adoption of technologies and applications, and how they are potentially relevant to solving real business problems and exploiting new
opportunities.
©uluumy, 2016 9
Gartner’s Emerging Technologies Hype Cycle contains a representative set of still-maturing technologies that receive interest from clients, and technologies that Gartner feels are significant and should be monitored.“
Gartner is predicting that Citizen Data Scientist and Advanced analytics with Self-service delivery to reach the Plateau of productivity in 2 to 5 years.
©uluumy, 2016 10
©uluumy, 2016 11
Source: GARTNER, "Hype Cycle for Emerging Technologies", 2015
Why do we need a citizen data scientist?
▪ First, the shortage of data scientists (because of the highly specialized skills needed in: computer science, coding, mathematics, machine learning, statistics) .... and still they need to have in-depth knowledge of the business
▪ Second, the rise of self-service data preparation
▪ Third, the development of advanced analytics platforms (Microsoft Machine Learning, IBM Watson, ...)
©uluumy, 2016 12
The need is so, that according to Gartner, by 2017, the
number of citizen data scientists will grow 5 times faster
than the number of highly skilled data scientists.
©uluumy, 2016 13
Who could be a Citizen Data Scientist?
In most companies, they’re already there. Here who they are. They Have:
▪ Solid business domain knowledge (marketing, finance, sales, operations, ...),
▪ Analytical mindset,
▪ The willingness to learn new methods and use new tools
©uluumy, 2016 14
Benefits
The rise of Citizen Data Scientist is a great opportunity for every organization. Because business people will bring with them:
▪ Contextual knowledge
▪ the Democratization of analytics in every department
©uluumy, 2016 15
Lay the Foundations: Definitions
Data Science is not equal to Big Data…
©uluumy, 2016 16
“You don’t have to have a petabyte of data and the
expenses that come along with it in order to predict the
interest of your customer base”
Source: John W. Foreman : Data Smart: Using Data Science to Transform Information into Insight
©uluumy, 2016 17
Data Science:
“Data Science is the transformation of data using
mathematics and statistics into valuable insight,
decisions, and products”
Source: Introduction to Machine Learning, 2nd Edition, MIT Press
©uluumy, 2016 18
Machine learning:
“The goal of machine learning is to program computers
to use example data or past experience to solve a given
problem.”
Source: Introduction to Machine Learning, 2nd Edition, MIT Press
©uluumy, 2016 19
Machine Learning : 2 (main) categories
1- Supervised Learning: Prediction. You want to predict unknown answers from answers you already have.
2- Unsupervised Learning: Categorization. You want to find unknown answers mostly grouping- directly from data.
©uluumy, 2016 20
Supervised Learning..
Supervised learning can be separated into to general categories of algorithms:
• Classification algorithms: are used to predict categorical responses. As example we can cite:
▪ Credit card fraud detection
▪ Customer likely to churn
▪ Customer targeting
• Regression: used to predict continuous variable.
• Example: Predict the future sales of a product©uluumy, 2016 21
Unsupervised Learning..
Unsupervised Learning: Categorization. You want to find unknown answers mostly grouping- directly from data.
• Customer segmentation
• Recommendation system
©uluumy, 2016 22
Data Science Process
According to a KDNuggets poll, 43% of the advanced analytics projects use the CRISP-DM methodology
©uluumy, 2016 23
CRIS-DM Methodology
Source Wikipedia©uluumy, 2016 24Source : Wikipedia
The process is composed of 5 steps
The key point to note is that the Process is circular rather
than linear. It means that we can and should go back and
forth between the steps.
Source: CRISP-DM : Cross Industry Standard Process for Data Mininghttp://spss.ch/upload/1107356429_CrispDM1.0.pdf
©uluumy, 2016 25
1- Business Understanding
The first step is BUSINESS UDERSTANDING. It’s the most critical step of the process. You need to frame the problem. At the end of this stage you should have a deep understanding of the problem you want to resolve and a clear idea about the data you will use
©uluumy, 2016 26
2- Data Understanding
The second step is DATA UNDERSTANDING. Your business knowledge will help you to contextualize your data. You notice that the steps Business understanding and Data Understanding are linked together with a double arrow.
©uluumy, 2016 27
3- Data Preparation
The third step is DATA PREPARATION. In this stage you will check for the common issues like missing values and outliers. Also doing operating like filtering merging and transformationAlso you run some data exploration using graphics and tables.
©uluumy, 2016 28
4- Modeling
The fourth step is the MODELING step. In this stage you build your model (for example a regression or a classification). You notice that this step is linked to the previous one with a double arrow which mean that you will often need to step back to Data Preparation.
©uluumy, 2016 29
5- Evaluation
The next step is EVALUATION. Every model you build has to be evaluated in term of accuracy, robustness and deployability. You notice that at this stage you may have to step back to the Business Understanding stage if the model you have built could not be deployed
©uluumy, 2016 30
6- Deployment
The last step is DEPLOYMENT. The final purpose of any data science project is to give actionable insight.
©uluumy, 2016 31
Data Science Toolbox
©uluumy, 2016 32
©uluumy, 2016 33
•Data preparation : SQL. a must know tool• 60% of data scientists said they spent the most time
cleaning and organizing data *• SQL : first among the top 10 in-demand skills for data
scientists*
•Data visualization : Power BI and Excel.•Data analysis : Azure Machine Learning
• 55% of data scientists think that Machine Learning had significant importance for their companies *
* source: CrowdFlower, 2016 Data Science Report
©uluumy, 2016 34
CRIS-DM & Data Science Toolbox
Source Wikipedia©uluumy, 2016 35
Data Preparation
The first tool is SQL.According to the CrowdFlower 2016 Data Science Report, 60% of data scientists said they spent the most time cleaning and organizing data.It’s THE language of database.You will have to use SQL in order to process the data preparation step.SQL is a must KNOW tool.
©uluumy, 2016 36
Data Visualization & Data Analysis
• Data VisualizationFor Data Visualization, we will use Power BI
• Data AnalysisMicrosoft Azure ML is one of the most relevant tool to use for citizen data scientist because of its ability to quickly create machine learning experiments and because its slight learning curve.
©uluumy, 2016 37
Microsoft Azure Machine Learning
« While machine learning has been around for a long time, usage was primarily restricted to people with deep skills and deep pockets. The cloud changes this dynamic completely »
Joseph Sirosh Corporate Vice President, Data Group at Microsoft
©uluumy, 2016 38
Azure Machine Learning Workflow
At high level we can divide it into 3 blocks: Data, Machine Learning Services, and Visualisation
You can clearly see how this workflow ca be embedded into the CRISP Data science we have presented in a previous lecture
©uluumy, 2016 39
Source: Microsoft
Marketing Framework Analysis
About 90% of the data collected by companies
today are related to customer actions and
marketing activities
©uluumy, 2016 40
“Marketing thinking is shifting from trying to maximize the company’s profit from each transaction to maximize the long-run profit from each relationship”, To rephrase it, Companies has shifted from being product-centric to CUSTOMER centric”.
Philip Kotler
©uluumy, 2016 41
That’s why it’s of vital importance to know as much as possible about our customer and to customize as much as possible our offer to each.
Despite the huge amount of data, we now have on each of our customer (from CRM, Web Site, Social Media) and the complexity to have a 360 view of the customer; still, we can frame the relationship with our customer with these 4 questions:
©uluumy, 2016 42
1- Who are my customers?
2- How to reach and interact with them?
3- Which customer should I target?
4- What is the best next-offer?
These 4 questions lead as to this 4 blocks Marketing Framework Analysis:
©uluumy, 2016 43
©uluumy, 2016 44
Philip Kotler, the founder of Modern Marketing said “Marketing’s future lies in database marketing where we know enough about each customer to make relevant and customized offers to each”
Explore
©uluumy, 2016 45
AdventureWorks is a sample database created for use in demos and training on each version of Microsoft SQL Server. A company which manufactures and sells metal and composite bicycles to North American, European and Asian.two categories of customers:B2B : team of salesB2C : E-commerce
Case Study : Adventure Works
©uluumy, 2016 46
According to the CrowdFlower 2016 Data Science Report, SQL is first among the top 10 in-demand skills for data scientists.
SQL Basics
©uluumy, 2016 47
SQL stands for Structured Query LanguageSQL is the language of databases : creation, access, manipulationRelational Database : a software to offer access to stored information and their manipulation.Information are stored in tables
SQL Basics
©uluumy, 2016 48
Tables : A set of data arranged in columns and rows. The columns represent characteristics of stored data and the rows represent actual data entries. Tables for Database is what's spreadsheet for Excel.SQL Server Express (a free "lite" version of SQL Server)
SQL Basics
©uluumy, 2016 49
Table Relationship:
On fundamental concept of database is the tables relationship.Let’s take an example from our database AdventureWorksWe took 2 tables: Product and Sales. Each table must have one Primary Key.
SQL Basics
©uluumy, 2016 50
A primary key is a field in a table which uniquely identifies each row in the table. That means that Primary keys must contain unique values. In our example: ProductKey is the primary Key of the table DimProduct.
SQL Basics
©uluumy, 2016 51
A FOREIGN KEY in one table is a column which points to a PRIMARY KEY in another table.Let’s look to our example. ProducKey is a foreign key column of the table FactInternetSales which refer to the primary key of the table DimProduct. It allows to identify the relationship between the two tables.
SQL Basics
©uluumy, 2016 52
©uluumy, 2016 53
SQL main operations
SQL contains 4 main operations. We can• Insert of new data into a table • SELECT data from a table• Update data already existing in a table• Delete data from table
©uluumy, 2016 54
SQL main operations
The insert, update and delete operation are usually restricted to the Database administrator. As a Citizen Data Scientist you will essentially need to select data from the database.
Let’s look how to select data from a table...
©uluumy, 2016 55
SQL main operations
Selection:
Here is the general syntax of a data selection
SELECT <Column List> FROM <Table Name>WHERE <Search Condition>
©uluumy, 2016 56
SQL main operations
Aggregation
How to group data and use aggregates...
SELECT <Column List>, <Aggregate Function> (<Column Name>) FROM <Table Name>WHERE <Search Condition>GROUP BY <Column List>
©uluumy, 2016 57
SQL main operations
Selection from 2 tables
How to select data from more than one table...
SELECT <Column List> FROM <Table1>JOIN <Table2> ON <Table1>. <Column1> = <Table2>. <Column1>
©uluumy, 2016 58
Customer Dashboard using Power BI
©uluumy, 2016 59
Segment
“Customer segmentation is the process of diving a customer into groups of individuals who are similar in specific ways relevant to marketing”
Source “A Marketer’s Guide to Analytics”, SAS
©uluumy, 2016 60
Segmentation: Types
“The literature about types of segmentation is very diverse.The best I could find is the one given in the SS paper “A Marketer’s Guide to Analytics” It distinguishes between two main types of segmentation:• Foundation segmentation• Targeting segmentation
©uluumy, 2016 61
Segmentation: Types
Foundation segmentation: Core segmentsIt has these proprieties:• All customers are included• Each customer falls into only one segment• Each segment can be subdivided into clusters• Attributes: value, profit, attrition, risk, demographics,
firmographics, etc.
©uluumy, 2016 62
Segmentation: Types
The second type is Targeting Segmentation It identifies customers with specific needs and preferences. Useful for specific marketing programs and campaigns identifies customers with specific needs and preferences.It has these features:• Not all customers can be included• Each customer may fall into many different segments
©uluumy, 2016 63
Good segmentation
A good segmentation must have these three features: • Relevant to the business objective• Simple: understandable and easy to characterize• Actionable
©uluumy, 2016 64
Managerial Segmentation
RFM method
We will use a very simple yet insightful method to build a customer segmentation which is relevant, simple and most importantly actionableRFM method has been around for decade. Yet it’s is still very useful
©uluumy, 2016 65
RFM in a nutshell
RFM is an acronym for Recency, Frequency and Monetary
• Recency: number of days since last purchase/Use/visit• Frequency: number of purchase/use/visit• M: Amount of purchase / time spent
©uluumy, 2016 66
RFM in a nutshell
Based on each of these 3 factors, all the customers are ranked and given a score from 1 to 4 (depending on which quartile they are). 1 being the best score.Now for each customer we have a composite score R-F-MAs each factor could have 4 different values (1,2,3, or 4)We can in theory divide our customer into until 64 segments!!
©uluumy, 2016 67
RFM in a nutshell
It’s a good first step …but we cannot stop here because we want to have simple but ACTIONABLE segmentationThat’s why we have used the term Managerial SegmentationAs a managerial decision we can decide that we need to have let’s say 9 different segments based on the RFM score we have already computed
©uluumy, 2016 68
RFM: 9 segments
Here the description of each of the 9 segments:
Best: R (1) AND F (1) AND M (1): it’s simple they have the highest score...Novice: R (1) AND F (3-4)Active High Value: R (1) AND M (1,2)Active: R (1)
©uluumy, 2016 69
RFM: 9 segments
Warm High Value: R (2) AND M (1)Warm: R (2)Win-back: R (3,4) AND {F (1) OR M (1)}Cold: R (3)Almost lost: R (4)
©uluumy, 2016 70
RFM: actions
Now that we have our customer segmentation.What Action can we take based on this segmentationHere are some ideas/Examples:
• Best Customer: • “Thank you” gift• “Exclusive preview” of new service/product
©uluumy, 2016 71
RFM: actions
• Novice:• Connection on social media• Personal greeting message• Free shipping
• Warm High-Value: • Next best offer “Get $50 in “ZZZZ” Dollars for every
$50 you spend”• Almost Lost
• “Last chance” special offer©uluumy, 2016 72
Power BI TreeMap visualization of the 9 resulting segments
©uluumy, 2016 73
Target
©uluumy, 2016 74
Classification model : basics
Here is a basic data flow for any classification model
Data training is the input of the classification algorithm. The purpose is to “train” the algorithm with historical data which contain the labels (target) variable.
For example, say we want to create a model to predict which customer is likely to respond positively to specific marketing campaign.
Training data contains a list of customers who were targeted in the past for the same kind of campaign. The labels variable is a Yes/NO variable
Source: http://www.cs.princeton.edu/~schapire/talks/picasso-minicourse.pdf©uluumy, 2016 75
Classification Model : Evaluation
To evaluate and chose which model is the best fitted for our problem we can use several measures. Here are the most widely used: • Accuracy: the proportion of the total number of
predictions that were correct.• Positive Predictive Value or Precision: the proportion
of positive cases that were correctly identified.
©uluumy, 2016 76
Classification Model : Evaluation
• Negative Predictive Value: the proportion of negative cases that were correctly identified.
• Sensitivity or Recall: the proportion of actual positive cases which are correctly identified.
• Specificity: the proportion of actual negative cases which are correctly identified.
• ROC curve: It is created by plotting the recall against the false positive rate
©uluumy, 2016 77
Confusion Matrix
Source: http://www.analyticsvidhya.com/blog/2016/02/7-important-model-evaluation-error-metrics/
©uluumy, 2016 78
Classification Fundamental concept : Bias-Variance Tradeoff
Google's Research Director Peter Norvig claimed that
"We don’t have better algorithms. We just have moredata."
©uluumy, 2016 79
Prediction Error
▪ You can never have a prediction model without error.
▪ Without going further with the maths behind it, predictionerror is mainly divided into 2 elements: Bias and Variance.
▪ Error due to Bias is the difference between the predictedvalue and the correct value.
▪ Error due to Variance is defined as the variability of a modelprediction for a given data point.
▪ As “a picture's worth a thousand words”, let’s look to thisgraphic taken from one of the best article I found on thissubject “Understanding the Bias-Variance Trade-off”.
©uluumy, 2016 80
Bias vs Variance
Bulls-eye represents the graphical visualization of bias and Variance. Each point is the result of one iteration of the model building.
The center of the target is a model that predicts perfectly the actual values.
Source : scott.fortmann-roe.com/docs/BiasVariance.html©uluumy, 2016 81
Bias vs Variance
We have mainly four cases:
▪ Low Bias and Low Variance: That’s wherewe want to be! We have here a goodmodel
▪ High Bias and Low Variance: that’s whatwe call an under-fitted model. It meansthat our model lacks some information.It’s too simple Maybe we have to addvariables to our training data. Alsoevaluating models using other methodscould be a good option too.
©uluumy, 2016 82
Bias vs Variance
We have mainly four cases:
▪ Low Bias and High Variance: We have anOver-Fitted Model. It means that the modelis too complicated for the data we have. Putsimply, the model cannot be generalized. Thesolution is to add more data into our trainingset and/or to reduce the number of features(the complexity), we use Ensemble methodlike random forest, bagging and boosting
▪ High Bias and High Variance: we still need towork on our model. My suggestion is totackle first the Bias error by using othermethods and adding variables if you can
©uluumy, 2016 83
Bias –Variance TradeOff
Here is another way to sum-up the bias-variance trade-off:
Prediction Error is plotted against Model complexity twice: the green line is the result using the training data. The red line is the result using the test Data
Source: Hastie, Tibshirani, Friedman “Elements of Statistical Learning” 2001©uluumy, 2016 84
Under-Fitted Model
When the model is too simple (lowcomplexity):
▪ The gap between the two plots is narrow.That’s an indication for low variance
▪ The prediction error is high for trainingand test data. It means a High-Bias
▪ Hence we have an under-fitted model
©uluumy, 2016 85
Over-Fitted Model
▪ Higher the complexity is, higher the gapis between the two plots
▪ When The prediction error between thetraining and the test data become toowide.
▪ It means that the model reached theover-fitting mode
©uluumy, 2016 86
Overview diagram of Azure Machine Learning Studio
Microsoft Azure Machine Learning Studio is a drag-and-drop cloud-based service you can use to build, test, and deploy predictive analytics solutions on your data. Machine Learning Studio publishes models as web services that can easily be consumed by custom apps or BI tools such as Excel or Power BI.This Figure (source) summarizes the basic high-level steps that are required to create, test, and deploy a new Azure Machine Learning prediction model
©uluumy, 2016 87
Source : https://azure.microsoft.com/en-us/documentation/articles/machine-learning-studio-overview-diagram/
©uluumy, 2016 88
Recommend
Recommendation systems are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item.
Source : Wikipedia
©uluumy, 2016 89
Two primary methodologies
• Collaborative Filtering : the item recommended to the user is based on the past purchase and preference of similar users
• Content-based filtering : Based on the attributes of items purchased by the user, suggest items with similar properties.
Best examples : Amazon, Netflix
©uluumy, 2016 90
Example
©uluumy, 2016 91
Overview diagram of Azure Machine Learning Studio
Let’s look at how these two methods works using this very simple example of movies ratingHere is the example: we have a list of 6 movies (items) and 7 users. Each has rated the movies that she watched (from 1 to 5 stars). Daniel has not seen the movie "The Notebook"We want to decide if we will recommend this movie to him or not based on a prediction of his rating for the movie.So let’s start with a collaborating filtering approach
©uluumy, 2016 92
Collaborative Filtering
Daniel has not seen the movie "The Notebook"• We select the subgroup of users who watched the same movies as Daniel
and also who watched "The Notebook".• Among this group, we select the users who are "similar" to Daniel in term
of rating (for example using KNN algorithm). • We compute the average rating that Daniel's "neighbors" gave to "The
Notebook". • It gives as the predicted rating of Daniel for the movie "The Notebook"• We repeat the steps 1 to for 4 for all movies that Daniel haven't seen• We recommend Daniel the best predicted rated movies.
©uluumy, 2016 93
Content-based filtering:
We want to predict the Daniel rating for the movie "The Notebook"using the similarity between items (in our example movies), and not users, to make predictions• We select the movies that are similar to "The Notebook". Based on the
genre we can divide movies into to groups "Action" (Skyfall, Star War, X-Men) and "Romance" ("P. S I Love You", "Titanic", "The Notebook")
• Daniel have rated "P. S I Love You" and "Titanic" which are similar to "The Notebook". Based on his rating of these two movies, we give a predicted rating of "The Notebook"
• We repeat the steps 1 and 2 for all movies that Daniel haven't seen• We recommend Daniel the best predicted rated movies.
©uluumy, 2016 94
Labs
©uluumy, 2016 95
Case Study : Adventure Works
▪ AdventureWorks is a sample database created for use in demos and training on each version of Microsoft SQL Server.
▪ A company which manufactures and sells metal and composite bicycles to North American, European and Asian.
▪ two categories of customers:
▪ B2B : team of sales
▪ B2C : E-commerce©uluumy, 2016 96
Setup the Lab Environment : Tools
• Office 2016:Free Trial: products.office.com/en-us/try
• Power BIFree : powerbi.microsoft.com/en-us/desktop/
• Install SQL Server 2014 ExpressFree : microsoft.com/en-us/server-cloud/Products/sql-server-editions/sql-server-express.aspx
• MICROSOFT AZURE MLFree : studio.azureml.net/
©uluumy, 2016 97
SQL Server 2014 Expressstep 1
microsoft.com/en-us/server-cloud/Products/sql-server-editions/sql-server-express.aspx
©uluumy, 2016 98
SQL Server 2014 ExpressStep 2
©uluumy, 2016 99
SQL Server 2014 ExpressStep 3 : Sign in
©uluumy, 2016 100
SQL Server 2014 ExpressStep 4a:
1- Chose SQL Server 2014 Express 64 Bit.
2- Choose your language
3- Scroll down click continue
©uluumy, 2016 101
SQL Server 2014 ExpressStep 4b:
1- Chose SQL Server Management Studio Express 64 Bit.
2- Choose your language
3- Scroll down click continue
©uluumy, 2016 102
SQL Server 2014 ExpressStep 5
When the files are downloaded : Execute
Choose the first option as shown here
©uluumy, 2016 103
SQL Server 2014 Express
Finally… you should have Microsoft SQL Server management Studio installed
©uluumy, 2016 104
Microsoft Azure Machine Learning
• Microsoft Azure ML is could-based service. So you don’t have to install anything. All you need is to have a Microsoft account ID
• Here is the address: studio.azureml.net/
©uluumy, 2016 105
DATA
©uluumy, 2016 106
Adventure Works 2014 Warehouse
Download the database : Adventure Works 2014 Warehouse
(Adventure Works 2014 Warehouse Script.zip)
from this address (the official Microsoft examples):
msftdbprodsamples.codeplex.com/releases
©uluumy, 2016 107
How to install the Database AdventureWorks in SQL Server Management Studio
©uluumy, 2016 108
Step 1 :
Open Microsoft SQL Server Management Studio
Server Name : YourLocalHost\SQLExpressOn my Laptop: ULUUMY\SQLExpress
©uluumy, 2016 109
Step 2 :
If you have this Error message…
©uluumy, 2016 110
Step 2 :
Open SQL Server 2014 Configuration Manager
©uluumy, 2016 111
Step 2 :
Then, Start SQL Server Service
©uluumy, 2016 112
Step 3 :
Open the file instawdbdw.sql
(from the Adventure Works 2014 Warehouse Script you have already downloaded)
©uluumy, 2016 113
Step 4 :
Put Management Studio into SQLCMD mode
Tools > Options > Query Execution and selecting By default, open new queries in SQLCMD mode
©uluumy, 2016 114
Step 5 :
Change it to the path of the Adventure Works database you have already downloaded
©uluumy, 2016 115
Step 6 :
Execute
©uluumy, 2016 116
Final result
©uluumy, 2016 117
Lab 1 : Data preparation using SQL
▪ The first lab is kind of the foundation for the following labs.
▪ We will use SQL to extract information about our customers:
▪ First we will create two tables:– Customers: socioeconomics and geographic data like, gender,
income, education, number of children, postal code and province.
– Sales : All the internet orders made by customers including quantity, amount, date and products features such as model, category and subcategory.
©uluumy, 2016 118
▪ These tasks should already have been done:
▪ Install SQL Server Express 2014.
▪ Download the database : Adventure Works 2014 Warehouse.
▪ Install the database in SQL Server Management Studio.
▪ You can download the SQL code here (Explore.sql)
©uluumy, 2016 119
Lab 2 : Customer Dashboard using Power BI
▪ You should already have installed Power BI.
▪ If not please go back to Lay the Foundation Section, Setup the Lab Environment Lecture.
▪ Also you should have finished lab 1 before, because you will need the two tables that we have created during that lab.
▪ If not, please, go back to lab 1.
▪ We will build a customer dashboard using the two tables we have already created during Lab1 : Customers and Sales.
▪ Here is an example:
©uluumy, 2016 120
©uluumy, 2016 121
Lab 3 : Managerial Segmentation using SQL
▪ In this lab we’re going to segment our customer based on their purchases.
▪ We will use the method we have presented in the previous lecture (managerial Segmentation).
▪ At the end we will divide our customers into 9 segments.
▪ Marketing team will be able to tailor a specific strategy for each segment.
©uluumy, 2016 122
▪ You should have finished lab 1 before, because you will need the two tables we have created during that lab.
▪ If not, please, go back to lab 1.
▪ Here is a Power BI TreeMap visualization of the 9 resulting segments.
©uluumy, 2016 123
©uluumy, 2016 124
You can downloand from GitHub the SQL script
ManagerialSegmentation.sql which implement the solution.
©uluumy, 2016 125
Lab 4 : Bike Buyers targeting using Azure ML
▪ During lab 2 we have noticed that only 50% of our customers have already bought a bike from Adventure Works.
▪ The management team, decided the by the end of the fiscal year, the company should have increased this percentage by 5%.
▪ In order to achieve this objective the marketing team will launch a telemarketing campaign to reach those among our customers who have never bought a bike.
©uluumy, 2016 126
▪ As it's not possible to contact all of them (around 9,000 customers) due to time and budget constraints, the marketing team wants to target only the most likely among them to be interested in our offer.
▪ So the VP Marketing asked, you , as the team citizen data scientist, to build a model in order to achieve this goal.
©uluumy, 2016 127
Our Challenge:
Target customers who have never purchased a bike (from
AdventureWorks) and who are the most likely to be
interested in buying one.
©uluumy, 2016 128
First.. let's get the dataset
You have two options:
▪ Create the table in SQL Server using the script BikeBuyerTargeting.sql (to download here ) and then export it to a csv file.
▪ Or download here the dataset BikeBuyerTargeting.csv.
▪ Now let's go to Azure ML Studio.
©uluumy, 2016 129
Part 1 : Predictive ModelFinal Experiment
©uluumy, 2016 130
Step 1 : studio.azureml.net/Sign In
©uluumy, 2016 131
Step 2 : DATASETSNew
©uluumy, 2016 132
Step 2 :DATASETfrom local file
©uluumy, 2016 133
Step 2 : DATSETUpload the file BikeBuyerTargeting.csv
©uluumy, 2016 134
Step 2 :DATASETthe file BikeBuyerTargeting is uploaded
©uluumy, 2016 135
Step 3 :Create New EXPERIMENT
©uluumy, 2016 136
Step 3 : Blank Experiment
©uluumy, 2016 137
Step 4 : 1- Select BikeBuyerTargeting dataset2- Drag and drop the dataset
©uluumy, 2016 138
Step 4 :Insert the module Select Columns in Dataset(Data Transformation / Manipulation)
©uluumy, 2016 139
Step 4 : Drag and drop it
©uluumy, 2016 140
Step 4 :Connect it to the dataset
©uluumy, 2016 141
Step 4 :
1-On the Properties lane of module “Select Columns in Dataset” Click Launch Column
2- Remove these columns as shown here
3-Click Check button
12
3
©uluumy, 2016 142
Step 5 : Search for Split Data (Data Transformation / Sample and Split)1- Drag and drop it2-Connect it to “Select Columns in DataSet3- In Properties, change Fraction of Rows to 0.7
©uluumy, 2016 143
Step 6 :Add the module Two-Class Boosted Decision Tree (Machine Learning / Initialize Model /Classification)
©uluumy, 2016 144
Step 6 :
1- Select the module Train Model (Machine Learning / Train)
2-Drag and drop it
©uluumy, 2016 145
Step 6 :
3-Connect it to Two-Class Boosted Decision Tree (the connector on the left)
4-Connect it to Split Model (the connector on the right)©uluumy, 2016 146
Step 6 :
Train Model Proprieties1- From the Proprieties Pane, click Launch column selector
2- Select the label Variable BikeBuyer
1 2
©uluumy, 2016 147
Step 7 :
1 Select Score Model (Machine Learning / Score)
2- Drag and drop it
©uluumy, 2016 148
Step 7 :
3-Connect it to Train Model (the connector on the left)
4-Connect it to Split Model (the connector on the right)
©uluumy, 2016 149
Step 8 :Run the experiment
©uluumy, 2016 150
Step 9 :Let’s visualize the result2 variables have been added : Scored Labels and Scored Probabilities
©uluumy, 2016 151
Step 10 :Add Evaluate Model (Machine Learning / Evaluate)
©uluumy, 2016 152
Step 11 :Run… and visualize the result
©uluumy, 2016 153
Step 11 :ROC…
©uluumy, 2016 154
Step 11 :Measures…
©uluumy, 2016 155
Part 2 : Web Service
Step 1 : Click on Set Up Web Service and choose Predictive Web Service
©uluumy, 2016 156
Step 1Here is the result… 3 modules have been added by Azure ML1-Web Service Input2-Web Service Output3- Bike Buyer Targeting (trained model)
©uluumy, 2016 157
Step 2
1- Delete the connection between “Web Service Input” and the module “Select Column..”2-Connect “Web Service Input” to “Score Model”
1
2
©uluumy, 2016 158
Step 3
1- Select the module Select Column..” 2- Launch column selector 3-Add “BikeBuyer” to the excluded variables
1
23
©uluumy, 2016 159
Step 4
Run …Then Click Deploy Web Service
©uluumy, 2016 160
Step 4
Here is the result….Click on Test
©uluumy, 2016 161
Step 5
Let’s test the web service
©uluumy, 2016 162
Step 5
The prediction (“yes” with a probability equal to 0.831
©uluumy, 2016 163
Try to improve the model in term of precision (0.836).
Optional Challenge:
©uluumy, 2016 164
Lab 5 : Next best offer using Azure ML
▪ Following the brilliant success of the previous targeted campaign, the VP marketing has asked you to work on a way to improve our customers’ retention and loyalty.
▪ In fact 50 % of our customers belong to the 3 segments "Win-Back", "Cold" and "Almost Lost". Customers in these segments have not bought a product for at least 1 year... That's something to be addressed.
©uluumy, 2016 165
Your challenge :
Build a recommendation system which suggests to each of our 18,484 customers, 3 items that she/he could be interested in.
©uluumy, 2016 166
First.. let's get the dataset
You have two options:
▪ Create the tables in SQL Server using the script Recommend.sql (to download here) and then export them to csv files.
▪ Or download here the dataset Rating.csv, User.csv and Item.csv
▪ Now let's go to Azure ML Studio
©uluumy, 2016 167
Part 1 : Recommendation ModelFinal Experiment
O'clock Shadow by Christopher (CC BY-SA©uluumy, 2016 168
Step 1 :
studio.azureml.net/Sign In
©uluumy, 2016 169
Step 2 :
DATASETSNew
©uluumy, 2016 170
Step 2 :
DATASET from local file Upload the files RatingRecommendation.csv, UserRecommendation.csv, ItemRecommendation.csv
©uluumy, 2016 171
Step 3 :
Create New EXPERIMENT
©uluumy, 2016 172
Step 3 :
Blank Experiment
©uluumy, 2016 173
Step 4 :
1- Select RatingRecommendation dataset2- Drag and drop the dataset
©uluumy, 2016 174
Step 4 :
Insert the module Edit Metatdata(Data Transformation / Manipulation)Connect it to the dataset
©uluumy, 2016 175
Step 4 :
1-On the Properties lane of module “Edit Metadata” Click Launch Column2- Include the variable “ImpliciteRating”3-Click Check button
1
2
3
©uluumy, 2016 176
Step 4 :
Choose Integer as Data Type
©uluumy, 2016 177
Step 5 : Search for Split Data (Data Transformation / Sample and Split)1- Drag and drop it2-Connect it to “Edit Metadata”
©uluumy, 2016 178
Step 6 :
1-Add the module Train Matchbox Recommender (Machine Learning / Train)2-Connect it to Split Data (the left connector)3-Change Number of Recommendation to 3
©uluumy, 2016 179
Step 7 :
Add the dataset UserRecommendaion.csv
©uluumy, 2016 180
Step 8 :Add the module “Remove Duplicate Rows “ Connect it to UserRecommendation datasetand Launch column selector
©uluumy, 2016 181
Step 8 :
Include he variable CustomerKey
©uluumy, 2016 182
Step 8 :
Then connect it to Train Matchbox (the middle connector)
©uluumy, 2016 183
Step 9 :
Add the dataset ItemRecommendation.csvThen connect it to Train Matchbox (the right connector)
©uluumy, 2016 184
Step 10 :
1 Select Score Matchbox Recommendation 2- Drag and drop it2-Connect it from left to right to “Train Matchbox”, “Split Data”, “Remove Duplicate” and “ItemRecommendation”
©uluumy, 2016 185
Step 11 :
Run the experimentThe model recommends 3 items for each user
©uluumy, 2016 186
Part 2 : Web Service
7 O'clock Shadow by Christopher (CC BY-SA©uluumy, 2016 187
Step 1 :
Click on Set Up Web Service and choose Predictive Web Service
Step 1
Here is the result… 3 modules have been added by Azure ML1-Web Service Input2-Web Service Output3- Lab 5 Recommendation System
©uluumy, 2016 188
Step 2
We want to keep only the variable CustomerKey from the module “Edit Metadata”Add “Select Columns in Dataset”Connect it with “Edit Metadat” and “Score Matchbox”Launch column selector and include CustomerKey
©uluumy, 2016 189
Step 2
1- Delete the connection between “Web Service Input” and the module “Edit Metadata”2-Connect “Web Service Input” to “Score Matchbox”
©uluumy, 2016 190
Step 4
Run …Then Click Deploy Web Service
©uluumy, 2016 191
Step 4
Here is the result….Click on Test
©uluumy, 2016 192
Include the segment factor in your
recommender system.
Optional Challenge
©uluumy, 2016 193
Resources
▪ SQL– Querying with Transact-SQLmva.microsoft.com/en-US/training-courses/querying-with-transactsql-10530?l=TjT07f87_9804984382
▪ MICROSOFT AZURE ML– Microsoft Azure Essentials: Fundamentals of Azure:
https://mva.microsoft.com/ebooks
– Building Recommendation Systems in Azure:
mva.microsoft.com/en-US/training-courses/building-recommendation-systems-in-azure-13765?l=j6AfbmlXB_4505513172
(was a very helpful resource to find the good data example to use throughout
the course and also to create the last lab)
©uluumy, 2016 194
▪ POWER BI– Documentation
https://powerbi.microsoft.com/en-us/documentation/powerbi-service-get-started
▪ Book– Data Mining Techniques: For Marketing, Sales, and Customer Relationship
Management,Gordon S. Linoff (Author), Michael J. A. Berry (Author)
▪ Blog– http://www.kdnuggets.com/
©uluumy, 2016 195
Image Credits
Bike Hängärtner CC BY
London Bike Show 2013 by Jon Arm CC BY
bikes in malacca by CC BY-ND
IA030694a simonsimages CC BY
Ines Njers CC BY-ND
The wheels Saku Takakusaki CC BY-ND
Cruisers... by micadew CC BY-SA
Rear end by Craig Sunter CC BY-ND
©uluumy, 2016 196
Image Credits
elite classes-2528 by jim simonson CC BY
corner by eflon CC BY
The oneJesus del Toro Garcia CC BY
'Phantom' bicycle JamesGardinerCollection CC0
Drahtesel mike goehler CC BY-ND
rotating shadow by rippchenmitkraut66 CC BY-ND
©uluumy, 2016 197
Take a look to our course:50% Off
Become a Citizen Data Scientist
©uluumy, 2016 198
Keep in touch..
Uluumy.com
©uluumy, 2016 199