the information herein is for informational purposes only and represents the opinions and views of...
TRANSCRIPT
![Page 1: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/1.jpg)
Advanced Analytics with Excel and Data Mining using SQL Server 2008 R2Rafal Lukawiecki
Strategic Consultant, Project Botticelli [email protected], @rafaldotnet
![Page 2: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/2.jpg)
2
Acknowledgments & Notes
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.
Portions © 2011 Project Botticelli Ltd & entire material © 2011 Microsoft Corp unless noted otherwise. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.
This seminar is based on a number of sources including a few dozen of Microsoft-owned presentations, used with permission. Thank you: Michael Tejedor, Chris Volckerick, Tara Seppa, Morris Novello, Bryan Bredehoeft, Chris Dial, Donald Farmer, Sabrena McBride, Peter Sprague, and Diwakar Rajagopal.
![Page 3: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/3.jpg)
3
• Introduce Data Mining• Demonstrate several DM techniques using
Excel
Objectives
![Page 4: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/4.jpg)
4
What does Data Mining Do?
Explores Your Data
Finds Patterns
Performs Prediction
s
![Page 5: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/5.jpg)
5
• Technologies for analysis of data and discovery of (very) hidden patterns
• Fairly young (<20 years old) but clever algorithms developed through database research
• Uses a combination of statistics, probability, artificial intelligence, machine learning, and database technologies
Data Mining
![Page 6: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/6.jpg)
6
Typical Uses
Predictive
Analysis
Seek Profitable Customers
Understand
Customer Needs
Anticipate Customer
Churn
Predict Sales &
Inventory
Build Effective Marketing Campaign
s
Detect and
Prevent Fraud
Correct Data
During ETL
![Page 7: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/7.jpg)
7
DM – Part Of Microsoft SQL Server
![Page 8: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/8.jpg)
9
Mining Model Mining ModelMining Model
Mining Process
DM EngineDM Engine
Training data
Data to be predictedMining Model
With predictions
![Page 9: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/9.jpg)
10
• Free add-in for Excel 2007 (and Visio)• Works with 32 bit edition of Office 2010• 64 bit version in the future
• Requires SQL Server Analysis Services
• Analyze Tab – simpler to use• Data Mining Tab – full power
Data Mining Add-Ins for Excel
![Page 10: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/10.jpg)
11
Clustering
• Applied to • Segmentation:
Customer grouping, mailing campaign
• Also: classification and regression
• Anomaly detection• Discrete and
continuous data
![Page 11: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/11.jpg)
1212
Demo1. Finding Outliers with
Data Mining
![Page 12: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/12.jpg)
13
ClusteringOutlier/Anomaly Detection
Male Female
Son
Daughter
Parent
Age
![Page 13: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/13.jpg)
14
Time Series
• Uses:• Forecast sales• Inventory prediction• Web hits prediction• Stock value
estimation• Regression trees
with extras
![Page 14: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/14.jpg)
1515
Demo1. Forecasting Sales with
Time Series
![Page 15: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/15.jpg)
16
Association Rules
• Use for:• Market basket
analysis• Cross selling and
recommendations• Advanced data
exploration• Finds frequent
itemsets and rules• Sensitive to
parameters
![Page 16: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/16.jpg)
1717
Demo1. Market Basket
Analysis
![Page 17: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/17.jpg)
18
Sequence Clustering
• Analysis of:• Customer behaviour• Transaction patterns• Click stream• Customer
segmentation• Sequence prediction
![Page 18: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/18.jpg)
1919
Demo1. Predicting Customer
Behaviour
![Page 19: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/19.jpg)
22
AlgorithmsAlgorithm Description
Decision Trees Finds the odds of an outcome based on values in a training set
Association Rules
Identifies relationships between cases
Clustering Classifies cases into distinctive groups based on any attribute sets
Naïve Bayes Clearly shows the differences in a particular variable for various data elements
Sequence Clustering
Groups or clusters data based on a sequence of previous events
Time Series Analyzes and forecasts time-based data combining the powerof ARTXP (developed by Microsoft Research) for short-term predictionswith ARIMA (in SQL 2008) for long-term accuracy.
Neural Nets Seeks to uncover non-intuitive relationships in data
Linear Regression
Determines the relationship between columns in order to predict an outcome
Logistic Regression
Determines the relationship between columns in order to evaluate the probability that a column will contain a specific state
![Page 20: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/20.jpg)
23
Time Series
Sequence Clustering
Neural Nets
Naïve Bayes
Logistic Regression
Linear Regression
Decision Trees
Clustering
Association Rules
Classifica
tion
Estimatio
n
Segmentation
Associa
tion
Forecasti
ng
Text Analysis
Advanced Data
Exploration
![Page 21: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/21.jpg)
24
• Data Mining is one of several advanced analytical tools of SQL Server 2008 R2
• Excel makes it easier to use• See what is hiding in your data
Summary
![Page 22: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649cdf5503460f949a8c39/html5/thumbnails/22.jpg)
25
© 2011 Microsoft Corporation & Project Botticelli Ltd. All rights reserved.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.
Portions © 2011 Project Botticelli Ltd & entire material © 2011 Microsoft Corp unless noted otherwise. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.