data mining 2008
DESCRIPTION
This is the deck I used at San Diego SQL UG in May 2008TRANSCRIPT
![Page 1: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/1.jpg)
SSAS 2008 Data Mining
Lynn Langit/MSDN Developer Evangelist Southern California
http://blogs.msdn.com/SoCalDevGal
![Page 2: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/2.jpg)
Session Prerequisites Working SQL Server 2005 Developer Understanding of OLAP concepts Working SQL Server Analysis Server
2005 Developer Interest in or basic knowledge of Data
Mining concepts
![Page 3: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/3.jpg)
Session Objectives and Agenda Understand what, why, when & how of
SQL Server 2008 Data Mining Examine the core functionality of the Data
Mining Extensions Hear about the new and/or advanced
functionality of Data Mining
![Page 4: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/4.jpg)
Predictive AnalyticsPredictive Analytics
PresentatioPresentationn
ExplorationExploration DiscoveryDiscovery
PassivePassive
InteractiInteractiveve
ProactivProactivee
Role of SoftwareRole of Software
Business Business InsightInsight
Canned reportingCanned reporting
Ad-hoc reportingAd-hoc reporting
OLAPOLAP
Data miningData mining
What and Why Data Mining?
![Page 5: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/5.jpg)
Cubes vs. Data Mining
![Page 6: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/6.jpg)
DM - From Scenarios to Tasks
![Page 7: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/7.jpg)
From Tasks to Techniques
![Page 8: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/8.jpg)
BI for Everyone
Enterprise – Performance PointEnterprise – Performance Point
Individual – Excel Individual – Excel
Project – Share PointProject – Share Point
![Page 9: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/9.jpg)
Microsoft’s Predictive Analytics
Data Mining SQL extensionsData Mining SQL extensions(DMX)(DMX)
Application Application DeveloperDeveloper
Data Mining Data Mining SpecialistSpecialist
Microsoft Dynamics CRMMicrosoft Dynamics CRMAnalytics FoundationAnalytics Foundation
SQL Server 2005 SQL Server 2005 Business Intelligence Development StudioBusiness Intelligence Development Studio
Microsoft SQL Server 2008 Analysis ServicesMicrosoft SQL Server 2008 Analysis Services
Information Information WorkerWorker
Data Mining Add-ins for Data Mining Add-ins for the 2007 Microsoft Office systemthe 2007 Microsoft Office system
Microsoft SQL Server 2008 Data MiningMicrosoft SQL Server 2008 Data Mining
BI AnalystBI Analyst
Custom Custom AlgorithmsAlgorithms
![Page 10: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/10.jpg)
Data Mining Add-ins for Office 2007
Table Analysis Tools for Excel 2007Table Analysis Tools for Excel 2007
Data Mining Template for Visio 2007Data Mining Template for Visio 2007
Data Mining Client for Excel 2007Data Mining Client for Excel 2007
Information Information WorkerWorker
BI AnalystBI Analyst
Data Mining Data Mining SpecialistSpecialist
![Page 11: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/11.jpg)
SSASSSAS(Data(Data
Mining)Mining)ExcelExcel
SSAS SSAS (DSV)(DSV)QueryQueryExcelExcel
SSISSSISSSASSSASSSRSSSRSExcelExcelYour AppsYour Apps
SSISSSISSSASSSASExcelExcel
Business Business UnderstandiUnderstandi
ngng
Data Data UnderstandiUnderstandi
ngng
Data Data PreparationPreparation
ModelingModeling
EvaluationEvaluation
DeploymentDeployment
DataData
Microsoft Data Mining Lifecycle CRISP-DM
www.crisp-dm.org
![Page 12: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/12.jpg)
Understand & Prepare specifics
![Page 13: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/13.jpg)
Demo 1 – Explore / Clean / Partition Data
![Page 14: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/14.jpg)
Modeling Specifics
![Page 15: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/15.jpg)
Demo 3 – Modeling
![Page 16: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/16.jpg)
Evaluation Specifics
![Page 17: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/17.jpg)
Demo 4 – Evaluation
![Page 18: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/18.jpg)
Data Mining – Logical Model
Mining ModelMining Model
Mining ModelMining Model
Training DataTraining Data
DB dataDB dataClient dataClient dataApplication dataApplication data
Data MiningData MiningEngineEngine
To To PredictPredict
Predicted DataPredicted Data
Mining ModelMining ModelDB dataDB dataClient dataClient dataApplication dataApplication data““Just one rowJust one row””
Data MiningData MiningEngineEngine
algorithmalgorithm
![Page 19: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/19.jpg)
Analysis ServicesAnalysis ServicesServerServer
Mining ModelMining Model
Data Mining AlgorithmData Mining Algorithm DataDataSourceSource
Data Mining - Physical Model
Your ApplicationYour Application
OLE DB/ ADOMD/ XMLAOLE DB/ ADOMD/ XMLA
DeploDeployy
BI Dev BI Dev StudioStudio (Visual (Visual Studio)Studio)
App DataApp Data
![Page 20: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/20.jpg)
Data Mining Interfaces – APIs
Analysis Server (msmdsrv.exe)
OLAP Data Mining
Server ADOMD.NET
.Net Stored Procedures Microsoft Algorithms Third Party Algorithms
XMLAXMLAOver TCP/IPOver TCP/IP
OLEDB for OLAP/DM ADO/DSO
XMLAXMLAOver HTTPOver HTTP
Any Platform, Any Device
C++ App VB App .Net App
AMO
Any App
ADOMD.NET
WANWAN
DM Interfaces
![Page 21: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/21.jpg)
Configuration Model Creation/Management
Database Administrators Session Mining Models
Model Application Permissions on models Permissions on data sources
![Page 22: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/22.jpg)
Deployment Browse
Copy to Excel Drillthrough
Query Default Advanced
Excel Services Manage models and structures
Export/Import Rename
Connection Database Trace
![Page 23: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/23.jpg)
Excel Functions*
DMPREDICTTABLEROW ( Connection, ModelName, PredictionResult, TableRowRange[, string CommaSeparatedColumnNames])
DMPREDICT ( Connection, Model, PredictionResult,
Value1, Name1, [...,Value32, Name32])
DMCONTENTQUERY (Connection, Model, PredictionResult[, WhereClause])
![Page 24: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/24.jpg)
Data Mining Extensions (DMX)
CREATE MINING MODELCREATE MINING MODEL CreditRiskCreditRisk
(CustID(CustID LONG KEY, LONG KEY,
Gender TEXT DISCRETE,Gender TEXT DISCRETE,
Income Income LONG LONG CONTINUOUS,CONTINUOUS,
Profession TEXT DISCRETE,Profession TEXT DISCRETE,
RiskRisk TEXT DISCRETE PREDICT) TEXT DISCRETE PREDICT)
USINGUSING Microsoft_Decision_Trees Microsoft_Decision_Trees
INSERT INTOINSERT INTO CreditRisk CreditRisk
(CustId, Gender, Income, (CustId, Gender, Income, Profession, Risk)Profession, Risk)
Select Select
CustomerID, Gender, Income, CustomerID, Gender, Income, Profession,RiskProfession,Risk
From CustomersFrom Customers
SelectSelect NewCustomers.CustomerID, NewCustomers.CustomerID, CreditRisk.Risk, CreditRisk.Risk, PredictProbability(CreditRisk.Risk)PredictProbability(CreditRisk.Risk)
FROMFROM CreditRisk CreditRisk PREDICTION JOINPREDICTION JOIN NewCustomersNewCustomers
ONON CreditRisk.Gender=NewCustomer.GenderCreditRisk.Gender=NewCustomer.Gender
ANDAND CreditRisk.Income=NewCustomer.Income CreditRisk.Income=NewCustomer.Income
AND AND CreditRisk.Profession=NewCustomer.ProfessionCreditRisk.Profession=NewCustomer.Profession
![Page 25: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/25.jpg)
DMX Column Expressions Predictable Columns Source Data Columns Functions - Predict
“Workhorse”Discrete scalar valuesContinuous scalar valuesAssociative nested tablesSequence nested tablesTime SeriesOverloaded to
PredictAssociationPredictSequencePredictTimeSeries
PredictProbability PredictSupport PredictHistogram Cluster ClusterProbability GetNodeId IsInNode
Arithmetic operators Stored Procedure Subselect
Select from nested tables
![Page 26: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/26.jpg)
Data Mining Interfaces – XMLA ++
Analysis Server (msmdsrv.exe)
OLAP Data Mining
Server ADOMD.NET
.Net Stored Procedures Microsoft Algorithms Third Party Algorithms
XMLAXMLAOver TCP/IPOver TCP/IP
OLEDB for OLAP/DM ADO/DSO
XMLAXMLAOver HTTPOver HTTP
Any Platform, Any Device
C++ App VB App .Net App
AMO
Any App
ADOMD.NET
WANWAN
DM Interfaces
![Page 27: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/27.jpg)
New to SQL Server 2008 Microsoft Time Series algorithm improved
ARIMA plus ARTxp method, and a blending algorithm = better results
New prediction mode allows adding new data to time series models
Holdout Support added Easily partition data into training and test sets that are stored in mining structure & available to
query after processing
Ability to build mining models based on filtered subsets added Results in less structures, i.e. can just filter existing
Drillthrough functionality extended makes all mining structure columns available, not just columns included in the model
allows you to build more compact models
Cross-validation added allows users to quickly validate their modeling approach by automatically building temporary
models and evaluating accuracy measures across K folds. The feature is available through a new cross-validation tab under Accuracy Charts in BIDS, in addition to being accessible programmatically via a stored procedure call.
![Page 28: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/28.jpg)
Summary
Data Mining in SQL Server 2008 is mature, powerful and accessible
Can use Excel 2007 Familiar client for BI – OLAP cubes AND Data Mining
models Model Creators / Users Excel Data or Server Data
SSAS and Excel both support the full DM Cycle Data Understanding Data Preparation Modeling Validation Deployment
![Page 29: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/29.jpg)
DM Webcasts
Fri, 02 Nov 2007MSDN Webcast: Build Smart Web Applications with SQL Server Data Mining (Level 200)Thu, 08 Nov 2007MSDN Webcast: Building Adaptive Applications with SQL Server Data Mining (Level 300)Mon, 19 Nov 2007MSDN Webcast: Extending and Customizing SQL Server Data Mining (Level 300)Fri, 30 Nov 2007MSDN Webcast: Creating Visualizations for SQL Server Data Mining (Level 300)Thu, 01 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 1 of 3): Your First Project with SQL Server Data Mining (Level 200)Thu, 15 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 2 of 3): Understand SQL Server Data Mining Add-ins for the 2007 Office System (Level 200)Thu, 29 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 3 of 3): Use Predictive Intelligence to Create Smarter KPIs (Level 200)
![Page 30: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/30.jpg)
BI Resources from Lynn Langit
1. “Foundations of SQL Server 2005 Business Intelligence”(published by APress in April 2007)
2.2. http://blogs.msdn.com/SoCalDevGal
3.3. “Building Business Intelligence “Building Business Intelligence Solutions with SQL Server 2008” Solutions with SQL Server 2008” (MSPress Fall 2008)(MSPress Fall 2008)
![Page 31: Data Mining 2008](https://reader033.vdocuments.us/reader033/viewer/2022061110/5452aff3af7959ed5f8b5c98/html5/thumbnails/31.jpg)
DM Resources
Technical Communities, Webcasts, Blogs, Chats & User Groupshttp://www.microsoft.com/communities/default.mspx
Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet
Trial Software and Virtual Labshttp://www.microsoft.com/technet/downloads/trials/default.mspx
Microsoft Learning and Certificationhttp://www.microsoft.com/learning/default.mspx
SQL Server Data Mininghttp://www.sqlserverdatamining.comhttp://www.microsoft.com/bi/bicapabilities/data-mining.aspxhttp://www.microsoft.com/bi/bicapabilities/data-mining.aspx