data mining for developers

29
BIN06-IS Understanding the Data Mining Add-Ins for Excel 2007 Lynn Langit MSDN Developer Evangelist – Southern California http://blogs.msdn.com/SoCalDevGal

Upload: llangit

Post on 07-Nov-2014

5.492 views

Category:

Business


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Data Mining for Developers

BIN06-IS Understanding the Data Mining Add-Ins for Excel 2007Lynn LangitMSDN Developer Evangelist – Southern Californiahttp://blogs.msdn.com/SoCalDevGal

Page 2: Data Mining for Developers

Session Prerequisites

• Working SQL Server 2005 Developer• Understanding of OLAP concepts• Working SQL Server Analysis Server 2005

Developer• Interest in or basic knowledge of Data Mining

concepts

Page 3: Data Mining for Developers

Session Objectives and Agenda• Understand how to set up a development

environment for working with Excel 2007 Data Mining Extensions

• Understand the core functionality of the Data Mining extensions

• Understand the advanced functionality of the Data Mining extensions.

Page 4: Data Mining for Developers

Predictive AnalyticsPredictive AnalyticsPredictive AnalyticsPredictive Analytics

PresentatioPresentationn

ExplorationExploration DiscoveryDiscovery

PassivePassive

InteractiInteractiveve

ProactivProactivee

Role of SoftwareRole of Software

Business Business InsightInsight

Canned reportingCanned reporting

Ad-hoc reportingAd-hoc reporting

OLAPOLAP

Data miningData mining

What and Why Data Mining?

Page 5: Data Mining for Developers

Data Mining Problems

Page 6: Data Mining for Developers

From Scenarios to Tasks

Page 7: Data Mining for Developers

From Tasks to Techniques

Page 8: Data Mining for Developers

Microsoft’s Predictive Analytics

Data Mining SQL extensionsData Mining SQL extensions(DMX)(DMX)

Application Application DeveloperDeveloper

Data Mining Data Mining SpecialistSpecialist

Microsoft Dynamics CRMMicrosoft Dynamics CRMAnalytics FoundationAnalytics Foundation

SQL Server 2005 SQL Server 2005 Business Intelligence Development StudioBusiness Intelligence Development Studio

Microsoft SQL Server 2005 Analysis ServicesMicrosoft SQL Server 2005 Analysis Services

Information Information WorkerWorker

Data Mining Add-ins for Data Mining Add-ins for the 2007 Microsoft Office systemthe 2007 Microsoft Office system

Microsoft SQL Server 2005 Data MiningMicrosoft SQL Server 2005 Data Mining

BI AnalystBI Analyst

Custom Custom AlgorithmsAlgorithms

Page 9: Data Mining for Developers

Data Mining Add-ins for Office 2007

Table Analysis Tools for Excel 2007Table Analysis Tools for Excel 2007

Data Mining Template for Visio 2007Data Mining Template for Visio 2007

Data Mining Client for Excel 2007Data Mining Client for Excel 2007

Page 10: Data Mining for Developers

SSASSSAS(Data(Data

Mining)Mining)ExcelExcel

SSAS SSAS (DSV)(DSV)QueryQueryExcelExcel

SSISSSISSSASSSASSSRSSSRSExcelExcelYour AppsYour Apps

SSISSSISSSASSSASExcelExcel

Business Business UnderstandiUnderstandi

ngng

Data Data UnderstandiUnderstandi

ngng

Data Data PreparationPreparation

ModelingModeling

EvaluationEvaluation

DeploymentDeployment

DataData

Microsoft Data Mining Lifecycle CRISP-DM

www.crisp-dm.org

Page 11: Data Mining for Developers

Understand & Prepare specifics

Page 12: Data Mining for Developers

Demo 1 – Explore / Clean / Partition Data

Page 13: Data Mining for Developers

Modeling Specifics

Page 14: Data Mining for Developers

Demo 3 – Modeling

Page 15: Data Mining for Developers

Evaluation Specifics

Page 16: Data Mining for Developers

Demo 4 – Evaluation

Page 17: Data Mining for Developers

Configuration

• Model Creation/Management• Database Administrators• Session Mining Models

• Model Application• Permissions on models• Permissions on data sources

Page 18: Data Mining for Developers

Deployment• Browse

• Copy to Excel• Drillthrough

• Query• Default• Advanced

• Excel Services• Manage models and structures

• Export/Import• Rename

• Connection• Database• Trace

Page 19: Data Mining for Developers

Advanced Techniques - DMX

Page 20: Data Mining for Developers

Excel Functions*

DMPREDICTTABLEROW ( Connection, ModelName, PredictionResult, TableRowRange[, string CommaSeparatedColumnNames])

DMPREDICT ( Connection, Model, PredictionResult,

Value1, Name1, [...,Value32, Name32])

DMCONTENTQUERY (Connection, Model, PredictionResult[, WhereClause])

Page 21: Data Mining for Developers

Data Mining Extensions (DMX) CREATE MINING MODELCREATE MINING MODEL CreditRiskCreditRisk

(CustID(CustID LONG KEY, LONG KEY,

Gender TEXT DISCRETE,Gender TEXT DISCRETE,

Income Income LONG LONG CONTINUOUS,CONTINUOUS,

Profession TEXT DISCRETE,Profession TEXT DISCRETE,

RiskRisk TEXT DISCRETE PREDICT) TEXT DISCRETE PREDICT)

USINGUSING Microsoft_Decision_Trees Microsoft_Decision_Trees

INSERT INTOINSERT INTO CreditRisk CreditRisk

(CustId, Gender, Income, (CustId, Gender, Income, Profession, Risk)Profession, Risk)

Select Select

CustomerID, Gender, Income, CustomerID, Gender, Income, Profession,RiskProfession,Risk

From CustomersFrom Customers

SelectSelect NewCustomers.CustomerID, NewCustomers.CustomerID, CreditRisk.Risk, CreditRisk.Risk, PredictProbability(CreditRisk.Risk)PredictProbability(CreditRisk.Risk)

FROMFROM CreditRisk CreditRisk PREDICTION JOINPREDICTION JOIN NewCustomersNewCustomers

ONON CreditRisk.Gender=NewCustomer.GenderCreditRisk.Gender=NewCustomer.Gender

ANDAND CreditRisk.Income=NewCustomer.Income CreditRisk.Income=NewCustomer.Income

AND AND CreditRisk.Profession=NewCustomer.ProfessionCreditRisk.Profession=NewCustomer.Profession

Page 22: Data Mining for Developers

DMX Column Expressions

• Predictable Columns• Source Data Columns• Functions

Predict“Workhorse”Discrete scalar valuesContinuous scalar valuesAssociative nested tablesSequence nested tablesTime SeriesOverloaded to

PredictAssociationPredictSequencePredictTimeSeries

• PredictProbability• PredictSupport• PredictHistogram• Cluster• ClusterProbability• GetNodeId• IsInNode

• Arithmetic operators• Stored Procedure• Subselect

• Select from nested tables

Page 23: Data Mining for Developers

Data Mining Interfaces – XMLA ++

Analysis Server (msmdsrv.exe)

OLAP Data Mining

Server ADOMD.NET

.Net Stored Procedures Microsoft Algorithms Third Party Algorithms

XMLAXMLAOver TCP/IPOver TCP/IP

OLEDB for OLAP/DM ADO/DSO

XMLAXMLAOver HTTPOver HTTP

Any Platform, Any Device

C++ App VB App .Net App

AMO

Any App

ADOMD.NET

WANWAN

DM Interfaces

Page 24: Data Mining for Developers

Summary

• Familiar client for SQL Server Data Mining• Model Creators• Model Users• Excel Data or Server Data

• Implement the full DM Cycle• Data Understanding• Data Preparation• Modeling• Validation• Deployment

Page 25: Data Mining for Developers

Resources

Technical Communities, Webcasts, Blogs, Chats & User Groupshttp://www.microsoft.com/communities/default.mspx

Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet

Trial Software and Virtual Labshttp://www.microsoft.com/technet/downloads/trials/default.mspx

Microsoft Learning and Certificationhttp://www.microsoft.com/learning/default.mspx

SQL Server Data Mininghttp://www.sqlserverdatamining.comhttp://www.microsoft.com/bi/bicapabilities/data-mining.aspxhttp://www.microsoft.com/bi/bicapabilities/data-mining.aspx

Page 26: Data Mining for Developers

BI Resources from Lynn Langit

Foundations of SQL Server 2005 Business Intelligencepublished by Apress in April 2007

Blog: http://blogs.msdn.com/SoCalDevGalBlog: http://blogs.msdn.com/SoCalDevGal

Page 27: Data Mining for Developers

Q&A

Page 28: Data Mining for Developers

• BIN302 Microsoft Office Excel and Analysis Services: An In-Depth Look at Integration

• OFF312 Using Data in Excel Solutions Built with Visual Studio Tools for the Office System

Related Content

Page 29: Data Mining for Developers

© 2007 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only.MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.