forecast anything! the seven data mining models andy cheung isv developer evangelist microsoft hong...

26
Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Upload: neil-ramsey

Post on 11-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Forecast Anything!The Seven Data Mining Models

Andy CheungISV Developer EvangelistMicrosoft Hong Kong

Forecast Anything!The Seven Data Mining Models

Andy CheungISV Developer EvangelistMicrosoft Hong Kong

Page 2: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Agenda

AnnouncementAnnouncement

OverviewOverview

Microsoft Mining Model AlgorithmsMicrosoft Mining Model Algorithms

Lucky Draw!!!Lucky Draw!!!

Page 3: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

AnnouncementLearn Microsoft Technologies and Win Learn Microsoft Technologies and Win Some Prize!Some Prize!To make it easier for you to learn Microsoft technologies, To make it easier for you to learn Microsoft technologies, we have changed the way to deliver seminar contents we have changed the way to deliver seminar contents by offering you by offering you Offline Webcast CDsOffline Webcast CDs. .

  •3 CDs in 6 months – 3 topics and assessment3 CDs in 6 months – 3 topics and assessment•If you can pass the assessment criteria, you will receive If you can pass the assessment criteria, you will receive a $150 Park’n Shop cash coupon!a $150 Park’n Shop cash coupon!

  

Since this is a trial offer, the maximum number of Since this is a trial offer, the maximum number of participants will be limited to participants will be limited to 5050 (on first-come-first- (on first-come-first-serve basis). Register now by sending email to Microsoft serve basis). Register now by sending email to Microsoft Macau Team at Macau Team at [email protected]@microsoft.com!!

Page 4: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Data Mining Overview

Microsoft Data Mining AlgorithmsMicrosoft Data Mining Algorithms

Explores Explores Your DataYour Data

Finds Finds PatternsPatterns

Performs Performs PredictioPredictio

nsns

Page 5: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Microsoft Mining Model Algorithms

Decision Trees Decision Trees

Naive BayesNaive Bayes

Cluster AnalysisCluster Analysis

Sequence ClusteringSequence Clustering

Association RulesAssociation Rules

Time SeriesTime Series

Neural NetworksNeural Networks

Page 6: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Decision TreesClassify each Classify each casecase to one of a few to one of a few discrete discrete broad categoriesbroad categories of selected of selected attributeattributess

The process of building is recursive The process of building is recursive partitioning – splitting data into partitioning – splitting data into partitions and then splitting it up morepartitions and then splitting it up more

Initially all cases are in one big boxInitially all cases are in one big box

Page 7: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Decision Trees

The algorithm tries all possible breaks in The algorithm tries all possible breaks in classes using all possible values of eachclasses using all possible values of each inputinput attribute; it then selects the split that attribute; it then selects the split that partitions data to the purest classespartitions data to the purest classes of the of the searched variablesearched variable

Several measures of puritySeveral measures of purity

Then it repeats splitting for each new classThen it repeats splitting for each new classAgain testing all possible breaksAgain testing all possible breaks

Unuseful branches of the tree can be Unuseful branches of the tree can be pre-pruned or post-prunedpre-pruned or post-pruned

Page 8: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Decision Trees

Decision trees are used for Decision trees are used for classification aclassification annd predictiond prediction

Typical questions:Typical questions:Predict which customers will leavePredict which customers will leave

Help in mailing and promotion Help in mailing and promotion campaignscampaigns

Explain reasons for a decisionExplain reasons for a decision

What are the movies young female What are the movies young female customers like to buy?customers like to buy?

Page 9: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Microsoft Mining Models

Page 10: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Naïve BayesClassification and Prediction ModelClassification and Prediction Model

Calculates probabilities for each Calculates probabilities for each possible state of the input attribute possible state of the input attribute given each state of the predictable given each state of the predictable attributeattribute

Page 11: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Naïve Bayes Used for classificationUsed for classification

Assign new cases to predefined Assign new cases to predefined classesclasses

Some typical questions:Some typical questions:Categorize bank loan applicationsCategorize bank loan applications

Determining which home telephone Determining which home telephone lines are used for Internet accesslines are used for Internet access

Assigning customers to predefined Assigning customers to predefined segmentssegments

Quickly gathering basic Quickly gathering basic comprehensioncomprehension

Page 12: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Cluster AnalysisGrouping data into clustersGrouping data into clusters

Objects within a cluster have high Objects within a cluster have high similarity based on the attribute valuessimilarity based on the attribute values

The class label of each object is not The class label of each object is not knownknown

Several techniquesSeveral techniquesPartitioning methodsPartitioning methods

Hierarchical methodsHierarchical methods

Density based methodsDensity based methods

Model based methods, more…Model based methods, more…

Page 13: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Cluster AnalysisSegments a heterogeneous Segments a heterogeneous population into a number of more population into a number of more homogenous subgroups or homogenous subgroups or clustersclusters

Some typical questions:Some typical questions:Discover distinct groups of Discover distinct groups of customerscustomers

Identification of groups of houses in Identification of groups of houses in a citya city

In biology, derive animal and plant In biology, derive animal and plant taxonomiestaxonomies

Page 14: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Sequence Clustering

AAnalyzes sequence-oriented data that nalyzes sequence-oriented data that contains discrete-valued series contains discrete-valued series

TThe sequence attribute in the series he sequence attribute in the series holds a set of events with a specific holds a set of events with a specific order order that can be cosnsidered as a that can be cosnsidered as a modelmodel

Typically used forTypically used for Web customer Web customer analysisanalysis

Can be used for any other sequential Can be used for any other sequential datadata

Page 15: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Sequence Clustering

UserUser SequenceSequence

11 frontpage news travel travelfrontpage news travel travel

22 news news news news newsnews news news news news

33 frontpage news frontpage news frontpagefrontpage news frontpage news frontpage

44 news newsnews news

55 frontpage news news travel travel travelfrontpage news news travel travel travel

66 news weather weather weather weathernews weather weather weather weather

77 news health health business business businessnews health health business business business

88 frontpage sports sports sports weatherfrontpage sports sports sports weather

99 weatherweather

Click-Stream Analysis

Page 16: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Microsoft Mining Models

Page 17: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Association RulesFor For market basket analysesmarket basket analyses

Identify cross-selling opportunitiesIdentify cross-selling opportunities

Arrange attractive packagesArrange attractive packages

Considers each attribute/value pair Considers each attribute/value pair as an itemas an item

An item set is a combination of items An item set is a combination of items in a single transactionin a single transaction

The algorithm scans through the The algorithm scans through the dataset trying to find item sets that dataset trying to find item sets that tend to appear in many transactionstend to appear in many transactions

Page 18: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Association Rules – Support

Support is the percentage of Support is the percentage of rowsrows containing the item combination containing the item combination compared to the total number of compared to the total number of rows:rows:

Transaction 1: Frozen pizza, cola, milk Transaction 1: Frozen pizza, cola, milk Transaction 2: Milk, potato chips Transaction 2: Milk, potato chips Transaction 3: Cola, frozen pizza Transaction 3: Cola, frozen pizza Transaction 4: Milk, pretzels Transaction 4: Milk, pretzels Transaction 5: Cola, pretzels Transaction 5: Cola, pretzels

The support for the rule “If a The support for the rule “If a customer purchases Cola, then they customer purchases Cola, then they will purchase Frozen Pizza” is 40%will purchase Frozen Pizza” is 40%

Page 19: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Association Rules – ConfidenceWWhat if 100% of customers buy milk hat if 100% of customers buy milk

and and only 20% of those buy potato chips? only 20% of those buy potato chips?

The confidence of an association rule The confidence of an association rule is the support for the combination is the support for the combination divided by the support for the divided by the support for the conditioncondition

This gives a confidence for a rule “If a This gives a confidence for a rule “If a customer purchases Milk, they will customer purchases Milk, they will purchase Potato Chips” of (20% / purchase Potato Chips” of (20% / 60%) = 33%60%) = 33%

Page 20: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Time Series

Predict continuous columns, such as Predict continuous columns, such as product sales or stock performance in product sales or stock performance in a forecasting scenarioa forecasting scenario

Builds a model in two stages Builds a model in two stages First stage creates a list of optimal First stage creates a list of optimal candidate input columnscandidate input columns

Second stage investigates each Second stage investigates each candidate input column and determines candidate input column and determines if it improves the modelif it improves the model

Page 21: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Microsoft Mining Models

Page 22: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Neural NetworkDData modeling tool that is able to capture ata modeling tool that is able to capture and represent complex input/output and represent complex input/output relationshipsrelationships

Neural networks resemble the human Neural networks resemble the human brain in the following two ways: brain in the following two ways:

A neural network acquires knowledge through A neural network acquires knowledge through learninglearning

A neural network's knowledge is stored within A neural network's knowledge is stored within inter-neuron connection strengths known as inter-neuron connection strengths known as synaptic weightssynaptic weights

It It explores all possible data relationships explores all possible data relationships It is slowIt is slow

Page 23: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Back-Propagation

Training a neural network is setting Training a neural network is setting the best weights on the inputs of the best weights on the inputs of each of the unitseach of the units

The back-propagation process:The back-propagation process:Get a training example and calculate Get a training example and calculate outputsoutputs

Calculate the error – the difference Calculate the error – the difference between the calculated and the between the calculated and the expected (known) resultexpected (known) result

Adjust the weights to minimize the errorAdjust the weights to minimize the error

Page 24: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Conclusion: When To Use WhatAnalytical problemAnalytical problem ExamplesExamples AlgorithmsAlgorithms

Classification: Assign cases to predefined classes

Credit risk analysisChurn analysisCustomer retention

Decision TreesNaive BayesNeural Nets

Segmentation: Taxonomy for grouping similar cases

Customer profile analysisMailing campaign

ClusteringSequence Clustering

Association: Advanced counting for correlations

Market basket analysisAdvanced data exploration

Decision TreesAssociation

Time Series Forecasting: Predict the future

Forecast salesPredict stock prices

Time Series

Prediction: Predict a value for a new case based on values for similar cases

Quote insurance ratesPredict customer income

All

Deviation analysis: Discover how a case or segment differs from others

Credit card fraud detectionNetwork infusion analysis

All

Page 25: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong
Page 26: Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

© 2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.