data mining as a bi tool business intelligence data analysis data extraction visualisation...

15
Data Mining as a BI Data Mining as a BI Tool Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting / Transforming Data Storage Storing / Aggregating / Historising Data Mining

Upload: shannon-george

Post on 26-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining as a BI Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting

Data Mining as a BI ToolData Mining as a BI Tool

Business IntelligenceBusiness Intelligence

Data AnalysisData Analysis

Data ExtractionData Extraction

VisualisationVisualisation

ExplorationExploration

DiscoveryDiscovery

Reporting / EIS / MIS

OLAP

Collecting / Transforming

Data StorageData Storage Storing / Aggregating / Historising

Data Mining

Page 2: Data Mining as a BI Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting

OLAP vs. Data MiningOLAP vs. Data Mining OLAP OLAP verifiesverifies hypotheses – The analyst intuits at the result and guides the process hypotheses – The analyst intuits at the result and guides the process

Data Mining Data Mining discoversdiscovers hypotheses – hypotheses – The data determine the results

Formulatehypothesis H

Formulatequery for H

DB

Queryresult

N

Y

Discoveredhypotheses

SelectDM method

Formulatebusiness problem

Y

N

OLAP

Data Mining

H valid ?

Useful ?

Actionablebusiness knowledge

Page 3: Data Mining as a BI Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting

Input-Output ViewInput-Output View

Dat

a M

inin

gD

ata

Min

ing

Business Knowledge

Data(internal & external)

Decision Models

Reports

Objective(s)

New Knowledge

Page 4: Data Mining as a BI Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting

What Kind of Output?What Kind of Output?

Decision treesDecision trees

RulesRules

WebWeb

Product B

Product F

Product C

Product E

Product G

Product D

Product A

Page 5: Data Mining as a BI Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting

Data MiningData Mining Operationalization of Machine Operationalization of Machine

Learning, with two specific Learning, with two specific emphasesemphases Emphasis on processEmphasis on process Emphasis on actionEmphasis on action

Page 6: Data Mining as a BI Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting

From Data to ActionFrom Data to ActionKnowledge• People who buy product X also buy product Y, P% of the time• Doctors who perform in excess of N operations of type T per month may be fraudulous• Molecules of class X are most likely carcinogenic

Actions• Offer product Y to owners

of product X• Investigate potential fraudsInformation

• Mrs X buys product Y• Product X costs Y

francs• Mr X drives a car of

type Y• Dr X performed Y

operations• of type T Data (raw)

• Lifestyle• Transactions• Socio-demographics

Page 7: Data Mining as a BI Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting

Process ViewProcess View

RawData

SelectedData

Pre-processedData

ModelBuilding

PatternsModels

Interpretation&

Evaluation

BusinessProblem

Formulation

Dissemination&

Deployment

Determine credit worthiness

Aggregate individual incomes into household income

Learn about loans, repayments, etc.;Collect data about past performance

Build a decision tree

Check against hold-out set

DataPre-processing

UnderstandingDomain & Data

Page 8: Data Mining as a BI Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting

Key Success FactorsKey Success Factors

Have a Have a clearly articulated business problemclearly articulated business problem that that needs to be solved and for which Data Mining is needs to be solved and for which Data Mining is the adequate technologythe adequate technology

Ensure that the problem being pursued is Ensure that the problem being pursued is supported by the right type of datasupported by the right type of data of of sufficient sufficient qualityquality and in and in sufficient quantitysufficient quantity

Recognise that Recognise that Data Mining is a processData Mining is a process with with many components and dependenciesmany components and dependencies

Plan to learnPlan to learn from the Data Mining process from the Data Mining process whatever the outcomewhatever the outcome

Page 9: Data Mining as a BI Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting

Myths (I)Myths (I)

Data Mining produces surprising results that will utterly Data Mining produces surprising results that will utterly transform your businesstransform your business Reality:Reality:

Early results = scientific confirmation of human intuition.Early results = scientific confirmation of human intuition. Beyond = steady improvement to an already successful Beyond = steady improvement to an already successful

organisation.organisation. Occasionally = discovery of one of those rare « breakthrough » facts.Occasionally = discovery of one of those rare « breakthrough » facts.

Data Mining techniques are so sophisticated that they can Data Mining techniques are so sophisticated that they can substitute for domain knowledge or for experience in substitute for domain knowledge or for experience in analysis and model buildinganalysis and model building Reality:Reality:

Data Mining = joint venture.Data Mining = joint venture. CClose cooperation lose cooperation between experts in modeling and using the between experts in modeling and using the

associated techniques, and associated techniques, and people who understand the business.people who understand the business.

Page 10: Data Mining as a BI Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting

Myths (II)Myths (II)

Data Data MMining is useful only in certain areas, such as ining is useful only in certain areas, such as marketing, sales, and fraud detectionmarketing, sales, and fraud detection Reality:Reality:

Data mining is useful wherever data can be collectedData mining is useful wherever data can be collected.. All that is really needed is data and a willingness to « give it a try. » All that is really needed is data and a willingness to « give it a try. »

There is little to loose…There is little to loose…

Only massive databases are worth miningOnly massive databases are worth mining Reality:Reality:

AA moderately moderately--sized or small data set can also yield valuable sized or small data set can also yield valuable informationinformation..

It is not only the quantity, but also the quality of the data that It is not only the quantity, but also the quality of the data that matters (characterising mutagenic compounds)matters (characterising mutagenic compounds)

Page 11: Data Mining as a BI Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting

Myths (III)Myths (III)

The methods used in The methods used in DData ata MMining are fundamentally ining are fundamentally different from the older quantitative model-building different from the older quantitative model-building techniquestechniques Reality:Reality:

All methods now used in data mining are natural extensions and All methods now used in data mining are natural extensions and generaligeneralissations of analytical methods known for decadesations of analytical methods known for decades..

WhatWhat i is new in data mining is that wes new in data mining is that we a are now applying these re now applying these techniques to more general business problemstechniques to more general business problems..

Data Data MMining is an extremely complex processining is an extremely complex process Reality:Reality:

The algorithms of data mining may be complex, but new tools The algorithms of data mining may be complex, but new tools and and well-defined methodologies well-defined methodologies have made those algorithms easier to have made those algorithms easier to applyapply..

Much of the difficulty in applying data mining comes from the same Much of the difficulty in applying data mining comes from the same datadata organiorganissation issues that arise when using any modeling ation issues that arise when using any modeling techniques.techniques.

Page 12: Data Mining as a BI Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting

OLAP vs. DM IllustrationOLAP vs. DM Illustration

Page 13: Data Mining as a BI Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting

Data Mining with OLAP (I)Data Mining with OLAP (I)

Formulate hypothesisFormulate hypothesis Beer and fish sell well togetherBeer and fish sell well together

Issue corresponding queriesIssue corresponding queries TC = select COUNT of all baskets TC = select COUNT of all baskets

containing both beer and fishcontaining both beer and fish Decide on validityDecide on validity

Ratio of TC over baskets containing only Ratio of TC over baskets containing only beer or only fish, AND other possible beer or only fish, AND other possible associationsassociations

Page 14: Data Mining as a BI Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting

Data Mining with OLAP (II)Data Mining with OLAP (II)

Assume 11 possible products in any Assume 11 possible products in any one basket and restrict to one basket and restrict to associations of at most 4 productsassociations of at most 4 products 55 possible associations of 2 products55 possible associations of 2 products 165 possible associations of 3 products165 possible associations of 3 products 330 possible associations of 4 products330 possible associations of 4 products

Must issue 550 queries and compare Must issue 550 queries and compare the results!!!the results!!!

Page 15: Data Mining as a BI Tool Business Intelligence Data Analysis Data Extraction Visualisation Exploration Discovery Reporting / EIS / MIS OLAP Collecting

Data Mining Instead of OLAPData Mining Instead of OLAP

Only two alternatives with OLAP:Only two alternatives with OLAP: Brute force: prohibitive!Brute force: prohibitive! Intuition: speculative!Intuition: speculative!

Data Mining strikes a balance:Data Mining strikes a balance: Try most associationsTry most associations Use heuristics to guide the searchUse heuristics to guide the search

DM increases chances of useful DM increases chances of useful discovery!discovery!