Data Mining as a BI ToolData Mining as a BI Tool
Business IntelligenceBusiness Intelligence
Data AnalysisData Analysis
Data ExtractionData Extraction
VisualisationVisualisation
ExplorationExploration
DiscoveryDiscovery
Reporting / EIS / MIS
OLAP
Collecting / Transforming
Data StorageData Storage Storing / Aggregating / Historising
Data Mining
OLAP vs. Data MiningOLAP vs. Data Mining OLAP OLAP verifiesverifies hypotheses – The analyst intuits at the result and guides the process hypotheses – The analyst intuits at the result and guides the process
Data Mining Data Mining discoversdiscovers hypotheses – hypotheses – The data determine the results
Formulatehypothesis H
Formulatequery for H
DB
Queryresult
N
Y
Discoveredhypotheses
SelectDM method
Formulatebusiness problem
Y
N
OLAP
Data Mining
H valid ?
Useful ?
Actionablebusiness knowledge
Input-Output ViewInput-Output View
Dat
a M
inin
gD
ata
Min
ing
Business Knowledge
Data(internal & external)
Decision Models
Reports
Objective(s)
New Knowledge
What Kind of Output?What Kind of Output?
Decision treesDecision trees
RulesRules
WebWeb
Product B
Product F
Product C
Product E
Product G
Product D
Product A
Data MiningData Mining Operationalization of Machine Operationalization of Machine
Learning, with two specific Learning, with two specific emphasesemphases Emphasis on processEmphasis on process Emphasis on actionEmphasis on action
From Data to ActionFrom Data to ActionKnowledge• People who buy product X also buy product Y, P% of the time• Doctors who perform in excess of N operations of type T per month may be fraudulous• Molecules of class X are most likely carcinogenic
Actions• Offer product Y to owners
of product X• Investigate potential fraudsInformation
• Mrs X buys product Y• Product X costs Y
francs• Mr X drives a car of
type Y• Dr X performed Y
operations• of type T Data (raw)
• Lifestyle• Transactions• Socio-demographics
Process ViewProcess View
RawData
SelectedData
Pre-processedData
ModelBuilding
PatternsModels
Interpretation&
Evaluation
BusinessProblem
Formulation
Dissemination&
Deployment
Determine credit worthiness
Aggregate individual incomes into household income
Learn about loans, repayments, etc.;Collect data about past performance
Build a decision tree
Check against hold-out set
DataPre-processing
UnderstandingDomain & Data
Key Success FactorsKey Success Factors
Have a Have a clearly articulated business problemclearly articulated business problem that that needs to be solved and for which Data Mining is needs to be solved and for which Data Mining is the adequate technologythe adequate technology
Ensure that the problem being pursued is Ensure that the problem being pursued is supported by the right type of datasupported by the right type of data of of sufficient sufficient qualityquality and in and in sufficient quantitysufficient quantity
Recognise that Recognise that Data Mining is a processData Mining is a process with with many components and dependenciesmany components and dependencies
Plan to learnPlan to learn from the Data Mining process from the Data Mining process whatever the outcomewhatever the outcome
Myths (I)Myths (I)
Data Mining produces surprising results that will utterly Data Mining produces surprising results that will utterly transform your businesstransform your business Reality:Reality:
Early results = scientific confirmation of human intuition.Early results = scientific confirmation of human intuition. Beyond = steady improvement to an already successful Beyond = steady improvement to an already successful
organisation.organisation. Occasionally = discovery of one of those rare « breakthrough » facts.Occasionally = discovery of one of those rare « breakthrough » facts.
Data Mining techniques are so sophisticated that they can Data Mining techniques are so sophisticated that they can substitute for domain knowledge or for experience in substitute for domain knowledge or for experience in analysis and model buildinganalysis and model building Reality:Reality:
Data Mining = joint venture.Data Mining = joint venture. CClose cooperation lose cooperation between experts in modeling and using the between experts in modeling and using the
associated techniques, and associated techniques, and people who understand the business.people who understand the business.
Myths (II)Myths (II)
Data Data MMining is useful only in certain areas, such as ining is useful only in certain areas, such as marketing, sales, and fraud detectionmarketing, sales, and fraud detection Reality:Reality:
Data mining is useful wherever data can be collectedData mining is useful wherever data can be collected.. All that is really needed is data and a willingness to « give it a try. » All that is really needed is data and a willingness to « give it a try. »
There is little to loose…There is little to loose…
Only massive databases are worth miningOnly massive databases are worth mining Reality:Reality:
AA moderately moderately--sized or small data set can also yield valuable sized or small data set can also yield valuable informationinformation..
It is not only the quantity, but also the quality of the data that It is not only the quantity, but also the quality of the data that matters (characterising mutagenic compounds)matters (characterising mutagenic compounds)
Myths (III)Myths (III)
The methods used in The methods used in DData ata MMining are fundamentally ining are fundamentally different from the older quantitative model-building different from the older quantitative model-building techniquestechniques Reality:Reality:
All methods now used in data mining are natural extensions and All methods now used in data mining are natural extensions and generaligeneralissations of analytical methods known for decadesations of analytical methods known for decades..
WhatWhat i is new in data mining is that wes new in data mining is that we a are now applying these re now applying these techniques to more general business problemstechniques to more general business problems..
Data Data MMining is an extremely complex processining is an extremely complex process Reality:Reality:
The algorithms of data mining may be complex, but new tools The algorithms of data mining may be complex, but new tools and and well-defined methodologies well-defined methodologies have made those algorithms easier to have made those algorithms easier to applyapply..
Much of the difficulty in applying data mining comes from the same Much of the difficulty in applying data mining comes from the same datadata organiorganissation issues that arise when using any modeling ation issues that arise when using any modeling techniques.techniques.
OLAP vs. DM IllustrationOLAP vs. DM Illustration
Data Mining with OLAP (I)Data Mining with OLAP (I)
Formulate hypothesisFormulate hypothesis Beer and fish sell well togetherBeer and fish sell well together
Issue corresponding queriesIssue corresponding queries TC = select COUNT of all baskets TC = select COUNT of all baskets
containing both beer and fishcontaining both beer and fish Decide on validityDecide on validity
Ratio of TC over baskets containing only Ratio of TC over baskets containing only beer or only fish, AND other possible beer or only fish, AND other possible associationsassociations
Data Mining with OLAP (II)Data Mining with OLAP (II)
Assume 11 possible products in any Assume 11 possible products in any one basket and restrict to one basket and restrict to associations of at most 4 productsassociations of at most 4 products 55 possible associations of 2 products55 possible associations of 2 products 165 possible associations of 3 products165 possible associations of 3 products 330 possible associations of 4 products330 possible associations of 4 products
Must issue 550 queries and compare Must issue 550 queries and compare the results!!!the results!!!
Data Mining Instead of OLAPData Mining Instead of OLAP
Only two alternatives with OLAP:Only two alternatives with OLAP: Brute force: prohibitive!Brute force: prohibitive! Intuition: speculative!Intuition: speculative!
Data Mining strikes a balance:Data Mining strikes a balance: Try most associationsTry most associations Use heuristics to guide the searchUse heuristics to guide the search
DM increases chances of useful DM increases chances of useful discovery!discovery!