some working definitions…. ‘data mining’ and ‘knowledge discovery in databases’ (kdd) are...
TRANSCRIPT
Some working definitions….
• ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably
• Data mining = – the discovery of interesting, meaningful and
actionable patterns hidden in large amounts of data • Multidisciplinary field originating from artificial
intelligence, pattern recognition, statistics, machine learning, econometrics, ….
Data mining is a process…
• Business objectives• Model Development
– Model objective– Data collection & preparation– Model construction– Model evaluation– Combining models with business knowledge into decision
logic• Model / decision logic deployment• Model / decision logic monitoring
Data mining is a process…a marketing example
• Business objectives– Cross sell MMS bundle to lapsed users / non users
• Model Development– Model objective
• For consumers with no MMS bundle in past 6 months, predict MMS bundle ownership yes/no in next three months
– Data collection & preparation• All fields for all active customers as of end APR05; remove all customers with MMS bundle in NOV04-
APR05; Left join MMS Bundle field from MAY05, JUNE05, JULY05– Model construction
• Build various models to predict MMS Bundle MAY or JUNE or JULY = ‘N’ on 70% if the data– Model evaluation
• Evaluate predictive power on 70% data for model development and 30% test set– Combining models with business knowledge into decision logic
• Target the top 30% and randomly test two propositions (50 MMS for 5Euro; 100MMS for 7.50Euro) across two channel (Direct mail and SMS)
• Model / decision logic deployment– Run the campaign
• Model / decision logic monitoring– Compare predctions against actual response to evaluate model quality and robustness– What propositions / channels work best
Data mining tasks
• Undirected, explorative, descriptive, ‘unsupervised’ data mining– Matching & search– Profile & rule extraction– Clustering & segmentation; dimension reduction
• Directed, predictive, ‘supervised’ data mining– Predictive modeling
Case A 7
Case B 4
10987654321
Worsebusiness
Score
Betterbusiness
Case A
Case B
Past experience
Data Behaviour
GoodBad
Bad
Good
Model
Data mining task example:predictive modeling
Data mining task example:predictive modeling
Income Age Children
60K 38 2
30K 23 1
30K 29 0
... ... ...
120K 55 2
Collected data
score = (0 x Income) + (-1 x Age) + (25 x Children)
Data mining task example:predictive modeling
Income Age Children Status Value Score
60K 38 2 Good 100 12
30K 23 1 Good 45 2
30K 29 0 Bad -80 -24
... ... ... ... ... ...
120K 55 2 Bad -40 -5
Data mining techniques for predictive modeling
• Linear and logistic regression• Decision trees• Neural Networks• Nearest Neighbor• Genetic Algorithms• ….
Regression in pattern space
age
inco
me
Only a single line available in pattern space to separate classes
Class ‘circle’
Class ‘square’
Decision Trees
20000 customersresponse 1%
Income >150000?
18800 customersPurchases >10?
1200 customersbalance>50000?
800 customersresponse 1,8% etc.400 customers
response 0,1%
no
noyes
yes
no
Decision Trees in Pattern Space
age
inco
me
Line pieces perpendicular to axes
Each line is a split in the tree, two answers to a question
Decision Trees in Pattern Space
age
wei
ght
Goal classifier is to seperate classes (circle, square) on the basis of attribute age and income
Each line corresponds to a split in the tree
Decision areas are ‘tiles’ in pattern space
Nearest Neighbour
• Data itself is the classification model, so no abstraction like a tree etc.
• For a given instance x, search the k instances that are most similar to x
• Classify x as the most occurring class for the k most similar instances
= new instance
Any decision area possible
Condition: enough data available
Nearest Neighbor in Pattern Space
Classification
fe age
fe w
eigh
t
Nearest Neighbor in Pattern Space
Voorspellen
f.e. age
bvb.
wei
ght
Any decision area possible
Condition: enough data available
Example classification algorithm 3:Neural Networks
• Inspired by neuronal computation in the brain (McCullough & Pitts 1943 (!))
• Input (attributes) is coded as activation on the input layer neurons, activation feeds forward through network of weighted links between neurons and causes activations on the output neurons (for instance diabetic yes/no)
• Algorithm learns to find optimal weight using the training instances and a general learning rule.
invoer:bvb. klantkenmerken
uitvoer:bvb. respons
• Example simple network (2 layers)
• Probability of being diabetic = f (age * weightage + body mass index * weightbody mass index)
Neural Networks
Weightbody mass index
Probability of being diabetic
age body_mass_index
weightage
Neural Networks in Pattern Space
Classification
f.e. age
f.e.
wei
ght
Simpel network: only a line available (why?) to seperate classes
Multilayer network:
Any classification boundary possible