a microeconomic view of data mining author:jon et al. advisor:dr. hsu graduate:zenjohn huang idsl...
TRANSCRIPT
A Microeconomic View of Data Mining
Author: Jon et al.
Advisor: Dr. Hsu
Graduate: ZenJohn Huang
IDSL seminar 2001/12/4
Outline
MotivationObjectiveThree examplesMarket segmentationData mining as sensitivity analysisSegmentation in a model of competitionConclusionsPersonal opinion
Motivation
Data mining is about extracting interesting patterns from raw data, but only disjointed discussion of what “interesting” means.Patterns are often deemed “interesting” on the basis of their confidence and support.
Objective
Presenting a rigorous frameworkBased on optimizationFor evaluating data mining operationsUtility in decision-making
Studying certain aspects of data miningEconomically motivated optimization problemsWith a large volume of unaggregated data
Microeconomic frameworkOptimization problem
Introduction (1/6)
)(max xfDxD is the domain of all possible decisions
f(x) is the utility or value of decision x
Mathematical programming and microeconomicsLagrange multipliers and penalty functions[Avriel, 1976]This paper
Feasible region D is basically endogenousObjective function f(x)
Introduction (2/6)
Introduction (3/6)
ci
i xfxf )()(
C is a set of agents or other factors influencing the utility of the enterprise
•Concrete level
•Abstract level
Introduction (4/6)
Yi denote the data we have on customer i
g(x, yi) is some fixed function of the decision and the data
)y ,( imax
ciDx xg
Introduction (5/6))y ,(max
ciDx xg
Aggregation
•The computational requirements otherwise would be enormous
•It is difficult to obtain the data yi
Introduction (6/6)
Fundamental issuesOptimizationLinear programmingGame theory
Three Examples (1/3)
Beer and diapersRetailer stocks two products in quantities x1, x2; X1+x2 <= cThe profit margins in the two products are m1, m2Part
All-or nothing
ci iyY ,11
ci iyY 2,2
2,,132.2.11 iiii yyByByB
Three Examples (2/3)
Market segmentationResidenceBusiness customers
21,max)( xcxcxf iii
ciiiDxx
xcxc 21max
)2,1(,max2
Three Examples (3/3)
Beer and diapers, revisitedTransaction(location, dd, mm, yy, item1, item2, …, itemn)Transaction[location=‘Palo alto’]Transaction[location=‘Palo alto’ and 12<tt]Transaction[location=‘Palo alto’ and day-of-the-week(dd,mm,yy)=‘Monday’]
To segment customers into k clusters
Different marketing strategyDifferent advertising campaign
Market Segmentation
k
j CiiDx
j
xc1
max
ci
ji kjxc ,...,1:max
kkjxcci
ji
],...,1:max[
Specific Problems
n
ijij cx
1
max1
N vectors in c1,…,cn {-1, 1}d
K is an integer
Find a set of k vectors x1,…,xk {-1, 1}d
Maximize the sum
Complexity
1. The segmentation problems corresponding to the following feasible sets D is NP-complete
2. Segmentation problems in the previous theorem can be solved in linear time when the number of dimensions
Complexity(cont’d)
Theorem1. The d-dimensional unit ball, even with k=22. The d-dimensional unit L1 ball
3. The r-slice of the d-dimensional hypercube4. The d-dimensional hypercube, even with
k=25. The set of all spanning trees of a graph G,
even with k=2
Complexity(cont’d)
Sketch1. Can be solved by aligning the solution
with the cost vector2. Has only 2d vertices3. Can be solved by choosing the r most
popular elements4. By simply picking the vertex that
coordinate-wise agrees in sign with the cost vector
Data Mining As Sensitivity Analysis(1/3)
xcxbAx max
0,
Data Mining As Sensitivity Analysis(2/3)
linear is , 0yy
f satisfies ),...,(
lki
21
ir
i fyyf
nonlinear is , 0yy
f satisfies ),...,(
lki
21
ir
i fyyf
Yi is the table capture from ci
Data Mining As Sensitivity Analysis(3/3)
Ci
X
cDI ij
Xi
j
jj ij
max0,
1
Segmentation in a Model of Competition
Two-player gamesProbability distribution
Conclusions
Presenting a rigorous framework for the automatic evaluation of data mining operationsData mining as an activity by a revenue-maximizing enterprise
Personal Opinion
Using independent decisions to K mean