a microeconomic view of data mining author:jon et al. advisor:dr. hsu graduate:zenjohn huang idsl...

24
A Microeconomic View of Data Mining Author: Jon et al. Advisor: Dr. Hsu Graduate: ZenJohn Huang IDSL seminar 2001/12/4

Upload: erin-gwenda-cox

Post on 02-Jan-2016

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

A Microeconomic View of Data Mining

Author: Jon et al.

Advisor: Dr. Hsu

Graduate: ZenJohn Huang

IDSL seminar 2001/12/4

Page 2: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Outline

MotivationObjectiveThree examplesMarket segmentationData mining as sensitivity analysisSegmentation in a model of competitionConclusionsPersonal opinion

Page 3: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Motivation

Data mining is about extracting interesting patterns from raw data, but only disjointed discussion of what “interesting” means.Patterns are often deemed “interesting” on the basis of their confidence and support.

Page 4: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Objective

Presenting a rigorous frameworkBased on optimizationFor evaluating data mining operationsUtility in decision-making

Studying certain aspects of data miningEconomically motivated optimization problemsWith a large volume of unaggregated data

Page 5: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Microeconomic frameworkOptimization problem

Introduction (1/6)

)(max xfDxD is the domain of all possible decisions

f(x) is the utility or value of decision x

Page 6: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Mathematical programming and microeconomicsLagrange multipliers and penalty functions[Avriel, 1976]This paper

Feasible region D is basically endogenousObjective function f(x)

Introduction (2/6)

Page 7: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Introduction (3/6)

ci

i xfxf )()(

C is a set of agents or other factors influencing the utility of the enterprise

•Concrete level

•Abstract level

Page 8: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Introduction (4/6)

Yi denote the data we have on customer i

g(x, yi) is some fixed function of the decision and the data

)y ,( imax

ciDx xg

Page 9: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Introduction (5/6))y ,(max

ciDx xg

Aggregation

•The computational requirements otherwise would be enormous

•It is difficult to obtain the data yi

Page 10: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Introduction (6/6)

Fundamental issuesOptimizationLinear programmingGame theory

Page 11: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Three Examples (1/3)

Beer and diapersRetailer stocks two products in quantities x1, x2; X1+x2 <= cThe profit margins in the two products are m1, m2Part

All-or nothing

ci iyY ,11

ci iyY 2,2

2,,132.2.11 iiii yyByByB

Page 12: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Three Examples (2/3)

Market segmentationResidenceBusiness customers

21,max)( xcxcxf iii

ciiiDxx

xcxc 21max

)2,1(,max2

Page 13: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Three Examples (3/3)

Beer and diapers, revisitedTransaction(location, dd, mm, yy, item1, item2, …, itemn)Transaction[location=‘Palo alto’]Transaction[location=‘Palo alto’ and 12<tt]Transaction[location=‘Palo alto’ and day-of-the-week(dd,mm,yy)=‘Monday’]

Page 14: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

To segment customers into k clusters

Different marketing strategyDifferent advertising campaign

Market Segmentation

k

j CiiDx

j

xc1

max

ci

ji kjxc ,...,1:max

kkjxcci

ji

],...,1:max[

Page 15: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Specific Problems

n

ijij cx

1

max1

N vectors in c1,…,cn {-1, 1}d

K is an integer

Find a set of k vectors x1,…,xk {-1, 1}d

Maximize the sum

Page 16: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Complexity

1. The segmentation problems corresponding to the following feasible sets D is NP-complete

2. Segmentation problems in the previous theorem can be solved in linear time when the number of dimensions

Page 17: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Complexity(cont’d)

Theorem1. The d-dimensional unit ball, even with k=22. The d-dimensional unit L1 ball

3. The r-slice of the d-dimensional hypercube4. The d-dimensional hypercube, even with

k=25. The set of all spanning trees of a graph G,

even with k=2

Page 18: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Complexity(cont’d)

Sketch1. Can be solved by aligning the solution

with the cost vector2. Has only 2d vertices3. Can be solved by choosing the r most

popular elements4. By simply picking the vertex that

coordinate-wise agrees in sign with the cost vector

Page 19: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Data Mining As Sensitivity Analysis(1/3)

xcxbAx max

0,

Page 20: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Data Mining As Sensitivity Analysis(2/3)

linear is , 0yy

f satisfies ),...,(

lki

21

ir

i fyyf

nonlinear is , 0yy

f satisfies ),...,(

lki

21

ir

i fyyf

Yi is the table capture from ci

Page 21: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Data Mining As Sensitivity Analysis(3/3)

Ci

X

cDI ij

Xi

j

jj ij

max0,

1

Page 22: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Segmentation in a Model of Competition

Two-player gamesProbability distribution

Page 23: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Conclusions

Presenting a rigorous framework for the automatic evaluation of data mining operationsData mining as an activity by a revenue-maximizing enterprise

Page 24: A Microeconomic View of Data Mining Author:Jon et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/12/4

Personal Opinion

Using independent decisions to K mean