data mining for management and e-commerce by johnny lee department of accounting and information...

Data Mining for Management and E-commerce

By Johnny Lee

Department of Accounting and Information Systems

University of Utah

Agenda

1. Microeconomic view of Data Mining

2. A Survey of recommendation systems in E-commerce

3. Turning Data Mining into a management science tool

A Microeconomic View of Data Mining

• Kleinberg et al. 1998

• Research Question: What is the economic utility of data mining?How to determine whether DM result is

interesting?


• “Interesting Pattern”– Confidence and support

• (High balanceHigh income)

– Information content• ?

– Unexpectedness• (Super ball result stock price)

– Actionability • $,$,$….


• Value of data mining– computing power and data

un-aggregate optimization

– Study of intricate ways (correlation and clusters in data that affect the enterprise’s optimal DECISION

A Microeconomic View of Data MiningValue of DM

Firm max f(x)

)()( xfxf i

),()( ii yxgxf

yi=customer data

),(max iyxg

Example one

If (demand of Beer) is not related (demand of diapers) then NO DM

If (demand of beer +demand of diaper)=(supply of beer-demand of beer)+*(supply of diaper- demand of diaper)+

then DM is needed

Example 2

Phone rate and users

without Data mining

experimenting arbitrary clusters

with data mining

optimize the profit by best matching customers and strategies

Ci

iiDXX

XcXcMax 22),(

,max2

21

Example 3

• Beer and diaper a~~gain

• Mining to decide how to jointly promote items.

• Mining data in rows or columns

• Goal oriented

What is the goal? Generated revenue

• Conflict in action space, what to do?

Contribution

• Automatic pattern filtering system based on economic value

• Rules for manual pattern filtering system

• Rules for determine trigger point of Data Mining

A survey of recommendation systems in electronic commerce

• Wei et al. 2001

• Research question:What are the types of E-commerce recommendati

on systems and how do they work?

E-commerce recommendation Systems

• Suggest items that are of interest to users based on something.

• Something:– Customer characteristics (demographics)– Features of items– User preferences: rating/purchasing history

Framework for Recommendation

Recommnedation System

Feature of Items

User's preferenece

User Demographics

Recommnedation

Types of Recommendation

• Prediction on preference of customersPersonalized and non personalized

• Top-N recommendation items for customersPersonalized and non personalized

• Top-M users who are most likely to purchase an item

Classification of Recommendation Systems

• Popularity-based: best sell

• Content-based: similar in items features

• Collaborative filtering: similar user’s taste

• Association-based: related items

• Demographic-based: user’s age, gender…

• Reputation-based: Represent individual

• Hybrid

Popularity-based

Procedures of Content-based

1. Feature extraction and Selection

2. Representation item pool by feature decided

3. User profile learning

4. Recommendation

Content-based

User Profile Learning

• pim=preference score of the user I on item m• wi=coefficient associated with feature j• fmj=the value of the j-th feature for item m• b=bias

bfwpk

j mjjim 1

Collaborative Filtering

• Recommend items based on opinions of other similar users

1. Dimension reduction by trimming preference matrix

2. Neighborhood formation for most similar user(s)

3. Recommendation generation

Collaborative filtering

Neighborhood Formation

• Pearson correlation coefficient

• Constrained Pearson correlation coefficient

• Spearman rank correlation coefficient

• Cosine similarity

• Mean-square

Neighborhood Selection

• Weight threshold

• Center-based best-k neighbors

• Aggregate-based best-k neighbors

Recommendation Generation

• Weighted average

• Deviation-from-mean

• Z-score average

Association-based

• Item-correlation for individual users 1. Similarity computing


• Association Rules– Guns and ammunition

– Cigarette and lighter

– Paper plate and soda

Theory: Complementary goods?

No theory: Co-occurrence?

Association-based

22)()(

))((),(

uujuui

jujiui

pppp

ppppjisim

Pui=preference score of user u on item I

Pibar=average preference sore of the I-th item over the set of co-rate user U

Pubar=average of the u-th user’s preference score

Association Based

Demographics-based

• Items that customers with similar demographics characteristics have bought

– Teens marketing

1. Data transformation: Counting, Exp(# of items), Statistic based

2. Category Preference model learning


Demographics-based

Methods:

1. Counting-based (frequency threshold)

2. Expected-value-based method

3. Statistics-based method

Comparison of recommendation approach

Approach Input info Types of recommendation

Degree of Personalization

Popularity-based User preferences Top-N Non-Personalized

Content-based Features of items and individual user preferences

Prediction, top-N and top-M users

Personalized

Collaborative Filtering

User preferences Prediction top-N recommendation

Personalized

Association-based User preferences Prediction top-N recommendation

Personalized

Demographics-based User demographic &preferences,features of items

Prediction top-N & top-M

Personalized

Reputation-based User preferences & reputation matrix

top-N & possible prediction

Personalized

Contribution

• Provide a systematic way to choose from E-commerce recommendation systems for practitioners

• Lay out existing approach

Turning Datamining into a Management Science Tool: New Algorithms

and Empirical Results

• Cooper & Giuffrida 2000

• Research question:How can we improve the performance of PromoC

ast (or other market) Forecast system by adding some local adjustment parameters?

Terminology

• SKU: Stock keeping unit

• KDS: knowledge discovery using SQL)

• Management science: ??????????????

KDSStart Rule generation

Phase

Sales records(error from Sales

Forecast)

Bottom-UpRule Generation

Location Data

Entropy-BasedRules Ranking

Rule Filtering

Entropy and Confidence

satisfy the level decided

Corrective Action

Yes

No

Rule network example

Activated Nodes example

Corrective Action

U_12= 0

U4-11= 58

U_3= 221

U_2= 1149

U_1= 3583

Ok= 1115

O_1= 7

O_2= 1

O_3= 0

O_4_11= 0

O_12= 0

KDS

• Bottom-up: start from the input database

• No Memory-Bound processing

• Minimal data preprocessing

• Separates the learning phase from the action phase

• Evaluation: for 10117 cases 8.9% ($?)

KDS

• Is this a research? Is this a case study?

• Is this a management research?

• Why should I know about it as a researcher/manager/engineer?

Acknowledge

• All right of trade marks and web-site contents belongs to the lawful owners

data mining for management and e-commerce by johnny lee department of accounting and information...

Documents