2002/4/10idsl seminar estimating business targets advisor: dr. hsu graduate: yung-chu lin data...
TRANSCRIPT
2002/4/10 IDSL seminar
Estimating Business Targets
Advisor: Dr. Hsu
Graduate: Yung-Chu Lin
Data Source: Datta et al., KDD01, pp. 420-425.
2002/4/10 IDSL seminar
Abstract
Propose a new solution to the classical econometric task of frontier analysis
Combine nearest neighbor methods and classical statistical methods
Identify under marketed customersBenchmark regional directory divisions
2002/4/10 IDSL seminar
Outline
MotivationObjectiveHistorical approachesTarget estimation methodologyCase studyConclusion Personal opinion
2002/4/10 IDSL seminar
Motivation
Setting targets is a critical taskSetting the target of each entity to the
average amongst the entities traditionallyTwo challenges
– The characteristics of the entities will have a heavy influence on the outcome
– The inherent unsupervised nature of the problem
2002/4/10 IDSL seminar
Objective
Provide a methodology for estimating unsupervised maximal or minimal targets
Setting revenue target expectations for individual customers
Revenue target setting for regional yellow page directories
2002/4/10 IDSL seminar
Mathematical Programming
where is the target for xi, a vector for the ith observation
Sensitivity to errors or outliers since it assumes that all observed targets define the possible space
)( ii xg
i
2002/4/10 IDSL seminar
Economics
where is a non-negative error term
The requirement of a model for the error term and for g
iii xg )(
i
2002/4/10 IDSL seminar
Target Estimation Methodology
Nearest neighbor vs. clusteringThe neighborhoodsThe distance functionTarget estimation from the neighborhoodsA heuristic for comparing neighborhoods
2002/4/10 IDSL seminar
Nearest Neighbor vs. Clustering
Time complexity– Clustering is better than nearest neighbor
Problem of clustering– Two similar entities fall into different cluster– Dimension higher, influence more serious– But nearest neighbor is not so
2002/4/10 IDSL seminar
The Neighborhoods
xi: ith observationyi: the variable containg its target valueni: neighborhood for xi, where ni is a set of
observations {xi, xj, …}
2002/4/10 IDSL seminar
The Distance Function
Continuous standardizee.g. Continuous- (2,1)(3,4)
Nominal- (a,b)(a,c) 2
22
1 )*3()*1( ww
220 w
2002/4/10 IDSL seminar
Target Estimation From the Neighborhoods
Let yi(1), yi(2), …, yi(k) be the order statistics, so that yi(1) is the largest
2002/4/10 IDSL seminar
A Heuristic for Comparing Neighborhoods
Maximal frontier E(xi) will range from 0 to 1Minimal frontier E(xi) >=1
2002/4/10 IDSL seminar
Case Study
Target revenues for directory book advertisers
Target revenue for regional directories
2002/4/10 IDSL seminar
(1) Target Revenues for Directory Book Advertisers
Goal– Find businesses that have low spending
relative to those with otherwise similar characteristics
Three categories of data available– Advertiser: e.g. number of employees– Directory: e.g. distribution size– Market : e.g. median household income
2002/4/10 IDSL seminar
Calculating Nearest Neighbors
Standardize continuous data: natural logK=4Weight the variables equally
– But decrease the weights for many of the directory and market variables
2002/4/10 IDSL seminar
(2) Target Revenue for Regional Directories
Goal– Benchmark regional directory divisions
Separate the data into two sets– Training set: 80%– Test set: 20%
K=4
2002/4/10 IDSL seminar
Book Type
System book– an entire serving area
System-neighborhood book– A smaller number of geographic areas in the
franchise areaNeighborhood book
– Areas outside of the telephone company’s franchise area
2002/4/10 IDSL seminar
Neigborhood books System books Non-system books
The x-axis shos log(distribution) and the y-axis E(x)
2002/4/10 IDSL seminar
Conclusion
Present a general data mining methodology for estimating business targets by frontier analysis
First case– Increase sales focus on the under-marketed customers
– Increase the potential revenue by several million
Second case– Estimate optimal revenue performance targets for
directory divisions
– Increase for directory books is a minimum of several million dollars