sponsored search seminar group project: approach and...
TRANSCRIPT
Sponsored Search Seminar GroupProject: Approach and Initial Results
Edi Bice, Kuzman Ganchev, AlexKulesza, Qian Liu, Jinsong Tan,
Qiuye Zhao
Initial Ideas
• Analyze keyword markets related by use ofmodifiers– ‘lexus’ vs. ‘used lexus’– ‘electrician’ vs. ‘chicago electrician’
• Develop better models for real-world biddingbehavior– Not “uniformly at random from [a,b]”
Combined Approach
• Develop a parameterized model for bids on asingle query– E.g., parameters are first bid and rate of falloff
• Collect bids for a wide array ofkeyword/modifier pairs– Fit the model to each bid viewer result
• Use model parameters as analysis quantities– E.g., the ‘free’ modifier lowers the first bid and
decreases rate of falloff
Analysis
• Noise is an issue– Consider keywords and modifiers in groups
• Generate matrices showing effect of eachmodifier group on each base keyword group– Preliminary results today
• So far, groups created by hand– Can we automate this? More later…
Outline
1. Keywords and modifiers (Edi)2. Initial results (Qian and Qiuye)3. Ongoing work: Modeling bids (Kuzman)4. Ongoing work: Modeling values (Jinsong)5. Ongoing work: Automatic clustering (Alex)
Keyword Base Groups and Modifiers
relevant, popular, diverse, and interesting− what some people search for− affected differently by modifiers− differ in several aspects (spatial, temporal, expense)
eight groups with ~600 keywords total− one or two groups per person
six modifier groups ~50 modifiers− modeling phases of consumer interaction− not necessarily applicable to all base keywords
32K base-modifier pairs− sparsity− data collection (Tales from the (s)Crypt)
Base Keywords
Cars (Alex)− toyota camry, chevy, ford suv, porsche 911
Drugs, medical (Edi)− zoloft, cialis, psoriasis, sciatica, liposuction
Electronics, software (Jinsong)− xbox, mp3, pda, oracle, world of warcraft
Travel (Kuzman)− airfare, cruise, safari, sailing, vacation
Local and non-local services (Qian)− electrician, locksmith, ** insurance, ** loan
Subscription services (Qiuye)− cable tv, gym membership, magazine subscription
Keyword Modifiers
INFO:− info, information,− specs, specifications,− reviews, ratings,− prices,− coupon, rebate,− guide, news
QUALITY:− best− luxury− favorite− inclusive, exclusive− preferred− used, new
LOCATION:− 20 U.S. States− 20 U.S. Cities
PRICE:− cheap, free− bargain, discount, deal− special, sale− budget, affordable− expensive
ACTION:− buy, sell, purchase− lease, rent, hire
POST:− support− parts− repair− mechanic− manufacturer− warranty
Base groups vs.modifier groups
serviceservice
travelsubscriptionsoftwarenon-localmedicallocalelectronicsdrugs
0.8152190.76446330.6629734.66921.7420411.2645451.0261760.878548null
0.3237210.46929410.2978571.4092310.31250.4986670.665185-1action
0.3038830.24393550.2237931.3414881.0085420.4874160.2597210.285461info
0.7298710.58423310.2647922.8292791.6519341.585050.38551-1location
0.110.35295450.2335290.730417-10.7137840.385122-1post
0.7779720.46265520.1864861.6448480.5586360.5095290.554332-1price
0.7188750.37452830.2627781.389070.7630770.4545160.5226970.2425quality
Base groups vs.modifier groups
serviceservice
travelsubscriptionsoftwarenon-localmedicallocalelectronicsdrugs
0.8152190.76446330.6629734.66921.7420411.2645451.0261760.878548null
0.3237210.46929410.2978571.4092310.31250.4986670.665185-1action
0.3038830.24393550.2237931.3414881.0085420.4874160.2597210.285461info
0.7298710.58423310.2647922.8292791.6519341.585050.38551-1location
0.110.35295450.2335290.730417-10.7137840.385122-1post
0.7779720.46265520.1864861.6448480.5586360.5095290.554332-1price
0.7188750.37452830.2627781.389070.7630770.4545160.5226970.2425quality
Base groups vs.price modifier sub-groups
serviceservice
travelsubscriptionsoftwarenon-localmedicallocalelectronicsdrugs
0.8152190.76446330.6629734.66921.7420411.2645451.0261760.878548null
0.7230110.65057690.1666673.1252630.7818180.5897220.806774-1cheap
0.2350.30204080.141.1159090.2215380.46050.297895-1free
0.9646810.61118640.1322.510.680.5503030.833214-1discount
1.1193750.51382350.341.5723080.1850.3215380.6528-1deal
0.6188890.173-11-10.9085710.236667-1budget
0.8242860.37277780.151.10.10.3463640.213-1special
0.8968420.46363640.10.893846-10.2142860.314737-1bargain
0.4813330.3394737-12.1813330.873750.6820.414615-1affordable
0.5388890.34606060.25250.7956250.20.3952380.63037-1sale
0.1240.12-10.82-10.1150.155556-1expensive
Base groups vs.location modifier sub-groups
serviceservice
travelsubscriptionsoftwarenon-localmedicallocalelectronicsdrugs
0.815220.764460.662974.66921.742041.264551.026180.87855null
0.776610.7102630.1885712.7672161.9615281.7185610.590606-1eastern
0.683050.5626960.2173912.7936311.6848471.6315140.344068-1central
0.6337660.5207690.5752.8507531.3766671.4498530.231905-1mountain
0.8750420.5538640.1793.0997171.4942481.4811580.365-1pacific
0
0.5
1
1.5
2
2.5
3
0 5 10 15 20 25 30 35 40
Pric
e
Number of Bids
Number of Bids vs Top Price
Bid Price Analysis (Ganchev)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 5 10 15 20 25 30 35 40
Pric
e
Number of Bids
Number of Bids vs Mean Price
1st Price2nd Price3rd Price
Bid Price Analysis (Ganchev)
0.1 0.49 15 34 620.49 0.88 119 144 1440.88 1.27 137 130 1161.27 1.66 40 34 281.66 2.05 38 26 212.05 2.44 2 3 52.44 2.83 17 5 12.83 3.22 7 1 03.22 3.61 0 0 03.61 4 2 0 0
2nd Bid 3rd Bid 4th Bid
0.49 0.88 1.27 1.66 2.05 2.44 2.83 3.22 3.61 40
102030405060708090
100110120130140150
Prices in Travel2nd Bid
3rd Bid4th Bid
price
count
Bid Price Analysis (Ganchev)
0
0.2
0.4
0.6
0.8
1
2 4 6 8 10 12 14 16 18 20
Nor
mal
ized
Val
ue
Bid position
Mean Prices after normalization
Mean above medianMean
Mean below median
Bid Price Analysis (Ganchev)
0
0.2
0.4
0.6
0.8
1
2 4 6 8 10 12 14 16 18 20
Nor
mal
ized
Val
ue
Bid position
Median Prices after normalization
Quartile 1Median
Quartile 3
Bid Price Analysis (Ganchev)
0.1 230 284 2900.2 59 47 440.3 31 24 300.4 12 6 50.5 6 7 60.6 3 2 00.7 8 3 20.8 4 2 00.9 2 1 0
1 22 1 0
2nd - 3rd 3rd - 4th 4th - 5th
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
25
50
75
100
125
150
175
200
225
250
275
300
Bid differences
2nd - 3rd
3rd - 4th
4th - 5th
price difference
count
Bid Price Analysis (Ganchev)
0.01 92 107 1530.02 24 20 150.03 10 22 160.04 15 17 190.05 20 18 170.06 11 1 80.07 4 6 90.08 8 6 100.09 6 8 130.1 187 172 117
2nd - 3rd 3rd - 4th 4th - 5th
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
20
40
60
80
100
120
140
160
180
200
Bid Differences
2nd - 3rd
3rd - 4th
4th - 5th
price difference
count
Bid Price Analysis (Ganchev)
190 241 2300.1 35 44 63
43 24 300.2 32 23 19
16 15 160.3 14 13 9
12 8 30.4 9 3 3
4 5 00.5 22 1 4
2nd - 3rd 3rd - 4th 4th - 5th
0.1
0.2
0.3
0.4
0.5
0
25
50
75
100
125
150
175
200
225
250
Normalized price differences
2nd - 3rd
3rd - 4th
4th - 5th
bid difference
count
Bid Price Analysis (Ganchev)
Modeling Bidder Values• The questions:
– What is the distribution of values (of all potentialbidders) for a random keyword?
– What is the distribution of values for a keyword from aspecific category?
– How does a modifier affects the distribution of value?• In the current literature, such distributions are
often assumed to be uniformly distributed oversome interval [a,b]– An oversimplification
• Our experiments set to answer these questions
Modeling bidder values• The problem:
– bidder values are never directly observable• Estimate bidder bi’s value with the
maximum bid ever observed during someperiod of time
• Assumptions:– 1. the Max bid is highly correlated with her
value (and positively).– 2. the bid value of any bidder does not vary
too much over the period of time we observe
Experiment Setup• 1. sample a set of keywords;• 2. observe the bids over, say, a few weeks for each
keyword X;• 3. record the max bid, max(bi,X) for each bidder bi• 4. normalize these data according to some criteria
– e.g. by dividing by the highest max bid for X among all bidders• max(bi,x) max(bi,x)/maxj{ max(bj,x) }
– Or by further take into consideration nX, the avg num of bidders forX
• max(bi,x) [ (nX+1)/nX ] * [ max(bi,x)/maxj{ max(bj,x) } ]– now each (keyword, bidder value) pair maps to a point in [0,1]
• 5. plot all such data points in [0,1] will give us a rough ideaof the "prior" distribution of bidder values for a randomkeyword.
• 6. come up with some statistical model that fits the data– Hopefully also come up with a theory explains it
Automatic Clustering
• Can we choose keyword/modifier groupsautomatically?
• Idea: use data to guide clustering– Modifiers that have similar effects should be
grouped together– Base keywords that are effected similarly should
be grouped together
• Might rediscover original groups, or findinteresting new ones (or garbage)
Algorithmic Ideas
• Suppose l fixed base keyword groups• Compute vector of length l per modifier
– ith dimension is average effect of modifier onkeyword group i
• Can run k-means (or something else…)– Should produce clusters with desired property
• Now suppose k fixed modifer groups– Can do the same thing for base keywords
Algorithmic Ideas
• Idea: alternate k/l-means steps for basekeywords and modifiers– Recompute vectors at each step
Assign modifiers to clusters
Re-center modifier prototypes
Randomly initialize prototypes
Assign keywords to clusters
Re-center keyword prototypes