ed snelson. counterfactual analysis
DESCRIPTION
#BigDataBYTRANSCRIPT
![Page 1: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/1.jpg)
Counterfactual analysis: a Big Data case-study using Cosmos/SCOPE
Ed Snelson
![Page 2: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/2.jpg)
Work by
Jonas Peters Joaquin Quiñonero Candela
Denis Xavier Charles D. Max Chickering
Elon Portugaly Dipankar Ray
Patrice Simard Ed Snelson
Léon Bottou
http://jmlr.org/papers/v14/bottou13a.html
![Page 3: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/3.jpg)
I. MOTIVATION
![Page 4: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/4.jpg)
Search ads
![Page 5: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/5.jpg)
The search ads ecosystem
User Advertiser
Queries
Ads &Bids
Ads Prices
Clicks (and consequences)Learning
ADVERTISER FEEDBACK LOOP
LEARNINGFEEDBACK LOOP
USER FEEDBACK
LOOP
Search-engine
![Page 6: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/6.jpg)
Learning to run a marketplace
• The learning machine is not a machine but is an organization with lots of people doing stuff!
How can we help?
• Goal: improve marketplace machinery such that its long term revenue is maximal
• Approximate goal by improving multiple performance measures (KPIs) related to all players• Provide data for decision making• Automatically optimize parts of the
system
![Page 7: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/7.jpg)
Outline from here on
II. Online Experimentation
III. Counterfactual measurements
IV. Cosmos/SCOPE
V. Implementation details
![Page 8: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/8.jpg)
II. ONLINE EXPERIMENTATION
![Page 9: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/9.jpg)
How do parameters affect KPIs?
• We want to determine how certain auction parameters affect KPIs
• Three options:
1. Offline log analysis – “correlational”
2. Auction simulation
3. Online experimentation – “causal”
![Page 10: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/10.jpg)
The problem with correlation analysis(Simpson’s paradox)
Trying to decide whether a drug helps or not
• Historical data:
• Conclusion: don’t give the drug
But what if the Drs. were saving the drug for the severe cases?
• Conclusion reversed: drug helps for both severe and mild cases
All Survived DiedSurvival Rate
Treated 5,000 2,100 2,900 42%
Not Treated 5,000 2,900 2,100 58%
Severe cases (treatment rate 80%)
All Survived DiedSurvival Rate
Treated 4,000 1,200 2,800 30%
Not Treated 1,000 100 900 10%
Mild case (treatment rate 20%)
All Survived DiedSurvival Rate
Treated 1,000 900 100 90%
Not Treated 4,000 2,800 1,300 70%
![Page 11: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/11.jpg)
Overkill?
Pervasive causation paradoxes in ad data!
Example.– Logged data shows a positive correlation between
event A “First mainline ad gets a high quality score” and event B “Second mainline ad receives a click”.
– Do high quality ads encourage clicking below?
– Controlling for event C ”Query categorized as commercial” reverses the correlation for both commercial and non-commercial queries.
![Page 12: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/12.jpg)
Randomized experimentsRandomly select who to treat
• Selection independent of all confounding factors
• Therefore eliminates Simpson’s paradox and allows:
Counterfactual estimates
• If we had given drug to 𝑥% of the patients,the success rate would have been 60% × 𝑥 + 40% × 1 − 𝑥
All population (treatment rate 30%)
All Survived DiedSurvival Rate
Treated 3,000 1,800 1,200 60%
Not Treated 7,000 2,800 4,200 40%
![Page 13: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/13.jpg)
Experiments in the online world
• A/B tests are used throughout the online world to compare different versions of the system
– A random fraction of the traffic (a flight) uses click-prediction system A
– Another random fraction uses click-prediction system B
• Wait for a week, measure KPIs, choose best!
• Our framework takes this one step further…
![Page 14: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/14.jpg)
III. COUNTERFACTUALMEASUREMENTS
![Page 15: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/15.jpg)
Counterfactuals
Measuring something that did not happen
“How would the system have performed if, when the data was collected, we had used 𝑠𝑦𝑠𝑡𝑒𝑚∗ instead of 𝑠𝑦𝑠𝑡𝑒𝑚?”
![Page 16: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/16.jpg)
Replaying past data
Classification example
• Collect labeled data in existing setup
• Replay the past data to evaluate what the performance would have been if we had used classifier θ.
• Requires knowledge of all functions connecting the point of change to the point of measurement.
𝑠*
![Page 17: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/17.jpg)
Concrete example: mainline reserve (MLR)
MainlineSidebar
Ad Score > MLR
![Page 18: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/18.jpg)
Online randomizationQ: Can we estimate the results of a change counterfactually
(without actually performing the change)?
A: Yes, if 𝑠𝑦𝑠𝑡𝑒𝑚∗ and 𝑠𝑦𝑠𝑡𝑒𝑚 are non-deterministic (and close enough)
𝑃(𝑀𝐿𝑅)𝑃∗(𝑀𝐿𝑅)
MLR MLR
𝑀𝐿𝑅 𝑀𝐿𝑅 ∗
Deterministic Randomized
For each auction, a random MLR is used online, drawn from the data-collection distribution 𝑃(𝑀𝐿𝑅)
![Page 19: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/19.jpg)
Estimating counterfactual KPIs
𝐶𝑙𝑖𝑐𝑘𝑠𝑡𝑜𝑡𝑎𝑙∗ ~
𝑖
𝑤𝑖∗𝐶𝑙𝑖𝑐𝑘𝑠(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖)
𝐶𝑙𝑖𝑐𝑘𝑠𝑡𝑜𝑡𝑎𝑙 =
𝑖
𝐶𝑙𝑖𝑐𝑘𝑠(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖)
Usual additive KPI:
Counterfactual KPI:• Weighted sum: auctions with MLRs “closer” to the counterfactual
distribution get higher weight
𝑤𝑖∗ =𝑃∗(𝑀𝐿𝑅𝑖)
𝑃 𝑀𝐿𝑅𝑖
![Page 20: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/20.jpg)
Exploration
𝑃(𝜔) 𝑃∗(𝜔) Quality of the estimation
• Confidence intervals reveal whether the data collection distribution 𝑃 𝜔 performs sufficient exploration to answer the counterfactual question of interest.
𝑃(𝜔) 𝑃∗(𝜔)
![Page 21: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/21.jpg)
Clicks vs MLRInner
“exploration” intervalOuter “sample-
size” interval
Control with no randomization
Control with 18% lower MLR
![Page 22: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/22.jpg)
Number of Mainline Ads vs MLR
This is easy to estimate
![Page 23: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/23.jpg)
Revenue vs MLR
Revenue has always high
sample variance
![Page 24: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/24.jpg)
More with the same data
How is this related to A/B testing?
• A/B testing tests 2 specific settings against each other
• Need to know what questions you want to ask beforehand!
Big advantage of more general randomization:
• Collect data first, choose question(s) later
• Randomizing more stuff increases opportunities
But…
• Requires more sophisticated offline log processing
![Page 25: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/25.jpg)
IV. COSMOS/SCOPE
![Page 26: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/26.jpg)
Ad Auction Logs
• ≈ 10TB per day ad-auction logs
• Cooked and joined from various raw logs
• Stored in Cosmos, queried via SCOPE
• Small fraction of total Bing logs and jobs:
– Tens of thousands SCOPE jobs daily
– Tens of PBs read/write daily
![Page 27: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/27.jpg)
Cosmos/SCOPE
≈ PIG/HIVE
≈ HDFS
http://research.microsoft.com/en-us/um/people/jrzhou/pub/Scope.pdf
http://research.microsoft.com/en-us/um/people/jrzhou/pub/scope-vldbj.pdf
![Page 28: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/28.jpg)
Cosmos
• Microsoft’s internal distributed data store
• Tens of thousands of commodity servers≈ HDFS, GFS
• Append-only file system, optimized for sequential I/O
• Data replication and compression
![Page 29: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/29.jpg)
Data Representation
1. Unstructured streams– Custom Extractors: converts a sequence of bytes into
a RowSet, specifying a schema for the columns
2. Structured streams– Data stored alongside metadata information: a well-
defined schema, and structural properties (e.g. partitioning and sorting information)
– Can be horizontally partitioned into tens of thousands of partitions e.g. hash or range partitioning
– Indexes for random access and index-based joins
![Page 30: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/30.jpg)
SCOPE scripting language
• SQL-like (in syntax) declarative language specifying data transformation pipeline
• Each scope statement takes as input one or more RowSets, and outputs another RowSet
• Highly extensible with C# expressions, custom operators and data types
• Scope compiler and optimizer responsible for generating a data flow DAG for an efficient parallel execution
![Page 31: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/31.jpg)
C# Expressions and functions
R1 = SELECT A+C AS ac, B.Trim() AS B1
FROM R
WHERE StringOccurs(C, “xyz”) > 2;
#CS
public static int StringOccurs(string str, string ptrn)
{
int cnt=0;
int pos=-1;
while (pos+1 < str.Length)
{
pos = str.IndexOf(ptrn, pos+1);
if (pos < 0) break;
cnt++;
}
return cnt;
}
#ENDCS
C# String method
C# String expression
![Page 32: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/32.jpg)
C# User-defined types (UDTs)
– Arbitrary C# classes can be used as column types in scripts
– Extremely convenient for easy serialization/deserialization
– Can be referenced in external dlls, C# backing files, and in-script (#CS … #ENDCS)
SELECT UserId, SessionId,
new RequestInfo(binaryData)
AS Request
FROM InputStream
WHERE Request.Browser.IsIE();
![Page 33: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/33.jpg)
C# User-defined operators
– User defined aggregates
• Aggregate Interface: Intialize, Accumulate, Finalize
• Can be declared recursive: allows partial aggregation
– MapReduce-like extensions
• PROCESS
• REDUCE– Can be declared recursive
• COMBINE
![Page 34: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/34.jpg)
SCOPE compilation and executionSELECT query, COUNT() AS count FROM "search.log“
USING LogExtractor
GROUP BY query
HAVING count > 1000
ORDER BY count DESC;
OUTPUT TO "qcount.result";
Runtime cost-based optimizer
![Page 35: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/35.jpg)
SCOPE: Pros/Cons (an opinion)
• Pros:– Very quick to write simple queries without thinking
about parallelization and execution
– Highly extensible with deep C# integration
– UDT columns and C# functions
– Easy development and debugging from VS• Intellisense
• Cons:– No loop/iteration support means a poor fit for many
ML algorithms
– Batch, rather than interactive
![Page 36: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/36.jpg)
V. IMPLEMENTATION
![Page 37: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/37.jpg)
Counterfactual computation
• Ideal for Map-Reduce setting
• Map: 𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖 → 𝐾𝑃𝐼(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖)
• Reduce: 𝑖𝑤𝑖∗…
𝐾𝑃𝐼𝑡𝑜𝑡𝑎𝑙∗ =
𝑖
𝑤𝑖∗ 𝐾𝑃𝐼(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖)
![Page 38: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/38.jpg)
Counterfactual grid
![Page 39: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/39.jpg)
SCOPE pseudo-code for counterfactualsAuctionLogs = VIEW CosmosLogPath;
SELECT Auction
FROM AuctionLogs;
SELECT ComputeKPIs(Auction) AS KPIs,
ComputeWeightGrid(Auction) AS WeightGrid;
SELECT ComputeWeightedKPIs(KPIs, GridPoint) AS wKPIs,
CROSS APPLY WeightGrid AS GridPoint;
SELECT AggregateKPIs(wKPIs) AS TotalKPIs
GROUP BY GridPoint;
SELECT GridPoint, TotalKPIs.Finalize() AS FinalKPIs
OUTPUT TO “Results.tsv”;
C# UDT: Wraps all logged info about a single auction
C# UDFs
Call instance method on “TotalKPIs” UDT
Recursive Aggregator: 𝑤𝑖, 𝑤𝑖𝐾𝑃𝐼𝑖
etc.
Unroll the weight grid
![Page 40: Ed Snelson. Counterfactual Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062319/554f433eb4c905423f8b46c9/html5/thumbnails/40.jpg)
Conclusions
• There are systems in the real world that are too complex to easily formalize
• Causal inference clarifies many problems
– Ignoring causality => Simpson’s paradox
– Randomness allows inferring causality
• The counterfactual framework is modular
– Randomize in advance, ask later
• Counterfactual analysis ideally suited to batch map-reduce