transactions computer mediated april 7 google hal varian · there is now a computer in the middle...

Post on 30-Sep-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Computer Mediated Transactions

Hal VarianGoogle April 7

Outline -- what does CMT enable?

1. Data extraction and analysis2. Personalization and customization3. Experimentation and continuous improvement4. Contractual innovation

There is now a computer in the middle of most economic transactions. What does this enable?

Data extraction and analysis

Initial claims: good leading indicator for recessions

Grey bars indicate recessions

Google Correlate with initial claims data

Initial claims and [unemployment filing]

Nowcasting initial claims

Predict NSA initial claims (yt), using lagged values of initial claims and contemporaneous queries on [unemployment filing] (xt)

Base: yt = a0 + a1 yt-1 + a52 yt-52 + et

Trends: yt = a0 + a1 yt-1 + a52 yt-52 + b xt + et

Result: R2 goes from 80.8% to 87.6%

How can we make variable selection easier?Big data

Rows or columns?

How to choose best predictors? Simple correlation?Judgment?Stepwise regression?Lasso, LARS, Elastic Net?

Spike-and-slab regressionKalman filter for trend and seasonality

George-McCulloch [1997]) ;Madigan-Raftery [1994] for regression Prior probability variable is included (spike)

Prior probability distribution over coefficient value (slab) Sample from simulated posterior, average to get predictionSee Scott and Varian (2012, 2013) for detailsDownload R package from CRAN (BoomSpikeSlab, bsts)

New Home Sales in US

Raw correlation

Predictors chosen by model

model: yt = trendt + seasonalt + b1 x1t + b2 x2t

plot1: yt = trendt

plot2: yt = trendt + seasonalt

plot3: yt = trendt + seasonalt + b1 x1t

plot4: yt = trendt + seasonalt + b1 x1t + b2x2t

Incremental fit plots

Visualize how much each predictor contributes to model fit

Trend

Seasonal

[appreciation rate]

[irs 1031]

[century 21 realtors]

[real estate purchase]

[80-20 mortgage]

One month ahead forecast

Does 23% better than simple AR1 model

Geo-amplification

You can do the same thing for any geographically distributed variable

Find out queries or query categories that are predictive of that variable

Make predictions/extrapolations to other geographies

Many applications

Social science

Policy

Marketing

Politics

Example: New York Times index of “hard places” (June 26, 2014)

Where are the hardest places to live in the U.S.?

What queries are associated with “hard places”?

Based on state level data and Google Correlate

What queries are associated with “easy places”?

Based on state level data and Google Correlate

Customization and

personalization

Assembled in America

Predictors of survey response

Top and bottom cities' predicted score

Kershaw, SC: 83.2 %

Summersville, WV: 82.8 %

Grundy, VA: 82.8 %

Chesnee, SC: 82.7 %

Duffield, VA: 82.5 %

Norton, VA: 82.3 %

Jonesville, VA: 82.2 %

Walnut Cove, NC: 82.2 %

Weston, WV: 82.2 %

Ennice, NC: 82.1 %

Calipatria, CA: 40.2 %

Fremont, CA: 40.2 %

Mountain View, CA: 40.8 %

San Jose, CA: 41.4 %

Berkeley, CA: 41.4 %

Redmond, WA: 41.5 %

Glendale, CA: 41.5 %

Cupertino, CA: 41.6 %

Palo Alto, CA: 41.7 %

Daggett, CA: 41.9 %

Top Bottom

Assembled in America by DMA

Experimentation and continuous improvement

“To find out what happens when you change something, it is necessary to change it.”

George Box

Causal inference

Experiments: gold standard for causality

What goes wrong with observational data?

yt = xt b + et = observed + unobserved

Correlation: if you observe x what is a good prediction for y?Causality: what happens to y if you change x?

Confounder: something unobserved that affects both x and y

Advertising

Q: How do your know your advertising works?A: Every December I increase my ad spend...

Advertising

Q: How do your know your advertising works?A: Every December I increase my ad spend...and every December my sales go up!

Advertising

Q: How do your know your advertising works?A: Every December I increase my ad spend...and every December my sales go up!

“Christmas holidays” are a confounding variable. Here the solution is obvious, but what happens if you can’t observe the confounders?

Train, test, treat, compare

1. Train a model on historical data2. Test the model on a holdout3. Apply treatment at some time4. Compare observed outcome with the

treatment to the counterfactual prediction of model

Compare outcome to counterfactual

Actual and natural experiments

You want randomized experiments to reduce systematic effects. Sometimes you get randomization “for free”.

Impact of class size on performance● Why are classes larger in some schools than others?● In Israel maximum class size is 40. Classes with 41 are split in two.● Can identify causal effect of class size on performance

Impact of ad impressions on movie revenueSuper Bowl facts● Ads are bought long before teams are chosen● Home cities of participating teams see elevated viewership● Natural randomization

Experimentation capability should be coded in

static code:const threshold = 3.14if (x > threshold) do something

learning code:param threshold = {3.13, 3.14, 3.15)performance = (num_right, num_wrong)if (x > threshold) do something report performance

Research challenge: How to turn legacy code into learning code?

Nice example: Keith Winstein et al, An Experimental Study of the Learnability of Congestion Control

Contractual innovation

What is a contract?

“If you do this, I’ll do that.”

But how do you verify “this” and “that”?

Can only contract on things that can be observed and verified...

What is a contract?

“If you do this, I’ll do that.”

But how do you verify “this” and “that”?

Can only contract on things that can be observed and verified…

But with a computer in the middle of the transaction, lots more can be verified.

Examples of contracts

● “You take me to my hotel on the best route, I will pay you.”

● “You use the car and send me a monthly payment.”

● “You drive this rental car safely, I will give you a discount.”

● “You display an ad that brings someone to my store, I will pay you.”

Summary

1. Data extraction and analysisa. Can use searches to nowcast economic activity

2. Personalization and customizationa. Can customize ads to different geos

3. Experimentation and continuous improvementa. Can use ML to estimate causal impact via train-test-

treat-compare cycle4. Contractual innovation

a. As more things become observable, more contracts become viable

Appendix

Advertise a movie about surfing

Honolulu: $1 ad spend $10 ticket salesFargo: $0.10 ad spend $1 ticket sales

Ticket sales = 10 x ad spend

fits the data perfectly...

Advertise a movie about surfing

Honolulu: $1 ad spend $10 ticket salesFargo: $0.10 ad spend $1 ticket sales

Ticket sales = 10 x ad spend

fits the data perfectly...

But do you really believe that if you increased spend to $1 in Fargo, you would get 10 times the ticket sales?

Ads and confounders

“Interest in surfing” is a confounding variable

Happens all the time in economics since people choose x (observing things you don’t observe.)

Causal effect of college on education?Causal effect of fertilizer on yield?Causal effect of health care on income?

Superbowl as a natural ad experiment

1. Viewership in home cities of teams that are playing is about 10-15% higher than elsewhere.

2. Ads are purchased long before it is known who is playing

Advertiser buys ad slot, then 2-3 months later two “random” cites get 10-15% more ad exposure.

Regression discontinuity

Impact of class size on performance● Why are classes larger in some schools than others?● In Israel maximum class size is 40. Classes with 41 are split in two.● Can identify causal effect of class size on performance

What would happen to auto fatalities if you changed the minimum drinking age?● 20.5 year olds are a lot like 21.5 year olds● So looking at people on each side of the threshold can give estimate of

causal effect

Regression discontinuity

top related