![Page 1: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/1.jpg)
Query-Based Data Pricing
Dan Suciu – U. of Washington
Joint with M. Balazinska, B. Howe, P. Koutris, Daniel Li, Chao Li, G. Miklau, P. Upadhyaya
1 EPFL, 2013
![Page 2: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/2.jpg)
Data Has Value
And it is increasingly being sold/bought on the Web • Big data vendors • Data Markets • Private data
EPFL, 2013 2
Pricing digital goods is challenging [Shapiro&Varian]
![Page 3: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/3.jpg)
Pricing Data
Pricing data lies at the intersection of several areas:
• Data management
• Mechanism design
• Economics
EPFL, 2013 3
This talk
![Page 4: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/4.jpg)
1. Big Data Vendors
High value data • Gartner report: $5k, even if you need only one
chart • Navteq Maps • Factual • A few others [Muschalle]:
– Thomson Reuters, Mendeley Ltd., DataMarket Inc, Vico Research & Consulting GmbH, TEMIS S.A., Neofonie GmbH, Inovex GmbH
4 EPFL, 2013 Expensive datasets, available only to major customers
![Page 5: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/5.jpg)
2. Data Markets
• Azure DataMarkets – 100+ data sources • Infochimps – 15,000 data sets • Xignite – financial data • Aggdata • Gnip – social media data • PatientsLikeMe
5 EPFL, 2013
These datasets are available to the little guy. The markets themselves are struggling, because they are just facilitators; no innovation
![Page 6: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/6.jpg)
3. Private Data • Private data has value
– A unique user: $4 at FB, $24 at Google [JPMorgan] • Today’s common practice:
– Companies profit from private data without compensating users
• New trend: allow users to profit financially – Industry: personal data locker
https://www.personal.com/ , http://lockerproject.org/ – Academia: mechanisms for selling private data
[Ghosh11,Gkatzelis12,Aperjis11,Roth12,Riederer12]
DIMACS - 10/2012 6
![Page 7: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/7.jpg)
Sample Data Markets
EPFL, 2013 7
![Page 8: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/8.jpg)
Different price by business type
8
![Page 9: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/9.jpg)
EPFL, 2013 9
$699 for 885976 teacher
names & emails!
![Page 10: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/10.jpg)
EPFL, 2013 10
Cheaper just for Washington
![Page 11: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/11.jpg)
A Criticism of Today’s Pricing Schemes
• Small buyers want to purchase only a tiny amount of data: if they can’t, they give up
• Large buyers have specific needs: price is often negotiated in a room-full-of-lawyers
• Sellers can’t easily anticipate all possible queries that buyers might ask
11 Needed: more flexible pricing scheme, parameterized by queries
![Page 12: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/12.jpg)
Outline
• Framework and examples • Results so far
• Conclusions
12 EPFL, 2013
![Page 13: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/13.jpg)
Query-based Pricing • Seller defines price-points:
(V1,p1), (V2, p2), … Meaning: price(Vi)=pi .
• Buyer may buy any query Q
• System will determine priceD(Q) based on: – The price points – The current database instance D – The query Q
13 EPFL, 2013 How should a “good“ price function be?
![Page 14: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/14.jpg)
Arbitrage Freeness
14
Arbitrage-free Axiom: For all queries Q1, …, Qk , Q, if Q1, …, Qk determine Q, then: priceD(Q) ≤ priceD(Q1) + … + priceD(Qk)
“Q1,…, Qk determine Q” means that Q(D) can be answered from Q1(D), …, Qk(D), without accessing the database instance D
![Page 15: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/15.jpg)
Example 1: Pricing Relational Data
15
S(Shape,Color,Picture) Shape Color Picture
Swan White
Swan Yellow . . . . .
Dragon Yellow
Car Yellow . . . . .
Fish White . . . . .
Picture credits: http://www.toysperiod.com/blog/uncategorized/the-modern-art-and-science-of-origami/
Price list Price
V1 = σShape=‘Swan’(S) $2
V2 = σShape=‘Dragon’ (S) $2
V3 = σShape= ‘Car’ (S) $2
V4 = σShape= ‘Fish’ (S) $2
W1 = σColor=‘White’(S) $3
W2 = σColor=‘Yellow’(S) $3
W3 = σColor=‘Red’(S) $3 Price(σColor)=$3 Price(σShape)=$2
![Page 16: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/16.jpg)
Example 1: Pricing Relational Data
16
Shape Color Picture
Swan White
Swan Yellow . . . . .
Dragon Yellow
Car Yellow . . . . .
Fish White . . . . .
Picture credits: http://www.toysperiod.com/blog/uncategorized/the-modern-art-and-science-of-origami/
S(Shape,Color,Picture)
Price(σColor)=$3 Price(σShape)=$2
Get all Dragons
for $2
Get all Red Origami
for $3
Price list Price
V1 = σShape=‘Swan’(S) $2
V2 = σShape=‘Dragon’ (S) $2
V3 = σShape= ‘Car’ (S) $2
V4 = σShape= ‘Fish’ (S) $2
W1 = σColor=‘White’(S) $3
W2 = σColor=‘Yellow’(S) $3
W3 = σColor=‘Red’(S) $3
![Page 17: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/17.jpg)
Example 1: Pricing Relational Data
17
Shape Color Picture
Swan White
Swan Yellow . . . . .
Dragon Yellow
Car Yellow . . . . .
Fish White . . . . .
Picture credits: http://www.toysperiod.com/blog/uncategorized/the-modern-art-and-science-of-origami/
Find the price of the entire db
S(Shape,Color,Picture)
$1? $4? $8?
$20?
Price(σColor)=$3 Price(σShape)=$2
Get all Dragons
for $2
Get all Red Origami
for $3
Price list Price
V1 = σShape=‘Swan’(S) $2
V2 = σShape=‘Dragon’ (S) $2
V3 = σShape= ‘Car’ (S) $2
V4 = σShape= ‘Fish’ (S) $2
W1 = σColor=‘White’(S) $3
W2 = σColor=‘Yellow’(S) $3
W3 = σColor=‘Red’(S) $3
![Page 18: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/18.jpg)
Example 1: Pricing Relational Data
Shape Color Picture
Swan White
Swan Yellow . . . . .
Dragon Yellow
Car Yellow . . . . .
Fish White . . . . .
Picture credits: http://www.toysperiod.com/blog/uncategorized/the-modern-art-and-science-of-origami/
Find the price of the entire db
V1, V2, V3, V4 determine Q, price(Q) ≤ $8 W1, W2, W3 determine Q, price(Q) ≤ $9
S(Shape,Color,Picture)
$1? $4? $8
$20? To ensure aribitrage-freeness, we can charge only $8 for the entire database.
Price(σColor)=$3 Price(σShape)=$2
Get all Dragons
for $2
Get all Red Origami
for $3
Price list Price
V1 = σShape=‘Swan’(S) $2
V2 = σShape=‘Dragon’ (S) $2
V3 = σShape= ‘Car’ (S) $2
V4 = σShape= ‘Fish’ (S) $2
W1 = σColor=‘White’(S) $3
W2 = σColor=‘Yellow’(S) $3
W3 = σColor=‘Red’(S) $3
![Page 19: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/19.jpg)
Example 1: Pricing Relational Data
19
R Shape Instructions Swan Fold,fold,fold… Dragon Cut,cut,cut,…
Shape Color Picture
Swan White
Swan Yellow . . . . .
Dragon Yellow
Car Yellow . . . . .
Fish White . . . . .
Color PaperSpecs White 15g/100 Black 20g/100
Pictures credits: http://www.toysperiod.com/blog/uncategorized/the-modern-art-and-science-of-origami/
S T
Find the price of the full join: Q = R ⋈ S ⋈ T
Price(σColor)=$3 Price(σShape)=$2 Price(σShape)=$99 Price(σColor)=$55
![Page 20: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/20.jpg)
Example 1: Pricing Relational Data
20
R Shape Instructions Swan Fold,fold,fold… Dragon Cut,cut,cut,…
Shape Color Picture
Swan White
Swan Yellow . . . . .
Dragon Yellow
Car Yellow . . . . .
Fish White . . . . .
Color PaperSpecs White 15g/100 Black 20g/100
Pictures credits: http://www.toysperiod.com/blog/uncategorized/the-modern-art-and-science-of-origami/
S T
Find the price of the full join: Q = R ⋈ S ⋈ T
Price(σColor)=$3 Price(σShape)=$2 Price(σShape)=$99 Price(σColor)=$55
Shape Instructions Color Picture PaperSpecs
Swan Fold,fold,fold… White 15g/100
![Page 21: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/21.jpg)
Example 1: Pricing Relational Data
21
R Shape Instructions Swan Fold,fold,fold… Dragon Cut,cut,cut,…
Shape Color Picture
Swan White
Swan Yellow . . . . .
Dragon Yellow
Car Yellow . . . . .
Fish White . . . . .
Color PaperSpecs White 15g/100 Black 20g/100
Pictures credits: http://www.toysperiod.com/blog/uncategorized/the-modern-art-and-science-of-origami/
S T
Find the price of the full join: Q = R ⋈ S ⋈ T
Not obvious! E.g. no Yellow Cars in the join. What to pay for? σShape=‘car’(R) or σColor=‘yellow’(T)
Price(σColor)=$3 Price(σShape)=$2 Price(σShape)=$99 Price(σColor)=$55
Shape Instructions Color Picture PaperSpecs
Swan Fold,fold,fold… White 15g/100
![Page 22: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/22.jpg)
Discussion
Why not charge per row in the answer? • Q1(x,y) = Fortune500(x,y)
Q(x,y) = Fortune500(x,y),StrongBuyRec(x) • Q ⊆Q1, yet Price(Q) >> Price(Q1) • “Containment” is unrelated to pricing • “Determinacy” is the right concept for
studying pricing
22 EPFL, 2013
![Page 23: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/23.jpg)
Example 2: Pricing Private Data
• Buyer: query c = x1+x2+…+x1000 • User compensation: $10 • Price for the buyer: $10,000
DIMACS - 10/2012 23
1. Raw data is too expensive!
UID User Rating (0..5) 1 Alice 3 $10 2 Bob 0 $10 3 Carol 1 $10 4 Dan 0 $10 … … … 1000 Zoran 2 $10
![Page 24: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/24.jpg)
Example 2: Pricing Private Data Differential privacy • Perturbation is necessary for privacy [Dwork’2011]
Selling private data • Perturbation is a cost saving feature • Two extremes:
– Raw data = no perturbation = high price – Differentially private = high perturbation = low price
![Page 25: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/25.jpg)
Example 2: Pricing Private Data
• Buyer: c = x1+x2+…+x1000 – Tolerates error ±300 – Equivalently: variance v = 5000*
• Answer: ĉ = c + Lap(√(v/2)) • User compensation: $10 $0.001 (query is 0.1-DP**) • Price for the buyer: $10,000 $1
*Probability(|ĉ – c| ≥ 3 √2 σ) < 1/18=0.056 (Chebyshev), where σ=√v =50√2 ** ε = √2 sensitivity(q)/σ = 5√2 / 50√2 = 0.1
2. Perturbation lowers the price
UID User Rating (0..5) 1 Alice 3 $10 2 Bob 0 $10 3 Carol 1 $10 4 Dan 0 $10 … … … 1000 Zoran 2 $10
![Page 26: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/26.jpg)
Example 2: Pricing Private Data
• Another buyer: c = x1+x2+…+x1000 – Zero error, error ±300 error ±30 – Variance = 0, variance = 5000 variance = 50
• User compensation: $10/item,$0.001/item $0.1/item? $1/item? • Price for the buyer: $10000, $1 $100? $1000?
– If price > $100 à arbitrage! Buy100 × queries with variance 5000, take average. Cost = 100 × $1.
3. Multiple queries: must be arbitrage-free.
UID User Rating (0..5) 1 Alice 3 $10 2 Bob 0 $10 3 Carol 1 $10 4 Dan 0 $10 … … … 1000 Zoran 2 $10
![Page 27: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/27.jpg)
Outline
• Framework and examples • Results so far
• Conclusions
27 EPFL, 2013
![Page 28: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/28.jpg)
Price of Relational Queries
28
Given: Price points (V1,p1), …, (Vk, pk) Database D Arbitrary query Q. Compute: PriceD(Q)
Arbitrage-freeness: For all queries, if Q1, …, Qk determine Q then priceD(Q) ≤ priceD(Q1) + … + priceD(Qk)
Must ensure this:
![Page 29: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/29.jpg)
Price of Relational Queries • Simple algorithm for computing priceD(Q) given an
oracle for checking deteminacy • Two options for determinacy
– Instance-independent: used by RDBMS today in query-answering using views; undecidable!
– Instance-dependent: seems more natural for pricing; Πp
2 in the database
• If (a) price-points (Vi,pi) are selection queries, and (b) Q is a Union of Conjunctive Queries then priceD(Q) is NP-complete in the database
• Reduction to ILP makes pricing (almost) practical
EPFL, 2013 29
![Page 30: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/30.jpg)
Price of Relational Queries
EPFL, 2013 30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
0
50
100
150
200
250
300
ILP construction time (100) Total time (100)
ILP construction time (1000) Total time (1000)
query
tim
e in
se
c
![Page 31: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/31.jpg)
Compensation for Private Data UID User Rating (0..5) 1 Alice 3 $10 2 Bob 0 $10 3 Carol 1 $10 4 Dan 0 $10 … … … 1000 Zoran 2 $10
How much should we pay Carol?
Query c = x1+x2+…+x1000 Variance v = 50
![Page 32: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/32.jpg)
Compensation for Private Data UID User Rating (0..5) 1 Alice 3 $10 2 Bob 0 $10 3 Carol 1 $10 4 Dan 0 $10 … … … 1000 Zoran 2 $10
How much should we pay Carol?
Query c = x1+x2+…+x1000 Variance v = 50
Differential Privacy
Def. [Dwork’11] Fix ε. Mechanism ĉ is called ε-differential private, if for all D, D’ that differ in one item, and any set S P[ĉ(D) ∈S] ≤ exp(ε) × P[ĉ(D’) ∈S]
![Page 33: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/33.jpg)
Compensation for Private Data UID User Rating (0..5) 1 Alice 3 $10 2 Bob 0 $10 3 Carol 1 $10 4 Dan 0 $10 … … … 1000 Zoran 2 $10
How much should we pay Carol?
Def. [Dwork’11] Fix ε. Mechanism ĉ is called ε-differential private, if for all D, D’ that differ in one item, and any set S P[ĉ(D) ∈S] ≤ exp(ε) × P[ĉ(D’) ∈S]
Thm. The mechanism ĉ(D) = c(D) + Lap(Δc/ε) is ε-differential private
Query c = x1+x2+…+x1000 Variance v = 50
Variance v=2(Δc/ε)2
Carol gets no money!
Differential Privacy
![Page 34: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/34.jpg)
Compensation for Private Data UID User Rating (0..5) 1 Alice 3 $10 2 Bob 0 $10 3 Carol 1 $10 4 Dan 0 $10 … … … 1000 Zoran 2 $10
How much should we pay Carol?
Thm. The mechanism ĉ(D) = c(D) + Lap(Δc/ε) is ε-differential private
Def. Carol’s privacy loss is ε(v) = supS log(P[ĉ(D) ∈S]/P[ĉ(D’) ∈S])
Fix variance v
Query c = x1+x2+…+x1000 Variance v = 50
Variance v=2(Δc/ε)2
W(ε) = Carol’s valuation function Carol gets no money!
Differential Privacy Data Pricing Carol’s compensation W
depends on ε which depends on v
Def. [Dwork’11] Fix ε. Mechanism ĉ is called ε-differential private, if for all D, D’ that differ in one item, and any set S P[ĉ(D) ∈S] ≤ exp(ε) × P[ĉ(D’) ∈S]
![Page 35: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/35.jpg)
Compensation for Private Data
• Option A: risk neutral • Option B: risk averse • Option C: opt-out
0 5 10 15 20
02
46
8
ε
$10
$5
W(ε) – Option A
W(ε) – Option B
Incentivizing Carol to reveal her valuation W(ε) is difficult! [Ghosh’11,Gkatzelis’12,Riederer’12] We use an idea from [Aperjis&Huberman’11]:
![Page 36: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/36.jpg)
Compensation for Private Data
• Option A: risk neutral • Option B: risk averse • Option C: opt-out
0 5 10 15 20
02
46
8
ε
$10
Risk-averse users count on
the fact that most queries will have low privacy leak
$5
W(ε) – Option A
W(ε) – Option B
Incentivizing Carol to reveal her valuation W(ε) is difficult! [Ghosh’11,Gkatzelis’12,Riederer’12] We use an idea from [Aperjis&Huberman’11]:
![Page 37: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/37.jpg)
Compensation for Private Data
• Option A: risk neutral • Option B: risk averse • Option C: opt-out
0 5 10 15 20
02
46
8
ε
$10
Risk-averse users count on
the fact that most queries will have low privacy leak
$5
W(ε) – Option A
W(ε) – Option B
Incentivizing Carol to reveal her valuation W(ε) is difficult! [Ghosh’11,Gkatzelis’12,Riederer’12] We use an idea from [Aperjis&Huberman’11]:
Risk-neutral users want full
compensation at the risk of never
being paid
![Page 38: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/38.jpg)
Outline
• Framework and examples • Results so far
• Conclusions
38 EPFL, 2013
![Page 39: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/39.jpg)
The Third Wave of Computing • First wave = hardware
– IBM, DEC, Sun, … – 1950 – 1980
• Second wave = software – Microsoft, Borland, Fox Software, Oracle, … – 1980 -- 2010
• Third wave = data! – Google maps v.s. IOS maps – Facebook’s users
39 EPFL, 2013
![Page 40: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/40.jpg)
Conclusions
• Data has (lots of) value! • Pricing data: at the intersection of three
areas: – Data management – Mechanism design – Economics
• Key concepts: – Arbitrage-free – Compensation = function of privacy loss
EPFL, 2013 40
This talk
![Page 41: Query-Based Data Pricing - Rutgers University](https://reader031.vdocuments.us/reader031/viewer/2022012916/61c688e71f18603f184af45c/html5/thumbnails/41.jpg)
References
• Koutris et al., PODS, 2012
• Li et al., ICDT, 2013
• Koutris et al, under review
EPFL, 2013 41