1 competitive privacy: secure analysis on integrated sequence data raymond chi-wing wong 1, eric lo...
TRANSCRIPT
![Page 1: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/1.jpg)
1
Competitive Privacy: Secure Analysis on Integrated Sequence Data
Raymond Chi-Wing Wong1, Eric Lo2
The Hong Kong University of Science and Technology1
Hong Kong Polytechnic University2
Prepared by Raymond Chi-Wing WongPresented by Raymond Chi-Wing Wong
![Page 2: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/2.jpg)
2
Outline
1. Introduction2. Problem3. Algorithm4. Conclusion
![Page 3: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/3.jpg)
3
1. Introduction In this talk,
“competitive privacy” occurs when two datasets from two different
sources are integrated
Illustrate this concept with a transportation application
Give the motivation why two datasets should be integrated
Explain that there is a privacy issue in this application
![Page 4: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/4.jpg)
4
1. Introduction
Transportation Application
Bus Company B Metro Company M
Passenger travel history in the bus company
Passenger travel history in the metro company
Both companies has implemented RFID-based electronic Transportation payment systems (e.g., Washington DC’s SmarTrip systemand Hong Kong Octopus System).
![Page 5: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/5.jpg)
5Bus Company B Metro Company M
RFID No. = 222 “Airport Bus Stop”, “Downtown Bus Stop”
RFID No. = 222“Downtown Station”, “Uptown Station”
These two sequences are stored separately.
Suppose that the bus company and the metro company wantto collaborate and offer discounts to passengers who traveled from airport to uptown using a combination of bus and metro.
We need to integrate these two datasets to know the total numberof such passengers
9:00am 10:00am
10:15am 11:00am
![Page 6: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/6.jpg)
6Bus Company B Metro Company M
RFID No. = 222 “Airport Bus Stop”, “Downtown Bus Stop”
RFID No. = 222“Downtown Station”, “Uptown Station”
RFID No. = 222 “Airport Bus Stop”, “Downtown Bus Stop”, “Downtown Station”, “Uptown Station”
9:00am 10:00am
10:15am 11:00am
9:00am 10:00am 10:15am 11:00am
![Page 7: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/7.jpg)
7Bus Company B Metro Company M
RFID No. = 222“Airport Bus Stop”, “Downtown Bus Stop”
RFID No. = 222“Downtown Station”, “Uptown Station”
RFID No. = 222 “Airport Bus Stop”, “Downtown Bus Stop”, “Downtown Station”, “Uptown Station”
![Page 8: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/8.jpg)
8
1. Introduction In this talk,
“competitive privacy” occurs when two datasets from two different
sources are merged
Illustrate this concept with a transportation application
Give the motivation why two datasets should be integrated
Explain that there is a privacy issue in this application
![Page 9: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/9.jpg)
9
1. Introduction In this talk,
“competitive privacy” occurs when two datasets from two different
sources are merged
Illustrate this concept with a transportation application
Give the motivation why two datasets should be integrated
Explain that there is a privacy issue in this application
![Page 10: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/10.jpg)
10
RFID No. = 222 “Airport Bus Stop”, “Downtown Bus Stop”, “Downtown Station”, “Uptown Station”
Data integration may cause privacy issues.
Bus Company B Metro Company M
Service sB “Downtown Bus Stop”, “Bay Bus Stop”
Service sM “Downtown Station”, “Bay Station” These two services are
competitive.
No of Passengers = 80,000No of Passengers = 10,000
If the metro company knows that the no. of passengers using sB is 80,000, then it may offer discounts to passengers using its own service sM to attract more passengersThus, the original service sB operated by the bus company will be definitelyaffected.
This statistical information about the competitive services correspondsto the “competitive privacy” of thebus company
![Page 11: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/11.jpg)
11
2. Problem Given
two companies the bus company the metro company
Objective After the datasets from these two
companies are integrated, no company can infer any statistical
information about the competitive services of the other company
![Page 12: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/12.jpg)
12
2. Problem
Contribution We are the first to propose the
concept of “competitive privacy” Privacy model when sequence
datasets are integrated Previous works
Privacy model when relational datasets are integrated
![Page 13: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/13.jpg)
13
3. Algorithm
![Page 14: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/14.jpg)
14
Trusted Third Party
Bus Company B Metro Company M
Integrated database
query 1
Determine whether this query allows that the metro company can infer any statistical information about the competitive services of the bus company.
If yes, we reject the query.If no, we give the answer of this query.
answer 1
![Page 15: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/15.jpg)
15
3. Algorithm
Idea: We reject any queries related to the
statistical information about all competitive services
We skip the details
![Page 16: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/16.jpg)
16
4. Conclusion
Privacy Model for Data Integration Competitive Privacy
Algorithm
![Page 17: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/17.jpg)
17
Q&A
![Page 18: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/18.jpg)
18
4. Empirical Studies
Real dataset Hong Kong Local Transportation
Metro Data 63 stations 6 transfer stations 4 railway lanes
![Page 19: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/19.jpg)
19
4. Empirical Studies
Variation No. of tuples in the integrated dataset The pattern size in a query
Measurements Audit time (the time to determine
whether this query should be answered or rejected)
Ratio of rejected queries (or restricted queries)
![Page 20: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/20.jpg)
20
4. Empirical Studies
The audit time is small.The ratio of restricted queries is small.
![Page 21: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/21.jpg)
21
![Page 22: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/22.jpg)
22
Trusted Third Party
Bus Company B Metro Company M
Integrated database
query 1
e.g., the total number of passengers who have a travel pattern “Airport Bus Stop”, “Downtown Bus Stop”,“Downtown Station”, “Uptown Station”.
Determine whether this query allows that the bus company can infer any statistical information about the competitive services of the metro company.
If yes, we reject the query.If no, we give the answer of this query.
answer 1
20,000
Pattern Size = 4
![Page 23: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/23.jpg)
23
Trusted Third Party
Bus Company B Metro Company M
Integrated database
query 2
Determine whether this query allows that the bus company can infer any statistical information about the competitive services of the metro company.
If yes, we reject the query.If no, we give the answer of this query.
answer 2
![Page 24: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/24.jpg)
24
Trusted Third Party
Bus Company B Metro Company M
Integrated database
query 3
Determine whether this query allows that the bus company can infer any statistical information about the competitive services of the metro company.
If yes, we reject the query.If no, we give the answer of this query.
answer 3
![Page 25: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/25.jpg)
25
Each query alone may not provide any statistical information of the competitive services
However, the combination of all query answers may allow that the metro company can infer the statistical information of competitive services
![Page 26: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/26.jpg)
26
Trusted Third Party
Bus Company B Metro Company M
Integrated database
Query: the total number of passengers who have a travel pattern “Downtown District”, “Bay District” 90,000
Knowledge 2: there are two services from “Downtown District” to “Bay District”1. The service provided by the bus company (“Downtown Bus Stop” to “Bay Bus Stop”)2. The service provided by the metro company (“Downtown Station” to “Bay Station”)
Knowledge 3: the total number of passengers who have a travel pattern “Downtown Station” to “Bay Station” = 10,000
Conclusion: the total number of passengers who have a travel pattern “Downtown Bus Stop” to “Bay Bus Stop” = 90,000 – 10,000 = 80,000
Knowledge 1
The statistical information of the competitive services of the bus company.
![Page 27: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/27.jpg)
27Bus Company B Metro Company M
RFID No. = 222 “Airport Bus Stop”, “Downtown Bus Stop”, “Downtown Station”, “Uptown Station”
Both companies want to know the total number of passengers traveling from “Airport Bus Stop” to “Uptown Station”
Both companies want to know the total number of passengers traveling from “Airport District” to “Uptown District”
Roll-up
![Page 28: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/28.jpg)
28
Trusted Third Party
Bus Company B Metro Company M
Integrated database
query 1
Determine whether this query allows that the metro company can infer any statistical information about the competitive services of the bus company.
If yes, we reject the query.If no, we give the answer of this query.
answer 1
![Page 29: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/29.jpg)
29
Trusted Third Party
Bus Company B Metro Company M
Integrated database
query 2
Determine whether this query allows that the metro company can infer any statistical information about the competitive services of the bus company.
If yes, we reject the query.If no, we give the answer of this query.
answer 2
![Page 30: 1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177ba15503460e6e8b52e7/html5/thumbnails/30.jpg)
30
Trusted Third Party
Bus Company B Metro Company M
Integrated database
query 3
answer 3