los angeles protection in personalized web search · ranking algorithm - okapi bm25 server...
TRANSCRIPT
![Page 1: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/1.jpg)
Intent-aware Query Obfuscation for Privacy Protection in Personalized Web SearchWasi Uddin Ahmad
University of California, Los Angeles
Kai-Wei ChangUniversity of California,
Los Angeles
Hongning WangUniversity of Virginia
![Page 2: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/2.jpg)
CS@UVA Privacy Preserving Personalization
Motivation
● Personalization is everywhere
2
![Page 3: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/3.jpg)
CS@UVA Privacy Preserving Personalization
Previous Solutions
● Identifiability aspect of privacy○ Secured communication, encrypted data storage
● Linkability aspect of privacy○ Plausible deniable search
■ Submit proxy query instead of the true query○ Obfuscation-based private web search
■ Submit cover-up queries along with the true query
3
![Page 4: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/4.jpg)
CS@UVA Privacy Preserving Personalization
Motivation
Do users submit isolated queries during web search?
4
![Page 5: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/5.jpg)
CS@UVA Privacy Preserving Personalization
Assumption
● Topics of search queries are sensitive○ Indicate a user’s (private) search intent
● All search query topics are sensitive○ Leads to stronger privacy protection
5
![Page 6: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/6.jpg)
CS@UVA Privacy Preserving Personalization
Definitions
● User profile - a hierarchically organized tree where,○ Each node represents a topic (a.k.a intent)○ Each topic contains N-gram language models (LM)
■ LMs are approximated based on submitted queries and clicked documents
● Search task - A sequence of queries submitted in the same search session○ Assumption: associated topics must form a sub-tree in
the original topic tree
6
![Page 7: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/7.jpg)
CS@UVA Privacy Preserving Personalization
Main Idea
Intent-aware Query-obfuscation for Private-protection (IQP)● Obfuscate search tasks to achieve task-level privacy ● Map a search task to a subtree of the intent tree
○ Intent tree: a predefined tree of topics● Maintain the difference in prior and posterior belief of a
search engine for true and cover search tasks
7
![Page 8: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/8.jpg)
CS@UVA Privacy Preserving Personalization
IQP Framework
8
![Page 9: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/9.jpg)
CS@UVA Privacy Preserving Personalization
Step 1: Query Intent Inference
● Query intent (a.k.a topic) is approximated using hierarchical language model○ Hierarchical Dirichlet prior smoothing is performed
● Search intent is predicted by the maximum a posterior inference
● The prior of a topic is proportional to the #nodes in the subtree rooted at the topic node
9
![Page 10: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/10.jpg)
CS@UVA Privacy Preserving Personalization
Step 2: Intent-aware Cover Query and Click Generation1. Select cover query topics
a. Specificity of the true query intentb. Transition between previous and current query intent
2. Generate cover querya. Rejection sampling is utilizedb. Conditioned on entropy difference between true and
cover queries3. Trained positional click model is employed to generate
cover clicks
10
![Page 11: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/11.jpg)
CS@UVA Privacy Preserving Personalization
Step 3: Client-side Personalization
● Client-side reranking using an uncontaminated user profile● Borda’s method for rank aggregation● Personalization score is computed based on client-side
user profile○ An estimated language model is utilized
11
![Page 12: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/12.jpg)
CS@UVA Privacy Preserving Personalization
Example
● Session is sampled from AOL search log, 2006
12
![Page 13: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/13.jpg)
CS@UVA Privacy Preserving Personalization
Measuring Task-level Privacy
● Prior works focused on query-level privacy evaluation metrics○ KL-divergence, normalized mutual information etc.
● Proposed two new metrics to evaluate task-level privacy protection○ Transition index (tIndex)○ Confusion index (cIndex)
13
![Page 14: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/14.jpg)
CS@UVA Privacy Preserving Personalization
Confusion Index (cIndex)
● Measures search engine’s belief of a user’s search task○ Search tasks are represented as a sub-tree
● Follows the entropy l-diversity principle○ Quantifies the difference in prior and posterior distributions of the
subtrees associated with true and cover tasks
14
![Page 15: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/15.jpg)
CS@UVA Privacy Preserving Personalization
Transition Index (tIndex)
● Measures task plausibility based on queries’ concentration on the intent tree
● A predefined matrix representing transition of intents against the intent tree structure○ States: {UP1, UP2, DOWN1, DOWN2, SA, MB, Others}○ Estimated based on a reference search log
● Counts how many cover tasks are ranked ahead of true tasks○ Score based on intent transition likelihood
15
![Page 16: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/16.jpg)
Experiments
16
![Page 17: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/17.jpg)
CS@UVA Privacy Preserving Personalization
Data Sources
● Open Directory Project○ 7,600 topic nodes up to level four○ 82,020 web documents belonging to the nodes
● AOL search log, 2006○ 1000 most active users○ 318,023 testing queries○ 0.96M web documents indexed○ Clicked documents are considered as relevant
17
![Page 18: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/18.jpg)
CS@UVA Privacy Preserving Personalization
Experimental Setup
● Apache Lucene-based search engine○ Ranking algorithm - Okapi BM25
● Server personalizes search result○ Using language model estimated based on user profiles○ Borda’s method for rank aggregation
● Server returns the top 100 relevant documents● Sessions are segmented based on 30-minutes inactive
time threshold
18
![Page 19: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/19.jpg)
CS@UVA Privacy Preserving Personalization
Evaluation Metrics
● Mean Average Precision (MAP@100)○ To evaluate ranking quality
● Kullback-Leibler (KL) Divergence○ Computed between the true and noisy user profiles○ Measures the effectiveness of privacy protection
● Normalized Mutual Information (NMI)○ Computed between true and cover query pairs○ Measures information disclosure by the cover queries
19
![Page 20: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/20.jpg)
CS@UVA Privacy Preserving Personalization
Baseline Details
● Plausible Deniable Search (PDS)○ Latent semantic indexing to generate cover queries
20
![Page 21: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/21.jpg)
CS@UVA Privacy Preserving Personalization
Baseline Details
● Plausible Deniable Search (PDS)○ Latent semantic indexing to generate cover queries
● Knowledge-based Scheme (KBS)○ Cover queries from lexical ontology (WordNet, ODP tree)
21
![Page 22: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/22.jpg)
CS@UVA Privacy Preserving Personalization
Baseline Details
● Plausible Deniable Search (PDS)○ Latent semantic indexing to generate cover queries
● Knowledge-based Scheme (KBS)○ Cover queries from lexical ontology (WordNet, ODP tree)
● Topic-based Privacy Protection (TPP)○ Sample cover query terms using LDA-based topic models
22
![Page 23: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/23.jpg)
CS@UVA Privacy Preserving Personalization
Baseline Details
● Plausible Deniable Search (PDS)○ Latent semantic indexing to generate cover queries
● Knowledge-based Scheme (KBS)○ Cover queries from lexical ontology (WordNet, ODP tree)
● Topic-based Privacy Protection (TPP)○ Sample cover query terms using LDA-based topic models
● Embellishing Search Queries (ESQ)○ Embellish user query by adding decoy terms
23
![Page 24: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/24.jpg)
CS@UVA Privacy Preserving Personalization
Baseline Details
● Plausible Deniable Search (PDS)○ Latent semantic indexing to generate cover queries
● Knowledge-based Scheme (KBS)○ Cover queries from lexical ontology (WordNet, ODP tree)
● Topic-based Privacy Protection (TPP)○ Sample cover query terms using LDA-based topic models
● Embellishing Search Queries (ESQ)○ Embellish user query by adding decoy terms
● Anonymizing User Profiles (AUP)○ Hide individual user identity inside groups’ identities
24
![Page 25: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/25.jpg)
CS@UVA Privacy Preserving Personalization
Comparison with Baselines
Model MAP@100 MAP@100 [client-side personalization]
KL Divergence
NMI
No cover queriesAUP 0.1088 0.1171 0.9636ESQ 0.1161 0.1090 0.0912
Number of cover queries = 2IQP 0.1387 0.1486 0.6866 0.2156TPP 0.1158 0.1174 0.7558 0.3922PDS 0.1307 0.1391 0.4467 0.4308KBS 0.1255 0.1474 0.7001 0.2914
* Detailed results can be found in the paper.25
![Page 26: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/26.jpg)
CS@UVA Privacy Preserving Personalization
Measuring Task-Level Privacy Protection
● Compares in-session true task and cover task
26
![Page 27: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/27.jpg)
CS@UVA Privacy Preserving Personalization
Statistical Query Plausibility
● Measures the ratio of search result hits for a query pair ○ Microsoft Bing API to get the hit count
27
![Page 28: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/28.jpg)
CS@UVA Privacy Preserving Personalization
Statistical Query Plausibility
● Compare true and cover queries at web-scale ○ Microsoft Web Language Model API
28
![Page 29: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/29.jpg)
CS@UVA Privacy Preserving Personalization
Conclusion and Future Works
● Intent-aware query obfuscation solution○ Handles sequentially developed intents in search tasks
● Proposed two new metrics measuring task-level privacy disclosure
● Future Works○ Adaptively adjust the number of cover queries
■ Relaxing the assumption that all queries are equally sensitive
○ Perform user studies■ Understanding real user’s satisfaction of privacy protection
solutions
29
![Page 30: Los Angeles Protection in Personalized Web Search · Ranking algorithm - Okapi BM25 Server personalizes search result Using language model estimated based on user profiles Borda’s](https://reader035.vdocuments.us/reader035/viewer/2022071021/5fd5b4612aafa5476367d717/html5/thumbnails/30.jpg)
Thank You
30