dews: a decentralized engine for web searchmfbari/files/c16s.pdf · 2019-11-14 · resolving web...
TRANSCRIPT
![Page 1: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter](https://reader033.vdocuments.us/reader033/viewer/2022043018/5f3a6159fcc2bb401e4ba65a/html5/thumbnails/1.jpg)
DEWS: A Decentralized Engine for Web Search
Presented by
Prof. Raouf Boutaba
![Page 2: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter](https://reader033.vdocuments.us/reader033/viewer/2022043018/5f3a6159fcc2bb401e4ba65a/html5/thumbnails/2.jpg)
Web Search : Today
• Contemporary Web Search:
– Logically centralized
– Company controlled
• Problems
– Censorship
– Biased ranking
– Privacy
![Page 3: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter](https://reader033.vdocuments.us/reader033/viewer/2022043018/5f3a6159fcc2bb401e4ba65a/html5/thumbnails/3.jpg)
Web Search : Decentralization
• Using P2P networks – YacY, Faroo
– Search overhead
– Churn
• DEWS:
– P2P network between Webservers not end-hosts
– Both decentralized and stable
![Page 4: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter](https://reader033.vdocuments.us/reader033/viewer/2022043018/5f3a6159fcc2bb401e4ba65a/html5/thumbnails/4.jpg)
Challenges
• Indexing the voluminous Web
• Resolving Web queries
• Ranking search results
• Incremental retrieval
DEWS addresses the first 3 Challenges
![Page 5: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter](https://reader033.vdocuments.us/reader033/viewer/2022043018/5f3a6159fcc2bb401e4ba65a/html5/thumbnails/5.jpg)
Conceptual Overview
DHT DHT WS
WS WS
WS
WS
WS
Hosted contents
Web Server (WS) DHT: - Pros:
- Very stable - 1 or 2 hop lookup via link cache
- Cons: - Additional overhead on WS
- Content index - links to other WS
WS WS WS
WS
WS WS Crawl Crawl
Search portal
![Page 6: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter](https://reader033.vdocuments.us/reader033/viewer/2022043018/5f3a6159fcc2bb401e4ba65a/html5/thumbnails/6.jpg)
Plexus DHT
• Why Plexus[1]? – Efficient routing with dynamic load-balancing
– Supports approximate matching
• How Plexus works: – Generates a bit-pattern from advertisement/query keywords
– Decodes this pattern to codewords using a Linear Binary Code
– Routes using the generator matrix of the LBC
• Modification to Plexus routing – DEWS aggregates routing messages and packs multiple
queries in one message
[1] R. Ahmed and R. Boutaba. Plexus: A Scalable Peer‐to‐peer Protocol Enabling Efficient Subset Search.
In IEEE/ACM Transactions on Networking (TON). IEEE Press, Vol. 17(1), pp. 130-143, February 2009.
![Page 7: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter](https://reader033.vdocuments.us/reader033/viewer/2022043018/5f3a6159fcc2bb401e4ba65a/html5/thumbnails/7.jpg)
Indexing Mechanism
website
codeword
Website index
node
hash
Plexus Routing
Base URL Keywords
Pattern
Inverted index
nodes
codewords
Plexus Routing
DMP, n-gram
Bloom-filter
List decoding
Used for Decentralized PageRank
Used for Keyword Relevance
![Page 8: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter](https://reader033.vdocuments.us/reader033/viewer/2022043018/5f3a6159fcc2bb401e4ba65a/html5/thumbnails/8.jpg)
Decentralized PageRank
8
Plexus Overlay
Hyperlink structure
Hash-map
Soft-link
ui
vi2 vi1
(vi1)
(vi2)
(ui)
URL/website
Hyper link
Web Server (index node)
Overlay link
ui
Other nodes in the graph
Other nodes in the graph
ui1 ui2 ui3
vi1 vi2
![Page 9: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter](https://reader033.vdocuments.us/reader033/viewer/2022043018/5f3a6159fcc2bb401e4ba65a/html5/thumbnails/9.jpg)
Distributed Inverted Index
9
Overlay
Hash-map
Soft-link
ui , {vi1 , (vi1 )}, {<ki1, ri1 >, ...<ki2, rig >}
(ui) (vi1)
(ui)
(vi1) (vi2) (vit) …
<kij , ui , rij , (ui)}>
(ki1)
(ki2)
( k ) rep
i1 … ( k )
rep
i2 ( k ) rep
ig
![Page 10: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter](https://reader033.vdocuments.us/reader033/viewer/2022043018/5f3a6159fcc2bb401e4ba65a/html5/thumbnails/10.jpg)
Resolving Web Query
Pattern
Inverted index
nodes
codewords
Plexus Routing
DMP, n-gram
Bloom-filter
List decoding
Keyword-1
Query keywords
Pattern
Inverted index
nodes
codewords
Keyword-2
query keyword
1 if ql is in ui; 0 otherwise
Pagerank weight of ui
Relevance of ui
to ql
![Page 11: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter](https://reader033.vdocuments.us/reader033/viewer/2022043018/5f3a6159fcc2bb401e4ba65a/html5/thumbnails/11.jpg)
Evaluation
• Simulation Setup
– Web Track dataset from LETOR 3.0
• ~ 1 million webpages and ~11 million hyperlinks
– WS network size – up to 100,000 nodes.
• Measurements
– Routing performance: scalability & overheads
– Ranking performance: accuracy & convergence rate
– Search performance : flexibility & accuracy
• Here we present two important results
![Page 12: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter](https://reader033.vdocuments.us/reader033/viewer/2022043018/5f3a6159fcc2bb401e4ba65a/html5/thumbnails/12.jpg)
Routing Performance
Advertisement Scalability
Observations: • Advertisement hops do not increase
significantly with network size
(ui)
(vi1) (vi2) (vit) …
( k ) rep
i2
(ui)
( k ) rep
i1 ( k ) rep
ij
Indexing ui
Indexing kijrep
• URL advertisement requires more hops than keyword advertisement
• Route aggregation in DEWS significantly reduces advertisement overhead Original Plexus
Modified Plexus in DEWS
![Page 13: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter](https://reader033.vdocuments.us/reader033/viewer/2022043018/5f3a6159fcc2bb401e4ba65a/html5/thumbnails/13.jpg)
Ranking Accuracy
Observations: • Spearman’s footrule distance decays rapidly
with simulation time, which indicates fast convergence of our distributed ranking algorithm
σ1 σ2
σ1(3)=3
σ2(3)=1
Ranking Accuracy
• Variation in Top-20 and Top-100 elements is not high => DEWS is close to centralized ranking
![Page 14: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter](https://reader033.vdocuments.us/reader033/viewer/2022043018/5f3a6159fcc2bb401e4ba65a/html5/thumbnails/14.jpg)
Summary
• DEWS is a self-indexing architecture for the Web
– provides censorship resistance
– delivers unbiased ranking of search results
– makes it hard to track users’ search history
• Future Research:
– Support for incremental retrieval in DEWS
• Can be achieved by gradually increasing decoding radius in Plexus routing.
– Develop a working prototype of DEWS and deploy in the Web
![Page 15: DEWS: A Decentralized Engine for Web Searchmfbari/files/c16s.pdf · 2019-11-14 · Resolving Web Query Pattern Inverted index nodes codewords Plexus Routing DMP, n-gram Bloom-filter](https://reader033.vdocuments.us/reader033/viewer/2022043018/5f3a6159fcc2bb401e4ba65a/html5/thumbnails/15.jpg)
Questions?