taxonomy caching: a scalable low- cost mechanism for indexing remote contents in peer-to-peer...
TRANSCRIPT
![Page 1: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/1.jpg)
Taxonomy Caching: A Scalable Low-Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems
Kjetil NørvågNorwegian University of Science and Technology Trondheim, Norway
Christos Doulkeridis and Michalis VazirgiannisAthens University of Economics and Business
Athens, Greece
![Page 2: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/2.jpg)
June 28, 2006 ICPS'2006 2
Outline
Motivation and example application Taxonomies and taxonomy-based querying Taxonomy-based query routing Taxonomy caching: architecture and maintenance Experimental results Summary and further work
![Page 3: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/3.jpg)
June 28, 2006 ICPS'2006 3
Motivation
Mobile devices high storage capacity & wireless support
Contain multimedia documents that can be shared Possibly other data/services:
– Temperature or other environmental data
Important challenge: find the files & services! Problem:
– Dynamic contents, location, and visibility
– Limited bandwidth
Centralized indexing/search engines not applicable
P2P network & search
![Page 4: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/4.jpg)
June 28, 2006 ICPS'2006 4
Example application: MobiShare
Devices share resources by hosting web services Device connected to a CAS CASs connected P2P [More details in Valavanis et al., Web Intelligence’2003]
Cell A
Cell B
Cell C
Wireless NetworkAccess Point
Wireless NetworkAccess Point
Wireless NetworkAccess Point
CASCAS
CAS
Internet
![Page 5: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/5.jpg)
June 28, 2006 ICPS'2006 5
Outline of basic idea
1) Describe contents according to taxonomy
2) Taxonomy info cached at remote peers
3) Use cached knowledge to route queriesto appropriate peers
Why?
1) Should reduce latency
2) Increase recall with same cost
![Page 6: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/6.jpg)
June 28, 2006 ICPS'2006 6
Resource descriptionTravel
Transportation Accomodation
Air Train Boat
Packagetours
Scheduledflights
Camping
Helicopter
Hotel Motel
Food
Restaurant Grocerystore
...
Taxonomy-based resource description Also applicable for audio/video More than one taxonomy might exist in system Resource description: Taxonomy ID and set of categories
![Page 7: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/7.jpg)
June 28, 2006 ICPS'2006 7
Taxonomy-based querying
Query:
1) Request for all resources belonging to category Cj
or
2) Request for all resources belonging to category Cj and satisfying some additional property
Example properties: Text contents, metadata
![Page 8: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/8.jpg)
June 28, 2006 ICPS'2006 8
Searching in unstructured P2P networks Basic search technique: Local execution of query then
forwarding if TTL>0– Naïve flooding (all neighbors)– Normalized flooding (only K neighbors)– Random walks: only one random neighbor, but W walks initiated
Problem: Only a limited # of peers can be searched (query horizon)
Possible improvements: – Routing indices– Summary indexing (bloom filters etc)– Result caching
However: Still limited scalability and coverage
![Page 9: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/9.jpg)
June 28, 2006 ICPS'2006 9
Taxonomy caching
Basic idea: – Maintain taxonomic of remote contents in a
taxonomy cache (TCache) Mapping from taxonomic concept to set of peers Advantages:
– Cheaper to maintain than full-text index– More applicable to multimedia data– More robust wrt. changes in contents
Used to improve query routing Higher recall and reduced latency
![Page 10: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/10.jpg)
June 28, 2006 ICPS'2006 10
Query routing using taxonomy cache (TCache)
1) Basis: one of traditional routing strategies
2) Query forward peers: PF
3) Starting point: PF = neighbors=PN={PN1,…,PNn}
4) Lookup in TCache: Lookup(category) PC={PC1,…,PCm}
5) PF = PN+PC
6) Query forwarded to (subset of) PF
![Page 11: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/11.jpg)
June 28, 2006 ICPS'2006 11
Query forwarding alternatives (1) Query forward peers: PF # of neighbors (excl. previous): Nn # matches from lookup: Nc Ranking of peers in PC:
– Based on # of resources within a category– High # of resources: considered experts
TCB: – Highest ranked in PC + the Nn neighbors in {PN1,…,PNn}– Forwarding to peer in PC called jump– Jump can be to peer beyond query horizon!
TCA: – If Nc ≥ Nn: forward to Nn highest ranked peers in PC
– If Nc < Nn: forward to all Nc peers in PC + (Nn-Nc) randomly selected neighbors
![Page 12: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/12.jpg)
June 28, 2006 ICPS'2006 12
Query forwarding alternatives (2)
TCCN:– If Nc ≥ Nn: forward to all Nc peers in PC
– If Nc < Nn: forward to all Nc peers in PC + (Nn-Nc)
neighbors TCDN:
– If Nc ≥ Nn: forward to Nn/2 highest ranked peers in PC +
random selection of Nn/2 other peers in PC
– If Nc < Nn: forward to all Nc peers in PC + (Nn-Nc)
neighbors
![Page 13: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/13.jpg)
June 28, 2006 ICPS'2006 13
Distributing taxonomic information
Basic mechanism: piggyback matching category with query result– Rsult returned through original path, possibly
involving jumps
– Makes revalidation of contents intermediate TCaches possible
– Coverage will be gradually extended (beyond query horizon)
Lazy distribution by gossiping also possible
![Page 14: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/14.jpg)
June 28, 2006 ICPS'2006 14
TCache architecture and maintenance
Aim: Provide efficient mapping C {PC1,…,PCm} For each category: Peers, # of resources, and TTL TTL:
– Regularly decremented
– Reset to start value at revalidation
Caching policy: Aggressive vs. selective Compacting techniques: Peer upgrade & non-expert pruning
Transportation{(P,#,T)}
Air {(P,#,T)} Train{(P,#,T)}
Boat{(P,#,T)}
Packagetours
{(P,#,T)}
Scheduledflights{(P,#,T)}
Helicopter{(P,#,T)}
![Page 15: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/15.jpg)
June 28, 2006 ICPS'2006 15
Experimental setup
Simulations Excerpts of DMOZ taxonomy Synthetic network topologies Resource allocation: 80/20 rule Queries are taxonomic categories A number of peers have role as querying peers Measured: Contacted peers, messages, recall
and latency In this presentation: Results using flooding and
TCDN query routing
![Page 16: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/16.jpg)
June 28, 2006 ICPS'2006 16
Improvements in recall
NM
(F)NM
(TC)Recall (F)
Recall (TC)
TTL=1 7.8 7.0 0.0022 0.0019
TTL=3 166.7 166.0 0.0117 0.0149
TTL=5 524.7 523.9 0.0282 0.0717
TTL=7 1058.6 1057.7 0.0506 0.1835
TTL=9 1721.0 1719.6 0.0773 0.2930
TTL=11 2566.3 2566.0 0.1104 0.4012
TTL=13 3536.5 3535.8 0.1477 0.4891
TTL=15 4560.2 4558.7 0.1864 0.5755
![Page 17: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/17.jpg)
June 28, 2006 ICPS'2006 17
Primary reason for improvement:More intelligent query forwarding
NC
(F)NC
(TC)Recall (F)
Recall (TC)
TTL=1 7.8 6.7 0.0022 0.0019
TTL=3 45.3 53.4 0.0117 0.0149
TTL=5 110.6 158.0 0.0282 0.0717
TTL=7 199.9 346.8 0.0506 0.1835
TTL=9 305.6 583.1 0.0773 0.2930
TTL=11 437.7 840.3 0.1104 0.4012
TTL=13 586.7 1120.6 0.1477 0.4891
TTL=15 741.6 1372.4 0.1864 0.5755
![Page 18: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/18.jpg)
June 28, 2006 ICPS'2006 18
0
50
100
150
200
250
300
350
400
5 10 15 20
TTL
%Im
pro
vem
ento
v
N1000
N2000
N3000
Improvement and scalability
![Page 19: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/19.jpg)
June 28, 2006 ICPS'2006 19
Latency reduction
TCache results in very fast retrieval of first results
Finding all results approximately similar performance because flooding in both techniques
![Page 20: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and](https://reader035.vdocuments.us/reader035/viewer/2022062309/5697bff81a28abf838cbf59a/html5/thumbnails/20.jpg)
June 28, 2006 ICPS'2006 20
Summary and further work
Presented motivation and context Taxonomy-based querying and query routing TCache architecture and maintenance Experimental results proving our claims Future/ongoing work:
– Employing the techniques for XML/XPath querying in P2P context (to appear at IEEE P2P’2006)
– Integration of different taxonomies