routing indices for p-to-p systems icdcs 2002. introduction search in a p2p system –mechanisms...

53
Routing Indices For P- to-P Systems ICDCS 2002

Upload: piers-richard

Post on 01-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Routing Indices For P-to-P Systems

ICDCS 2002

Page 2: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Introduction• Search in a P2P system

– Mechanisms without an index– Mechanisms with specialized index nodes (cent

ralized search)– Mechanisms with indices at each node

• Structure P2P network• Unstructure P2P network

• Parallel v.s. sequentially search– Response time– Network traffic

Page 3: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Routing indices(RI)• Query

– Documents are on zero or more “topics”, and queries request documents on particular topics.

– Documents topics are independent

• Local index• RI

– Each node has a local routing index which contains following information

• The number of documents along each path• The number of documents on each topic of interest

– Allow a node to select the “best” neighbors to send a query to

Page 4: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

• The RI may be “coarser” than the local indices – overcounts– Undercounts

Page 5: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

• Goodness measure– Number of results in a path

• Using Routing indices

Page 6: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

– Storage space• N: number of nodes in the P2P network

• b: branching factor

• c: number of categories

• s: counter size in bytes

Centralized index : s*( c+1) *N

Distributed system: s*(c+1)*b (each node)

Page 7: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

• Creating routing indices

Page 8: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

• Maintaining Routing Indices– Trade off between RI freshness and update cost– No requiring the participation of a

disconnecting node

• Discussion– If the search topics is dependent?– Can the number of “hops” necessary to reach a

document be estimated?

Page 9: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Alternative Routing Indices

• Hop-count RI– Aggregated RIs for each “hop” up to a maximu

m number of hops are stored

Page 10: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

– Search cost• Number of messages

– The goodness of a neighbor• The ratio between the number of documents availabl

e through that neighbor and the number of messages required to get those documents

– Regular tree with fanout F

– It takes Fh messages to find all documents at hop h

– Storage cost?

Page 11: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

• Exponentially aggregated RI– Store the result of applying the regular-tree cost

formula to a hop-count RI

– How to compute the goodness of a path for the query containing several topics?

Page 12: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Cycles in the P2P network (HW)

Page 13: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Improving Search in Peer-to-Peer Networks

ICDCS 2002

Beverly YangHector Garcia-Molina

Page 14: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Outline

• Introduction

• Techniques

• Experiment

Page 15: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Introduction

• We present three techniques for efficient search in P2P systems.– Basic idea is to reduce the number of nodes that

process a query

Page 16: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Current Techniques

• Gnutella– BFS with depth limit D.– Waste bandwidth and processing resources

• Freenet– DFS with depth limit D.– Poor response time.

Page 17: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Iterative Deepening

• Under policy P= { a, b, c} ;waiting time W

• See example.

Page 18: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Directed BFS

• A source send query messages to just a subset of its neighbors

• A node maintains simple statistics on its neighbors– Number of results received from each neighbor– Latency of connection

Page 19: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Candidate nodes

• Returned the Highest number of results

• Low hop-count

• High messages

Page 20: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Local Indices

• Each node n maintains an index over the data of all nodes within r hops radius.

• All nodes at depths not listed in the policy simply forward the query.

• Example: policy P= { 1, 5}

Page 21: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Experimental Setup

• For each response ,we log:– Number of hops took– IP from which the Response message came– Response time– Individual results

Page 22: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Experimental result

Page 23: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Efficient Content Location Using Interest-Based Locality in Peer-to-

Peer SystemsKunwadee Sripanidkulchai

Bruce Maggs

Hui Zhang

IEEE INFOCOM 2003

Page 24: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

motivation

• Although flooding is simple and robust, it is not scalable.

• A content location solution in which peers organized into an interest-based structure on top of Gnutella.

• The algorithm is called interest-based shortcuts

Page 25: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Interest-based locality

Page 26: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Shortcuts Architecture and Design Goals

• To create additional links on top of a peer-to-peer system’s overlay

• As a separate performance enhancement layer on top of existing content location mechanisms

Page 27: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Content location paths

Page 28: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Shortcut Discovery

• The first lookup returns a set of peers that store the content

• These are potential candidates.

• One peer is selected at random from the set and added

• For scalability, each peer allocates a fixed-size amount of storage to implement shortcuts.

Page 29: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Shortcut selection

• We rank shortcuts based on their perceived utility

• A peer sequentially asking all of the shortcuts on its list.

Page 30: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Ranking metrics

• Probability of providing content

• Latency of the path to the shortcut

• Load at the shortcut

• A combination of metrics can be used based on each peer’s preference

Page 31: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Performance indices

• Success rate

• Load characteristics

• Query scope

• Minimum reply path lengths

• Additional state

Page 32: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Potential and Limitations

• Adding 5 shortcuts at a time produces success rates that are close to the best possible.

• Slightly increase the shortest path length from 1 to 2 hops will perform better success rate.

Page 33: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Conclusion

• A simple and practical mechanism was proposed.

Page 34: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Similarity Discovery in structured P2P Overlays

ICPP

Page 35: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Introduction• Structured P2P network

– Only support search with a single keyword

• Similarity between two documents– Keyword sets– Vector space– Measure

• Problems– Search problem– New keyword?

||||cos 1

ba

baab

Page 36: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Meteorograph

• Absolute angle

Page 37: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Publishing and Searching

• Publish– Hash

– Publish the item to a node np with the hash key closest to hash value

Page 38: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

• Search problem– Nearest answers– K_nearest answers–

• Partial

• Comprehensive

• Search strategy

• Discussions

• What happened when keyword vector is represented by ?

Page 39: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Other issues

• Load balance

• Changes of vector space– Republished?– Comprehensive set of keywords– Other methods?

Page 40: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

SWAM: A Family of Access Methods for Similarity-Search in

Peer-to-Peer Data NetworksFarnoush Banaei-KashaniCyrus Shahabi

(CIKM04)

Page 41: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

PDN access method

• Defines

• How to organize the PDN topology to an index-like structure

• How to use the index structure

Page 42: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Hilbert space

• Hilbert space (V, Lp)• Key k = (a1,a2, … , ad)

– d: the dimension of a Vector space– The domain is a contiguous and finite interval o

f R

• The Lp norm with p belongs to Z+– The distance function to measure the dissimilari

ty

Page 43: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes
Page 44: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Topology

• Topology of a PDN can be modelled as a directed graph G(N, E)

• A(n) is the set of neighbors for node n

• A node maintains– A limited amount of information about its neigh

bors Includes • the key of the tuples maintained at neighbors

• The physical addresses of neighbors

Page 45: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

• The processing of the query is completed when all expected tuples in the relevant result set are visited

• Access methods– Join, leave for virtual nodes– Forward for using local information to process

queries and make forwarding decisions

Page 46: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

The small world example

• Grid component

• Random graph component

• The process of queries (exact, range, kNN) in the highly locality topology

Page 47: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes
Page 48: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Flat partitioning

• SWAM also employs the space partitioning idea: flat partitioning

Page 49: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Query Processing

• Exact-Match query processing

• Range query processing

• kNN Query processing

Page 50: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

Data Indexing in Peer-to-Peer DHT Networks

ICDCS 2004

Page 51: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

• Locating data using incomplete information.– How to search data in a DHT

• Data descriptors and queries– Semi-structured XML data

Page 52: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

– Query• Most specific query for d

• Relationship between queries

Page 53: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes

• Given the most specific query, finding the location of the file is simple

• How about less specific queries

• Solution– Provide query-to-query service

• For a given query q, the index service returns a list of more specific queries, covered by q

– DHT storage system must be extended• Insert(q.qi), q->qi, adds a mapping (q;qi) to the index

of the node responsible for key q.