distance join processing in a p2p world

Upload: michael0066

Post on 31-May-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Distance Join Processing in a P2P World

    1/25

    Distance Join Processing

    in a P2P World

    Xiaoqi Zhang

    [email protected]

    Student ID: 261273

    8/7/2008

    Supervisor: Dr. Egemen Tanin

  • 8/14/2019 Distance Join Processing in a P2P World

    2/25

    2 | P a g e

    Distance Join Processing in a P2P World

    Abstract

    P2P networks have expanded their use to the area of distributed database

    systems. The P2P paradigm is famous for its various advantages over the conventional

    client-server paradigm in that it provides excellent scalability both in computation and

    bandwidth as well as no single point of failure due to decentralization. Spatial data is

    widely used today in P2P applications. By exploiting the features of the P2P paradigm,

    efficient spatial data retrieval becomes possible. A large body of work has been done

    in spatial data retrieval over P2P networks, which focuses on the classic query

    operations of range query and nearest neighbor query. However, to the best of my

    knowledge, no work has been done in spatial data distance join operations in thecontext of P2P paradigm. This report gives a detailed review on the first distance join

    algorithm for P2P networks along with its implementation. A comprehensive

    experiment is carried out at the end to examine different aspects of the algorithm.

    Keywords: P2P, client-server, spatial data, GIS

    1. Introduction

    Spatial data has become a critical ingredient in various applications and

    databases including location-based services [1], public transportation services

    scientific data management [2,3,4] and digital government [5]. Not only is spatial data

    widely used in scientific or government organizations but also it is used by the general

    public, such as in-car GPS systems, real-estate agencies, etc.

    2D worlds and their representations are the most frequently used spatial data in

    spatial data processing domain. A 2D presentation of a virtual or a real world in an

    application contains many spatial objects which have positional values. One solution

    to eliminate the bottleneck problem that the conventional client-server architecture

    may bring into the applications is to distribute such spatial objects among machines in

    the P2P networks so that operations on the spatial data are carried out in a P2Pparadigm rather than a client-server paradigm. New P2P applications, i.e.,

    job-employee seeker networks, buyer-seller networks, event/location finders for a city,

    follow the solution. For example, in a buyer-seller P2P network, information about

    sellers and products is distributed over the network. A potential buyer may supply

    his/her location and an area in the map where sellers may be located along with some

    information about the product to a search system and the system returns a list about

    the sellers who is selling the related products. This type of operation can be done by

    simply clicking on a 2D map to choose the location and area. Another similar type of

    query will yield the distance join result which contains ordered pairs of spatial objects.

    Such order depends on the distance between the two spatial objects. Finding the

  • 8/14/2019 Distance Join Processing in a P2P World

    3/25

    3 | P a g e

    closest bar-restaurant pair will be one example of such applications. One

    straightforward approach towards this type of operation is to simply forward

    messages among available nodes in the network for locating desired data. Such an

    approach is obviously not feasible, which makes an extra large amount of peers that

    do not have the desired data participate in this operation. In the unpublished paper [6],Tanin et al. have proposed an elegant way that exploits the features of P2P networks.

    They used a data structure called quadtree [7] to partition underlying spatial data in

    2D worlds on which distance join queries are carried out. The content of this report is

    based on [6, 8]. It gives a detailed explanation of the proposed distance join algorithm

    and the results of a comprehensive experiment are presented at the end.

    The rest of this report is organized as follows. Section 2 gives a brief review of

    related works focusing on sequential distance join algorithms and distributed quadtree

    index; section 3 discusses 2 other types of query on distributed quadtree index;

    section 4 explains the distance join algorithm and one implementation of mine;

    section 5 gives the details of the experiments and the results; in section 6, conclusionand future work are given.

    2. Related Work

    2.1. Base Sequential Algorithm

    Several works has been done regarding to distance join algorithms. Hjaltason

    and Samet examined various similarity search algorithms in metric spaces in [9] with

    the main contribution being the use of a priority queue-based ranking algorithm for

    spatial data. This algorithm can find the results of a ranking query in an incremental

    fashion. In [10], they proposed a distance join algorithm that works on a hierarchical

    spatial data structures. In the paper, the authors use a data structure called R-tree as

    the storage of the spatial data/R-tree blocks. Priority queue based approach is adopted

    to facilitate the process of the ranking algorithm. Pairs of spatial objects and R-tree

    blocks are inserted into the priority queue. The distance between each pair is used as

    the criterion for ordering the queue. At each step of the algorithm, the pair at the head

    of the priority queue is retrieved and processed, i.e., the pair with the smallest distance.

    If the pair is formed by two data objects, then the pair is reported as the next closest

    pair. If one of the items in the dequeued pair is a node from the R-tree, then the R-treenode in the pair is substituted by its descendants, i.e., objects or sub-nodes, to form

    new pairs. This method works in an incremental fashion. Their algorithm has a

    drawback. Pairs in the priority queue are processed sequentially. Thus in a P2P

    network, the algorithm will work inefficiently due to the accumulated communication

    delay. The algorithm examined in this report employs the similar priority queue based

    approach but it is carefully designed so that it works efficiently in P2P networks by

    utilizing the parallelism in the network.

    2.2. Distributed Quadtree Index

  • 8/14/2019 Distance Join Processing in a P2P World

    4/25

    4 | P a g e

    2.2.1. Partition Spatial Data Using Quad-CIF Tree

    The distance join algorithm examined in this report is based on distributed

    quadtree index proposed in [11]. In the paper [11] a data structure called quad-CIF

    tree [12] is used for partitioning spatial data. A quad-CIF tree is a variation of quad

    tree [13] and is originally used for speeding-up algorithms used in computer-aideddesign of integrated circuits [12]. A quadtree is a tree data structure with each node

    can have maximum 4 sub nodes. The quadtree can represent a 2D space in the

    following way: At the beginning a root node in the quadtree represents the entire 2D

    space. The space is then divided into 4 identical sub regions, which equals the root

    node splitting itself into 4 sub nodes

    with each one of them corresponding to

    a sub region. For each one of the sub

    regions, the same process then proceeds

    recursively until a certain criterion is

    met. Figure 1 shows this process. QuadCIF-tree extends quadtree definition in

    that it specifies the criteria of when to start

    the subdivision and when to stop the

    subdivision given the distribution of

    spatial data in the 2D space. The start and

    stop rules are defined as follows:

    For any one of the spatial objects

    within a certain 2D region, the region thatcompletely contains the spatial object

    root node

    o

    A B

    C D

    level 1 nodes

    root node

    o

    root node

    o

    A B

    C D

    level 1 nodes

    O

    root node

    AB

    C D

    A

    B C

    D

    O

    O

    root node

    O

    root node

    AB

    C D

    A

    B C

    D

    O

    CA CB

    CC CD

    CA

    CB CC

    CD

    rectangle 1

    Figure 1. Quad tree demo

    Figure 2. Quad-CIF tree partitions spatial data

  • 8/14/2019 Distance Join Processing in a P2P World

    5/25

    5 | P a g e

    splits itself into 4 identical sub regions; and for any one of the 4 sub regions that

    completely contains the spatial object, split itself again, until no sub region can

    contain the spatial object in its entirety. And the spatial object is inserted to the node

    which corresponds to the smallest region that contains the spatial object in its entirety.

    The process is depicted in figure 2.In the paper [11], the proposers give a concept of control point for each

    region and sub region, which is simply the centroid of the region. As shown in figure

    2, each node in the quadtree maintains the information about its corresponding control

    point denoted as :, which can be represented in the following formula

    () = ({ , , , }, ,) ,( )

    Basically, these are 3 pieces of information: first, the information about the 4 children

    of the node, denoted as , , , , which are just type of integer indicating how

    many spatial objects does the corresponding child have; ,) ( is the 2D Cartesian

    point in the 2D region; and contains all the spatial objects which are inserted

    to this quadtree node. The information is crucial for searching algorithms (rang query,nearest neighbor query, distance join query) to conduct. It makes it possible to decide

    whether to forward a query further down on the quadtree. Details will be given at

    section 3.

    2.2.2. Routing Desired Data Using Chord

    The P2P distance join algorithm proposed employs the distributed quadtree

    index as well as the well known DHT (distributed hash table) protocol Chord [14] as

    the application level routing protocol.

    There are 2 major reasons for choosing the Chord as the application level

    transport protocol:

    Firstly, the hashing function which Chord employs provides uniformly random

    key-location mappings, which guarantee that keys are near uniformly distributed

    among the peers in the P2P networks. In other words, no peer is allocated keys

    significantly more than others. This is good for load balancing. Because no peer in the

    network will overload due to the fact that more queries are forwarded to it; Secondly,

    Chord uses consistent hash function SHA-1 [15] which is excellent for an unstable

    network such as P2P networks where peers leave and join the networks frequently.

    Without consistent hashing, as peers join or leave the network, all the existing hashed

    keys must be rehashed which results in issue that the most of the network bandwidthis taken over by the messages used for rehashing.

    As mentioned previously, every node in the quadtree stores a control point

    which controls the underlying region. For distributing the quadtree among the

    available machines in the P2P network, the string representation of x y coordinates of

    control points stored at each quadtree node are used as the key of SHA-1 hash

    function. It is in the format of (x, y). Practically, no two control points are hashed to

    the same location due to the fact that there are no two control points are exactly the

    same. With Chord protocol a control point and the information about it ( ()

    described previously) are hashed into the Chord virtual circle space. With a string

    representation of a control point one can easily find the desired data just by following

  • 8/14/2019 Distance Join Processing in a P2P World

    6/25

    6 | P a g e

    Chord specification. Figure 3 shows one possible result of hashing control point of

    each quadtree node to the Chord virtual circle space. As depicted in the figure, peer1

    has the control points C, O and CB along with the spatial data stored; peer345 has the

    control points D and CD; etc.

    When partitioning the spatial data, smaller objects tend to be inserted into the

    deeper level in the quadtree which may cause the problem that a query is passed down

    to many levels in the quadtree before a spatial object can be found. A major impact of

    this is that more messages are needed to find the smaller spatial objects therefore,

    causes longer communication delay. A variable Fmax is proposed to specify the

    maximum level in the quadtree into which a spatial object can be inserted. Variable

    Fmax prevents the quadtree generated from partitioning process from being too high,

    which may results in long time traverse along the quadtree when doing queries.

    Note that for any queries, they all start processing from the peer who has

    information about root quadtree node, which may cause single point of failure. A newvariable which is similar to Fmax, namely, Fmin is defined. Fmin specifies the

    minimum level in the quadtree into which a spatial object can be inserted. When

    spatial objects are inserted into the quadtree, at minimum, they are inserted into the

    Fmin level nodes in the quadtree. When no Fmin node can contain the spatial data in

    its entirety, then the spatial object is inserted into those Fmin level nodes whose

    controlled regions intersect with the spatial object. By doing this, every query now

    starts processing from those nodes at Fmin level in the quadtree not a single root

    node.

    3. Algorithms for Basic Spatial Query

    3.1. Range Query

    3.1.1. High Level Description

    Range query, nearest neighbor query and distance join query are all based on

    distributed quadtree index [11]. Figure 4 shows the pseudo code of range query. In

    figure 4, procedure D (u) returns a reference of control point u; C(u,i) returns the ith

    children control point of control point u; R( ) returns the range that the specified

    control point controls. Range query is initiated from one peer in a P2P network by

    calling the InitiateRangeQuery procedure with a parameter Q being the 2D rectanglewithin which one wants to check whether there are some spatial objects located.

    O

    root node

    AB

    C D

    AB C

    D

    O

    CA CB

    CC CD

    CA

    CB CC

    CD

    rectangle 1

    peer 1

    peer 345

    peer 1567

    peer m

    O

    A

    B

    C

    D

    CA

    CB

    CC

    CD

    Figure 3. Hashing result of quadtree in Chord circle space

  • 8/14/2019 Distance Join Processing in a P2P World

    7/25

    7 | P a g e

    Firstly, procedure Subdivide is called to get the Fmin level of control points

    whose controlled ranges intersect with the query Q. And then for each of such control

    points, forward the range query to the peers who possess the desired control points by

    following Chord protocol (denoted as Delegate(u)-> DoRangeQuery(Q, u) in figure

    4). Upon arrival, peers that get the forwarded range query return any spatial objects

    that intersect with the query range Q and then for each children of the queried control

    point, forward the query Q to those who have spatial objects and whose controlled

    range intersects with the query. The range query process is shown in figure 5 with

    Fmin=0. Peer1567 initiates the range query. Translucent rectangle (denoted as query

    Q in the figure) is the query rectangle. In a distributed quadtree index P2P network,

    every query starts to process from Fmin level in the quadtree. In this case, Fmin=0,

    the query starts from root node. Query is passed down on the quadtree. Initially, the

    InitiateRangeQuery(query Q)

    {

    control point list G = {}

    Subdivide (Q, root, G)

    for each u in G do

    Delegate(u)-> DoRangeQuery(Q, u)

    }

    DoRangeQuery (query Q, control point u)

    {

    intersect objects in D(u).list with Q

    send results

    for i = 1 to 4 do

    if (Ints(R(C(u, i)), Q) is not empty) and(D(u).di > 0) then

    Delegate (C(u, i))->DoRangeQuery(Q, C(u, i))

    }

    Figure 4.

    Algorithm for range query

    O

    root node

    AB

    C D

    A

    B C

    D

    O

    CA CB

    CC CD

    CA

    CB CC

    CD

    rectangle 1

    peer 1

    peer 345

    peer 1567

    peer m

    O

    A

    B

    C

    D

    CA

    CB

    CC

    CDquery Q

    11

    2

    2

    3

    3

    Figure 5. Chord, quad tree and spatial data

  • 8/14/2019 Distance Join Processing in a P2P World

    8/25

    8 | P a g e

    result of Subdivide contains only control point O which controls the entire region.

    Peer1567 then passes the query to the peer in the network which has the data about

    control point O. This process is depicted as the curve marked 1 in figure 5. With the

    help of Chord, the query is then passed to peer1 who has information about control

    point O. When query is arrived in peer1, peer1 first examines whether it has anyspatial data (in this simplified example, rectangles) that intersects the query

    rectangle; and then, it checks are there any children of the node O whose controlled

    range intersects the query rectangle Q and who has spatial data. After examining,

    peer1 finds that the children of O, C meets such requirements. Then peer1 forwards

    the query to the peer who has information about control point C. With Chord, we

    know that the peer is still peer1. This process is depicted by curve marked 2. Peer1

    repeated process 1, and finds that sub region CD intersects the query Q and has spatial

    data in it. Then peer1 forwards the query to the peer who has information regarding

    control point CD, namely the peer345. The routing process is depicted by the curve 3.

    When query arrives at peer345 it finds it has spatial object rectangle1 and no subregions have spatial objects. Then, after sending the result back, the range query stops.

    As described, the query starts at root node and is passed down on the quadtree with

    the order: O->C->CD.

    3.1.2.Implementation

    For implementation part, I use tables to show the features which I implemented

    and in Extra column, I added some specials and key points that must be paid

    attention to.

    Table 1 shows the implementation details.

    Item Implemented Extra

    Routing (Chord) Basic data structures This project does not deal with the

    issues that arise when node join or leave

    the Chord network, only routing is dealt

    with. Caching mechanism in Chord is

    NOT implemented.

    find_predecessor

    find_successor

    Indexing Basic data structures Quadtree, control point. Quadtree

    node, rectangle, Fmin, Fmax, etc.

    Algorithm Basic data structures Implementation strictly follows

    the protocol defined in the original paper

    [8]

    InitiateRangeQuery()

    Subdivide (Q, root, G)

    Delegate(u)

    DoRangeQuery(Q, u)

    Table 1. Implementation details for range query

    3.2.Nearest Neighbor Query

    3.2.1. High Level Description

    Hjaltason and Samet [9] gave a comprehensive analysis of various similarity searchalgorithms in metric spaces. The main contribution of theirs was to propose a priority

  • 8/14/2019 Distance Join Processing in a P2P World

    9/25

    9 | P a g e

    queue based ranking algorithm that can

    find the results of a ranking query in an

    incremental fashion. Ranking is a more

    general form of NN query where all the

    spatial objects will eventually be retrievedin the increasing order of their distance

    from a query point. Initially, by first

    iteration of the algorithm the root node of

    the data structure is inserted into the

    priority queue. The priority is measured

    by the distance between the data structure

    and the query point. In the next iteration

    of the algorithm, all children of the root

    which are in turn added to the priority

    queue. Hence, in this fashion, at eachiteration of the algorithm, the element

    with the smallest distance is removed and

    visited, and its children are inserted into

    the queue. Eventually, there will be an

    object at the head of the queue, which is

    the object with the shortest distance to the

    query point. Note that their algorithm

    works in an incremental fashion.

    Elements in the priority queue are

    contacted sequentially, which is clearly

    not suitable for P2P paradigm where the

    power of parallelism must be fully

    exploited.

    Tannin, et al. proposed an elegant

    way of doing nearest neighbor query in

    [8]. Their algorithm is based on the

    priority queue based approach. Figure 6

    shows the pseudo code of nearest

    neighbor query. The peer that initiates the nearest neighbor query maintains thepriority queue. At the beginning, instead inserting just the root node into the priority

    queue, all the control points at level Fmin are inserted into the priority queue. There is

    a new variable called WCDist, which is the worst case distance from the query point

    to the controlled range of the control point. The WCDist is used as a criterion to

    decide which peers are to be contacted in parallel during one iteration of the algorithm.

    This is the most remarkable difference between this algorithm and the algorithm

    proposed by Hjaltason and Samet in [10]. During each iteration, the WCDist is

    updated as follows: Let dbe the distance between the first spatial object (if any) in the

    priority queue and the query point. And let D be the maximum distance between the

    query point and the top element (cannot be a spatial object, because spatial objects at

    InitiateNNQuery(query q)

    {

    priority queue pqueue =

    GetSortedControlPoints (q, fmin)

    control point c =FindControlPoint (q, fmin)

    WCDist =MaxDist(q, c)

    SendMessagesWithin(WCDist)

    }

    DoNNQuery(control point u)

    {

    Msg= CreateReplyMessage()

    msg.Put (D(u).list)

    for i = 1 to 4 do

    if (D(u).di > 0) then

    msg.Put (C(u, i))

    SendMessageBack(msg)

    }

    Synchronized ReceiveNNMessage(message msg)

    {

    for each object X in msg.list do

    pqueue. Add (X )

    for each control point u in msg do

    pqueue. Add(u)

    pqueue.Remove(SenderOf(msg))

    WCDist=UpdateWCDist()

    SendMessagesWithin(WCDist)

    }

    Figure 6

    Algorithm for nearest neighbor query

  • 8/14/2019 Distance Join Processing in a P2P World

    10/25

    10 | P a g e

    top will be deleted as soon as they are found). Thus the WCDist= Min (d, D). Then,

    for each control point in the priority queue, those with the distance from their

    Figure 7. Process of nearest neighbor query

    controlled ranges to the query point less than or equal to WCDist are contacted in

    parallel. The entire process is depicted in figure 7 with Fmin=1. Peer345 initiates the

    nearest neighbor query by calling InitiateNNQuery. GetSortedControlPoints will

    return a priority queue, which contains level 1 control points, namely, A,B,C and D.

    The status of the priority queue is denoted as priority queue status 1 in the figure.

    The first WCDist and the range it covers are denoted by the quadrant marked as

    Wcdis1 in the figure. Therefore, SendMessagesWithin will forward the query in

    parallel to the peers who possess control points C, A and D respectively. As shown in

    the figure, peer345, peer1 and peer m get this message. Then DoNNQuery procedure

    is called at each one of them. They will create reply message put any spatial objects

    they have along with any control points which have spatial object in it to the message

    and send it back to query initiating peer, in this case, peer345. Assuming the reply

    message corresponding to control point C arrives at peer345 first (the arriving order

    may vary due to message delay; however, this doesnt affect the correctness of the

    algorithm). ReceiveNNMessage is called at peer345. After inserting all the control

    points and spatial objects into the priority queue, the status of the priority queue isdenoted as priority queue status 2 in figure 7. Control point C is deleted from the

    priority queue after handling the reply message corresponding to it. Then

    UpdateWCDist is called to update the WCDist. The updated WCDist is shown as the

    smaller quadrant in figure 7 denoted as Wcdis2, where the SendMessagesWithin

    procedure will sent the query to the peer that just has control point CD (because

    control points A and D has been contacted previously). This time peer345 is contacted.

    Before peer345 returns a reply message back, assuming reply message about control

    point A just arrives at the query initiating peer which is peer345, according to the

    algorithm, the spatial objects and control points are inserted into the priority queue.

    priority queue status 3 in the figure shows the status of the priority queue after

    insertion. Note that the distance from control point D, B, CD to query point is closer

    O

    root node

    A

    B

    C D

    A

    B C

    D

    O

    CA CB

    CC CD

    CA

    CB CC

    CD

    rectangle 1

    peer 1

    peer 345

    peer 1567

    peer m

    O

    A

    B

    C

    D

    CA

    CB

    CC

    CDquery Q

    q

    A BC D

    priority queue status1:

    A BCD D

    priority queue status 2:

    Wcdist1Wcdist2

    BCD D

    priority queue status 3:

    rect0

    Brect1 D

    priority queue status 4:

    rect0

    BD

    priority queue status 5:

    rect0

    priority queue status 6:

    rect0

  • 8/14/2019 Distance Join Processing in a P2P World

    11/25

    11 | P a g e

    than that of rectangle 0, thus, rectangle 0 is at the end of the priority queue. Now

    peer345 sent the reply message back along with the spatial object rectangle 1 to the

    query initiating peer. After this iteration, the status of the priority queue is shown as

    priority queue status 4. Now, there is a spatial object becoming the head of the

    queue. So it will be the nearest spatial object with respect to query point q. Thealgorithm can now stop or proceed as needed. Because neither do both control points

    B and D possess any spatial objects nor their children, when the reply messages

    corresponding to them are returned, B and D are simply deleted. The nearest neighbor

    query stops automatically when the priority queue is empty.

    3.2.2. Implementation

    Table 2 shows the implementation details of algorithm for nearest neighbor

    query.

    Item Implemented Extra

    Routing

    (Chord)

    Basic data structures This project does not deal with the

    issues that arise when node join or

    leave the Chord network, only

    routing is dealt with. Caching

    mechanism in Chord is NOT

    implemented.

    find_predecessor

    find_successor

    Indexing Basic data structures Quadtree, control point. Quadtree

    node, rectangle, Fmin, Fmax, etc.

    Algorithm Basic data structures Implementation strictly follows the

    protocol defined in the original paper

    [8]. Data structures include priority

    queue, two types of queue elements,

    etc.

    InitiateNNQuery(query q)

    GetSortedControlPoints (q, fmin)

    FindControlPoint (q, fmin)

    SendMessagesWithin(WCDist)

    DoNNQuery(control point u)

    CreateReplyMessage

    SendMessageBack(msg)

    Synchronized ReceiveNNMessage(message msg)UpdateWCDist()

    Table 2. Implementation details for nearest neighbor query

    4. Distance Join Algorithm for P2P Networks

    4.1. High Level Description

    Distance join algorithm is working on two sets of spatial data. The goal of the

    algorithm is to try to find the closest pair of spatial objects from two spatial data sets.Such type of searching has great potentials in real life. Imaging at a weekend, one

  • 8/14/2019 Distance Join Processing in a P2P World

    12/25

    12 | P a g e

    wants to go out for dinner and watch a

    great movie then. The first mind off

    the top of his/her mind is to try to find

    a restaurant with a cinema nearby. The

    shorter the distance between the twothe better (no one wants to drive a long

    way to watch a movie after having

    dinner). Finding the closest

    cinema-restaurant pair is one possible

    application of distance join algorithm.

    One straightforward approach is

    to retrieve all the spatial objects in data

    set 1 and data set 2, and compute the

    Cartesian product of the two sets,

    order the result in increasing orderbased on distance. The first pair in the

    ordered result is the closest pair. This

    is clearly not suitable for a large P2P

    network with extremely huge amount

    of spatial objects distributed among

    the machines in the network. Several

    works has been done regarding to

    distance join algorithms [9, 10].

    However, the algorithms proposed

    only work in a centralized

    environment and algorithm proceeds

    sequentially. To fully exploit the

    advantages of P2P networks, extra

    work has to be done.

    Chord, distributed quadtree

    index and priority queue based

    approach, all three form the essence of

    the newly proposed distance join

    algorithm for P2P networks. Similarwith the proposal in [10], the query

    initiating peer maintains the priority

    queue and acts as a query processing

    front. Two pieces of information are

    crucial for forwarding a distance join

    query in query initiating peer. One is

    the information about how quadtree

    partitions underlying spatial data. The

    other is the information about 4

    children of a control point. The former

    JoinInit(QuadTreeNode root1,QuadTreeNode root2)

    {

    PQueue=new PriorityQueue()

    MessageCacheList=new List();

    controlpoint1=GetRootControlPoint(root1)

    controlpoint2=GetRootControlPoint(root2)

    SendMessageTo(controlpoint1,id)

    SendMessageTo(controlpoint2,id)

    }

    ProcessReply(ControlPoint u,id)

    {

    msg=CreateReplyMessage(id

    msg.Put (D(u).list)

    for i = 1 to 4 do

    if (D(u).di > 0) then msg.Put (C(u, i))

    SendMessageBack(msg)

    }

    Synchronized RecvMessage(Message msg)

    {

    if MessageCacheList.contains(msg.id) then

    doCombine(msg, MessageCacheList.get(id))

    PQueue.deque(msg,MessageCacheList.get(id)

    else

    MessageCacheList.add(msg);

    Return;

    for each new pair P generated from doCombine do

    Pqueue.add(P)

    WCDist=UpdateWCDist()

    for each element pair E in Pqueue do in parallel

    { i f E.Dist

  • 8/14/2019 Distance Join Processing in a P2P World

    13/25

    13 | P a g e

    is implicitly known by every peer in the P2P network, thus no communication is

    required. The latter is automatically obtained after distributing the quad-CIF tree

    among the machines in the P2P network (mentioned in section 2.2.1, each control

    point contains information in the form: () = ({ , , , }, ,) ,( )). Therefore, it is

    very easy for a query initiating peer to forward the distance join query from root nodedown on the quadtree. Figure 8 is the pseudo code for P2P distance join algorithm.

    Initially, there is only one pair in the priority queue, namely, the root control point of

    each quadtree. As the algorithm proceeds, pairs of control points and spatial objects

    are inserted into the priority queue. Thus, four types of queue element exist, (spatial

    object, spatial object), (spatial object, control point), (control point, spatial object),

    (control point, control point). The processing of a pair in the query initiating peer

    must be strictly synchronized in the sense that messages that are sent as a pair must be

    processed together. In the P2P distance join algorithm, elements in priority queue are

    control points and objects pair. As algorithm proceeds, pairs of messages are sent. The

    reply messages corresponding to paired-messages sent previously must be handledtogether. However, due to the uncertainty in communication delay, reply messages

    may arrive at query initiating peer at arbitrary time. Therefore, for handling reply

    messages pairwise, extra work has to be done. My solution is giving the messages that

    are sent in pair a unique ID and caching the single message to which that hasnt

    received a paired reply message. Whenever a reply message with the same ID as the

    cached one is received, we can say that the two replay messages are in one pair, thus

    they can be handled together. This strict synchronization property of pairwise

    message processing guarantees that the new pairs generated from doCombine will not

    contain redundant pairs. As shown in the algorithm, pairs in the priority queue are

    contacted in parallel rather than sequentially. The newly defined variable WCDist is

    used here to be a criterion to determine which pairs are contacted. The procedureUpdateWCDist updates the WCDist in the following way: let D be the maximum

    O

    A B

    C D

    CA CB

    CC CDrectangle X1

    Status1:BA

    BDrectangleY1

    SETX SETY

    A B

    BCHead

    Tail

    Status2:

    SETX SETY

    Tail CD

    CD

    BA

    BD

    A B

    BA

    BD

    rectX0

    Head

    Status3:

    Tail CD

    CD

    BA

    BD

    SETX SETY

    rectX0

    BD

    rectX0

    Status4:

    Tail CD

    CD

    BA

    BD

    SETX SETY

    rectX0

    rectY0

    Figure 9. Process of distributed distance join algorithm

  • 8/14/2019 Distance Join Processing in a P2P World

    14/25

    14 | P a g e

    distance between the items of a pair that is in the head of the priority queue and is

    none-object-object pair. And let d be the maximum distance between the spatial

    objects of the first object-object pair (if any) found in the priority queue (can not be

    the first, because as soon as found in head, it will be retrieved as the next closest pair).

    Then WCDist=Min(D,d). Then for those pairs in the priority queue whose distancebetween the two items in the pair is less than or equal to WCDist is contacted in

    parallel, which makes this algorithm distinct from the traditional sequential algorithm.

    Figure 9 shows a simple case to demonstrate the distance join algorithm. There

    are 2 sets of data, depicted using two different colors. Rectangles X0, X1 belong to

    dataset X. Rectangles Y0, Y1 belong to dataset Y. At the beginning, procedure JoinInit

    is called at query initiating peer. As shown in the pseudo code, peers that own the root

    control point of each data set are first contacted; in this case, two control points O of

    two data sets. Two distance join initialization messages are sent with the same unique

    ID (for processing messages in pair). Whenever a peer receives a distance join related

    message procedure ProcessReply is called, it will put any spatial objects along withany children control points which contain spatial objects in a reply message and sent it

    back to the query initiating peer. Procedure RecvMessage is called at query initiating

    peer upon receiving a reply message. Due to the fact that reply messages

    corresponding to pairwise sent messages can be delay randomly, for being able to

    process the messages in pair, a message cache is used to temporarily store the early

    arrived reply message (the unique id is used to pair messages). Assuming reply

    message from peer that owns control point O of data set X arrives first, and that of

    data set Y arrives second. The algorithm then finds the paired reply messages, and

    calls procedure doCombine to generate new pairs from the reply messages. After

    processing the messages, it deletes the processed element from the queue. Now one of

    the possible statuses of the priority queue is denoted as Status 1 in figure 9 (it also

    can be (A,B),(C,B), because the distance between control block A and B is equal to

    that of C and B). Then the worst case distance WCDist is calculated, the result is

    denoted in the figure as WCDist1 which is the maximum distance between control

    block C and B. Then pairs in priority queue whose distance between two items in the

    pair is less than or equal to WCDist1 are contacted. Thus peer that has control point C

    in data set Xand peer that has control point B in data set Yare contacted. The same for

    pair (A,B). Until now, the first iteration of the algorithm finishes. Note that same

    control points in one data set may appear in more than one pair in the priority queue,thus potentially will be contacted multiple times, which causes communication

    overheads. To overcome the problem, the results of previously contacted control

    points are stored locally in the query initiating peer for eliminating unnecessary

    communication. In the next iteration, assuming paired reply messages for (C,B) arrive

    first (algorithm works correctly if paired reply messages for (A,B) arrive first). Status

    2 in figure 9 shows the content of priority queue after receiving reply messages for

    (C,B). Status 3 shows the content after receiving reply messages for (A,B). Note that

    a new iteration may begin when the queue is in Status 2 where the previously

    contacted pair (A,B) will not be contacted again. Assuming the new iteration begins

    after Status3. The corresponding updated WCDist is denoted as WCDist2 in the

  • 8/14/2019 Distance Join Processing in a P2P World

    15/25

    15 | P a g e

    figure, which is the maximum distance between rectangle X0 and control block BA.

    Again, pairs in the priority queue that satisfy the worst case criterion are contacted. In

    this case, all 4 pairs are contacted. For the reason of clarity and simplicity, we only

    look at pair (rectX0, BA). When the reply messages for control point BA is received,

    after calling procedure doCombine, the content of the queue is denoted in the figure asStatus 4. As shown in the figure, an object-object pair appears at the top of the

    queue; this is the closest pair in two different data sets. Once such a pair is found, it is

    retrieved immediately and the algorithm should allow the users to determine whether

    to proceed or stop the distance join algorithm.

    The simple example described previously started the query from the root control

    point of each data set. The distributed quadtree index allows spatial data to be inserted

    from Fmin level in the quadtree rather than from root level which is the same as when

    Fmin=0. Therefore a slight modification of the algorithm is needed to allow query to

    start from Fmin level rather than root level to avoid communication overheads when

    forwarding query from level 0 to Fmin level.

    4.2. Implementation

    Table below shows the implantation details of P2P distance join algorithm.

    Item Implemented Extra

    Routing

    (Chord)

    Basic data structures This project does not deal with the issues that arise

    when node join or leave the Chord network, only

    routing is dealt with. Caching mechanism in Chord

    is NOT implemented.

    ind_predecessor

    ind_successor

    Indexing Basic data structures Quadtree, control point. Quadtree node, rectangle,

    Fmin, Fmax, etc..

    Algorithm Basic data structures Implementation strictly follows the protocol defined

    in the original paper [6]. Data structures include

    priority queue, four types of queue element, queue

    operations, etc.

    But the priority queue only allows sequential access,

    but implementation allows contacting multiple peers

    in parallel.

    Implementation only allows distance join query to

    start from root level rather than from Fmin level.

    Implementation allows caching the results of

    previously contacted control points.

    oinInit(QuadTreeNode

    root1,QuadTreeNode root2)

    MessageCacheList

    SendMessageTo(controlpoint,id)

    ProcessReply(ControlPoint u,id)

    CreateReplyMessage(id)

    SendMessageBack(msg)

    Synchronized RecvMessage(Message msg)

    doCombine(msg1,msg2)

    PQueue.deque(msg.id)

    UpdateWCDist()

    Table 3. Implementation details of P2P distance join algorithm

    5. Experiments

  • 8/14/2019 Distance Join Processing in a P2P World

    16/25

    16 | P a g e

    5.1. Experimental Environment

    Figure 10. Example of transit-stub model

    In the experiment part, J-Sim (www.j-sim.org) is used for simulation

    environment. Because there are no random factors which may result in differences in

    testing results for the same test case, for each test case I run the test for only once.

    There are several assumptions that my experiments are based on: 1. No packets

    lost during communication; 2. Query response time are introduced mainly for the

    reasons of messages propagation delay; 3. The P2P network is extremely stable that

    during the entire progress of the experiments no node will leave or join the network

    and no node will randomly crash. By defining such assumptions, I actually create an

    ideal world to measure the performance of this algorithm in ideal state.Before conducting experiments, network topology and test data sets must be

    prepared. For network topology, I create a static topology for each test case, which is

    similar to Transit-Stub model [16] as shown in figure 10, where intermediate nodes

    can be regarded as transit nodes and nodes shown on the edge can be regarded as stub

    nodes. In real life, transit domains can be thought as the metropolitan area networks

    and transit nodes play the role of internet service provider. Stub domains resemble

    networks within different organizations, companies, campuses, etc. Table 4 gives the

    physical characteristics of the underlying network used in J-Sim. All of the test

    parameters are chosen to closely reflect the real world scenario. Some of them are

    statistics generated from Rogers Communications Inc [17].

    transit node

    stub node

    Transit domain2Transit domain1

    Transit domain3

    stub domain

  • 8/14/2019 Distance Join Processing in a P2P World

    17/25

    Tabl

    For test data sets, ob

    found to generate near

    distribution in urban regi

    in urban region in Melbcan only yield unifor

    performance of this algori

    types of data studied in t

    Zipfian distribution [18].

    Zipfian distribution. For a

    them sharing a centroid

    spatial objects are distribin the inner square ring i

    Parameter

    Network delay in loc

    Network delay betw

    Network delay betw

    Network delay betwBandwidth in local a

    Bandwidth between

    Bandwidth between

    Bandwidth between

    F

    4. Physical parameters for underlying network

    taining real life data can be tricky. Thus a s

    real life test data sets, for example, all

    n in Melbourne and all the seven-eleven con

    urne. Merely adopting random functions ply distributed data which cannot reflec

    thm towards real world. According to Zipf's

    e physical and social sciences can be appr

    My test data sets are generated roughly

    2D region, it is divided into 8 square rings

    the innermost one becomes a square). A fi

    ted in the following manner: the number os roughly twice as many as that of in its i

    Value

    al area network 10

    een stub nodes 40

    een stub node and transit node 200

    een transit nodes 200rea network 54

    stub nodes 100

    stub node and transit node 100

    transit nodes 1000

    igure 11. Sample test data with 400 spatial object

    17 | P a g e

    lution must be

    the restaurants

    venience stores

    ovided by APIt the genuine

    law [18], many

    ximated with a

    following the

    ith each one of

    xed number of

    spatial objectsmediate outer

    Unit

    ms

    ms

    ms

    msMbps

    Mbps

    Mbps

    Mbps

  • 8/14/2019 Distance Join Processing in a P2P World

    18/25

    square ring; and within a

    spatial data. By doing this

    the 2D region while spars

    life data distribution. Figu

    that follows Zipfian distriGenerally speaking,

    parameters: Fmin; num

    simultaneously initiated;

    said to be finished when t

    Besides, peers are a

    from each stub domain is

    5.2. Results

    5.2.1. Different Fmin:

    The first experimenpeers in the network, whi

    set contains 200 spatial o

    10 and Fmax is set to 9.

    point of failure. With Fmi

    level or deeper in the qua

    node. Multiple peers in th

    the effects of increasing

    are split into smaller piec

    resulting in increasing the

    complexity to become bi

    before actual spatial data i

    As can be observed

    processing time so much,

    roughly steady. However,

    This is due to the longer

    are forwarded from rootspatial objects are actuall

    0.000

    5.000

    10.000

    15.000

    20.000

    25.000

    30.000

    35.000

    Av

    erageResponseTime

    (seconds)

    Figur

    ertain square ring, random function API is

    , spatial objects are densely distributed in th

    ely distributed in the outer region, which si

    re 11 shows one example of 400 spatial obj

    ution.the experiments are conducted by changin

    er of peers in the P2P network; num

    umber of spatial objects in each data set. T

    e top 10 closest pairs are found.

    most equally allocated to stub nodes and nu

    oughly the same.

    t examines how Fmin affects the algorithm.h are uniformly distributed in the stub dom

    jects. The number of simultaneous client r

    he philosophy behind the variable Fmin is

    n, the spatial objects are forced to be inserte

    dtree. Therefore, queries are no longer proc

    e network are contacted as soon as the queri

    min will be that as Fmin increases the bigge

    s and pieces of objects are falling deeper do

    height of the quadtree, which in turn cause

    ger. Another effect is that more messages

    s retrieved which causes overheads in comm

    in the figure 12, different Fmins do not aff

    as Fmin increase, the average response tim

    as Fmin reaches its maximum, a slight incre

    uery messages propagation delay introduce

    level to Fmin level in the distributed quay located. For the first few Fmins, there i

    24.033 24.677 24.365 24.200 23.392 24.95926.554

    29.0063

    0 1 2 3 4 5 6 7

    Fmin

    Changing Fmin

    12. Average query response time as Fmin increases

    18 | P a g e

    sed to generate

    central area in

    ulates the real

    cts distribution

    the following

    er of queries

    he one query is

    ber of queries

    There are 200ains. Each dada

    quests is set to

    to avoid single

    d into the Fmin

    ssed from root

    es start. One of

    spatial objects

    n the quadtree

    s the algorithm

    have to be sent

    unication.

    ect the average

    e curve remain

    se is observed.

    d when queries

    tree where theno significant

    1.060

    8

  • 8/14/2019 Distance Join Processing in a P2P World

    19/25

    19 | P a g e

    difference in average response time, which is because: 1. For finding the first 10

    closest pairs is quite different from that of finding all the pairs; 2. Fmin doesnt affect

    the test data set significantly before it is reaching a certain value due to the fact that

    the test data set contains many smaller spatial objects; 3. Even if spatial objects are

    split into smaller pieces which will cause communication overheads (shown in figure13), the parallel communication property of the algorithm compensates for such

    overheads with regard to average response time.

    Figure 13. Average number of messages for finishing one query as Fmin increases

    Figure 13 shows the variation in the number of messages per query (each query

    finds the first 10 closest pairs) as Fmin increases. As expected, number of messages

    increases when Fmin increases. For the first few cases, Fmin doesnt affect the

    number of messages so much. However, as it reaches 5, there is a relatively steep

    increase due to the fact that the underlying 2D space is divided into so many tiny

    squares and hence the increase in height of the distributed quadtree.

    For different Fmins, figure 14 shows the load distribution in terms of the

    standard deviation. As can be observed, as Fmin increases, the standard deviation

    drops gradually which means the load among peers in the network tends to be more

    balanced.

    Figure 15 shows the actual load for peers in the network. There are 15 slots on

    5,975 7,020 7,782 9,40313,550

    26,361

    46,691

    71,769

    120,302

    0

    20000

    40000

    60000

    80000

    100000

    120000

    140000

    0 1 2 3 4 5 6 7 8M

    essagesPerRequest

    Fmin

    Average Number of Messages

    0

    5

    10

    15

    20

    25

    0 1 2 3 4 5 6 7 8StandardDeviationinLoad

    Fmin

    Standard Deviation for Fmin

    Figure 14. Standard deviation of number of messages for finishing one

    query as Fmin increases

  • 8/14/2019 Distance Join Processing in a P2P World

    20/25

    20 | P a g e

    the x-axis with each of them representing a number-of-message-range a certain

    number of peers have received for finishing 10 queries. Each of the slots potentially

    has 9 bars indicating load for different Fmin. For example, if one wants to know the

    load distribution for Fmin=0, then he/she needs to see the first bar in every slot. As

    shown in the figure, there are around 80 peers in the network which get less than orequal to 10 messages; and around 7 peers which got more than 10 but less than or

    equal to 20 messages, etc. There is a general trend can be seen, as the Fmin increases,

    more and more peers in the network handle more messages. When Fmin=0, 81 out of

    200 peers handle less than 10 messages, no peer handles more than 5120 messages.

    While when Fmin come to 8, only 14 peers in the network handle less than 10

    messages, 47 out of 200 peers handle more than 5120 messages totally. Load is

    increasing along with the increase of Fmin, However, load is roughly uniformly

    distributed among the network.

    Figure 15. Load distribution for finishing 10 queries with different Fmins

    5.2.2. Distributed VS Sequential:

    The most prominent advantage of the P2P distance join algorithm over the

    traditional distance join algorithm is that it will contact the relevant peers in a parallel

    manner rather than a sequential manner, which enables it to exploit the parallelism ofP2P network. Figure 16 gives the comparison of experiment results between parallel

    algorithm and sequential algorithm. As shown, parallel algorithm gives a steady curve.

    The average response time isnt affected significantly by increasing Fmin; while the

    sequential one fluctuates severely, because the elements in the priority queue are

    handled one by one. Besides, different Fmins will cause the uncertainty in spatial

    objects distribution when partitioning them using the distributed quadtree, which

    gives the uncertainty in average response time. Without surprise, the parallel one

    works much better than the sequential one from the response time point of view.

    Next several experiments will examine how well the P2P distance join

    algorithm scales with respect to increasing the number of peers, the number of

    simultaneous queries and the number of spatial objects.

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    NumberofPeers

    Slots of Number of Messages

    Load Distribution for Different Fmin (finish 10 queries)

    fmin=0

    fmin=1

    fmin=2

    fmin=3

    fmin=4

    fmin=5

    fmin=6

    fmin=7

    fmin=8

  • 8/14/2019 Distance Join Processing in a P2P World

    21/25

    5.2.3. Different Num

    The first experiment

    of peers in the network. F

    in the region; number of s

    pairs found account for fi

    As shown in the fig

    time remains roughly stea

    due to the fact that as th

    located at more peers, th

    query to finish.

    Figure 17. Ave

    0.000

    5.000

    10.000

    15.000

    20.000

    25.000

    30.000

    AverageResponseTime(seconds)

    24.033

    404.13

    0.000

    100.000

    200.000

    300.000

    400.000

    500.000

    600.000

    0

    AverageResponseTime(seconds)

    Figure 16. Avera

    c

    er of Peers:

    examines how the algorithm scales with inc

    in is set to 2; Fmax is set to 9; there are 20

    imultaneous queries is set to 10; and only the

    ishing 1 query. The result is shown in figu

    ure, as the number of peers increases the a

    y, although there are tiny increase in averag

    re are more peers in the network, 200 sp

    refore, more hops in the Chord network a

    age response time per query as number of peers increas

    21.00722.796

    25.014 25.53827.498

    200 400 600 800 1000

    Number of Peers in the Network

    Changing Number of Peers

    24.677 24.365 24.200 23.392 24.959 26.554 29.006 31.06

    446.853425.328

    492.621

    449.236

    196.662171.358

    277.483

    269.2

    1 2 3 4 5 6 7 8

    Fmin

    arallel One VS Sequential One

    Sequential Distance

    Algorithm

    ge response time per query for P2P distance join algorit

    mparison to centralized sequential algorithm

    21 | P a g e

    reasing number

    spatial objects

    first 10 closest

    e 17.

    erage response

    e response time

    tial objects are

    e needed for a

    s

    4

    Join

    m in

  • 8/14/2019 Distance Join Processing in a P2P World

    22/25

    22 | P a g e

    5.2.4. Different Number of Simultaneous Queries:

    The second scalability experiment examines how well the algorithm scales as

    the number of simultaneous queries increases. Again, Fmin is set to 2; Fmax is set to 9;

    there are 200 spatial objects in the 2D space; number of peers in the network is set to

    200; and only the first 10 closest pairs found account for finishing 1 query. The resultis shown in figure 18. In the figure, there is a drop at the beginning. One possible

    reason that introduces the drop in average response time is that most of the queries are

    forwarded to the same peers that previously forwarded the same messages. However,

    the rest of the curve remains steady.

    Figure 18. Average response time per query as number of query increases

    5.2.5. Different Number of Spatial Objects

    The last experiment examines how well the algorithm performs with the

    increasing number of spatial objects. With fixed number of peers in the network, as

    more and more spatial objects are inserted into the network, for one single peer, there

    must be an increase in the number of spatial objects allocated to it, which will reduce

    the number of hops a query needs to be forwarded in the Chord network to fetch

    needed spatial objects before the first 10 closest pairs are returned. In this experiment,

    Fmin is set to 2; Fmax is set to 9; number of peers in the network is set to 200; the

    number of simultaneous queries is set to 10; and only the first 10 closest pairs found

    account for finishing 1 query.

    Figure 19. Average response time per query as number of objects increases

    26.153

    24.36524.703

    24.966 24.928

    23.000

    23.500

    24.00024.500

    25.000

    25.500

    26.000

    26.500

    5 10 20 40 80

    AverageResponseTime

    Number of Simultaneous Queries

    Changing Number of Queries

    25.497

    22.717

    23.555

    21.81321.404

    19.000

    20.000

    21.000

    22.000

    23.000

    24.000

    25.000

    26.000

    200 400 600 800 1000AverageResponseTime

    (seconds)

    Number of Spatial Objects

    Changing Number of Spatial Objects (response time)

  • 8/14/2019 Distance Join Processing in a P2P World

    23/25

    23 | P a g e

    Figure 19 shows the result. As expected, as the number of spatial objects

    increases, the general trend in average response time is in a decreasing pattern

    regardless of a sudden increase when the number of objects is set to 600, which is

    possible for the reason of the randomness in distribution of spatial objects among the

    machines in the P2P network.Although the average response time decreases, as more and more spatial objects

    are inserted into the network, the number of messages generated for finishing one

    query is in an increasing pattern (shown in figure 20). The reason is intuitive. As more

    spatial objects are inserted, more quadtree blocks (control points) are needed to be

    inserted into the network including both the quadtree blocks (control points) that

    contain spatial objects or those whose children contain spatial objects. Therefore,

    either the distributed quadtree is becoming fuller or the height of the quadtree is

    increasing. In either case, more messages are needed to finish one query.

    Figure 20. Average messages per query as number of objects increases

    6. Conclusion and Future Work

    P2P paradigm is absolutely a trend in todays network development. More and

    more people start to use applications that employ P2P technology. However, complex

    queries on spatial data over P2P networks can be difficult to achieve. The P2P

    distance join algorithm examined in this report fully exploits the advantages of P2P

    networks. In this project, I did heaps of research on the unpublished P2P distance joinalgorithm and made one implementation of it as well as 2 other algorithms, range

    query and nearest neighbour query. At the end, several experiments have been

    conducted to examine different aspects of the P2P distance join algorithm. The results

    of experiments show that the distance join algorithm works pretty well in a 2D

    environment with respect to average response time. The variable Fmin proposed in

    the original paper [8] is very important to this algorithm. Finding an appropriate Fmin

    so that single point of failure will not likely to happen and meanwhile the number of

    messages generated for finishing one single query isnt overwhelming, isnt a trivial

    task. However Fmin and Fmax do give a lot of flexibility to the applications built on

    top of it.

    The P2P distance join algorithm implemented for experiments always starts

    7,480

    16,471

    27,64832,512

    45,652

    0

    10,000

    20,000

    30,000

    40,000

    50,000

    200 400 600 800 1000AverageNumberofMessages

    PerRequest

    Number of Spatial Objects

    Changing Number of Spatial Objects (messages/request)

  • 8/14/2019 Distance Join Processing in a P2P World

    24/25

    24 | P a g e

    query from root control points of 2 data sets, which causes communication overheads

    from passing down the query form level 0 to level Fmin in the distributed quadtree.

    This problem can be solved by allowing the query to start from Fmin level rather than

    0 level. In real life applications, other query criteria can be applied, such as giving a

    query range, within which find the closest pair or allowing the users to specify twocertain types of data sets that are in users interest.

  • 8/14/2019 Distance Join Processing in a P2P World

    25/25

    25 | P a g e

    References

    [1]. Front Page of Business Link. Business Link Web Site. [Online]

    http://www.businesslink.gov.uk.

    [2]. Wilson, Jim. Front Page of National Aeronautics and Space Administration.NASA Official Web Site. [Online] http://www.nasa.gov.

    [3]. Front Page of National Institutes of Health. Official Web Site of National

    Institutes of Health. [Online] http://www.nih.gov.

    [4]. Front Page of National Geospatial Intelligence Agency. Official Web Site of

    National Geospatial Intelligence Agency. [Online] http://www.nga.mil.

    [5]. Front Page of National Institute of Justice. Official Web Site of National

    Institute of Justice. [Online] http://www.ojp.usdoj.gov/nij.

    [6]. Egemen Tanin and Deepa Nayar. An Efficient Distributed Distance Join

    Algorithm for Peer-to-Peer Networks.

    [7]. Raphael Finkel and J.L. Bentley. Quad Trees: A Data Structure for Retrieval onComposite Keys. Acta Informatica 4 (1): 1-9.

    [8]. E. Tanin, A. Harwood, H. Samet, D. Nayar, and S. Nutanong. Building and

    querying a P2P virtual world, Geoinformatica, 2006, 10(1):91-116,.

    [9]. G.R. Hjaltason and H. Samet. Index-Driven Similarity Search in Metric Spaces,

    ACM Tran. On Database Systems, Dec 2003, Vol.28, No. 4, pp. 517-580.

    [10]. G.R.Hjaltason and H.Samet, Incremental. Distance Join Algorithms for Spatial

    Databases, Proc. Of the ACM SIGMOD Conference, Seattle, WA, 1998, pp.

    237-248.

    [11]. E. Tanin, A. Harwood and H. Samet. A distributed quadtree index for

    peer-to-peer settings, in Proceedings of the IEEE International Conference on

    Data Engineering, Tokyo, Japan, April 2005, pp. 254-255.

    [12]. Gershon Kedem. The Ouad-ClF Tree:A Data Structure for Hierarchical On-Line

    Algorithms, University of Rochester Rochester, New York 14627.

    [13]. Raphael Finkel and J.L. Bentley. Quad Trees: A Data Structure for Retrieval on

    Composite Keys, Acta Informatica 4(1): 1-9.

    [14]. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari

    Balakrishnan. A scalable peer-to-peer lookup service for Internet applications,

    in Proceedings of the ACM SIGCOMM 01, San Diego, CA, August 2001, pp.

    149-160.[15]. Secure Hash Standard, FIPS PUB 180, by US government standards agency

    NIST (National Institute of Standards and Technology).

    [16]. Zegura EW, Calvert KL and Donahoo MJ. A quantitative comparison of

    graph-based models for Internet topology. IEEE/ACM Trans. on Networking,

    1997, 5(6):770-783.

    [17]. Looking Glass and Network Information. Rogers Communications Inc. [Online]

    https://supernoc.rogerstelecom.net/ops/.

    [18]. G.K.Zipf. Human Behavior and the Principle of Least-Effort,

    Addison-Wesley ,MA, 1965.