department of computer science - state university of new york, binghamton1 non-uniform information...

Department of Computer Science - State University of New York, Binghamton

1

Non-Uniform Information Dissemination for Dynamic

Grid Resource Discovery

Vishal IyengarSameer Tilak

Michael J. LewisNael B. Abu-Ghazaleh

2

Outline Motivations Resource Discovery and Information

Dissemination Approach Non-uniformity Non-Uniform Protocols New Dissemination Protocols Results Directions for the future

3

Motivations Information is at a premium in Grids

The scale of the Grid is on the way up – potentially, millions of hosts, each with several services/objects

Each service/object needs resources The requirement can be specific or general Critical parameters decided by the

service/object

How do we make information and/or meta-information available across the Grid?

4

Resource Discovery Problem: How to find (efficiently) resources, that conform to certain

specifications provided by a Grid Client?

Harchol-Balter et al. [1], suggested algorithms (Name-Dropper) Location service for nodes in a network

Condor Matchmaking [3], uses class ads to match requirements with resources

Class Ads are Name-Value pairs Sent and Matched passively Our work similar to their flocking mechanism

Maheswaran et al. [2], suggested protocols, similar to the ones we propose, for information dispersal

Use data dissemination efficiency to tune the system We feel that including topology and resource variations could be useful

5

Synopsis Designed and studied 4 information dissemination

protocols Two are implementations of randomized protocols from

Sensor Networks, applied to the Grid context Two others build on the above randomized protocols,

bringing some intuition to the dissemination

Simulations involve different topologies and different data models to get a broader perspective

Emphasis is on reducing Network Overhead while keeping Error within reasonable limits

Salient feature is the configurability of each protocol to system needs

6

Information Dissemination So, in order to find resources, we need to

Know about them Allow every node in the Grid to participate equally in this

process Do so without utilizing too much Bandwidth

Iamnitchi et al. [4], suggested that a peer to peer approach might be useful

Reactive approach driven by queries

We agree. But there may be other ways to use these P2P networks to get information to where it may be used

Proactive approach based on data and its dispersal Eventually, hope for an equilibrium between reactive and

proactive

7

The Approach

Our approach borrows from a related problem domain – Sensor Networks

We think of Grid environments as having some resemblance to Sensor Networks

Both have large number of nodes that need to behave in P2P fashion

Each node gathers information that needs to be known to the others

Tilak et al. [4], proposed the use of Non-uniformity to disseminate data among nodes in a Sensor Network

8

Non-Uniformity In our context, Information has 2 main dimensions

Temporal – How old is it ? Spatial – How far is the source ?

We try to relate these two by using the following premise

Any application to be scheduled should be as close to its origin as possible reduces the overhead of sending data to a remote location

We propose to Let neighbors know about each other more

frequently and/or with more detail Let far-off nodes know less about each other and/or

with less frequency

9

Non-Uniform Protocols

Randomized protocols that take the decision to forward probabilistically. We incorporated these in the Grid context

Unbiased – Each data item is forwarded with a probability X, and discarded with a probability 1-X

Biased – Each data item is forwarded with a probability inversely proportional to its distance from source

Based on these protocols and their probabilistic forwarding policies we built 2 more dissemination protocols

10

New Dissemination Protocols Change Sensitive Protocol (CSP)

Resources and their availability change constantly

Those that change too rapidly or too slowly are not useful farther away from the source

If they change too fast, far-off nodes will hear about these too late too unstable for scheduling purposes

If they change slowly then, sending this information is wasteful bandwidth waste

So, aggressively propagate only information that changes moderately

Propagate fast and slow changing information less often

The trick here is in defining what is slow and fast changing Different data models simulated

11

New Dissemination Protocols (Contd.) Priority Dissemination Protocol (PDP)

In the preceding protocols, the intermediate nodes make forwarding decisions for every data item they process

But, what about Site Autonomy? Providers should be able to decide where information about their resources is seen

So, we suggest a protocol in which each source decides the priority of its information

Intermediate nodes abide by these Certain high-end/unique resources may need more coverage

otherwise requests for them will not be satisfied Others are easily available and can do with lesser coverage Useful in commercial Grids with accounting capability ?

12

Factors affecting Dissemination Protocols decide the dissemination policy

Topology (Connectivity) of hosts makes a difference

Used different topologies to see interaction with the protocols and effect on dissemination

Topologies used included Waxman, Locality-Based, Pure Random etc., created using the GT-ITM tool

Variation in resource information representative of different resources

Faster vs. Slower changing models to test CSP Models used a simple Monotonic Step function, a

Gaussian distribution, Poisson distribution etc.

13

Experimental Setup A prototype implementation in JAVA that has been

tested on smaller clusters (16 – 32 nodes)

SSFNet Simulation testbed Used to design large scale networks In our context, 100-150 nodes may represent few thousand

end-hosts

Base case implementation Flooding Compare our protocols against it

Probabilistic protocols were implemented along with CSP and PDP

14

Performance Metrics Error - Absolute and Weighted

Host A’s local view of Host B compared to actual value at Host B

Weighted with inverse of distance between them

Network Overhead Total number of bytes exchanged over the network

sum of the bytes sent by each host

15

Results Topology – Waxman 100 Nodes RIVM – Monotonic Step

Flooding has the least values but pays for it with a very high network overhead

Unbiased with p = 0.5, cuts the overhead by a thirds but has a correspondingly high Overhead

16

Results Topology – Waxman 100 Nodes RIVM – Poisson Distribution

A similar trend with the Poisson Distribution as with the previous data model

Flooding doesn’t out perform others in terms of Error here because of difference in Data models – here values don’t change as often

17

Results Topology – Pure Random 150

Nodes RIVM – Uniform Distribution

The protocols provide a range of trade-off points between Error and Overhead

Another point to note – Biased works well for the first few hops while Unbiased 0.8 works well as the number of hops increases

18

Results Overhead comparison for each RIVM across different topologies

Monotonic Step Function Uniform Distribution

The protocols follow similar trends across different topologies and different data models

19

Results Overhead comparison for each RIVM across different topologies

Gaussian Distribution Poisson Distribution

20

To Wrap Up … The idea of non-uniform dissemination is new

to the area of Grid computing

The P2P approach is useful and looks to the future of Grids

Results are promising and motivate further research in this field

21

Directions for the Future Design a framework that will use the information

disseminated to make scheduling decisions more efficient – Query Forwarding

Protocols that will use the feedback from the Query Forwarding framework to disseminate data smartly – Adaptive Protocols

(Hierarchical) Protocols that can do neighbor discovery in addition to information dissemination

Aggregation of data – forward one item which compacts the data from multiple sources … (we found this non-trivial)

22

References1. M. Harcol-Balter, P. Leighton, and D. Lewin. Resource discovery in

distributed networks. In Proc. of ACM PODS 1999, pages 229-237, 1999.

2. A. R. Butt, R. Zhang, and Y. C. Hu. A Self-Organizing Flock of Condors. SC '03, November 15-21, 2003, Phoenix, AZ.

3. M. Maheswaran and K. Krauter. A parameter-based approach to resource discovery in grid computing system. In GRID, pages 181-190, 2000.

4. S. Tilak, A. Murphy, and W. Heinzelman. Non-uniform information dissemination for sensor networks. In 11th IEEE International Conference on Network Protocols (ICNP'03), 2003.

5. Vishal Iyengar, Sameer Tilak, Michael J. Lewis, Nael B. Abu-Ghazaleh. Non-uniform information dissemination for Dynamic Grid Resource Discovery. 2004

department of computer science - state university of new york, binghamton1 non-uniform information...

Documents

motivations information

sensor network slide

useful slide

proactive slide

suggested protocols

future slide

grid context

dissemination simulations