internet networking spring 2006 tutorial 12 web caching protocols icp, carp

21
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

Post on 19-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

Internet Networking Spring 2006

Tutorial 12 Web Caching Protocols

ICP, CARP

Page 2: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

2

ICP - Internet Caching Protocol

ICP is Web caching protocol

ICP version 2 defined in RFC 2186

Message format used for communicating among Web caches

Used to exchange hints about the existence of URLs in neighbor caches.

Caches exchange ICP queries and replies

gather information to use in selecting the most appropriate location from which to retrieve an object

Page 3: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

3

ICPv2 Protocol specification Generally, Web caches use HTTP for the

transfer of object data

However, caches can benefit from a simpler, lighter communication protocol.

ICP is primarily used in a cache mesh to locate specific Web objects in neighboring caches.

One cache sends an ICP query to its neighbors.

The neighbors send back ICP replies indicating a "HIT" or a "MISS."

Page 4: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

4

ICP Implementation In current practice, ICP is implemented on top of UDP

There is no requirement that it be limited to UDP.

ICP over UDP offers features important to Web caching applications.

Query/reply exchange needs to occur quickly.

A cache cannot wait longer than that before beginning to retrieve an object.

Failure to receive a reply message means the network path is either congested or broken.

In either case we would not want to select that neighbor.

Page 5: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

5

Cache selection

ICP messages can also be used for cache selection

Failure to receive a reply from a cache

network or system failure.

The ICP reply may include extra information

Can assist selection of the most appropriate source from

which to retrieve an object.

Page 6: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

6

ICPv2 application specification RFC 2187

A single Web cache will reduce the amount of traffic

generated by the clients behind it

Similarly, a group of Web caches can benefit by

sharing another cache in much the same way

In a cache hierarchy (or mesh) one cache establishes

peering relationships with its neighbor caches

Page 7: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

7

Web Cache Hierarchies

Two types of cache relationship: Parent

A parent cache is essentially one level up in a cache hierarchy

Sibling

A sibling cache is on the same level

Neighbor (peer) Is either parent or sibling which is a single “cache-

hop” away

Page 8: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

8

A Simple Web Cache Hierarchy

Internet

Parent Cache

Local Cache Sibling Cache

Cache Clients

Hits Resolved

Hits and Misses Resolve

d

Direct R

etrievals

Page 9: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

9

Levels

The general flow of document requests is up the

hierarchy

When a cache does not hold a requested object

It may ask via ICP whether any of its neighbor caches

has the object.

If there is a ‘Hit’ then the cache will request it from

them.

Else the cache must forward the request either to a

parent, or directly to the origin server.

Page 10: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

10

Parent and Sibling Caches “Neighbor hit" may be fetched from either parent

or sibling cache

“Neighbor miss" may NOT be fetched from a

sibling.

In other words:

sibling relationship - can retrieve objects the

sibling already has cached.

parent relationship - can retrieve any object

regardless of whether or not it is cached.

Page 11: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

11

ICP Additional Delay Caches are designed to return ICP requests quickly. The application does minimal processing of the ICP

request Most ICP-related delay is due to transmission on the

network. ICP serves to provide an indication of neighbor

reachability. If ICP replies from a neighbor fail to arrive, it should not

be used at this time Network path is congested (or down) Cache application is not running on the ICP-queried neighbor

machine

ICP provides also some form of load balancing, because an idle cache can reply faster than a busy one.

Page 12: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

12

Determine whether to use ICP

Not every HTTP request requires an ICP query to be sent Obviously, cache hits will not need ICP because the request

is satisfied immediately

For origin servers very close to the cache, we do not want to use any neighbor caches

Some classes of requests the cache (or the administrator) may prefer to forward directly to the origin server

all non-GET request methods

URLs containing certain strings (e.g. “cgi_bin”)

Page 13: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

13

Source Selection

The cache sends queries to each peer.

In order to maximize the chance to get a HIT reply from

one of the peers, the cache waits for all ICP replies to be

received (query timeout is applied).

HIT reply - object retrieval commences immediately from

the replying peer.

When all peers MISS either parent cache or the origin

server is selected.

Page 14: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

14

Multicast for Efficient Distribution A cache may deliver ICP queries to a multicast address. Neighbor caches may join the multicast group to receive

such queries. But for multicast we have no way to know exactly how many

replies to expect ICP replies sent to unicast address:

Multicasting ICP replies would not reduce the number of packets sent.

It prevents other group members from receiving unexpected replies.

The reply should follow unicast routing path to indicate connectivity between the receiver and the sender since the subsequent HTTP request will be unicast routed.

Page 15: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

15

Differences Between ICP and HTTP

HTTP supports a rich and sophisticated set of

features.

ICP was designed to be simple, small, and

efficient.

HTTP request and reply headers consist of lines

of ASCII text.

ICP uses a fixed size header and represents

numbers in binary.

Page 16: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

16

CARP - Cache Array Routing Protocol Microsoft® Proxy Server 2.0 uses the Cache

Array Routing Protocol (CARP)

Series of algorithms that are applied on top of HTTP

Multiple proxy servers are arrayed as a single logical cache

Does not require a new wire protocol

Uses HTTP, compatible with existing firewalls and proxy servers

Page 17: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

17

Hash-based Routing Provides a deterministic "request resolution path"

through an array of proxies

The request resolution path

Hashing of proxy array member identities and URLs

For any given URL request, the proxy server will

know exactly where in the proxy array the

information will be stored (or still not)

Page 18: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

18

Benefits

Deterministic request resolution path:

No query messaging between proxy servers that

existed in ICP

Eliminates the duplication of contents that

otherwise occurs on an array of proxy servers

Has positive scalability, becomes faster and

more efficient as more proxy servers are added

Page 19: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

19

How CARP works A hash function is computed for the name of

each proxy server

A hash function is computed for the name of each requested URL

The hash value of the URL is combined with the hash value for each proxy

Whichever URL+Proxy Server hash comes up with the highest value, becomes "owner" of the information cache

If a server fails, its URLs are automatically rerouted to the server with the next highest score

Page 20: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

20

How CARP works (cont.)

The result:

Deterministic location for all cached information

Web browser or downstream proxy server can know

exactly where a requested URL either already is stored

locally, or will be located after caching

Because the hash functions used to assign values are so

great: 2^32 = 4294967296 - the result is a statistically

distributed load balancing across the array

Page 21: Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP

21

Updating Membership List

Array manager maintains a current list of members of a particular proxy array

All proxies servers in the array stores their own local copies of the array list and periodically send requests for updates to the array manager

They also watches all HTTP requests to any array members and if a request fails, then marks that proxy member down until next update from the array manager