ieee transactions on parallel and …yl8c/paper/journal1.pdfakamai edge servers were able to serve...

13
Design, Implementation, and Evaluation of Differentiated Caching Services Ying Lu, Student Member, IEEE, Tarek F. Abdelzaher, Member, IEEE, and Avneesh Saxena Abstract—With the dramatic explosion of online information, the Internet is undergoing a transition from a data communication infrastructure to a global information utility. PDAs, wireless phones, Web-enabled vehicles, modem PCs, and high-end workstations can be viewed as appliances that “plug-in” to this utility for information. The increasing diversity of such appliances calls for an architecture for performance differentiation of information access. The key performance accelerator on the Internet is the caching and content distribution infrastructure. While many research efforts addressed performance differentiation in the network and on Web servers, providing multiple levels of service in the caching system has received much less attention. This paper has two main contributions. First, we describe, implement, and evaluate an architecture for differentiated content caching services as a key element of the Internet content distribution architecture. Second, we describe a control-theoretical approach that lays well-understood theoretical foundations for resource management to achieve performance differentiation in proxy caches. An experimental study using the Squid proxy cache shows that differentiated caching services provide significantly better performance to the premium content classes. Index Terms—Web caching, control theory, content distribution, differentiated services, QoS. æ 1 INTRODUCTION T HE phenomenal growth of the Internet as an information source makes Web content distribution and retrieval one of its most important applications today. Internet clients are becoming increasingly heterogeneous ranging from high-end workstations to low-end PDAs. A corre- sponding heterogeneity is observed in Internet content. In the near future, a much greater diversification of clients and content is envisioned as traffic sensors, smart buildings, and various home appliances become Web-enabled, represent- ing new data sources and sinks of the information backbone. This trend for heterogeneity calls for customiz- able content delivery architectures with a capability for performance differentiation. In this paper, we design and implement a resource management architecture for Web proxy caches that allows controlled hit rate differentiation among content classes. The desired relation between the hit rates of different content classes is enforced via per-class feedback control loops. The architecture separates policy from mechanism. While the policy describes how the hit rates of different content classes are related, the performance differentiation mechanism enforces that relation. Of particular interest, in this context, is the proportional hit rate differentiation model. Applying this model to caching, “tuning knobs” are provided to adjust the quality spacing between classes, independently of the class loads. The two unique features of the proportional differentiated service model [14], [13] are its guarantees on both predictable and controllable relative differentiation. It is predictable in the sense that the differentiation is consistent (i.e., higher classes are better, or at least no worse) regardless of the variations of the class loads. It is controllable, meaning that the network operators are able to adjust the quality spacing between classes based on their selected criteria. While we illustrate the use of our performance differentiation architecture in the context of hit rate control, it is straightforward to extend it to control directly other performance metrics that depend on cache hit rate, such as average client-perceived page access latency. One significantly novel aspect of this paper is that we use a control-theoretical approach for resource allocation to achieve the desired performance differentiation. Digital feedback control theory offers techniques for developing controllers that utilize feedback from measurements to adjust the controlled performance variable such that it reaches a given set point. This theory offers analytic guarantees on the convergence time of the resulting feedback control loop. It is the authors belief that feedback control theory bears a significant promise for predictable performance control of computing systems operating in uncertain, unpredictable environments. By casting cache resource allocation as a controller design problem, we are able to leverage control theory to arrive at an allocation algorithm that converges to the desired performance differentiation in the shortest time in the presence of a very bursty self-similar cache load. The rest of this paper is organized as follows: Section 2 presents the case for differentiated caching services. Section 3 describes the architecture of a cache that supports service differentiation. A control-theoretical approach is proposed to achieve the desired distance between performance levels of different classes. In Section 4, the implementation of this architecture on Squid, a very popular proxy cache in today’s Web infrastructure, is presented. Section 5 gives experimen- tal evaluation results of our architecture, obtained from performance measurements on our modified Squid proto- type. Section 6 discusses related work and Section 7 concludes the paper. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 15, NO. 5, MAY 2004 1 . Y. Lu and T.F. Abdelzaher are with the Department of Computer Science, School of Engineering and Applied Science, University of Virginia, 151 Engineer’s Way, PO Box 400740, Charlottesville, VA 22904-4740. E-mail: {ying, zaher}@cs.virginia.edu. . A. Saxena is with Knight Trading Group, 130 Cheshire Lane, Suite # 102, Minnetonka, MN 55305. E-mail: [email protected]. Manuscript received 6 Aug. 2002; revised 3 Oct. 2003; accepted 6 Oct. 2003. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number 117082. 1045-9219/04/$20.00 ß 2004 IEEE Published by the IEEE Computer Society

Upload: votram

Post on 18-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Design, Implementation, and Evaluation ofDifferentiated Caching Services

Ying Lu, Student Member, IEEE, Tarek F. Abdelzaher, Member, IEEE, and Avneesh Saxena

Abstract—With the dramatic explosion of online information, the Internet is undergoing a transition from a data communication

infrastructure to a global information utility. PDAs, wireless phones, Web-enabled vehicles, modem PCs, and high-end workstations can

be viewed as appliances that “plug-in” to this utility for information. The increasing diversity of such appliances calls for an architecture for

performance differentiation of information access. The key performance accelerator on the Internet is the caching and content

distribution infrastructure. While many research efforts addressed performance differentiation in the network and on Web servers,

providing multiple levels of service in the caching system has received much less attention. This paper has two main contributions. First,

we describe, implement, and evaluate an architecture for differentiated content caching services as a key element of the Internet content

distribution architecture. Second, we describe a control-theoretical approach that lays well-understood theoretical foundations for

resource management to achieve performance differentiation in proxy caches. An experimental study using the Squid proxy cache

shows that differentiated caching services provide significantly better performance to the premium content classes.

Index Terms—Web caching, control theory, content distribution, differentiated services, QoS.

1 INTRODUCTION

THEphenomenal growth of the Internet as an informationsource makes Web content distribution and retrieval

one of its most important applications today. Internetclients are becoming increasingly heterogeneous rangingfrom high-end workstations to low-end PDAs. A corre-sponding heterogeneity is observed in Internet content. Inthe near future, a much greater diversification of clients andcontent is envisioned as traffic sensors, smart buildings, andvarious home appliances become Web-enabled, represent-ing new data sources and sinks of the informationbackbone. This trend for heterogeneity calls for customiz-able content delivery architectures with a capability forperformance differentiation.

In this paper, we design and implement a resourcemanagement architecture for Web proxy caches that allowscontrolled hit rate differentiation among content classes.The desired relation between the hit rates of differentcontent classes is enforced via per-class feedback controlloops. The architecture separates policy from mechanism.While the policy describes how the hit rates of differentcontent classes are related, the performance differentiationmechanism enforces that relation. Of particular interest, inthis context, is the proportional hit rate differentiation model.Applying this model to caching, “tuning knobs” areprovided to adjust the quality spacing between classes,independently of the class loads. The two unique features ofthe proportional differentiated service model [14], [13] areits guarantees on both predictable and controllable relativedifferentiation. It is predictable in the sense that the

differentiation is consistent (i.e., higher classes are better,or at least no worse) regardless of the variations of the classloads. It is controllable, meaning that the network operatorsare able to adjust the quality spacing between classes basedon their selected criteria. While we illustrate the use of ourperformance differentiation architecture in the context of hitrate control, it is straightforward to extend it to controldirectly other performance metrics that depend on cache hitrate, such as average client-perceived page access latency.

One significantly novel aspect of this paper is that we usea control-theoretical approach for resource allocation toachieve the desired performance differentiation. Digitalfeedback control theory offers techniques for developingcontrollers that utilize feedback from measurements toadjust the controlled performance variable such that itreaches a given set point. This theory offers analyticguarantees on the convergence time of the resultingfeedback control loop. It is the authors belief that feedbackcontrol theory bears a significant promise for predictableperformance control of computing systems operating inuncertain, unpredictable environments. By casting cacheresource allocation as a controller design problem, we areable to leverage control theory to arrive at an allocationalgorithm that converges to the desired performancedifferentiation in the shortest time in the presence of a verybursty self-similar cache load.

The rest of this paper is organized as follows: Section 2presents the case for differentiated caching services. Section 3describes the architecture of a cache that supports servicedifferentiation. A control-theoretical approach is proposed toachieve the desired distance between performance levels ofdifferent classes. In Section 4, the implementation of thisarchitecture on Squid, a very popular proxy cache in today’sWeb infrastructure, is presented. Section 5 gives experimen-tal evaluation results of our architecture, obtained fromperformance measurements on our modified Squid proto-type. Section6discusses relatedworkandSection7concludesthe paper.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 15, NO. 5, MAY 2004 1

. Y. Lu and T.F. Abdelzaher are with the Department of Computer Science,School of Engineering and Applied Science, University of Virginia, 151Engineer’s Way, PO Box 400740, Charlottesville, VA 22904-4740.E-mail: {ying, zaher}@cs.virginia.edu.

. A. Saxena is with Knight Trading Group, 130 Cheshire Lane, Suite # 102,Minnetonka, MN 55305. E-mail: [email protected].

Manuscript received 6 Aug. 2002; revised 3 Oct. 2003; accepted 6 Oct. 2003.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number 117082.

1045-9219/04/$20.00 � 2004 IEEE Published by the IEEE Computer Society

2 THE CASE FOR DIFFERENTIATED

CACHING SERVICES

While a significant amount of research went into imple-menting differentiated services at the network layer, theproliferation of application-layer components that affectclient-perceived network performance such as proxy cachesand content distribution networks (CDNs) motivatesinvestigating application layer QoS. In this section, wepresent a case for differentiated caching services as afundamental building block of an architecture for Webperformance differentiation. Our argument is based onthree main premises. First, we show that the storage inproxy cache is a rare resource that requires better alloca-tions. Second, we argue for the importance of Web proxycaches in providing performance improvements beyondthose achievable by push-based content distribution net-works. Third, we explain why there are inherently differentreturns for providing a given caching benefit to differentcontent types. Thus, improved storage resource manage-ment calls for performance differentiation in proxy caches.

Let us first illustrate the rareness of network storagerelative to the Web workloads. As reported by AOL, thedaily traffic on their proxy caches is in excess of 8 Terabytesof data. With a hit rate of 60 percent, common to AOLcaches, the cache has to fetch 8 � 40% ¼ 3:2 Terabytes ofnew content a day. Similarly, the advent of contentdistribution networks that distribute documents on behalfof heavily accessed sites may require large storage sizesbecause they have a large actively accessed working set. It istherefore important to allocate storage resources appro-priately such that the maximum perceived benefit isachieved.

Second, consider the argument for employing proxycaches in our storage resource allocation framework. Webproxy caching and CDNs are the key performance accel-eration mechanisms in the Web infrastructure. Whiledemand-side (i.e., pull-based) proxy caches wait for surfersto request information, supply-side (i.e., push-based)proxies in CDNs let delivery organizations or contentproviders proactively push the information closer to theusers. Research [30], [23] indicates that the combination ofthe two mechanisms leads to better performance than eitherof them alone. Gadde et al. [17] used the Zipf-based cachingmodel from Wolman et al. [36] to investigate the effective-ness of content distribution networks. They find that,although supply-side caches in CDNs may yield good localhit rates, they contribute little to the overall effectiveness ofthe caching system as a whole when the populations servedby the demand-side caches are reasonably large. Theseresults are consistent with what Koletsou and Voelker [23]conclude in their paper. In [23], Koletsou and Voelkercompare the speedups achieved by the NLANR proxycaches to those achieved by the Akamai content distributionservers. They found that the NLANR (pull-based) cachehierarchy served 63 percent of HTTP requests in theirworkload at least as fast as the origin servers, resulting in adecrease of average latency of 15 percent. In contrast, whileAkamai edge servers were able to serve HTTP requests anaverage of 5.7 times as fast as the origin servers, usingAkamai reduced overall mean latency by only 2 percent,because requests to Akamai edge servers were only 6percent of the total workload. The aforementioned researchresults indicate that the demand-side Web proxy caching

remains the main contributor in reducing overall meanlatency, while the supply-side proxies are optimizing only asmall portion of the total content space. Hence, in thispaper, we present an architecture geared for pull-basedproxy caches. Providing QoS control for push-based proxiesin CDN will be investigated in the future work, where thestorage reallocation is triggered actively by the contentproviders’ requirements instead of passively by the contentconsumers’ requests. The proactive nature of storagereallocation introduces additional degrees of freedom inQoS management that are not explored in traditional proxycaching.

Finally, consider the argument for performance differentia-tion as a way to increase the global utility of the cachingservice. First, let us illustrate the end-users’ perspective. It iseasy to see that caching is more important to faster clients. Ifthe performance bottleneck is in the backbone, caching has animportant effect on reducing average user service time as hitrate increases.Conversely, if theperformancebottleneck is onthe side of the client, user-perceived performance is notaffected significantly by saving backbone trips when thecaching hit rate is increased. An example where user-speed-motivated differentiation may be implementable is thepreferential treatment of regular Web content over wirelesscontent. Web appliances such as PDAs and Web-enabledwireless phones require new content types for which a newlanguage, the Wireless Markup Language (WML), wasdesigned. Proxy caching will have a lower impact on user-perceived performance of wireless clients because savingbackbone round trips does not avoid the wireless bottleneck.Proxies that support performance differentiation can getaway with a lower hit rate on WML traffic to give moreresources to faster clients (that are more susceptible tonetwork delays), thus optimizing aggregate resource usage.This is especially true of caches higher in the cachinghierarchy where multiple content types are likely to beintermixed.1

Another argument for differentiation is a content-centricone. In particular, it may be possible to improve client-perceived performance by caching most “noticeable” con-tent more often. It has been observed that different classesof Web content contribute differently to the user’s percep-tion of network performance. For example, user-perceivedperformance depends more on the download latency ofHTML pages than on the download latency of theirdependent objects (such as images). This is because, whilea user has to wait explicitly for the HTML pages todownload, their embedded objects can be downloaded inthe background incurring less disruption to the user’ssession. Treating HTML text as a premium class in a cachewould improve the experience of the clients for the samenetwork load conditions and overall cache hit rate. Later, inthe evaluation section, we show (by replaying real proxycache traces) that a differentiated caching service cansubstantially decrease the average client wait-time onHTML files at the expense of only a moderate increase inwait times for embedded objects.

Finally, a Web proxy cache may choose to classifycontent by the identity of the requested URL. For instance,an ISP (such as AOL) can have agreements with preferredcontent providers or CDN service providers to give theirsites better service for a negotiated price. Our architecture

2 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 15, NO. 5, MAY 2004

1. Currently, ISP caches closest to the client are usually dedicated to onetype of clients, e.g., “all wireless” or “all wired.”

would enable such differentiation to take place, althoughthere are better ways to achieve provider-centric differ-entiation, such as using a push-based approach. Weconclude that there are important practical applicationsfor differentiated caching services in pull-based proxycaches. This paper addresses this need by presenting aresource management framework and theoretical founda-tions for such differentiation.

3 A DIFFERENTIATED CACHING SERVICES

ARCHITECTURE

In this section, we present our architecture for servicedifferentiation among multiple classes of content cached ina proxy cache. Intuitively, if we assign more storage spaceto a class, its hit rate will increase and the average responsetime of client accesses to this type of content will decrease.2

If we knew future access patterns, we could tell the amountof disk space that needs to be allocated to each class aheadof time to achieve their performance objectives. In theabsence of such knowledge, we need a feedback mechanismto adjust space allocation based on the difference betweenactual system performance and desired performance. Thisfeedback mechanism is depicted in Fig. 1, which illustratesa feedback loop that controls performance of a single class.One such loop is needed for each class.

In the figure, the reference (i.e., the desired performancelevel) for the class is determined by a service differentiationpolicy. Assume there are N content classes. To provide theproportional hit rate differentiation, the policy should specifythat the hit rates (Hi) of the N classes be related by theexpression:

H1 : H2 : . . . : HN ¼ c1 : c2 : . . . : cN; ð1Þ

where ci is a constant weighting factor representing the QoS

specification for classi. To satisfy the above constraints, it is

enough that the relative hit ratio of each classi, defined as

Ri ¼ Hi

H1þH2þ...þHN, be equal to the relative hit ratio computed

from the specification (i.e., Ridesired ¼ cic1þc2þ...þcN

). Thus, they

are used as the performance metrics of the feedback control

loop. In Fig. 1, the actual system performance measured by

the output sensor is the relative hit ratio Ri, which is

compared with the reference Ridesired and their difference

called errori ¼ Ridesired �Ri is used by the cache space

controller to decide the space allocation adjustment online.

An appealing property of this model is that the aggregateperformance error of the system is always zero because:

X

1�i�n

ei ¼X

1�i�n

ðRidesired �RiÞ

¼P

1�i�n ci

c1 þ c2 þ . . .þ cN�

P1�i�n Hi

H1 þH2 þ . . .þHN

¼ 1� 1 ¼ 0:

ð2Þ

As we show in the next section, this property allows us todevelop resource allocation algorithms in which resourcesof each class are heuristically adjusted independently ofadjustments of other classes, yet the total amount ofallocated resources remains constant equal to the total sizeof the cache.

Note that the success of the feedback loop in achieving itsQoS goal is contingent on the feasibility of the specification.That is, the constraints stated by (1) should be achievable.Assume the average hit rate of the unmodified cache isH. Ingeneral, when space is divided equally among classes, themaximummultiplicative increase in space that any one classcan get is upper-bounded by the number of classes N . It iswell-known that hit rate increases logarithmically with cachesize [35], [4], [18], [7]. Thus, in a cache of total size S, themaximum increase in hit rate for the highest priority class isupper-bounded by ln N . After some algebraic manipulation,this leads toHmax � H ln S

ln S�ln N . If the relative hit ratio betweenthe top and bottom classes is q ¼ cmax : cmin, the hit rate of thebottom class is upper bounded by Hmax

q . This gives someorientation for choosing the specification c1 : c2 : � � � : cN .

3.1 The Performance Differentiation Problem

We cast the proportional hit rate differentiation into a closed-loop control problem.Each content classi is assigned a certainamount of cache storage si, such that

Pi si is the total size of

the cache. The objective of the system is to achieve the desiredrelative hit ratio. This objective is achieved using a resourceallocation heuristic, which refers to the policy that adjusts thecache storage space allocation among the classes such that adesired relative hit ratio is reached. We need to show that1) our resource allocation heuristic makes the systemconverge to the relative hit ratio specification, and that 2) theconvergence is bounded by a finite constant that is a designparameter. To provide these guarantees, we rely on feedbackcontrol theory in designing the resource allocation heuristic.The heuristic is invoked at fixed time intervals at which itcorrects resource allocation based on the measured perfor-mance error. Let the measured performance error at the kthinvocation of the heuristic be ei½k�. To compute the correction�si½k� in resource allocation, we choose a linear function fðeiÞ

LU ET AL.: DESIGN, IMPLEMENTATION, AND EVALUATION OF DIFFERENTIATED CACHING SERVICES 3

2. This presumes that the request traffic on the cache is not enough tooverload its CPU and I/O bandwidth.

Fig. 1. The hit rate control loop.

so that fð0Þ ¼ 0 (no correction unless there is an error). At thekth invocation, the heuristic computes:

8i : �si½k� ¼ fðei½k�Þ ð3Þ

and the space allocation is then adjusted:

8i : si½k� ¼ si½k� 1� þ �si½k�: ð4Þ

If the computed correction �si½k� is positive, the space

allocated to classi is increased by j�si½k�j. Otherwise, it is

decreased by that amount. Since the function f is linear,P

i f

ðei½k�Þ ¼ fðP

i ei½k�Þ. From(2),P

i ei½k� ¼ 0. Thus,P

i fðei½k�Þ=fð0Þ ¼ 0. It follows that the sum of corrections across all

classes is zero. This property is desirable since it ensures that,

while the resource adjustment can be computed indepen-

dently for each class based on its own error ei, the aggregate

amount of allocated resources does not change after the

adjustment and it is always equal to the total size of the cache.

Next, we show how to design the function f in a way that

guarantees convergence of the cache to the specified

performance differentiation within a single sampling period.

3.2 Control Loop Design

To design the function f , a mathematical model of thecontrol loop is needed. The cache system is essentiallynonlinear. We approximate it by a linear model in order tosimplify the design of the control mechanism. Suchlinearization is a well-known technique in control theorythat facilitates the analysis of nonlinear problems. Therelevant observation is that nonlinear systems are wellapproximated by their linear counterparts in the neighbor-hood of linearization (e.g., the slope of a nonlinear curvedoes not deviate too far from the curve itself in theneighborhood of the point at which the slope was taken).Observe that the feasibility of control loop design based ona linear approximation of the system does not imply thatcache behavior is linear. It merely signifies that thedesigned controller is robust enough to deal gracefullywith any modeling errors introduced by this approxima-tion. Such robustness, common to many control schemes, isone reason for the great popularity of linear control theory,despite the predominantly nonlinear nature of most realisticcontrol loops.

Approximating the nonlinear cache behavior, a change

�si½k� in space allocation is assumed to result in a

proportional change in the probability of a classi hit

�pi½k� � Kc�si½k�. While we cannot measure the probability

pi½k� directly, we can infer it from the measured hit rate. The

expected hit rate at the end of a sampling interval (where

expectation is used in a mathematical sense) is determined

by the space allocation and the resulting hit probability that

took place at the beginning of the interval. Hence,

Eð�Hi½k�Þ � Kc�si½k� 1� ð5Þ

(and EðHi½k�Þ ¼ Hi½k� 1� þ Eð�Hi½k�Þ).Remember that the relative hit ratio (the controlled

performance variable) is defined as Ri ¼ Hi½k�PiHi½k�

. Unfortu-

nately, the measured Hi½k� might have a large standard

deviation around the expected value unless the sampling

period is sufficiently large. Thus, using Hi½k� for feedback to

the controller will introduce a significant random noise

component into the feedback loop. Instead, the measured

Hi½k� is smoothed first using a low pass filter. Let the

smoothed Hi½k� be called Mi½k�. It is computed as a moving

average as follows:

Mi½k� ¼ aMi½k� 1� þ ð1� aÞHi½k�: ð6Þ

In this computation, older values of hit rate are exponentially

attenuatedwith a factor a, where 0 < a < 1. Values of a closer

to 1will increase the horizon over whichHi is averaged and

vice versa. The corresponding smoothed relative hit ratio isMi½k�PiMi½k�

. This value is compared to the set point for this class

and the error is used for space allocation adjustment in the

next sampling interval, thereby closing the loop.

Next, we take the z-transform of (3), (4), (5), and (6) and

draw a block diagram that describes the flow of signals in the

hit rate control loop. The z-transform is a widely used

technique in digital control literature that transforms differ-

ence equations into equivalent algebraic equations that are

easier to manipulate. Fig. 2 depicts the control loop showing

the flowof signals and theirmathematical relationships in the

z-transform. The z-transform of the heuristic resource reallo-

cation function f is denoted by F ðzÞ.We can now derive the relation between Ri and Ridesired .

From Fig. 2, Ri ¼ eiF ðzÞGðzÞ, where:

GðzÞ ¼ z�1Kcð1� aÞð1� z�1Þð1� z�1aÞ

Pi Mi

: ð7Þ

Substituting for ei, we getRi ¼ ðRidesired �RiÞF ðzÞGðzÞ. Usingsimple algebraic manipulation

4 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 15, NO. 5, MAY 2004

Fig. 2. z-Transform of the control loop.

Ri ¼F ðzÞGðzÞ

1þ F ðzÞGðzÞRidesired : ð8Þ

To design the allocation heuristic, F ðzÞ, we specify thedesired behavior of the closed loop, namely, that Ri followsRidesired within one sampling time, or Ri½k� ¼ Ridesired ½k� 1�. Inz-transform, this requirement translates to:

Ri ¼ z�1Ridesired : ð9Þ

Hence, from (8) and (9), we get the design equation:

F ðzÞGðzÞ1þ F ðzÞGðzÞ ¼ z�1: ð10Þ

From (10), it follows that F ðzÞ ¼ z�1

ð1�z�1ÞGðzÞ . Substituting for

GðzÞ from (7), we arrive at the z-transform of the desired

heuristic function, namely, F ðzÞ ¼ ð1�z�1aÞP

iMi

Kcð1�aÞ . The corre-

sponding difference equation is:

�si½k� ¼ fðeiÞ ¼P

i Mi

Kcð1� aÞ ðei½k� � aei½k� 1�Þ: ð11Þ

The above equation gives the adjustment of the disk spaceallocated to classi given the performance error ei of that classand the aggregate

Pi Mi of smoothed hit rates. The resulting

closed loop is stable because the closed loop transfer function,z�1, is stable and the open loop transfer function does notcontain unstable poles or zeros [33].

4 IMPLEMENTATION OF THE DIFFERENTIATION

HEURISTIC IN SQUID

We modified Squid, a widely used and popular real-worldproxy-cache to validate and evaluate our QoS-basedresource allocation architecture. Squid is an open-source,high-performance, Internet proxy-cache [12] that servicesHTTP requests on the behalf of clients (browsers and othercaches). It acts as an intermediate, accepting requests fromclients and contacting Web servers for servicing thoserequests. Squid maintains a cache of the documents that arerequested to avoid refetching from the Web server ifanother client makes the same request. The efficiency ofthe cache is measured by its hit-rate, H: the rate at whichvalid requests can be satisfied without contacting the Webserver. The least-recently-used (LRU) replacement policy isused in Squid to evict objects that have not been used forthe longest time from the cache. Squid maintains entries forall objects that are currently residing in its cache. The entriesare linked in the order of their last access times using adoubly-linked list. On getting a request for an objectresiding in the cache, the corresponding entry is moved tothe top of the list; if the request is for an object not in thecache, a new entry is created for it. LRU policy isimplemented by manipulating the entries in this list.

To provide service differentiation, one possible imple-mentationof our control-algorithm inSquidwouldhavebeento create a linked list for each class containing entries for allobjects belonging to it. Freeing up disk space would haveinvolved scanning the list of all the classes that were over-using resources and releasing entries in their lists. Themultiple-list implementation provides the most efficientsolution; however, to minimize changes to Squid, we chosetouseasingle list implementation thatachieves the samegoal.The single linked-list implementation was used to simulate

the multiple-list implementation. It links all the entries in asingle list and each entry contains a record of all the classes itbelongs to. In our special case where the different classes ofcontent are nonoverlapping, each entry belongs to a singleclass only. This allows us to use Squid original implementa-tion of the linked-lists without modifications.

In LRU, an entry is moved to the top of the list whenaccessed. With separate lists, the entry would be moved tothe top of the list for the class it belongs to. In a single listimplementation, the entry is moved to the top of everyother entry, which implies that it is moved to the top of allthe other entries of its class. Hence, by moving an entry, wepreserve the order, i.e., the effect is the same as if we hadseparate lists for each class.

In Squid, a number of factors determine whether or notany given object can be removed. If the time since lastaccess is less than the LRU threshold, the object will not beremoved. In our implementation, we removed this thresh-old checking for two reasons. First, we want to implementLRU in a stricter sense and we replace the object that is leastrecently used regardless of how old it is. Second, this helpsus reduce the time required for running our experiments aswe do not have to wait for objects to expire. To have a fairevaluation, in Section 5, we compare our QoS Squid withthe original Squid with no threshold checking.

Our implementation closely corresponds to the controlloop design; we implemented five modules in the QoScache: timer, output sensor, cache space controller, classifier, andactuator. The timer sends signals to output sensor and cachespace controller to let them update their outputs periodi-cally. The classifier is responsible for request classificationand the actuator is in charge of cache space deallocation andallocation. In Section 3, we have detailed the functions ofcache space controller and output sensor. Their implemen-tation is straightforward. Hence, below we focus on theother three modules:

. Timer: In order to make the control loops work atfixed time interval, we added a module in Squid thatregulates the control loop execution frequency.Using the module, we could configure a parameterto let the loops execute periodically, for example,once every 30 seconds. That means, for every certainperiod, the output sensor measures the smoothedrelative hit ratio and the cache space controllercalculates the space allocations, which is then usedto adjust the cache space assignments for the classes.

. Classifier: This module is used to identity therequests for various classes. On getting a request,this module is invoked and obtains the class of therequest. The classification policy is applicationspecific and should be easily configurable. We weredoing classification based on requested site orcontent type. In general, classification policies basedon different criteria, such as the service provider orIP address are possible. For example, an ISP mighthave separate IP blocks allocated to low bandwidthwireless clients requesting WML documents andhigh bandwidth ADSL clients requesting regularHTML content.

. Actuator: As described in Section 3, at each samplingtime, the cache space controller performs the compu-tation si½k� ¼ si½k� 1� þ �si½k� and outputs the newvalue of desired space si½k� for each class. In Squid, the

LU ET AL.: DESIGN, IMPLEMENTATION, AND EVALUATION OF DIFFERENTIATED CACHING SERVICES 5

cache space deallocation and allocation are twoseparate processes. The actuator uses the output ofthe controller to guide the two processes. LetrealSpacei be a running counter of the actual amountof cache space usedby classi. In the space deallocationprocess, the cache scans the entries from the bottom ofthe LRU list. Whenever an entry is scanned, the cachewill first find out which class it belongs to and, then,according to the cachespaceassigned to the class at thetime, the cache will decide whether to remove theentry or not. If the cache space assigned is less than thedesired cache space for the class (realSpacei < si½k�),the entry will not be removed. Otherwise, it will.Similarly, in the space allocation process, whenever apage is fetched from a Web server, the cache willchoose to save it in thedisk ornot basedonwhich classrequests the page and the current cache space of theclass. If the cache space assigned is greater than thedesired cache space for the class (realSpacei > si½k�),thepagewillnotbesaved.That is,wechange thestatusof the page to be not cachable. In our currentimplementation, onlywhen a page is saved in the diskwill the disk space occupied be counted as part of thecache space for the corresponding class. Ideally, as aresult of the above enforcement of the desired cacheallocation, the cache space realSpacei occupied byeach class i by the end of the kth sampling time shouldbe exactly the same as the desired value si½k� set at thebeginningof the interval. In reality, adiscrepancymayarise for at least two reasons. First, it ispossible thatwewant to give one class more cache space, while at thesampling period, the class does not send enoughrequests to fill in that much space with requestedpages. Second, some pages which are waiting to besaved to the disk have not been counted as the cachespace of their class yet. In order to remedy thisproblem, we include the difference si½k� � realSpaceiat the end of the kth sampling interval in ourcomputation of desired cache space for the kþ 1thsampling period. That is, si½kþ 1� = si½k� þ �si½kþ 1� =realSpacei + ðsi½k� � realSpaceiÞ + �si½kþ 1�. From theformula,wecansee that thedifferencebetween theoldreal space and the new desired space for the class is(si½k� � realSpacei) + �si½kþ 1�. If the difference ispositive, that means we want to give the class morecache space; otherwise, wewant to release space fromthe class. We use only one actuator to realize cachespace deallocation and allocation for all the classes.

5 EVALUATION

We tested the performance of the feedback control architec-ture using both synthetic and empirical traces. We usedsynthetic workload in Section 5.1 to show that our designmade the cache converge most efficiently to the specifiedperformance differentiation under representative cache loadconditions. In Section 5.2, we evaluated the practical impactof our architecture from the user’s perspective. To do so, weused one of the applications described in Section 2, namely,improving the hit rate on HTML content at the expense ofembeddedobjects to reduceuserwaiting times.We chose this

application for two reasons. First, HTML content is known tobe less cachable than other content types. This is due to theinherent uniqueness of text pages compared, for example,with generic gif icons (which tend to appear on several Webpages simultaneously thus generating a higher hit rate ifcached).Consequently, improving the cachehit rateofHTMLis a more difficult goal than improving the hit rate of othercontent mixes. The second reason for choosing this applica-tion is that content type (such asHTML,GIF, JPG) is explicitlyindicated in proxy cache traces. Hence, it is easy to assess theperformance improvement due to differentiated cachingservices by inspecting existing cache traces. In Section 5.2, wereran those traces against our instrumented Squid prototypeand experimentally measured the performance improve-ment. We proved that the average client wait time for HTMLcontent can be substantially reduced at the expense of only amoderate increase in wait times for embedded objects.

5.1 Synthetic Trace Experiments

The first part of our experiments is concerned with testingthe efficacy of our control-theoretical resource allocationheuristic. We verified that the heuristic indeed achieved thedesired relative differentiation for a realistic load. Theexperiments were conducted on a testbed of seven AMD-based Linux PCs connected with 100Mbps Ethernet. TheQoS Web cache and three Apache [16] Web servers werestarted on four of the machines. To emulate a large numberof real clients accessing the three Web servers, we usedthree copies of Surge (Scalable URL Reference Generator)[6] running on different machines that sent URL requests tothe cache. The main advantage of Surge is that it generatesWeb references matching empirical measurements of

1. server file size distribution,2. request size distribution,3. relative file popularity,4. embedded file references,5. temporal locality of reference, and6. idle periods of individual users.

In this part of experiments, the traffic were divided intothree client classes based on their requested site. Weconfigured Surge in a way such that the total traffic volumeof the three classes are the same, all very huge. To test theperformance of the cache under saturation, we configuredthe file population to be 34 times the cache size. Asmentioned in Section 3, in order to apply the proportionaldifferentiation model in practice, we have to make feasibleQoS specifications. Hence, we set the reference ratio to H0 :H1 : H2 ¼ 6 : 5 : 4 in all the synthetic trace experiments. Byanalyzing the average hit rates of the undifferentiatedcache, we conclude that the specification is feasible andshould lead to better performance for the high priority classwith only a small sacrifice of the low priority classperformance.

To develop a base reference point against which ourcontrol-theoretical heuristic (11) could be compared, we firstused a simple linear controller fðeiÞ ¼ Kei in the control loop(Fig. 3) to determine the best cache performance over allvalues of K. In this case, the system reacts to performanceerrors simply by adjusting space allocation by an amountproportional to the error, where K is the proportionality

6 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 15, NO. 5, MAY 2004

constant. Second, we implemented the control function (11)designed using the theoretical analysis in Section 3. Bycomparison, we found out that the theoretically designedfunction produced better performance than the linearfunction with the best empirically found K, thus guarantee-ing the best convergence of the cache. In this context, byperformance, we mean the efficiency of convergence of therelative hit ratio to the desired differentiation. This conver-gence is expressed as the aggregate of the squared errorsbetween the desired and actual relative hit ratio achieved foreach class over the duration of the experiment. The smallerthe aggregate error, the better the convergence.

Fig. 4 depicts the aggregate error for the proportionalcontroller andour theoreticallydesigned controller,when thespecified performance differentiation is H0 : H1 : H2 =6 : 5 : 4. The horizontal axis indicates the base 10 logs of thegainK value for the proportional controller. The vertical axisis the sum of the square of errors (Ridesired �Ri, whereRi is therelative hit ratio) over all classes collected in 20 samplingperiods (each sampling period is 30 seconds long). Thesmaller the sum, thebetter is the convergence of the cache.Wecan see from the aggregate error plot in the figure that, usingdifferent values of K for the proportional control functionfðeiÞ ¼ Kei, results in different convergence performance. Inparticular, small values of K are too sluggish in adjustingspace allocation resulting in slow convergence and largeaggregate error. Similarly, large values of K tend to over-compensate the space adjustment causing space allocation(and the resulting relative hit ratio) to oscillate in apermanentfashion also increasing the aggregate error. In between thetwo extremes, there is a value of K that results in a globalminimum of aggregate error. ThisK corresponds to the bestconvergence we can achieve using the proportional con-troller. We compare this best performance of the simpleheuristic fðeiÞ ¼ Kei with that of our heuristic function (11)designed using digital feedback control theory. The aggre-gate error computed for the latter heuristic is depicted by thestraight line at the bottom of Fig. 4. It can be seen that theaggregate error using the designed function is even smallerthan the smallest error achieved using the simple linearheuristic above, which means that the designed functionproduces very goodperformance and successfully convergesthe cache.

To appreciate the quality of convergence for differentcontroller settings, Figs. 5a, 5c, and 5e shows plots of therelative hit ratio of different classes versus time inrepresentative experiments with the proportional controllerfðeiÞ ¼ Kei. Every point in those plots shows the datacollected in one sampling period. In the figure, curve goali isthe desired performance of classi

Ridesired ¼ciPj cj

and curve classi is the corresponding relative hit ratio

Ri ¼MiPj Mj

Since the difference Ridesired �Ri reflects the performanceerror ei of classi, we will know how well the control loopperforms by comparing the two curves classi and goali. Thecloser the two curves, the better the control loop performsand the better is the convergence of the cache.

Fig. 5a depicts the relative hit ratio using a small value ofK for the controller. From the figure, we can see that curveclassi approaches the curve goali. However, the conver-gence is too slow. The controller is too conservative inreacting to the performance error. Fig. 5c plots the relativehit ratio for the best possible K. The figure shows that thecache is converging quickly to the specified performancedifferentiation. Fig. 5e depicts the relative hit ratio for a bigvalue K. It shows if we use the big K, the cache spaceadaptation is so large that the relative hit ratio overshootsthe desired value. This overcompensation causes therelative hit ratio to continue changing in an oscillatoryfashion, making the system unstable.

Figs. 5b, 5d, and 5f plots the allocated space for each classversus time.Weobserve thatwhenK is small, spaceallocationconverges very slowly. Similarly, when K is large, spaceallocation oscillates permanently due to overcompensation.Space oscillation is not desired since itmeans that documentsare repeatedly evicted then refetched into the cache. Suchcyclic eviction and refetching will increase the backbonetraffic generated by the cache which is an undesirable effect.The optimal value of K results in a more stable spaceallocation that is successful in maintaining the specifiedrelative performance differentiation.

The above experiments show that the controller tuninghasadramatic effecton theconvergence rateandsubsequentlyonthe success of performance differentiation. One of the maincontributions of the paper lies in deriving a technique forcontroller tuning that avoids the need for an ad hoc trial anderror design of cache resource allocation heuristic to effectproper service differentiation. We have demonstrated theeffect of changing a single parameter K on resultingperformance. In reality, the controller design space is muchlarger than that of tuning a singleparameter. For example, the

LU ET AL.: DESIGN, IMPLEMENTATION, AND EVALUATION OF DIFFERENTIATED CACHING SERVICES 7

Fig. 3. The control loop with a linear controller.

Fig. 4. The aggregate error versus controller gain K.

controller functionmayhave two constants inwhich case twovariables must be tuned.We presented a design technique inSection 3 that computed the structure and parameters of thebest heuristic function. The convergence of the cache whenthis function is used with the analytically computed para-meters isdepicted inFig. 6,whichshows that theperformanceis favorably comparable to the best performance we canachieve by experimental tuning (Fig. 5c and Fig. 5d). In Fig. 7,we present the absolute hit rates of the three classes for Squidwith and without service differentiation. We can see that theQoSSquid increases thehit rate of thehighpriority class at theexpense of the hit rate of the low priority class. The hit ratedifferentiation among different classes implies that lesspopular content of a “favored” class may displace more

popular content of less favored classes. Such displacement is

suboptimal from the perspective of maximizing hit rate. This

consequence is acceptable, however, since the favored classes

are presumably more important. Observe that in real-life

situations the number of high-paying “first-class” customers

is typically smaller than thenumberof “economy”customers;

a situation present in many application domains from airline

seating to gas pumps. Hence, a small resource reallocation

from economy to first-class customers is likely to cause a

larger relative benefit to the latter at the expense of a less

noticeable performance change to the former. Thus, it is

possible to improve the hit rate of premium clients without

significantly impacting other customers.

8 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 15, NO. 5, MAY 2004

Fig. 5. Performance of the proportional controller with different gain. (a) The relative hit ratio forK ¼ 2; 000. (b) Space allocation forK ¼ 2; 000. (c) The

relative hit ratio for K ¼ 8; 000. (d) Space allocation for K ¼ 8; 000. (e) The relative hit ratio for K ¼ 100; 000. (f) Space allocation for K ¼ 100; 000.

5.2 Empirical Trace Experiments

In the second part of our experiments, we developed a URLreference generator which read URL references from a proxytrace, generated the corresponding requests, and sent them toour instrumented proxy that implemented service differen-tiation. We randomly picked one of the NLANR (NationalLaboratory for Applied Network Research) sanitized accesslogs, uc.sanitized-access.20030922.gz available at the timefrom URL:ftp://ircache.nlanr.net/Traces/. To serve re-quests generated from the trace, the proxy cache contactsthe realWeb servers on the Internet. For example, if a requestfor “www.cnn.com” is found in the data set, our proxy cacheactually contacts the CNNWeb server. The reasonwe set ourtestbed this way is that we wanted to know the backbonelatency (the time needed to download the file from theoriginal server to theproxy cache) in real life.Onlyusing such“real” data can we prove that our architecture works well inthe Internet. The requests are divided into two classes basedon whether they are HTML file requests or not. (Among thefirst 372; 020 referenceswe generated from the trace file, thereare a total of 4; 431 requests forHTML files.) TheHTML is oneof the least cachable types.We challenge ourQoS architectureby using this difficult case. We then rerun the experimentusing anunmodifiedproxy cache. The results of the two casesare compared to distinguish the effect of performancedifferentiation. From the experiments, we determined thatthe proportion of the hit rate betweennon_HTMLandHTMLclass is normally roughly 7 : 5 (i.e., nonHTMLcontent has 1.4times the hit rate of HTML in the absence of differentiation,

which confirms that HTML is less cachable). By specifyingthat the relative hit ratio of the two classes beH0 : H1 ¼ 6 : 5,where H0 represents hit rate for non_HTML class and H1

represents hit rate for HTML class, our differentiation policyfavors the HTML class while still services the non_HTMLclass well.

One concern about the accuracy of our experiments lies inhow the generator actually replays the trace. To expedite theexperiments, we replay the log faster than real-time. Thismeans that documents on the servers have less opportunity tobecomestaleduring theexpeditedexperiment,which leads topotentially inflated hit rates. This effect, however, is equallytrue of both theQoS cache experiment and theordinary Squidexperiment. Hence, the relative performance improvementwe measure is still meaningful. Moreover, we claim that theimpact of expedited replay in our experiments is limited tobegin with. To validate this claim, we measured the rate ofchange of content during the original duration of the log(12 hours) and demonstrated that such change was minimal.More specifically, we played the tracewith a very large cachethat could save all the files requested in the trace. After12 hours (the length of the log), we played the trace againtoward the same cache. In the second run, 93.5 percent of theSquid result codes were either TCP_HIT (i.e., a valid copy ofthe requested objectwas in the cache), TCP_MEM_HIT (i.e., avalid copy of the requested object was in the cache memory),or TCP_REFRESH_HIT (i.e., the requested object was cachedbut stale and the IMS query for the object resulted in ”304 not

LU ET AL.: DESIGN, IMPLEMENTATION, AND EVALUATION OF DIFFERENTIATED CACHING SERVICES 9

Fig. 6. Performance of the analytically designed controller (for a synthetic log). (a) Relative hit ratio. (b) Space allocation.

Fig. 7. Absolute hit rates (for a synthetic log). (a) Original Squid without differentiation. (b) QoS Squid.

modified”). An additional 5.4 percent of Squid result codeswere TCP_MISS (i.e., the requested object was not in thecache)withDIRECThierarchy code,meaning the requestwasrequired to go straight to the source.Only 0.8 percent of Squidresult codes were TCP_REFRESH_MISS (i.e., the requestedobject was cached but stale and the IMS query returned thenew content). Therefore, we conclude that the maximumdifference between our measured hit rates versus those wewould get if we had played it for the original duration(12 hours) is only 0.8 percent.

We carried out two experiments with the regular Squidand the QoS Squid, respectively. The performance metricswe considered were hit rate and backbone latency reduc-tion. To reflect the backbone latency reduction, we usedboth raw latency reduction and relative latency reduction.By raw latency reduction, we mean the average backbonelatency (in second) reduced per request. By relative latencyreduction, we mean the percentage of the sum of down-loading latencies of the pages that hit in the cache over thesum of all downloading latencies. Here, downloading isfrom the original server to the proxy cache.

Figs. 8a and 8c depicts the relative hit ratio in the two

experiments. Fig. 8a shows the relative hit ratio achieved for

each content type in the case of regular Squid. Fig. 8c plots

the case of the QoS Squid. The non_HTML and HTML

curves represent the relative hit ratio for the two classes,

respectively. Since the differentiation policy specifies H0 :

H1 ¼ 6 : 5 to be the goal, the desired relative hit ratio Ri ¼MiPjMj

for non_HTML and HTML is 611 and

511, respectively.

For comparison reasons, we plot the targets 611 and

511 in both

graphs, although Fig. 8a depicts the data for the regular

Squid, which does not use the differentiation policy.Comparing Fig. 8a with Fig. 8c, we can see that our QoS

proxy cache pulls the relative hit ratio of the two classes tothe goals after spending some time on getting to the steadystate. That initial transient occurs when the cache has notrun long enough for unpopular content to percolate to thebottom of the LRU queue and get replaced.

The big gap of population between non_HTML requestsand HTML requests makes our choice of sampling intervalharder. The huge number of non_HTML requests asks for asmall interval in order to make the cache more sensitive,while the small number of HTML requests asks for a biginterval, because too small an interval will cause noise inthe system. Our proxy cache balances the two cases andchooses a reasonable small sampling interval (30 seconds).The smoothed hit rate is calculated with a large enough “a”56

� �in (6). Large “a” increases the horizon over which the hit

rate is averaged and decreases the influence of noise. Asseen from Fig. 8c, our QoS cache works fine and makes therelative hit ratio converge to the goals in a reasonable scale.

Figs. 8b and 8d plots the allocated space for each class,which shows how the QoS cache changes the spaceallocation in order to achieve the desired differentiation.Absolute hit rates are presented in Fig. 9, indicating a big

10 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 15, NO. 5, MAY 2004

Fig. 8. Performance of the analytically designed controller (for a real log). (a) Relative hit ratio for the original Squid. (b) Space allocation for theoriginal Squid. (c) Relative hit ratio for the QoS Squid. (d) Space allocation for the QoS Squid.

performance improvement of HTML class with only a slightdecrease of non_HTML class performance.

Figs. 10 and 11 depict the backbone latency reduction dueto caching both in the case of a regular Squid and the case of aQoS Squid. From Fig. 10, we observe that the average latencyreduced per request for the HTML class is above 0:2 secondfor the QoS Squid compared with 0:08 second for the regularSquid, while the number for the non_HTML class are around0:1 second for both cases. Because the transient networkstatus at the time of experiments could have a big effect on theresulting latency, it might make the raw latency measured inthe two experiments incomparable. Therefore, we also userelative latency reduction (i.e., the percentage of the sum ofdownloading latencies of the pages that hit in the cache overthe sum of all downloading latencies of the cache) as ourmetric to evaluate and compare the performance of the twosystems. The average relative latency reduction per request ispresented in Fig. 11. Consistent with the results shown by theraw latency reduction, the data further proves that our QoSarchitecture can significantly improve the latency reductionfor the HTML class, while incurring only a moderate cost forthe non_HTML class. On one hand, the uniqueness of HTMLcontent makes it less cachable than other content types suchthat to improve its cachehit rate is adifficult task.On theotherhand, the small request volume and small file size of theHTML content favors caching it, as HTML files do notconsume much cache space. To generalize our results, if we

assignhighpriority toonlya smallportionof Internet content,significant performance improvement can be expected forthem at only a very small cost of the low priority classesperformance. Thus, we consider our differentiated cachingservices a serious candidate for the future heterogeneous,QoS-aware, Web infrastructure.

6 RELATED WORK

Service differentiation and QoS control at the network layerhave been studied extensively in the IETF [20] community.In order not to negate the network’s efforts, it is importantto extend QoS support to endpoint systems. Recent researchefforts focused on QoS control in Web servers include [3],[10], [5], [34], [1], [9], [8], [26]. By CPU scheduling andaccept queue scheduling, respectively, Almeida et al. [3]and Abdelzaher et al. [1] successfully provide differentiatedlevels of service to Web server clients. Demonstrating theneed to manage different resources in the system depend-ing on the workload characteristics, Pradhan et al. [26]develop an adaptation technique for controlling multipleresources dynamically. Like [26], Banga et al. [5] and Voigtet al. [34] provide Web server QoS support at OS kernellevel. In [10], [9], [8], session-based QoS mechanisms areproposed, which utilize session-based relationship amongHTTP requests.

The above efforts addressed performance differentiationat the origin server. Differentiation techniques developedfor origin servers may not be applicable to proxy cachesbecause the cache introduces a crucial additional degree ofcomplexity to performance differentiation. Namely, itintroduces the ability to import and offload selected itemsdepending on their popularity and resource requirements.On an origin server all requests are served locally (i.e., a100 percent hit rate is achieved on valid requests). Thus, theperceived performance of the server depends primarily onthe order in which clients are served. A priority queue, forexample, will give high priority clients shorter responsetimes. The performance speedup due to a cache, on theother hand, depends primarily on whether or not therequested content is cached. Thus, hit rate, rather thanservice order, is a significant performance factor. Perfor-mance differentiation in caches, therefore, requires a newapproach.

LU ET AL.: DESIGN, IMPLEMENTATION, AND EVALUATION OF DIFFERENTIATED CACHING SERVICES 11

Fig. 9. Absolute hit rates (for a real log). (a) Original Squid without differentiation. (b) QoS Squid.

Fig. 10. Raw backbone latency reduction.

Web caching research has traditionally focused onreplacement policies. In [7], the authors introduced theGreedyDualSize algorithm, which incorporates localitywith cost and size concerns in the replacement policy.Rizzo and Vicisano [29] proposed LRV, which selects forreplacement the document with the Lowest Relative Valueamong those in cache. In [24], a number of techniques weresurveyed for better exploiting the bits in HTTP caches.Aiming at optimally allocating disk storage and givingclients better performance, their schemes do not provideQoS guarantees.

In [22], a weighted replacement policy was proposedwhich provides differential quality-of-service. However,their differentiationmodel does not provide a “tuning knob”to control theperformancedistancebetweendifferent classes.Fixed weights are given to each server or URL, but higherweights alone do not guarantee user-perceived serviceimprovement. For instance, the hit rate for the high weightURL may be very low because the proxy cache is over-occupied by many popular low weight URLs. Although thescheme is good in the sense that it saves backbone traffic bycaching popular files, there is no predictability and controll-ability in the differentiated service. In contrast, proportionaldifferentiated caching services described in this paperprovide application-layerQoS “tuning knobs” that are usefulin a practical setting for network operators to adjust thequality spacing between classes depending on pricing andpolicy objectives.

Like caching, content distribution networks (CDNs),such as Akamai [2], Digital Island [19], or Speedera [31],are targeted for speeding up the delivery of Web content.As another form of content distribution, peer-to-peer (P2P)networks are mainly used to share individual files amongusers, which also improve content availability and responsetime. Examples of peer-to-peer networks include Napster[25], Gnutella [27], and Freenet [11], who are intended forthe large-scale sharing of music; Chord [32], CAN [28], andTapestry [37] provide solutions for efficient data retrievingand routing; while PAST [15] focuses on data availabilityand load balancing issues in P2P environment. In thispaper, we address the performance differentiation problemin demand-side Web proxy caching. The complementaryproblem of how to provide QoS control mechanism incontent distribution networks will be considered in aforthcoming paper. Research on Internet storage manage-ment [21], [28] also focused on resolving the conflictbetween increasing storage requirement and finite storage

capacity in every node of the storage system. Instead offinding an optimal storage management scheme for all thecontent, in this paper, we address the QoS-aware storageallocation problem in which the resources are allocatedpreferentially to content classes that are more important ormore sensitive to delays.

7 CONCLUSIONS

In this paper, we argued for differentiated caching servicesin future caches in order to cope with the increasingheterogeneity in Internet clients and content classes. Weproposed a relative differentiated caching services modelthat achieves differentiation of cache hit rates betweendifferent classes. The specified differentiation is carried outvia a feedback-based cache resource allocation heuristic thatadjusts the amount of cache space allocated to each classbased on the difference between its specified performanceand actual performance. We described a control theoreticalapproach for designing the resource allocation heuristic. Itaddresses the problem as one of controller design andleverages principles of digital control theory to achieve anefficient solution. We implemented our results in a real-lifecache and performed performance tests. Evaluation sug-gests that the control theoretical approach results in a verygood controller design. Compared to manual parametertuning approaches, the resulting space controller hassuperior convergence properties and is successful inmaintaining the desired performance differentiation for arealistic cache load.

ACKNOWLEDGMENTS

The work reported in this paper was supported in part byUS National Science Foundation grants CCR-0093144, ANI-0105873, and CCR-0208769.

REFERENCES

[1] T.F. Abdelzaher, K.G. Shin, and N. Bhatti, “Performance Guaran-tees for Web Server End-Systems: A Control-Theoretical Ap-proach,” IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 1,pp. 80-96, Jan. 2002.

[2] Akamai, http://www.akamai.com, 2003.[3] J. Almeida, M. Dabu, A. Manikntty, and P. Cao, “Providing

Differentiated Levels of Service in Web Content Hosting,” Proc.First Workshop Internet Server Performance, June 1998.

12 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 15, NO. 5, MAY 2004

Fig. 11. Relative backbone latency reduction.

[4] V. Almeida, A. Bestavros, M. Crovella, and A. de Oliveira,“Characterizing Reference Locality in the WWW,” Proc. IEEE Conf.Parallel and Distributed Information Systems, 1996.

[5] G. Banga, P. Druschel, and J.C. Mogul, “Resource Containers: ANew Facility for Resource Management in Server Systems,”Operating Systems Design and Implementation, pp. 45-58, 1999.

[6] P. Barford and M.E. Crovella, “Generating Representative WebWorkloads for Network and Server Performance Evaluation,”Proc. Performance ’98/ACM SIGMETRICS ’98, pp. 151-160, 1998.

[7] P. Cao and S. Irani, “Cost-Aware WWW Proxy CachingAlgorithms,” Proc. USENIX Symp. Internet Technology and Systems,pp. 193-206, Dec. 1997.

[8] J. Carlstrom and R. Rom, “Application-Aware Admission Controland Scheduling in Web Servers,” Proc. IEEE Infocom, June 2002.

[9] H. Chen and P. Mohapatra, “Session-Based Overload Control inQoS-Aware Web Servers,” Proc. IEEE Infocom, June 2002.

[10] L. Cherkasova and P. Phaal, Session Based Admission Control: AMechanism for Improving the Performance of an Overloaded WebServer. 1998.

[11] I. Clarke, O. Sandberg, B. Wiley, and T.W. Hong, “Freenet: ADistributed Anonymous Information Storage and RetrievalSystem,” Proc. Workshop Design Issues in Anonymity and Unobser-vability, pp. 311-320, July 2001.

[12] J. Dilley, M. Arlitt, and S. Perret, “Enhancement and Validation ofthe Squid Cache Replacement Policy,” Proc. Fourth Int’l WebCaching Workshop, Mar. 1999.

[13] C. Dovrolis and P. Ramanathan, “Proportional DifferentiatedServices, Part II: Loss Rate Differentiation and Packet Dropping,”Proc. Int’l Workshop Quality of Service, June 2000.

[14] C. Dovrolis, D. Stiliadis, and P. Ramanathan, “ProportionalDifferentiated Services: Delay Differentiation and Packet Schedul-ing,” Proc. SIGCOMM, pp. 109-120, 1999.

[15] P. Druschel and A. Rowstron, “Past: A Large-Scale, PersistentPeer-to-Peer Storage Utility,” Proc. Eighth Workshop Hot Topics inOperating Systems (HotOS VIII), May 2001.

[16] R.T. Fielding and G. Kaiser, “The Apache HTTP Server Project,”IEEE Internet Computing, vol. 1, no. 4, pp. 88-90, July 1997.

[17] S. Gadde, J.S. Chase, and M. Rabinovich, “Web Caching andContent Distribution: A View from the Interior,” Computer Comm.,vol. 24, no. 2, pp. 222-231, 2001.

[18] S. Glassman, “ACachingRelay for theWorldWideWeb,”ComputerNetworks and ISDN Systems, vol. 27, no. 2, pp. 165-173, 1994.

[19] D.I. Inc., http://www.sandpiper.net, 2003.[20] Internet Engineering Task Force, http://www.ietf.org, 2003.[21] J. Kangasharju, J. Roberts, and K. Ross, “Object Replication

Strategies in Content Distribution Networks,” Proc. Web Cachingand Content Distribution Workshop, June 2001.

[22] T.P. Kelly, Y.M. Chan, S. Jamin, and J.K. MacKie-Mason, “BiasedReplacement Policies for Web Caches: Differential Quality-of-Service and Aggregate User Value,” Proc. Fourth Int’l Web CachingWorkshop, Mar. 1999.

[23] M. Koletsou and G. Voelker, “The Medusa Proxy: A Tool forExploring User-Perceived Web Performance,” Proc. Sixth Int’l WebCaching Workshop and Content Delivery Workshop, June 2001.

[24] J. Mogul, “Squeezing More Bits out of HTTP Caches,” IEEENetwork, pp. 6-14, May/June 2000.

[25] Napster, http://www.napster.com, 2003.[26] P. Pradhan, R. Tewari, S. Sahu, A. Chandra, and P. Shenoy, “An

Observation-Based Approach Toward Self-Managing Web Ser-vers,” Proc. Int’l Workshop Quality of Service, May 2002.

[27] T.G. Protocol Specification, 2000. http://dss.clip2.com/gnutellaprotocol04.pdf.

[28] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, “AScalable Content-Addressable Network,” Proc. ACM SIGCOMM,Aug. 2001.

[29] L. Rizzo and L. Vicisano, “Replacement Policies for a ProxyCache,”IEEE/ACM Trans. Networking, vol. 8, no. 2, pp. 158-170, 2000.

[30] A.I.T. Rowstron and P. Druschel, “Storage Management andCaching in PAST, A Large-Scale, Persistent Peer-to-Peer StorageUtility,” Proc. Symp. Operating Systems Principles, pp. 188-201, 2001.

[31] Speedera, http://www.speedera.com, 2003.[32] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrish-

nan, “Chord: A Scalable Peer-to-Peer Lookup Service for InternetApplications,” Proc. ACM SIGCOMM, Aug. 2001.

[33] S.G. Tzafestas, Applied Digital Control.North-Holland Systems andControl Series, 1986.

[34] T. Voigt, R. Tewari, D. Freimuth, and A. Mehra, Kernel Mechanismsfor Service Differentiation in Overloaded Web Servers. 2001.

[35] S. Williams, M. Abrams, C.R. Standridge, G. Abdulla, and E.A.Fox, “Removal Policies in Network Caches for World-Wide WebDocuments,” Proc. ACM SIGCOMM Conf., 1996.

[36] A. Wolman, G.M. Voelker, N. Sharma, N. Cardwell, A.R. Karlin,and H.M. Levy, “On the Scale and Performance of CooperativeWeb Proxy Caching,” Proc. Symp. Operating Systems Principles,pp. 16-31, 1999.

[37] B.Y. Zhao, J.D. Kubiatowicz, and A.D. Joseph, “Tapestry: AnInfrastructure for Fault-Resilient Wide-Area Location and Rout-ing,” Technical Report, UCB//CSD-01-1141, Apr. 2001.

Ying Lu received the BS degree in computerscience from Southwest Jiaotong University,Chengdu, China, in 1996, and the MS degreein computer science from Jinan University,Guangzhou, China, in 1999. Since 1999, shehas been working on her PhD degree incomputer science at the University of Virginia.Her research interests include applying controltheory to Internet-based services, self-tuningQoS-architecture design, and performance man-

agement in complex computing environments. She is a student memberof the IEEE and the IEEE Computer Society.

Tarek F. Abdelzaher received the BSc and MScdegrees in electrical and computer engineeringfrom Ain Shams University, Cairo, Egypt, in1990 and 1994, respectively. He received thePhD degree from the University of Michigan in1999. Since 1999, he has been an assistantprofessor at the University of Virginia where hefounded the Quality of Service (QoS) Labora-tory. He has authored/coauthored more than60 refereed publications. He was guest editor for

the Journal of Computer Communications and the Journal of Real-TimeSystems, and is coeditor of IEEE Distributed Systems Online. He servedon numerous program committees and held several steering positions,including poster chair of ICDCS 2003, work-in-progress chair of RTSS2003, and program chair of RTAS 2004. He is a patent holder onAdaptive Web Systems and a US National Science Foundation CareerAward recipient. His research interests include QoS provisioning, real-time computing, operating systems, computer networking, and sensornetworks. He is particularly interested in developing and experimentallyvalidating new foundations for performance guarantees in highlydynamic unpredictable software systems, ranging from open high-performance Internet applications to low-power sensor networks andembedded systems. He is a member of the IEEE and the IEEEComputer Society.

Avneesh Saxena received the BS degree incomputer science from the Indian Institute ofTechnology, Guwahati, India, in 1999, and theMS degree in computer science from Universityof Virginia in 2001. He is currently working forKnight Trading Group, designing and imple-menting financial systems.

. For more information on this or any other computing topic,please visit our Digital Library at http://computer.org/publications/dlib.

LU ET AL.: DESIGN, IMPLEMENTATION, AND EVALUATION OF DIFFERENTIATED CACHING SERVICES 13