1 probabilistic models for web caching david starobinski, david tse uc berkeley conference and...

27
1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June 2000

Post on 21-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

1

Probabilistic Models for Web Caching

David Starobinski, David Tse

UC Berkeley

Conference and Workshop on Stochastic NetworksMadison, Wisconsin, June 2000

Page 2: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

2

Overview

• Web Caching Goals• Caching Levels• Classical caching algorithms and the

Independent Reference (IR) model • Web caching issues• New algorithms and analysis for Web

caches • Discussion

Page 3: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

3

Web Caching GoalsReduce response latencyReduce bandwidth consumptionReduce server load

Exploit the locality of reference

Page 4: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

4

Web Caching Levels

Internet Internet

Clients

Server

Browsercache

Proxycache

Reverseproxy

Page 5: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

5

Caching: Performance

• Cache buffers have finite capacity

• Goal: Maximize the proportion of requests served by the cache (hit ratio)

• Need to devise algorithms that keep the “hot” documents in the cache

Page 6: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

6

Caching Algorithms

• LRU

• FIFO

• CLIMB (Transpose)

Page 7: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

7

LRU (Least Recently Used)

1234

5

The buffer is arranged as a stack

5

Page 8: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

8

LRU (ii)

123

4

5

Page 9: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

9

LRU (iii)

1234

5

3

Page 10: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

10

LRU (iv)

124

5

3

Page 11: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

11

CLIMB (Transpose)

1234

5

Page 12: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

12

CLIMB (ii)

1324

5

Page 13: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

13

Analysis: The IR model

• N: total number of pages• pi: the probability that page i (i = 1,2,…,N)

is requested• Independent of previous requests• Remarks:

– Model mostly justified for proxy caches– Studies show that web page popularity follow a

Zipf law

Page 14: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

14

Cache algorithms

• K: Capacity storage of the cache (in pages)

• Ideally, place the K pages with the greatest value of pi into the cache

• Problem: the values pi are unknown a priori

Page 15: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

15

LRU, FIFO, CLIMB analysis

• Under the IR model, the cache dynamics can be described by a Markov chain

• Each state {I1, I2,…, IK} represents the identity (URL) and ordering of the pages within the cache

Page 16: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

16

LRU – Stationary Probabilities

K

iN

ij j

i

p

pK

1

,,2,1

• Allows to compute hit ratio• Similar results for FIFO and CLIMB

Page 17: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

17

Analysis - Summary

• Best hit ratio for CLIMB followed by LRU followed by FIFO

• Convergence rate much faster for LRU and FIFO than CLIMB

• Some mathematical issues still open

Page 18: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

18

New Issues

• Non-uniform page size

• Non-uniform access costs– Nearby vs. distant servers– Underloaded vs. overloaded servers

• Page updates

Page 19: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

19

The Extended IR model (Size)

• Same assumptions as in the IR model +

• The size of page i is si

• The cache size is K

Page 20: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

20

Off-Line Problem

2)

possible as large as is 1)

such that },...,2,1{subset a Find

Ks

p

NI

Iii

Iii

Knapsack Problem!

Page 21: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

21

Heuristics

• Place documents in the cache with the greatest pi/si values

• Perform, at most, twice worse than the optimal solution (except for extreme cases)

• Goal: Devise new on-line algorithms that learn to order documents according to pi/si

values

Page 22: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

22

Size-LRU algorithm

• Set smin = min{s1,s2,…,sN }

• A randomized algorithm

• When page i is requested then– Act like LRU with probability smin /si

– Otherwise, do not change the cache ordering

Page 23: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

23

Result

• IR model• LRU

• pi

• Extended IR model• Size-LRU

• pi/si

Size-LRU is dual to LRU

Page 24: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

24

Example: Size-LRU Stationary Probabilities

N

iN

ij jj

ii

sp

spN

1

,,2,1

Page 25: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

25

Numerical Example

• N=100 documents

• Page popularity

• Heavy-tailed document size

8.0

1

ipi

xxs

1

1)Pr(

Page 26: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

26

Numerical Example

Page 27: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June

27

Summary

• New issues in Web caching

• Size-LRU algorithm

• Dual to LRU

• Extensions for cost issue

• On-going research

The End