selected techniques in content distribution networks pei cao cisco systems, inc
Post on 21-Dec-2015
216 views
TRANSCRIPT
Selected Techniques in Content Distribution Networks
Pei CaoCisco Systems, Inc.
Enterprise WAN Today
Data Center
Regional Hubs
Branch Offices
Internet
T1
56Kbps,128kbps,DSL …
. . .
. . . . . .
Why Enterprise CDN (ECDN)
• Overcome bandwidth limitations for video applications to branches
• Distribute very-large files to branches• Cache and police web contents• Consolidate data storage...
Components of ECDN
WAN
Edge Content Engine(CE)
Edge CE
Data Center
Web Servers
Content Injection Devices
Content Distribution Manager(CDM)
Branches
IOS Router with WCCPIOS Router with WCCP
.
.
.
.
.
.
Content Delivery
Internet Or
WAN
Internet or Intranet Web Server
HTTP Proxy & Server
Filtering module
Windows Media Proxy & Server
RealNetwork Proxy
MPEG streaming server
CDM Agent Content Dist. Module
RealNetwork Server
Content Distribution
WAN
Edge Content Engine(CE)
Edge CE
Data Center
Web Servers
Root CE(s)
Content Distribution Manager(CDM)
Branches
.
.
.
.
.
.
Challenges in Building CDNs
• Network interoperability• System scalability• Content engine performance• System usability
Outline
• Protocol highlight:Content-based WCCP
• Algorithm highlight: TPUT: Scalable Top-k Algorithm
• Kernel mechanism highlight: Stream Engine
Request Interception
• “Web Content Caching Protocol” (WCCP) on port 80
Internet Or
WAN
TCP SYN TCP SYN
TCP SYN_ACK
ACKACK
GET … HTTP/1.1 GET … HTTP/1.1
HTTP/1.1 200 OK …Cache Hit:
Cache Miss
WCCP Bypass
Internet Or
WAN
TCP SYN TCP SYNTCP SYN
TCP SYN
Dealing with Client Transparency
Internet Or
WAN
GET … HTTP/1.1
GET … HTTP/1.1
HTTP/1.0 200 OK … <META HTTP-EQUIV=\”REFRESH\” …
Cache Miss
404 Not Found …
XXX
TCP SYN TCP SYNTCP SYN
TCP SYN
Content-Based Interception
• Problem: how to intercept all HTTP traffic from client browsers?
• Possible solutions:– Send all traffic through content engine (CE)
• Issues with per-packet latency and CE throughput
– Send traffic to CE but CE tells router which flows to bypass• High overhead for short flows
Algorithm Highlight
Scalable Top-k Algorithm
Top-k Queries in CDNs
Example queries:• List top 10 URLs accessed most often
among all CEs• List top 10 domains that consume
the most storage among all CEs• etc.
Definitions
• a network of m nodes, connected to a central manager (CM)
• each node i has a reverse-sorted list of ( x, Vi(x) )
• an object’s sum V(x) = V1(x)+V2(x)+…+Vm(x)
• Problem: find the k objects with highest sums
A generic problem in distributed systems
Existing Methods
• “Naïve” Algorithm– Each node sends the full list of objects
and their values to the Central Manager
• Threshold Algorithm (TA)– Proposed by multiple groups in the
database research community
The Threshold Algorithm (TA)
(A, 10)(C, 8)(E, 8)(F, 8)(B, 7)(D, 5)(J, 1)(K, 1)
.
.
.
(B, 10)(D, 9)(F, 8)(H, 6)(G, 5)(C, 1)(A, 1)
.
.
.
(C, 10)(A, 9)(G, 8)(J, 7)(F, 6)(D, 4)(B, 1)
.
.
.
Node1
Node2
Node3
Central Manager (CM)
T = 30; V(A)=20, V(C)=19, V(B)=18 T = 26; V(A)=20, V(C)=19, …
T = 24; V(F)=22, V(A)=20, …T = 21; V(F)=22, V(A)=20, …T = 18; V(F)=22, V(A)=20, …
• Example: find top 2 objects with max sums in three columns
Adapting TA for Distributed Environments
• Consists of multiple “rounds”, each round having two round trips– Round-trip #1 “sorted access”: CM asks for the
next B objects on the lists and nodes respond– Round-trip #2 “random lookup”: CM sends a list
of object names to nodes and nodes supply values
– B = k
• Issues– # of rounds unpredictable– O(m2) network traffic on average
New Algorithm: Three-Phase Uniform Threshold (TPUT)
• Motivation: terminate in a fixed number of round trips regardless of input
• Operates in three phases1. Lower-bound estimation2. Pruning3. Final lookup
Partial Sums and Upper Bounds
• Partial sum: PS(x) = ∑Vi’(x)
• Upper bound: U(x) = ∑Ui’(x)
Vi’(x) =Vi(x), if x has been reported by node i to CM
0, otherwise
Ui’(x) =Vi(x), if x has been reported by node i to CM
Ti, otherwise
Ti: Node i sends all objects with values > Ti
Examples
(A, 10)(C, 8)(E, 8)(F, 8)(B, 7)(D, 5)(J, 1)
.
.
.
(B, 10)(D, 9)(F, 8)(H, 6)(G, 5)(C, 1)(A, 1)
.
.
.
(C, 10)(A, 9)(G, 8)(J, 7)(F, 6)(D, 4)(B, 1)
.
.
.
Node1
Node2
Node3
CM
PS(A) = 10+ 0 + 9 = 19U(A) = 10 + 9 + 9 = 28PS(B) = 0 + 10 + 0 = 10U(B) = 8 + 10 + 9 = 27…
For any object O, PS(O) ≤ V(O) ≤ U(O)
Steps in TPUTPhase 1:• Manager gets top k objects from each node• Manager:
– Calculate partial sums of all objects – Take the k’th partial sum E1 (E1 ≤E); set t = E1/m
Phase 2:• Manager gets all objects with value ≥ t from each node• Manager:
– Calculate partial sums again; take the k’th partial sum E2 (E1 ≤ E2 ≤ E)
– Calculate upper bounds of all objects– S = {objects whose upper bounds are ≥ E2}
Phase 3:• Manager Nodes: here is S; send me all objects in S• Nodes Manager: here they are
Example
(A, 10)(C, 8)(E, 8)(F, 8)(B, 7)(D, 5)(J, 1)
.
.
.
(B, 10)(D, 9)(F, 8)(H, 6)(G, 5)(C, 1)(A, 1)
.
.
.
(C, 10)(A, 9)(G, 8)(J, 7)(F, 6)(D, 4)(B, 1)
.
.
.
Node1
Node2
Node3
CM
PS(A) =19; PS(C) =18; E1 = 18; t = 6;
PS(F) = 22; PS(A) =19; E2 = 19U(H) = 18, U(J) = 19 H and J are out!S = (A, B, C, D, E, F, G)
S(F) = 22; S(A) = 20; S(C) = 19; …Top 2 objects are F and A.
Improving the Pruning Power
• Set t = (E1/m) * α, where 0<α<1
(x1,...)(x2,…)
.
.
.
.
.
.
.
.
Node1
(y1,…)(y2,…)
.
.
.
.
.
.
.
.
Node2
(z1,...)(z2,...)
.
.
.
.
.
.
.
.
Noden
E2/m
t
. . .
U(o)
Compression via Hashing
• Problem: reducing traffic in phase 2• Solution: send hashed keys of object IDs
– Node report to CM (hash(o), V(o))– Hashed keys are short
– If hash(o1)==hash(o2), then V = max(V(o1), V(o2))
– Candidate set S is a set of hashed keys
Evaluating TPUT Algorithm
• Trace-driven simulation• Optimality analysis
Trace Data for Simulations
NLANR-10 daily web access from 10 NLANR proxies
Worldcup-30
2-hr logs from 30 WorldCup web servers
DEC-64 split 1-day DEC proxy traces into 64 sub-traces by client IP
DEC-128 split 2-day DEC proxy traces into 128 sub-traces by client IP
NLANR-203 split NLANR traces into 203 sub proxy traces by client IP
Berkeley-512
Split one week UCB traces into 512 sub traces by client IP
Results on Unicast-Bytes
m=10 m=30 m=64 m=128 m=203 m=512
Number of Objects Looked-Up
Trace K=10: TA K=10: TPUT/0.5
NLANR-10 166 18
WorldCup-30 46 12
DEC-64 3164 31
DEC-128 6928 28
NLANR-203 5576 28
Berkeley-512 47899 41
Results on Multicast-Bytes
m=10 m=30 m=64 m=128 m=203 m=512
Optimality Analysis
Main results:• TPUT is instance optimal for data sets with a
log-log slope function C(n)– Zipf distribution: C(n) = n– Zipf distribution: opt-ratio = (m-1)*2m +k*m
• Setting α<1 reduces cost qualitatively. – Zipf distribution: opt-ratio = (m-1)O(√m) +k*m/α
General Instance Optimality
• Definition: An algorithm R is instance-optimal with
optimality ratio C1, if exists C2, such that for any data series D, and any algorithm A,
cost(R, D) ≤ C1 * cost(A, D) + C2
– cost is amount of network traffic– TA is instance optimal with opt-ratio = O(m2)
Worst Cases for Fixed Number Round-Trip Algorithms
• TPUT is not general instance optimal
• Nor can any algorithm that terminates in a fixed number of round trips
Node 1(A, 1)(C, 1)(X1, 0.6)(X2, 0.6)...(Xn, 0.6)(B, 0.5)..
Node 2(B, 1)(D, 0.2).........
Finding obj with highest sum
Log-Log Slope Function C(n)
• L(j) is the value at position j in a reverse-sorted list
• The list satisfies log-log slope function C(n), if, for all j≤k, L(j*C(n)) < L(j)/n
• For Zipf-like distribution L(j) ~ 1/jλ, C(n) = n1/λ.
ListPosition 1 . . . . . Position j . . . . . . .Position j*C(n) . . . . . . .
L(j)
< L(j)/n
Properties of the Two Lower Bounds
• Let E be the “true bottom”
• E1 ≥ E/m
• E2 > E/2
– E2 ≥ E1
– E2 > E – E1*(m-1)/m
E2 > (m/(2m-1))*E
Restricted Instance Optimality of TPUT (α=1)
• Assume D is a collection of m lists all following log-log slope function C(n), then for any algorithm A,
cost(TPUT,D) ≤ cost(A,D) * ((m-1)*C(2m)+C(m)*k)
Effect of α<1
• Property: – If object x appears in n nodes in Phase 2 and
U(x)≥ E2, then its average value in those nodes R(x) ≥ E2 * (1-α)/n
• Let li = the num of objects in S that appear in exactly i nodes in Phase 2, then:– 1*l1 + 2*l2 + 3*l3 + … + m*lm ≤ C(m * (1+α)/α) * ∑bi
– For each i, l1 + l2 + … + li ≤ C( i * (1+ α)/(1-α)) * ∑bi
– Size of S is l1 + l2 + … + lm
Effect of α<1 (Cont’)
• Opt-ratio ≤ (m-1) * C(d*β) + m*k/α, where d isd * C(d*β) - ∑C(i* β) ≤ C(m * (1+α)/α)
For Zipf distribution, TPUT w. α<1 has opt-ratio ≤ (m-1) * c * √m + m*k/α
i=1
d
Top-k Query Calculation in CDNs
• # of objects small naïve alg.• # of objects large TPUT w. α<1
– Optimal α depends on # of nodes– Limit max # of objects sent in phase 2
• TPUT extends to hierarchical networks easily
Kernel Mechanism Highlight
Stream Engine
Building High Performance Internet Streaming Server
• Basic characteristics of streaming protocols– Control channel (TCP): Start/Stop, FF/Rew, Seek, Change bit rate…– Data channel (UDP or TCP): Paced sending of streaming data
• What makes Linux inefficient– Data copies– Context switches
Observations on Per Stream Flow
Process control command
New Req
Process data
Send data
Exit/ Cleanup
Sleep
Check control channel
Read data
setup
Process control command
New Req
Process data
Send data
Exit/ Cleanup
Sleep
Check control channel
Read data
setup
Observations on Per Stream Flow < 1%
runtime> 98% code
> 99% runtime
< 2% code
Where Stream Engine Fits
Process control command
New Req
Process data
Send data
Exit/ Cleanup
Sleep
Check control channel
Read data
Stream Engine
setup
Streaming File and Data Packets
file header packet1 packet2 packetn indices. . .
ts
Sending time SubBlock1 SubBlock2 Padding
TCP header
Stream Engine
• In-kernel event driven module to deliver streaming data
• Similar to sendfile() but has streaming logic– Method to assemble data packet– Timed send– Control channel monitoring
Stream Engine Interface
• client_data_fd• client_control_fd• source_fd & offset• packet_timing_and_assembly
– Example 1: fixed_rate_fixed_block– Example 2: asf_packet_parse
Performance Comparison
050100150200250300350400
Mbps
DarwinStreamingServer(Eventdriven)
W/OStreamEngine(Processbased)
WithStreamEngine(Processbased)
100Kbps
Based on PC: 1 Xeon 2.8Ghz, 2GB mem, 2 Gigabit interface
Stream Engine Future
• Put it in hardware– TCP-Offloading Engines– Special blades in Cat6K switches
• To be used by a highly popular Internet radio station
Summary
• Techniques– Content-based WCCP
• Patent pending
– TPUT as a top-k algorithm• Submitted for publication
– Stream Engine• Published in WCW’2003
• Open research questions