towards a scalable, adaptive and network-aware content distribution network yan chen eecs department...

Towards a Scalable, Adaptive and Network-aware Content Distribution NetworkYan Chen

EECS DepartmentUC Berkeley

OutlineMotivation and ChallengesOur Contributions: SCAN systemCase Study: Tomography-based overlay network monitoring systemConclusions

MotivationThe Internet has evolved to become a commercial infrastructure for service deliveryWeb delivery, VoIP, streaming media Challenges for Internet-scale servicesScalability: 600M users, 35M Web sites, 2.1Tb/sEfficiency: bandwidth, storage, managementAgility: dynamic clients/network/serversSecurity, etc. Focus on content delivery - Content Distribution Network (CDN)Totally 4 Billion Web pages, daily growth of 7M pagesAnnual traffic growth of 200% for next 4 years

How CDN Works

Challenges for CDNReplica Location Find nearby replicas with good DoS attack resilienceReplica DeploymentDynamics, efficiencyClient QoS and server capacity constraintsReplica ManagementReplica index state maintenance scalabilityAdaptation to Network Congestion/Failures Overlay monitoring scalability and accuracy

SCAN: Scalable Content Access NetworkProvision: Dynamic Replication + Update Multicast Tree BuildingReplica Management:(Incremental) Content ClusteringNetwork End-to-End Distance Monitoring Internet Iso-bar: latency TOM: loss rateNetwork DoS ResilientReplica Location: Tapestry

Replica LocationExisting Work and ProblemsCentralized, Replicated and Distributed Directory ServicesNo security benchmarking, which one has the best DoS attack resilience?Solution Proposed the first simulation-based network DoS resilience benchmarkApplied it to compare three directory servicesDHT-based Distributed Directory Services has best resilience in practicePublication3rd Int. Conf. on Info. and Comm. Security (ICICS), 2001

Replica Placement/MaintenanceExisting Work and ProblemsStatic placementDynamic but inefficient placementNo coherence supportSolutionDynamically place close to optimal # of replicas with clients QoS (latency) and servers capacity constraintsSelf-organize replica into a scalable application-level multicast for disseminating updatesWith overlay network topology onlyPublicationIPTPS 2002, Pervasive Computing 2002

Replica ManagementExisting Work and ProblemsCooperative access for good efficiency requires maintaining replica indicesPer Website replication, scalable, but poor performancePer URL replication, good performance, but unscalableSolutionClustering-based replication reduces the overhead significantly without sacrificing much performanceProposed a unique online Web object popularity prediction scheme based on hyperlink structuresOnline incremental clustering and replication to push replicas before accessedPublicationICNP 2002, IEEE J-SAC 2003

Adaptation to Network Congestion/Failures Existing Work and Problems Latency estimationClustering-based: network proximity based, inaccurateCoordinate-based: symmetric distance, unscalable to updateGeneral metrics: n2 measurement for n end hostsSolutionLatency: Internet Iso-bar - clustering based on latency similarity to a small number of landmarksLoss rate: Tomography-based Overlay Monitoring (TOM) - selectively monitor a basis set of O(n logn) paths to infer the loss rates of other pathsPublicationInternet Iso-bar: SIGMETRICS PER 2002TOM: SIGCOMM IMC 2003

SCAN ArchitectureLeverage Distributed Hash Table - Tapestry forDistributed, scalable location with guaranteed successSearch with locality

data planenetwork planedatasourceWeb serverSCAN serverReplica LocationDynamic Replication/Update and Replica ManagementOverlay Network Monitoring

MethodologyNetwork topologyWeb workloadNetwork end-to-end latency measurementAnalytical evaluationPlanetLab tests

Case Study:

Tomography-based Overlay

Network Monitoring

TOM OutlineGoal and Problem FormulationAlgebraic Modeling and Basic AlgorithmsScalability AnalysisPractical IssuesEvaluationApplication: Adaptive Overlay Streaming MediaConclusions

Existing WorkGeneral Metrics: RON (n2 measurement)Latency EstimationClustering-based: IDMaps, Internet Isobar, etc.Coordinate-based: GNP, ICS, Virtual LandmarksNetwork tomographyFocusing on inferring the characteristics of physical links rather than E2E pathsLimited measurements -> under-constrained system, unidentifiable linksGoal: a scalable, adaptive and accurate overlay monitoring system to detect e2e congestion/failures

Problem FormulationGiven an overlay of n end hosts and O(n2) paths, how to select a minimal subset of paths to monitor so that the loss rates/latency of all other paths can be inferred.

Assumptions:Topology measurableCan only measure the E2E path, not the link

Our ApproachSelect a basis set of k paths that fully describe O(n2) paths (k O(n2)) Monitor the loss rates of k paths, and infer the loss rates of all other pathsApplicable for any additive metrics, like latency

Algebraic ModelPath loss rate p, link loss rate l

ADCB123

Putting All Paths TogetherTotally r = O(n2) paths, s links, s rADCB123=

Sample Path Matrixx1 - x2 unknown => cannot compute x1, x2 Set of vectorsform null spaceTo separate identifiable vs. unidentifiable components: x = xG + xN

Intuition through Topology VirtualizationVirtual links:Minimal path segments whose loss rates uniquely identifiedCan fully describe all pathsxG is composed of virtual linksAll E2E paths are in path space, i.e., GxN = 0

More ExamplesReal links (solid) and all of the overlay paths (dotted) traversing themVirtualizationVirtual links

Basic AlgorithmsSelect k = rank(G) linearly independent paths to monitorUse QR decompositionLeverage sparse matrix: time O(rk2) and memory O(k2)E.g., 79 sec for n = 300 (r = 44850) and k = 2541Compute the loss rates of other pathsTime O(k2) and memory O(k2)E.g., 1.89 sec for the example above=

Scalability Analysisk O(n2) ?For a power-law Internet topology When the majority of end hosts are on the overlay

When a small portion of end hosts are on overlayIf Internet a pure hierarchical structure (tree): k = O(n)If Internet no hierarchy at all (worst case, clique): k = O(n2)Internet has moderate hierarchical structure [TGJ+02]

k = O(n) (with proof)For reasonably large n, (e.g., 100), k = O(nlogn)(extensive linear regression tests on both synthetic and real topologies)

TOM OutlineGoal and Problem FormulationAlgebraic Modeling and Basic AlgorithmsScalability AnalysisPractical IssuesEvaluationApplication: Adaptive Overlay Streaming MediaSummary

Practical IssuesTopology measurement errors toleranceRouter aliasesIncomplete routing infoMeasurement load balancingRandomly order the paths for scan and selection ofAdaptive to topology changesDesigned efficient algorithms for incrementally updateAdd/remove a path: O(k2) time (O(n2k2) for reinitialize)Add/remove end hosts and Routing changes

Path loss rate estimation accuracyAbsolute error |p p |Error factor [BDPT02]

Lossy path inference: coverage and false positive ratioMeasurement load balancingCoefficient of variation (CV)Maximum vs. mean ratio (MMR)Speed of setup, update and adaptationEvaluation Metrics

EvaluationExtensive SimulationsExperiments on PlanetLab51 hosts, each from different organizations51 50 = 2,550 pathsOn average k = 872Results on AccuracyAvg real loss rate: 0.023Absolute error mean: 0.0027 90% < 0.014Error factormean: 1.1 90% < 2.0On average 248 out of 2550 paths have no or incomplete routing informationNo router aliases resolved

Areas and Domains# of hostsUS (40).edu33.org3.net2.gov1.us1Interna-tional (11)Europe (6)France1Sweden1Denmark1Germany1UK2Asia (2)Taiwan1Hong Kong1Canada2Australia1

Evaluation (contd)Results on SpeedPath selection (setup): 0.75 secPath loss rate calculation: 0.16 sec for all 2550 pathsResults on Load BalancingSignificantly reduce CV and MMR, up to a factor of 7.3

TOM OutlineGoal and Problem FormulationAlgebraic Modeling and Basic AlgorithmsScalability AnalysisPractical IssuesEvaluationApplication: Adaptive Overlay Streaming MediaConclusions

Motivation Traditional streaming media systems treat the network as a black boxAdaptation only performed at the transmission end points

Overlay relay can effectively bypass congestion/failuresBuilt an adaptive streaming media system that leveragesTOM for real-time path infoAn overlay network for adaptive packet buffering and relay

XUC BerkeleyUC San DiegoStanfordHP LabsAdaptive Overlay Streaming MediaImplemented with Winamp client and SHOUTcast serverCongestion introduced with a Packet ShaperSkip-free playback: server buffering and rewindingTotal adaptation time < 4 seconds

SERVER

OVERLAY RELAYNODE

OVERLAY NETWORK OPERATION CENTER

CLIENT

3. Network congestion /failure

4. Detect congestion /failure

2. Register trigger

7. Skip-free streaming media recovery

6. Setup New Path

1. Setupconnection

5. Alert +New Overlay Path

Adaptive Streaming Media Architecture

Data

Server

Cloud

Card

MEDIA SOURCE

SERVER

SHOUTcast Server

Buffering Layer

BUFFER

FromSHOUTcastserver

Client 1

Client 2

Client 3

Client 4

Calculated concatenation point

Client 2

Client 3

OVERLAY RELAY NODE

Client 4

Client 1

CLIENT

INTERNET

Triggering /alert + new path

Winamp client

Overlay Layer

Byte Counter

TCP/IP Layer

Path Management

Byte Counter

Winamp Video/Audio Filter

Internet

RELAY

Overlay Layer

Path Management

TCP/IP Layer

RELAY

TCP/IP Layer

OVERLAY NETWORK OPERATION CENTER

SummaryA tomography-based overlay network monitoring systemSelectively monitor a basis set of O(n logn) paths to infer the loss rates of O(n2) pathsWorks in real-time, adaptive to topology changes, has good load balancing and tolerates topology errorsBoth simulation and real Internet experiments promisingBuilt adaptive overlay streaming media system on top of TOMBypass congestion/failures for smooth playback within seconds

Tie Back to SCANProvision: Dynamic Replication + Update Multicast Tree BuildingReplica Management:(Incremental) Content ClusteringNetwork End-to-End Distance Monitoring Internet Iso-bar: latency TOM: loss rateNetwork DoS ResilientReplica Location: Tapestry

Contribution of My ThesisReplica location Proposed the first simulation-based network DoS resilience benchmark and quantify three types of directory servicesDynamically place close to optimal # of replicasSelf-organize replicas into a scalable app-level multicast tree for disseminating updatesCluster objects to significantly reduce the management overhead with little performance sacrificeOnline incremental clustering and replication to adapt to users access pattern changesScalable overlay network monitoring

Thank you !

Backup Materials

Existing CDNs Fail to Address these ChallengesNon-cooperative replication inefficientNo coherence for dynamic contentUnscalable network monitoring - O(M N)M: # of client groups, N: # of server farmsX

Network Topology and Web WorkloadNetwork TopologyPure-random, Waxman & transit-stub synthetic topologyAn AS-level topology from 7 widely-dispersed BGP peersWeb WorkloadAggregate MSNBC Web clients with BGP prefixBGP tables from a BBNPlanet routerAggregate NASA Web clients with domain namesMap the client groups onto the topology

Network E2E Latency MeasurementNLANR Active Measurement Project data set111 sites on America, Asia, Australia and EuropeRound-trip time (RTT) between every pair of hosts every minute17M daily measurementRaw data: Jun. Dec. 2001, Nov. 2002 Keynote measurement dataMeasure TCP performance from about 100 worldwide agentsHeterogeneous core network: various ISPsHeterogeneous access network: Dial up 56K, DSL and high-bandwidth business connectionsTargets40 most popular Web servers + 27 Internet Data CentersRaw data: Nov. Dec. 2001, Mar. May 2002

Internet Content Delivery Systems

Absolute and Relative ErrorsFor each experiment, get its 95 percentile absolute and relative errors for estimation of 2,550 paths

Lossy Path Inference Accuracy90 out of 100 runs have coverage over 85% and false positive less than 10%Many caused by the 5% threshold boundary effects

Loss rate distribution

MetricsAbsolute error |p p |: Average 0.0027 for all paths, 0.0058 for lossy pathsRelative error [BDPT02]

Lossy path inference: coverage and false positive ratioOn average k = 872 out of 2550PlanetLab Experiment Results

lossrate[0, 0.05)lossy path [0.05, 1.0] (4.1%)[0.05, 0.1)[0.1, 0.3)[0.3, 0.5)[0.5, 1.0)1.0%95.9%15.2%31.0%23.9%4.3%25.6%

Experiments on Planet Lab51 hosts, each from different organizations51 50 = 2,550 pathsSimultaneous loss rate measurement300 trials, 300 msec eachIn each trial, send a 40-byte UDP pkt to every other hostSimultaneous topology measurementTracerouteExperiments: 6/24 6/27100 experiments in peak hours

Areas and Domains# of hostsUS (40).edu33.org3.net2.gov1.us1Interna-tional (11)Europe (6)France1Sweden1Denmark1Germany1UK2Asia (2)Taiwan1Hong Kong1Canada2Australia1

Motivation With single node relayLoss rate improvementAmong 10,980 lossy paths:5,705 paths (52.0%) have loss rate reduced by 0.05 or more3,084 paths (28.1%) change from lossy to non-lossy

Throughput improvementEstimated with

60,320 paths (24%) with non-zero loss rate, throughput computableAmong them, 32,939 (54.6%) paths have throughput improved, 13,734 (22.8%) paths have throughput doubled or more

Implications: use overlay path to bypass congestion or failures

SCANCoherence for dynamic contentCooperative clustering-based replicationXScalable network monitoring O(M+N)s1, s4, s5

Problem FormulationSubject to certain total replication cost (e.g., # of URL replicas)Find a scalable, adaptive replication strategy to reduce avg access cost

SCAN: Scalable Content Access NetworkCDN Applications (e.g. streaming media)Provision: Cooperative Clustering-based ReplicationUser Behavior/Workload MonitoringCoherence: Update Multicast Tree ConstructionNetwork PerformanceMonitoringNetwork Distance/ Congestion/ FailureEstimationred: my work, black: out of scope

Evaluation of Internet-scale SystemAnalytical evaluationRealistic simulationNetwork topologyWeb workloadNetwork end-to-end latency measurementNetwork topologyPure-random, Waxman & transit-stub synthetic topologyA real AS-level topology from 7 widely-dispersed BGP peers

Web WorkloadAggregate MSNBC Web clients with BGP prefixBGP tables from a BBNPlanet routerAggregate NASA Web clients with domain namesMap the client groups onto the topology

Simulation MethodologyNetwork TopologyPure-random, Waxman & transit-stub synthetic topologyAn AS-level topology from 7 widely-dispersed BGP peersWeb WorkloadAggregate MSNBC Web clients with BGP prefixBGP tables from a BBNPlanet routerAggregate NASA Web clients with domain namesMap the client groups onto the topology

Online Incremental ClusteringPredict access patterns based on semanticsSimplify to popularity prediction Groups of URLs with similar popularity? Use hyperlink structures!Groups of siblingsGroups of the same hyperlink depth: smallest # of links from root

Challenges for CDNOver-provisioning for replicationProvide good QoS to clients (e.g., latency bound, coherence)Small # of replicas with small delay and bandwidth consumption for updateReplica ManagementScalability: billions of replicas if replicating in URLO(104) URLs/server, O(105) CDN edge servers in O(103) networksAdaptation to dynamics of content providers and customersMonitoringUser workload monitoring End-to-end network distance/congestion/failures monitoringMeasurement scalabilityInference accuracy and stability

SCAN ArchitectureLeverage Decentralized Object Location and Routing (DOLR) - Tapestry forDistributed, scalable location with guaranteed successSearch with localitySoft state maintenance of dissemination tree (for each object)

data planenetwork planedatasourceWeb serverSCAN serverRequest LocationDynamic Replication/Update and Content Management

Wide-area Network Measurement and Monitoring System (WNMMS)Cluster AClientsCluster BMonitorsCluster CSCAN edge serversSelect a subset of SCAN servers to be monitorsE2E estimation forDistanceCongestionFailuresnetwork plane

Dynamic ProvisioningDynamic replica placementMeeting clients latency and servers capacity constraintsClose-to-minimal # of replicasSelf-organized replicas into app-level multicast treeSmall delay and bandwidth consumption for update multicastEach node only maintains states for its parent & direct childrenEvaluated based on simulation ofSynthetic traces with various sensitivity analysisReal traces from NASA and MSNBCPublicationIPTPS 2002Pervasive Computing 2002

Effects of the Non-Uniform Size of URLsReplication cost constraint : bytesSimilar trends existPer URL replication outperforms per Website dramatically Spatial clustering with Euclidean distance and popularity-based clustering are very cost-effective1234

SCAN: Scalable Content Access Network

ClientISP 2ISP 1Web Proxy Caching

Conventional CDN: Non-cooperative PullClient 1Web content serverISP 2ISP 1Inefficient replication

SCAN: Cooperative PushCDN name serverClient 1ISP 2ISP 1Significantly reduce the # of replicas and update cost

Internet Content Delivery Systems

Scalability for request redirectionPre-configured in browserUse Bloom filter to exchange replica locationsCentralized CDN name serverCentralized CDN name serverDecentra-lized P2P location

PropertiesWeb caching (client initiated)Web caching (server initiated)Pull-based CDNs (Akamai)Push-based CDNsSCAN

Efficiency (# of caches or replicas)No cache sharing among proxiesCache sharingNo replica sharing among edge serversReplica sharingReplica sharing

Network- awarenessNoNoYes, unscalable monitoring systemNoYes, scalable monitoring system

Coherence supportNoNoYes NoYes

Previous Work: Update DisseminationNo inter-domain IP multicastApplication-level multicast (ALM) unscalableRoot maintains states for all children (Narada, Overcast, ALMI, RMX)Root handles all join requests (Bayeux)Root split is common solution, but suffers consistency overhead

Comparison of Content Delivery Systems (contd)

PropertiesWeb caching (client initiated)Web caching (server initiated)Pull-based CDNs (Akamai)Push-based CDNsSCANDistributed load balancingNoYesYesNoYesDynamic replica placementYesYesYesNoYesNetwork- awarenessNoNoYes, unscalable monitoring systemNoYes, scalable monitoring systemNo global network topology assumptionYesYesYesNoYes

Architecture will compare uncooperative pull-based vs. cooperative push-based replication# of Internet users: http://www.usabilitynews.com/news/article637.asp, will reach one billion by 2005. # of total Internet traffic: http://www.cs.columbia.edu/~hgs/internet/traffic.html

Info source: matrix source.Efficiency: design systems with growth potentialAll the entities in the system, such as client, servers and network, change their status continuously. E.g.,

Amazing growth in WWW trafficDaily growth of roughly 7M Web pagesAnnual growth of 200% predicted for next 4 years

1M page growth Scientific American June 1999 issue.

7M page growth rate: http://cyberatlas.internet.com/big_picture/traffic_patterns/article/0,,5931_413691,00.htmlTotally 4 billion pages

The convergence in the digital world of voice, data and video is expected to lead to a compound annual growth rate of 200% of Web traffic over the next four years -- http://www.skybridgesatellite.com/l21_mark/cont_22.htm

Define the term for replicaThere are large and popular Web servers, such as CNN.com and msnbc.com, which need to improve performance and scalability. Lets imagine that there is a Web server in Cambridge, London. And one client in W&M try to access some content.Later on, a nearby client from U of Virginia try to access similar content, then directly served from the CDN server.(UGA, and GIT), U TennesseeSo, the CDN reduces the latency for the client, reduces the b/w for Web servers, and improve the scalability and availability for the Web content server. And it helps the Internet as a whole by reducing the long-haul traffic.How to deploy replicas in a dynamic and efficient manner while having client QoS (latency) and server capacity constraints

CDN applications, not addressed in the thesisWhat if no Tapestry? DHT, from operator point of view, divide the problems. The other problem is solvedWhy choose Tapestry?Introduce replicas, caches, etc.UpdatesTapestry, who r involved and functionalitiesIntroduce the dissemination tree, soft state maintentanceHow to design and evaluate the Internet-scale services? Analytical evaluation, such as back-of-envelope calculation, is good for algorithm design, but not for real evaluation.

We cant deploy it on thousands of nodes to test out the scalability or performance. Instead, our solution is realistic simulation. That is, we collect real-world traces and measurement to understand the Internet behavior, and apply them for evaluation of algorithms/arch.To obtain these real-world traces and measurement, I have been widely collaborated with industry and research labs, such as These measurement data often provide valuable insight which led to change of the original design choices.Take the network monitoring design for instance, although there are M*N possible paths among all client group and server farms, for the underlying network topology, there are only a small # of lossy links.We just need to identify the lossy links, monitor them, and then make inference for the congestion/failure of all the paths.

The measurement data can fall in three categories: network topology, web workload and e2e network dist measurement.The first two are used to evaluate the provisioning and coherence part, while the last one is Architecture will compare uncooperative pull-based vs. cooperative push-based replicationThese coordinates can not reflect any congestion/failures.

Latency doesnt imply congestion/failures. Especially IDMaps only looks at the # of hops. Under-constrained system imply there are some links whose characteristics are un-identifiable. Use various heuristics and statistical methods to infer as much as they can.For example, assume symmetric routing, the existing system has 6 paths, but only 4 links. Monitoring of 4 independent paths can solve the loss rates of all links, then we can compute the loss rate of other 2 paths.Basically, there is an NOC. The end hosts will measure the topology and send to NOC. NOC will select 4 paths to measure and instrument certain end hosts to do the measurement, and collect the results. Then it compute the loss rates of the basis set and infer the loss rates of all other paths.For example, assume symmetric routing, the existing system has 6 paths, but only 4 links. Monitoring of 4 independent paths can solve the loss rates of all links, then we can compute the loss rate of other 2 paths.Basically, there is an NOC. The end hosts will measure the topology and send to NOC. NOC will select 4 paths to measure and instrument certain end hosts to do the measurement, and collect the results. Then it compute the loss rates of the basis set and infer the loss rates of all other paths.

Assumptions: Break into 2 slidesNow lets take a look at what the basis set are, and how to select paths to monitor.

Take logarithmic on both sides, we get Assume there are totally s links that the overlay spans on, for each path, we use a vector of size s to represent the links on the path: 1 means it is on, and 0 o/w.

Thus for one path, we gett a linear equation. Putting all the n^2 paths together, we get a series of equations, and we write them as a matrix format.Then v^T xWhen we put all the r=O(n^2) paths together, we have matrix G. G has r rows, corresponding to r paths, and s columns, corresponding to s links. b is the path loss rate vector of log(1-p).Now lets take a look at what the basis set are, and how to select paths to monitor.

Take logarithmic on both sides, we get Assume there are totally s links that the overlay spans on, for each path, we use a vector of size s to represent the links on the path: 1 means it is on, and 0 o/w.

Thus for one path, we gett a linear equation. Putting all the n^2 paths together, we get a series of equations, and we write them as a matrix format.Then v^T xWhen we put all the r=O(n^2) paths together, we have matrix G. G has r rows, corresponding to r paths, and s columns, corresponding to s links. b is the path loss rate vector of log(1-p).Note that G is rank-deficient b/c there are two links, for which we can not identify their loss rates.

Notice that x_1 and x_2 can not be uniquely identified, b/c we only know the sum of them, but nothing about their difference. Actually they r the unidentifiable links in network tomography. Here we would like to separate the identifiable and unidentifiable components, and see what indeed we can measure and infer. If we view the linear system in the high-dim space, we have this graph. Each x variable is a dimension.Here \alpha * (x_1 x_2) form the null space, which can not be measured.And there is an orthogonal path space, which is completely measured.Here the path space is a plane, one axis is x_1+x_2, the other is x_3.

For any vector x in the high-dim space, we can always decompose it as x_G + x_N, where x_G is its projection on the path space, thus completely known, and x_N is its projection on the null space.

Given this decomposition, we can rewrite the formula as Unlike x, x_G can be uniquely identified by the e2e measurements. This transformation filters out unidentifiable components by dimension reduction.Notice that x_G has two components, x_1+x_2 and x_3. Then we have two virtual links as here. Notice that the two virtual links form the basis of the path space G, and thus can fully describe all E2E paths.

The point is once x_G is computable, we can calculate b from x_G. So all we need to do is just to monitor each paths to calculate x_G.Now lets take a look at more examples for x_G and virtual links.Only talk about the bottom one.Recall that these virtual links form x_G so that x_G can also be uniquely identified.

We just need to monitor the virtual links to get x_G, which is sufficient to describe all the paths in the path space.Now we have G x_G = b, we just need to monitor enough paths to solve x_G, then we can compute all other b.Recall that all we need to do is to monitor enough paths to get the loss rates of x_G, then we compute b.

QR decompositionArchitecture will compare uncooperative pull-based vs. cooperative push-based replicationUpcoming paper will address the practical issues and full evaluation.We did sensitivity test on the trial period.Sending rate: 40*50*8*3.3 = 53.3 KbpsRelative errors: 1.1 for all paths, 1.7 for lossy paths, on average about 2900 links in the topologySending We put the load value of each end host into 10 equally spaced bins. Then count the number of nodes in each bin as y-axis.Sending Architecture will compare uncooperative pull-based vs. cooperative push-based replicationnumber of lossy paths: 10980 paths improved: 9385 paths lossy2nonlossy: 3084 paths w/ >5% improvment: 5705 paths w/ >5% improv + non lossy: 2787 This slide show how the application interact with the monitoring service by having a path management layer b/t the app and the tcp layer that set up the overlay path for overlay transmission.CDN applications, not addressed in the thesisCompared w/ per-URL based replication, Client from NYU try to access it again?Even when wide-distributed clients have very sparse requests, it may end up placing replicas on every CDN servers.Inefficient replication will have two effects: 1. wastes a lot of replication bandwidth, and consequently, update bandwidth;2. the replicas are not fully utilized. CDN servers have limit storage. So the inefficient replication will cause old content replicas constantly replaced by new content replicas before serving more clients.

Questions on consistent caching: Recently, consistent hashing other even more sophisticated mechanisms are proposed to locate the replicas with high probability.It can be regarded as probabilistic replica directory. However, there are some serious problems:1. They operate on per-object basis. Given millions of object, it is very expensive to compute the optimal redirection choice and there is very big management overhead for expanding/shrinking the replica directory due to the changes of access pattern.2. They didnt know for sure where the replicas, so cant update them.

Throughout our study, we use measurement-based simulation and analysis. The method has two parts: topology and workload. MSNBC is consistently ranked among the top news sites, thus highly popular and dynamic contents.In comparison, the NASA 95 traces is more static and much less accessed. We use various n/w topology and web workload for simulation.All the results we got are based on this methodologyThe traces are applied throughout our experiments, except for the online incremental clustering, which requires the full Web contents.

For MSNBC, 10K groups left, chooses top 10% covering >70% of requests

Emphasize here it is the abstract cost, it could be latency, or # of hops, depending on simulations. Collect several months data, to study not only the performance, but also the stability of performance.

Next, I will show how to use these measurement data to help design and evaluate SCAN. In particular, I will focus on one of the components: clustering of Web contents for efficient replication.Fundamental difference b/t web caching (server) vs. SCAN:InfrastructureMore intelligent and close-to-optimal on the objects and replica locationsWe did sensitivity test on the trial period.Sending rate: 40*50*8*3.3 = 53.3 KbpsSending number of lossy paths: 10980 paths improved: 9385 paths lossy2nonlossy: 3084 paths w/ >5% improvment: 5705 paths w/ >5% improv + non lossy: 2787 SCAN puts objects into clusters. In each cluster, the objects are likely to be accessed by clients that r topologically close. For example, the red and yellow URLs Given millions of URLs, clustering-based replication can dramatically reduce the amount of replica location states, thus afford to build replica directories.Further, SCAN pushes the cluster replicas to certain strategic locations, then use replica directories to forward clients requests to these replicas. We call it cooperative push, which can significantly reduces the # of replicas, and consequently, the update cost.

Define the term for replicaM client groups and N servers O(M+N) instead of O(M*N)Define the term for replicaIDC charge CDN by the amount of bandwidth used, so translated to the bytes replicated. Here we simplified it as the # of URL replicas as we assume the URLs are of the same size.As we will show later, the nature of non-uniform size doesnt really affect the resultsCDN applications, not addressed in the thesisHow to evaluate the architecture/algorithms designed for Internet-scale services? We cant deploy it on thousands of nodes to test out the scalability or performance

Analytical way, back-of-envelope calculation.Realistic simulation, real-world traces and measurement to understand the Internet behavior, and apply them for evaluation of algorithms/arch.

In my thesis, I developed wide collaboration w/ industry and research labs to obtain these real-world traces and measurement.It include the network topo, web workload and e2e network dist measurement.



Emphasize here it is the abstract cost, it could be latency, or # of hops, depending on simulations. How to evaluate the architecture/algorithms designed for Internet-scale services? We cant deploy it on thousands of nodes to test out the scalability or performance

Analytical way, back-of-envelope calculation.Realistic simulation, real-world traces and measurement to understand the Internet behavior, and apply them for evaluation of algorithms/arch.

In my thesis, I developed wide collaboration w/ industry and research labs to obtain these real-world traces and measurement.It include the network topo, web workload and e2e network dist measurement.



Emphasize here it is the abstract cost, it could be latency, or # of hops, depending on simulations. Throughout our study, we use measurement-based simulation and analysis. The method has two parts: topology and workload. MSNBC is consistently ranked among the top news sites, thus highly popular and dynamic contents.In comparison, the NASA 95 traces is more static and much less accessed. We use various n/w topology and web workload for simulation.All the results we got are based on this methodologyThe traces are applied throughout our experiments, except for the online incremental clustering, which requires the full Web contents.


Emphasize here it is the abstract cost, it could be latency, or # of hops, depending on simulations. Replicate the URLs before accessedThe challenge is how to predict the access patterns from semantics info only?From previous study, we know that popularity-based clustering has very good performance, i.e., URLs w/ similar popularity has similar aggregated access patterns.How to find groups of URLs with similar popularity? So that we can infer the new URL popularity by the popularity of old URLs in the same group.We explored two simple options, with hyperlink structures.One is groups of siblings, the other is group of same hyperlink depth.To compare the two methods, we measure the divergence of Every line start with How toWorld Cup log peak time: 209K req/min, 580MB transfer per minuteMSNBC SIGCOMM 2000 paper, number for order of URLs and clients.

The CDN operators set up the latency constraints, based onDifferent classes of clientsHuman perception of latency

What if no Tapestry? DHT, from operator point of view, divide the problems. The other problem is solvedWhy choose Tapestry?Introduce replicas, caches, etc.UpdatesTapestry, who r involved and functionalitiesIntroduce the dissemination tree, soft state maintentanceOngoing: better clustering methods, dynamic service modelOne reason is that the size doesnt differ much for top 1000 URLs (from several hundred bytes to tens of thousands bytes).Talk about the design space of SCAN and my contributions!Existing CDNs use non-cooperative access, as we have shown. In contrast, SCAN builds the replica directory, and forward clients requests to the closest CDN server with that replica, we call it cooperative access.Further, SCAN pushes the replicas instead of pull for high performance and availability.We also examine the replication granularity in full spectrum, from per site to per object, and found that clustering of objects is most cost-effective.

Mention contributions clearly!!!

Coherence support, unicast is not efficient nor scalable, but wide-area IP multicast doesnt exist in todays Internet. In contrast, we use app-level multicast.

What is new for app-level multicast?Use p2p system DHT, tapestry, so that each node only has to maintain its parent and direct children. We can take it offline for that

The key observation is that: although there are M*N possible paths, in the underlying IP network, there are a few lossy links that really dominate the e2e loss. So all we need to do is to identify the lossy links, and only monitor a few paths to calculate the loss rates on these links, then infer the loss rates of all other paths.

[ NOTES: NWS (token passing) wont really solve the scalability problem, b/c it essentially has the same approach as Akamai. ]There are many ISPs, each . There are also CDN name servers and Web content server, which hosts the Web contents denoted as the green box. When client 1 has request for the green URL, the hostname resolution has to go through the CDN name server.CDN name server will return the IP address of the local CDN server of ISP 1.Then client 1 has to send request to its local CDN server, which essentially performs as a cache. The problem is that the CDN name servers dont track where the content has been replicated. So when client 2 request for green URL, the CDN name server still reply the IP address of CDN server of ISP 2.although the content has been replicated in CDN server 1, which could be quite close to client 2. 4%: explains that comparison is under similar average latency for two schemes.Scalability for summary cache is that they target O(100) proxies and O(1M) pages, which is not as good as we need.The key problem is proxy server is usually installed by ISPs, which may not accept the cooperative model.Distributed load balancing is implemented through request redirection mechanisms, such as server initiated Web caching, pull-based CDNs and SCAN.

towards a scalable, adaptive and network-aware content distribution network yan chen eecs department...

Documents

network proximity

latency tom

best dos attack resilience

latency similarity

best resilience

clients qos latency

good efficiency

good performance