enabling flow-level latency measurements across routers in data centers_ppt

Upload: ziwen-zhang

Post on 07-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt

    1/15

    Enabling Flow-level Latency

    Measurements across Routers in Data

    Centers

    Parmjeet Singh, Myungjin Lee

    Sagar Kumar, Ramana Rao Kompella

  • 8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt

    2/15

    Latency-critical applications in data centers

    Guaranteeing low end-to-end latency is important Web search (e.g., Googles instant search service)

    Retail advertising

    Recommendation systems

    High-frequency trading in financial data centers

    Operators want to troubleshoot latency anomalies

    End-host latencies can be monitored locally

    Detection, diagnosis and localization through a network: no

    native support of latency measurements in a router/switch

  • 8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt

    3/15

    Prior solutions

    Lossy Difference Aggregator (LDA) Kompella et al. [SIGCOMM 09]

    Aggregate latency statistics

    Reference Latency Interpolation (RLI) Lee et al. [SIGCOMM 10]

    Per-flow latency measurements

    More suitable due to more fine-grained measurements

  • 8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt

    4/15

    Deployment scenario of RLI

    Upgrading all switches/routers in a data center network Pros

    Provide finest granularity of latency anomaly localization

    Cons

    Significant deployment cost

    Possible downtime of entire production data centers

    In this work, we are considering partial deployment of RLI

    Our approach: RLI across Routers (RLIR)

  • 8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt

    5/15

    Overview of RLI architecture

    Goal

    Latency statistics on a per-flowbasis between interfaces

    Problem setting

    No storing timestamp for each packet at ingress and egressdue to high storage and communication cost

    Regular packets do not carry timestamps

    Router

    Ingress I

    Egress E

  • 8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt

    6/15

    Overview of RLI architecture

    Premise of RLI: delay locality

    Approach

    1) The injector sends reference packets regularly

    2) Reference packet carries ingress timestamp

    3) Linear interpolation: compute per-packet latency estimates atthe latency estimator

    4) Per-flow estimates by aggregating per-packet estimates

    Latency

    Estimator

    Reference

    Packet

    Injector

    Ingress I Egress E

    2

    1

    R

    L

    Delay

    Time

    R

    L

    1

    Linear interpolation

    line

    2

    Interpolateddelay

  • 8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt

    7/15

    Full vs. Partial deployment

    Full deployment: 16 RLI sender-receiver pairs

    Partial deployment: 4 RLI senders + 2 RLI receivers

    81.25 % deployment cost reduction

    Switch 1 Switch 5

    Switch 2 Switch 4

    Switch 3

    Switch 6

    RLI Sender (Reference Packet Injector) RLI Receiver (Latency Estimator)

  • 8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt

    8/15

    Case 1: Presence of cross traffic

    Issue: Inaccurate link utilization estimation at the sender

    leads to high reference packet injection rate

    Approach Not actively addressing the issue

    Evaluation shows no much impact on packet loss rate increase

    Details in the paper

    Switch 1 Switch 5

    Switch 2 Switch 4

    Switch 3

    Switch 6

    RLI Sender (Reference Packet Injector) RLI Receiver (Latency Estimator)

    Link utilization

    estimation on Switch 1Bottleneck

    LinkCross

    Traffic

  • 8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt

    9/15

    Case 2: RLI Sender side

    Issue: Traffic may take different routes at an intermediate

    switch Approach: Sender sends reference packets to all receivers

    Switch 1 Switch 5

    Switch 2 Switch 4

    Switch 3

    Switch 6

    RLI Sender (Reference Packet Injector) RLI Receiver (Latency Estimator)

  • 8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt

    10/15

    Case 3: RLI Receiver side

    Issue: Hard to associate reference packets and regularpackets that traversed the same path

    Approaches Packet marking: requires native support from routers

    Reverse ECMP computation: reverse engineer intermediateroutes using ECMP hash function

    IP prefix matching at limited situation

    Switch 1 Switch 5

    Switch 2 Switch 4

    Switch 3

    Switch 6

    RLI Sender (Reference Packet Injector) RLI Receiver (Latency Estimator)

  • 8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt

    11/15

    Deployment example in fat-tree topology

    RLI Sender (Reference Packet Injector) RLI Receiver (Latency Estimator)

    IP prefix matching Reverse ECMP computation /

    IP prefix matching

  • 8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt

    12/15

    Evaluation

    Simulation setup Trace: regular traffic (22.4M pkts) + cross traffic (70M pkts)

    Simulator

    Results

    Accuracy of per-flow latency estimates

    Traffic

    DividerSwitch1 Switch2

    Cross TrafficInjector

    Packet

    Trace

    RLI

    Receiver

    RLI

    Sender

    CrossTraffic

    RegularTraffic Referencepackets

    10% / 1%

    injection rate

  • 8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt

    13/15

    67%

    Accuracy of per-flow latency estimates

    10% injection

    1% injection 10% injection

    1% injection

    Bottleneck link utilization: 93%

    Relative error

    CDF

    1.2% 4.5% 18% 31%

  • 8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt

    14/15

    Summary

    Low latency applications in data centers Localization of latency anomaly is important

    RLI provides flow-level latency statistics, but full

    deployment (i.e., all routers/switches) cost is expensive

    Proposed a solution enabling partial deployment of RLI

    No too much loss in localization granularity (i.e., every other

    router)

  • 8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt

    15/15

    Thank you! Questions?