profiling network performance in multi-tier datacenter applications jori hardman carly ho paper by...

40
Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford, Lihua Yuan, Srikanth Kandula, Changhoon Kim

Post on 15-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Profiling Network Performancein Multi-tier Datacenter Applications

Jori HardmanCarly Ho

Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford, Lihua Yuan, Srikanth Kandula, Changhoon Kim

Page 2: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Applications inside Data Centers

Front end Server

Aggregator

Aggregator Aggregator … …Aggregator

Worker …Worker Worker …Worker Worker

Page 3: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Challenges of Datacenter Diagnosis

• Multi-tier applications o Tens of hundreds of application componentso Tens of thousands of servers

• Evolving applicationso Add new features, fix bugso Change components while app is still in operation

• Human factorso Developers may not understand network well o Nagle’s algorithm, delayed ACK, etc.

Page 4: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Where are the Performance Problems?• Network or application?

o App team: Why low throughput, high delay?o Net team: No equipment failure or congestion

• Network and application! -- their interactionso Network stack is not configured correctlyo Small application writes delayed by TCPo TCP incast: synchronized writes cause packet loss

A diagnosis tool to understand network-application interactions

Page 5: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Diagnosis in Today’s Data Center

Host

App

OS Packet sniffer

App logs:#Reqs/secResponse time1% req. >200ms delay

Switch logs:#bytes/pkts per minute

Packet trace:Filter out trace for long delay req.

SNAP:Diagnose net-app interactions

Application-specific

Too expensive

Too coarse-grainedGeneric, fine-grained, and lightweight

Page 6: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Full Knowledge of Data Centers

• Direct access to network stacko Directly measure rather than relying on inferenceo E.g., # of fast retransmission packets

• Application-server mappingo Know which application runs on which serverso E.g., which app to blame for sending a lot of traffic

• Network topology and routingo Know which application uses which resourceo E.g., which app is affected if a link is congested

Page 7: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

SNAP: Scalable Net-App Profiler

Page 8: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Outline

• SNAP architectureo Passively measure real-time network stack infoo Systematically identify performance problemso Correlate across connections to pinpoint problems

• SNAP deploymento Operators: Characterize performance problemso Developers: Identify problems for applications

• SNAP validation and overhead

Page 9: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

SNAP Architecture

Step 1: Network-stack measurements

Page 10: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

What Data to Collect?

• Goals:o Fine-grained: in milliseconds or secondso Low overhead: low CPU overhead and data volumeo Generic across applications

• Two types of data:o Poll TCP statistics Network performanceo Event-driven socket logging App expectationo Both exist in today’s linux and windows systems

Page 11: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

TCP statistics

• Instantaneous snapshotso #Bytes in the send buffero Congestion window size, receiver window sizeo Snapshots based on Poisson sampling

• Cumulative counterso #FastRetrans, #Timeouto RTT estimation: #SampleRTT, #SumRTTo RwinLimitTimeo Calculate difference between two polls

Page 12: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

SNAP Architecture

Step 2: Performance problem classification

Page 13: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Life of Data Transfer

• Application generates the data

• Copy data to send buffer

 • TCP sends data to the network

• Receiver receives the data and ACK

Sender App

Send Buffer

Receiver

Network

Page 14: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Classifying Socket Performance

• Bottlenecked by CPU, disk, etc.• Slow due to app design (small writes)

• Send buffer not large enough

• Fast retransmission • Timeout

• Not reading fast enough (CPU, disk, etc.)• Not ACKing fast enough (Delayed ACK)

Sender App

Send Buffer

Receiver

Network

Page 15: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Identifying Performance Problems

• Not any other problems

• Send buffer is almost full

• #Fast retransmission• #Timeout

• RwinLimitTime• Delayed ACK diff(SumRTT) > diff(SampleRTT)*MaxQueuingDelay

Sender App

Send Buffer

Receiver

Network

Direct measure

Sampling

Inference

Page 16: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

SNAP Architecture

Step 3: Correlation across connections

Page 17: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Pinpoint Problems via Correlation

• Correlation over shared switch/link/hosto Packet loss for all the connections going through one

switch/hosto Pinpoint the problematic switch

Page 18: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Pinpoint Problems via Correlation

• Correlation over applicationo Same application has problem on all machineso Report aggregated application behavior

Page 19: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

SNAP Architecture

Page 20: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

SNAP Deployment

• Production data centero 8K machines, 700 applicationso Ran SNAP for a week, collected petabytes of data

• Operators: Profiling the whole data centero Characterize the sources of performance problemso Key problems in the data center

• Developers: Profiling individual applicationso Pinpoint problems in app software, network stack, and their

interactions

Page 21: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Performance Problem Overview

• A small number of apps suffer from significant performance problems

Problems >5% of the time > 50% of the time

Sender app 567 apps 551

Send buffer 1 1

Network 30 6

Recv win limit 22 8

Delayed ACK 154 144

Page 22: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

SNAP diagnosis

• SNAP diagnosis steps:o Correlate connection performance to pinpoint

applications with problemso Expose socket and TCP stats o Find out root cause with operators and

developerso Propose potential solutions

Sender App

Send Buffer

Receiver

Network

Page 23: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Classifying Socket Performance• Bottlenecked by CPU, disk, etc.• Slow due to app design (small writes)

• Send buffer not large enough

• Fast retransmission • Timeout

• Not reading fast enough (CPU, disk, etc.)• Not ACKing fast enough (Delayed ACK)

Sender App

Send Buffer

Receiver

Network

Page 24: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Send Buffer and Recv Window

• Problems on a single connection

App process

…WriteBytes

TCP

Send Buffer

App process

…ReadBytes

TCP

Recv Buffer

Some apps use default

8KB

Fixed max size 64KB not enough

for some apps

Page 25: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Need Buffer Autotuning• Problems of sharing buffer at a single host

o More send buffer problems on machines with more connections

o How to set buffer size cooperatively?• Auto-tuning send buffer and recv window

o Dynamically allocate buffer across applicationso Based on congestion window of each appo Tune send buffer and recv window together

Page 26: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Classifying Socket Performance• Bottlenecked by CPU, disk, etc.• Slow due to app design (small writes)

• Send buffer not large enough

• Fast retransmission • Timeout

• Not reading fast enough (CPU, disk, etc.)• Not ACKing fast enough (Delayed ACK)

Sender App

Send Buffer

Receiver

Network

Page 27: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Packet Loss in a Day in the Datacenter

• Packet loss burst every hour• 2-4 am is the backup time

Page 28: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Spread Writes over Multiple Connections • SNAP diagnosis:

o More timeouts than fast retransmissiono Small packet sending rate

• Root cause:o Two connections to avoid head-of-line blockingo Low-rate small requests gets more timeouts

• Solution:o Use one connection; Assign ID to each requesto Combine data to reduce timeouts

ReqReq ReqRespons

e

Page 29: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Congestion Window Allows Sudden Bursts

• SNAP diagnosis:o Significant packet losso Congestion window is too large after an idle period

• Root cause:o Slow start restart is disabled

Page 30: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Slow Start Restart• Slow start restart

o Reduce congestion window size if the connection is idle to prevent sudden burst

t

Window Drops after an idle time

Page 31: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Slow Start Restart• However, developers disabled it because:

o Intentionally increase congestion window over a persistent connection to reduce delayo E.g., if congestion window is large, it just takes 1 RTT to send 64

KB data• Potential solution:

o New congestion control for delay sensitive traffic

Page 32: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Classifying Socket Performance• Bottlenecked by CPU, disk, etc.• Slow due to app design (small writes)

• Send buffer not large enough

• Fast retransmission • Timeout

• Not reading fast enough (CPU, disk, etc.)• Not ACKing fast enough (Delayed ACK)

Sender App

Send Buffer

Receiver

Network

Page 33: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Timeout and Delayed ACK• SNAP diagnosis

o Congestion window drops to one after a timeouto Followed by a delayed ACK

• Solution: o Congestion window drops to two

Page 34: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

200ms ACK Delay

W1: write() less than MSS

W2: write() less than MSS

Nagle and Delayed ACK

TCP/IPApp Network

TCP segment with W1

TCP segment with W2

ACK for W1

TCP/IP App

read() W1

read() W2

• SNAP diagnosiso Delayed ACK and small writes

Page 35: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

ReceiverSocket send buffer

Send Buffer and Delayed ACK

Application bufferApplication

1. Send complete

NetworkStack 2. ACK

With Send Buffer

Receiver

Application bufferApplication

2. Send complete

NetworkStack 1. ACK

Set Send Buffer to zero

• SNAP diagnosis: Delayed ACK and send buffer = 0

Page 36: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

SNAP Validation and Overhead

Page 37: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Correlation Accuracy• Inject two real problems• Mix labeled data with real production data• Correlation over shared machine• Successfully identified those labled machines

2.7% of machines have ACC > 0.4

Page 38: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

SNAP Overhead

• Data volumeo Socket logs: 20 Bytes per socketo TCP statistics: 120 Bytes per connection per poll

• CPU overheado Log socket calls: event-driven, < 5% o Read TCP tableo Poll TCP statistics

Page 39: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Reducing CPU Overhead• CPU overhead

o Polling TCP statistics and reading TCP tableo Increase with number of connections and polling freq.o E.g., 35% for polling 5K connections with 50 ms interval 5% for polling 1K connections with 500 ms interval

• Adaptive tuning of polling frequencyo Reduce polling frequency to stay within a target CPUo Devote more polling to more problematic connections

Page 40: Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Conclusion

• A simple, efficient way to profile data centerso Passively measure real-time network stack informationo Systematically identify components with problemso Correlate problems across connections

• Deploying SNAP in production data centero Characterize data center performance problems

Help operators improve platform and tune networko Discover app-net interactions

Help developers to pinpoint app problems