internet performance dynamics
DESCRIPTION
Internet Performance Dynamics. Paul Barford. Boston University Computer Science Department. http://cs-people.bu.edu/barford/. Fall, 2000. Motivation. What are the root causes of long response times in wide area services like the Web? Servers? Networks? Server/network interaction?. - PowerPoint PPT PresentationTRANSCRIPT
Internet Performance Dynamics
Boston University Computer Science Department
http://cs-people.bu.edu/barford/
Fall, 2000
Paul Barford
Motivation
What are the root causes of long response times in wide area services like the Web? Servers? Networks? Server/network interaction?
A Challenge
Histograms of file transfer latency for 500KB files transferred between Denver and Boston
Day 1 Day 2
HS mean = 8.3 sec. LS Mean = 13.0 sec. HS mean = 5.8 sec. LS Mean = 3.4 sec.
Precise separation of server effects from network effects is difficult
2 4 6 8 10 12 14 16 18 20
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Seconds
2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
HS mean = 8.3 sec. LS mean = 13.0 sec.
HSLS
HS
LS
HS mean = 5.8 sec. LS mean = 3.4 sec.
What is needed?
A laboratory enabling detailed examination of Web transactions (Web “microscope”) Wide Area Web Measurement (WAWM)
project testbed Technique for analyzing transactions to separate
and identify causes of delay Critical path analysis of TCP
Web Transactions “under a microscope”
WebServer
Distributed Clients
Global Internet
Generating Realistic Server Workloads
Approaches: Trace-based:
Pros: Exactly mimics known workload Cons: “black box” approach, can’t easily change
parameters of interest Analytic: synthetically create a workload
Pros: Explicit models can be inspected and parameters can be varied
Cons: Difficult to identify, collect, model and generate workload components
SURGE: Scalable URL Reference Generator
Analytic Web workload generator Based on 12 empirically derived distributions Explicit, parameterized models Captures “heavy-tailed” (highly variable) properties of
Web workloads SURGE components:
Statistical distribution generator Hyper Text Transfer Protocol (HTTP) request generator
Currently being used at over 130 academic and industrial sites world wide Adopted by W3C for HTTP-NG testbed
Seven workload characteristics captured in SURGE
Characteristic Component Model System Impact
File Size Base file - body Lognormal File System *Base file - tail Pareto *Embedded file Lognormal *Single file1 Lognormal *Single file 2 Lognormal *
Request Size Body Lognormal Network *Tail Pareto *
Document Popularity Zipf Caches, buffersTemporal Locality Lognormal Caches, buffersOFF Times Pareto *Embedded References Pareto ON Times *Session Lengths Inverse Gaussian Connection times
* Model developed during the SURGE project
BF EF1 EF2 Off time SF Off time BF EF1
HTTP request generator
Supports both HTTP/1.0 and HTTP/1.1 ON/OFF thread is a “user equivalent”
SURGE Client System
SURGE Client System
SURGE Client System
Network
ON/OFF Thread
ON/OFF Thread
ON/OFF Thread Web Server System
SURGE and SPECWeb96 exercise servers very differently
Surge
SPECWeb96
-5
0
5
10
15
20
25
30
35
40
0 200 400 600
Packets per Second
Per
cen
t C
PU
Uti
liza
tio
n
SPECWeb96
SURGE
SURGE’s flexibility allows easy experimentation
HTTP/1.0 HTTP/1.1
Web Transactions “under a microscope”
WebServer
Distributed Clients
Global Internet
WAWM Infrastructure
13 clients distributed around the global Internet Execute transactions of interest
One server cluster at BU Local load generators running SURGE enable
server to be placed under any load condition Active and passive measurements from both
server and clients Packet capture via “tcpdump”
GPS timers
WAWM client systems
Harvard University, MAPurdue University, INUniversity of Denver, COACIRI, Berkeley, CAHP, Palo Alto, CAUniversity of Saskatchewan, CanadaUniversity Federal de Minas Gerais, BrazilUniversity Simon Bolivar, Venezuela
EpicRealm - Dallas, TXEpicRealm – Atlanta, GAEpicRealm - London, EnglandEpicRealm - Tokyo, Japan
Internet2/Surveyor
Others??
What is needed?
A laboratory enabling detailed examination of Web transactions (Web “microscope”) Wide Area Web Measurement (WAWM)
project testbed Technique for analyzing transactions to separate
and identify causes of delay Critical path analysis of TCP
Identifying root causes of response time
Delays can occur at many points along the end-to-end path simultaneously
Pinpointing where delays occur and which delays matter is difficult
Our goal is to identify precisely the determiners of response time in TCP transactions
Client
Router 1
Router 2
Router 3 Server
Critical path analysis (CPA) for TCP transactions
CPA identifies the precise set of events that determine execution time of a distributed application Web transaction response time
Decreasing duration of any event on the CP decreases response time not true for events off the CP
Profiling the CP for TCP enables accurate assignment of delays to: Server delay Client delay Network delay (propagation, network variance and drops)
Applied to HTTP/1.0 Could apply to other applications (eg. FTP)
Window-based flow control in TCP
Client Server
1 or more data packets
ACK packet
Client Server
D
DD
DD
DD
DDD
A
AA
D
A
A
DD
D D
System Time line Graph
DD
A
A
D D
AAA
D DDDDDDD
DDD
TCP flows as a graph
Vertices are packet departures or arrivals Data, ACK, SYN, FIN
Directed edges reflect Lamport’s “happens before” relation On client or server or over the network
Weights are elapsed time Assumes global clock synchronization
Profile associates categories with edge types Assignment based on logical flow
Client Server
Original Data Flow
1461:2921
5841:7301
11681:13141
drop 17521
17521:20441
20441:24821
16061:17521
24821:27741
27741:29201
ack 2921
ack 7301
ack 10221
ack 16061
ack 16061
ack 16061
ack 24821
ack 27741
Number
Rounds
Bytes Liberated
1:29201
2921:73002
7301:131403
13141:175204
17520:248205
24821:277406
27741:306607
Client Server
Critical Path
1461:2921
5841:7301
11681:13141
drop 17521
16061:17521
24821:27741
27741:29201
ack 2921
ack 7301
ack 10221
ack 24821
ack 27741
Profile
Network Delay
Network Delay
Network Delay
Network Delay
Network Delay
Network Delay
Network Delay
Network Delay
Network Delay
Network Delay
Network Delay
Server Delay
Server Delay
Server Delay
Client Delay
Client Delay
Drop Delay
tcpeval
Inputs are “tcpdump” packet traces taken at end points of transactions
Generates a variety of statistics for file transactions File and packet transfer latencies Packet drop characteristics Packet and byte counts per unit time
Generates both timeline and sequence plots for transactions
Generates critical path profiles and statistics for transactions
Freely distributed
Implementation Issues
tcpeval must recreate TCP state at end points as packets arrive Capturing packets at end points makes timer simulation
unnecessary “Active round” must be maintained
Packet filter problems must be addressed Dropped packets Added packets Out of order packets
tcpeval works across platforms for RFC 2001 compliant TCP stacks
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Seconds
LN, LS LN, HS HN, LS HN, HS
Net Variance
Propagation
Server
Time Out
Client
Fast Retrans
CPA results for 1KB file
Latency is dominated by server load for BU to Denver path
6 packets are typically on the critical path
CP time line diagrams for 1KB file
Low Server Load High Server Load
CPA results for 20KB file
Both server load and network effects are significant
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
LN, LS LN, HS HN, LS HN, HS
Net Variance
Propagation
Server
Time Out
Client
Fast Retrans
14 packets are typically on the critical path
The Challenge
Histograms of file transfer latency for 500KB files transferred between Denver and Boston
Day 1 Day 2
HS mean = 8.3 sec. LS Mean = 13.0 sec. HS mean = 5.8 sec. LS Mean = 3.4 sec.
2 4 6 8 10 12 14 16 18 20
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Seconds
2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
HS mean = 8.3 sec. LS mean = 13.0 sec.
HSLS
HS
LS
HS mean = 5.8 sec. LS mean = 3.4 sec.
CPA results for 500KB file
Latency is dominated by network effects
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Sec
onds
LN, LS LN, HS HN, LS HN, HS
Net Variance
Propagation
Server
Time Out
Client
Fast Retrans
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Seco
nds
LN, LS LN, HS HN, LS HN, HS
Net Variance
Propagation
Server
Time Out
Client
Fast Retrans
Day 1 Day 2
56 packets are typically on the critical path
Active versus Passive Measurements
Understanding active (Zing) versus passive (tcpdump) network measurements
Figure shows active measures are a poor predictor of TCP performance Goal is to be able to predict TCP performance using active
measurements
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
% Packet loss from tcpdump
% P
acke
t lo
ss f
rom
Zin
g
Related work
Web performance characterization Client studies [Catledge95,Crovella96] Server studies [Mogul95, Arlitt96]
Wide area measurements NPD [Paxson97], Internet QoS [Huitema00], Keynote Systems Inc.
TCP analysis TCP modeling [Mathis97, Padhye98,Cardwell00] Graphical TCP analysis [Jacobson88, Brakmo96] Automated TCP analysis [Paxson97]
Critical path analysis Parallel program execution [Yang88, Miller90] RPC performance evaluation [Schroeder89]
Conclusions
Using SURGE, WAWM can put realistic Web transactions “under a microscope”
Complex interactions between clients, the network and servers in the wide area can lead to surprising performance
Complex packet transactions can be effectively understood using CPA
CP profiling of BU to Denver transactions allowed precise assignment of delays Latency for small files is dominated by server load Latency for large files is dominated by network effects
Relationship between active and passive measurement is not well understood
Future work – lots of things to do!
Acknowledgements
Mark Crovella Vern Paxson, Anja Feldmann, Jim Pitkow, Drue
Coles, Bob Carter, Erich Nahum, John Byers, Azer Bestavros, Lars Kellogg-Stedman, David Martin
Xerox, Inc., EpicRealm Inc., Internet2 Michael Mitzenmacher, Kihong Park, Carey
Williamson, Virgilio Almeida, Martin Arlitt