evaluating the running time of a communication round over the internet omar bakr idit keidar mit...

Evaluating the Running Time of a Communication Round over the Internet

Omar Bakr Idit Keidar

MIT MIT/Technion

PODC 2002

© Omar Bakr and Idit Keidar; PODC July 2002

Communication Round

• Exchange of information from all hosts to all hosts

• Part of many distributed algorithms, systems– consensus, atomic commit, replication, ...


Common Metric for Evaluating Algorithms

• Number of rounds (or steps) they require


Questions

• What is the best way to implement a communication round over the Internet– decentralized vs. centralized

• How long is a communication round over the Internet?


Prediction is Hard

• Internet is unpredictable, diverse, …• Different answers for different topologies,

different times• Different performance metrics

– local running time one host is engaged in algorithm

– overall running time from when first host starts to when last host finishes


“Communication Round” Primitive

• Initiated by some host• Propagates data from every host to every other host

connected to it


Example Implementations

All-to-allLeader

Secondary Leader


Experiment I

• 10 hosts: Taiwan, Korea, US academia, ISPs• TCP/IP (connections always up)• Algorithms:

– All-to-all– Leader (initiator)– Secondary leader (not initiator)

periodically initiated at each host

- 650 times over 3.5 days


Computing Overall Running Time

• Elapsed time from initiation (at initiator) until all hosts terminate

• Requires estimating clock differences– Clocks not synchronized, drift– We compute difference over short intervals– Compute 3 different ways – Achieve accuracy within 20 ms. on 90% of runs


Teaser: Comparing Performance Based on Number of Steps

All-to-all: 2

Leader: 3

Secondary Leader: 4


Predicting Overall Runnig Times From MIT

• Ping-measured latencies (IP):– Longest link latency 240 milliseconds– Longest link to MIT 150 milliseconds

150+240 = 390 150+150+150 = 450


Measured Running Times Runs Initiated at MIT

All-to-All Leader

Overall Local Overall LocalPrediction 390 300 450 300Average (runs under 2 sec)

811 295 541 335

% runs over

2 seconds55% 3% 13% 6%

Running times in milliseconds


What’s Going On?

• Loss rates on two links are very high– 42% and 37%– Taiwan to two ISPs in the US

• Loss rates on other links up to 8%• Upon loss, TCP’s timeout is big

– More than round-trip-time

• All-to-all sends messages on lossy links– Often delayed by loss


Distribution of Running Times Up to 1.3 sec. at MIT

MIT

0

100

200

300

400

milliseconds

num

ber

of r

uns

Leader

Sleader

All2All


Running Times Runs Initiated at Taiwan

% runs over

2 seconds

Average (runs under 2 sec)

Sec. Leader

overall local

Leader

overall local

All-to-all

overall local

7%13%43%64%24%54%

6076798441120645866

Running times in milliseconds


Distribution of Running Timesin Taiwan

Taiwan

050

100150200250300350400

milliseconds

num

ber

of r

uns

Leader

Sleader

All2All


What’s Going On?

All-to-all Leader Secondary Leader

Taiwan

MIT

Hosts with bad links to Taiwan

Other Hosts

Good link

Lossy link


Experiment II: Removing Taiwan

• Overall running times much better– For every initiator and algorithm, less than 10%

over 2 seconds (as opposed to 55% previously)

• All-to-all overall still worse than others!– either Leader or Secondary Leader best,

depending on initiator– loss rates of 2% - 8% are not negligible– all-to-all sends O(n2) messages; suffers

• But, all-to-all has best local running times


Probability of Delay due to Loss

• If all links would have same latency– assume 1% loss on all links; 10 hosts (n=10)– Leader sends 3(n-1) = 27 messages

• probability of at least one loss: 1 -.9927 24%

– All-2-all sends n(n-1) = 90 messages• probability of at least one loss: 1 -.9990 60%

• In reality, links don’t have same latency– only loss on long links matters


Conclusions• Message loss causes high variation in TCP link latencies

– latency distribution has high variance, heavy tail

• Latency distribution determines expected time for receiving O(n) concurrent messages

• Secondary leader helps – No triangle inequality, especially for loss

• Different for overall vs. local running times

• Number of rounds/steps not sufficient metric– One-to-all and all-to-all have different costs

evaluating the running time of a communication round over the internet omar bakr idit keidar mit...

Documents

idit keidar podc

loss omar bakr

milliseconds omar bakr

days omar bakr

host finishes omar bakr

number of steps omar

measured running times

mitrunning times