uh swarm: dense perfsonar deployment with small ... · dense perfsonar deployment with small,...

21
UH SWARM: Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS I2 GS 2015 April 28, 2015

Upload: others

Post on 29-Jan-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

UH SWARM:Dense perfSONAR DeploymentWith Small, Inexpensive Devices

Alan WhineryU. Hawaii ITSI2 GS 2015

April 28, 2015

Page 2: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

The Swarm

● Wrote paragraph into our CC-NIE campus networking proposal about making use of the recent availability of ~$50 computers to “sense” the network, using elements of perfSONAR.

● Funded a project to deploy 100 nodes on one campus over 2 years, exploiting a ~$50 price point to deploy many nodes on campus as a dense mesh.

Page 3: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

Goals/Challenges● Finding nodes to buy in the face of market exhaustion

● Getting node deployment work-flow down to nil

– Getting recoveries of off-line nodes to a minimum● Tracking assets and reliability, generating metrics

● Evaluating capabilities of the whole set-up

● Developing a test program for many nodes

● Slicing/Dicing data to see what it has to tell us

● Developing visualizations and distillations to put tools in hands of network maintainers, merging into pS Toolkit

Page 4: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

Devices We Have/Are Getting

● Raspberry Pi – famous, $50, med-perf, file system on SD card, 100 Mb Ethernet, USB 2.0

● BeagleBone Black – $50, more perf, FS on internal flash, and/or SD card, 100 Mb, USB 2.0

Honorable mention:● CuBox i4 – $147, more perf, FS on SD, GigE,

WiFi, USB 2.0● MiraBox $149 – most perf, FS on SD, dual

GigE, WiFi, USB 3.0

Page 5: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

Reliability

● Raspberry Pi (July 2014)– UH ITS owns 47 – 1 has failed

– 22 SD card hard failures

– 10 file-system failures

● BeagleBone Black Rev A. (December 2013)– UH ITS owns 10 (+50 NIB), 1 has corrupted

firmware

– 9 in production, one had to be power-cycled, once

● CuBox – one deployed 6 months of service zero problems. (using SD from OEM).

Page 6: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

SD Cards● DANE ELEC 8 GB Class 4

10 cards, 2 failures in light duty

● SanDisk Ultra 8 GB Class 10– 10 cards, 0 failures, 3 FS corrupted in 42k hours

● Kingston 8 GB Class 10– 10 cards, 0 failures, 7 FS corrupted, in 42k hours

● Kingston 4 GB Class 4– 20 hard failures in less than 20k hours

– (100% across 6 weeks, < 1000 Hr MTBF)

● SanDisk Ultra – 8GB Class 10– Most recent batch of replacements

Page 7: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

Year 1● Tried 10 BeagleBones, liked them

– And a few Raspberries Pi● The market vacuum around the release of BBB Rev. C

made BBB impossible to obtain

● Bought 43 Raspberries

● Although we are going with

BeagleBone Black for the

completion, we could make

Raspberries work if necessary.

● Bought 2 Dell rack servers as

test facilitators, data archives.

Page 8: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

2nd Year Completion● 50 BeagleBone Black Rev. C

(4 GB internal flash)– BBB Internal flash is more reliable

than SD

– Internal + SD card enables separating system/data partitions

– Better 100 Mb Ethernet performance

● 5 Raspberry Pi 2 Model B● As number deployed approaches

100, we will be placing nodes in new/special roles.

Page 9: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

Management● Puppet/The Foreman

– https://puppetlabs.com/

– http://theforeman.org/

– Easy to push changes, updates out to the swarm.

– Easy to push errors out to the swarm and require 50 SSH sessions.

● Work-flow – Try to minimize per node actions and attended

setup

– RPi – ua-netinstall with tweaks for Puppetization

– BBB – custom SD that auto-images the internal flash

Page 10: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

Characteristics Of DenseSensor Deployment

● Having many observations makes the loss of a single one less important.

● You can correlate topo and test results to “triangulate” on the source of a problem.

Page 11: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

Test Programs: powstream (owamp)● powstream from pS Toolkit node to/from each

sensor node– Really, really, really boring at first glance. All loss

appears to be about zero. Always one or two losing a packet per day (1 in 864000)

– Standard deviation in latency groups somewhat interesting, may reflect queuing, flares in latency std dev may precede loss events

– Longitudinal analysis reveals damaging loss rates that would otherwise be invisible

– Higher packet rates might expose low loss probabilities in shorter time

Page 12: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

30 nodes, in/out

Page 13: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

Mathis, Semke, Mahdavi, "The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm”,ACM SIGCOMM, Vol 27, Number 3, July 1997

Slid

e:P

hil D

ykst

ra S

C20

06U

sed

with

per

mis

sion

Page 14: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

Slide: Used with permission

Page 15: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

Speed Limits You Can't SeeFor 45 milliseconds RTT, typical minimum to get onto

continental US from HawaiiLoss Rate 10 pps Powstream

PacketsLostPer day

TCP AIMD Coastal Limit@1460 MSS(Mbits/sec)

45 mS RTT

TCP AIMD Coastal Limit@8960 MSS(Mbits/sec)

45 mS RTT

1.82E-005 15.75 42.56 261.18

2.25E-006 1.94 121.11 743.23

1.87E-006 1.62 132.76 814.72

9.38E-007 0.81 187.58 1151.16

6.05E-007 0.52 233.55 1433.28

5.93E-007 0.51 236.03 1448.52

3.35E-007 0.29 314.03 1927.21

2.51E-007 0.22 362.49 2224.57

1.74E-007 0.15 435.64 2673.49

Page 16: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

Test Progams:50 Node Full Mesh TCP Throughput● <= 100 Mbps RPi, BBB throughput tests resemble

real-life user flows

– Unlike a high performance iperf tester which “punches the network in the face”

● I run a 50x50 full mesh iperf matrix (2450 tests) in about 7 hours, (5 second tests).

● Full-mesh traceroute is collected concurrently

● By scoring every hop encountered on the average peformance for paths it appears in, “per-hop confidence” can be derived.

● Using multi-rate UDP vs. TCP is worth investigating.

Page 17: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

The Matrix

● Cut-out view of iperf3 tests to/from a chosen node...

● This row/column represents all tests to/from that chosen node.

● Leaves one wondering what the correlation is between the pink squares showing retransmissions

Sou

rces

Destinations

Page 18: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

Correlating Full Mesh Throughput And Traceroute Results For

Fault Isolation

Page 19: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

Graph of per-hop “confidence” with colored links where retransmissions were observed(names/addresses obfuscated)

This graph shows hops involved in in-bound Throughput testing between a chosennode and all partners.

Each oval represents anIP interface as reported in Traceroute output.

Graph rendered fromtest data with GraphViz.(GraphViz.org)

Page 20: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

Ongoing● perfSONAR toolkit integration

– Not so much new development as making some pieces fit together

● Correlation of other sources to zero in on a fault– NetDot

– Flows/MRTG

● Ancillary programs – Log collection (honeypot-ish info)

– Name resolution tests● v6/v4 precedence

Page 21: UH SWARM: Dense perfSONAR Deployment With Small ... · Dense perfSONAR Deployment With Small, Inexpensive Devices Alan Whinery U. Hawaii ITS ... SanDisk Ultra – 8GB Class 10 –

PerfClub

● http://perfclub.org● Monthly conference call for perfSONAR

deployers

3rd Monday 22:00 GMT● Send email to

[email protected] or [email protected]

to join the mailing list.