perfsonar: getting telemetry on your network
TRANSCRIPT
![Page 1: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/1.jpg)
Duncan Rand, Jisc and Imperial College London
perfSONAR: getting telemetry on your network
![Page 2: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/2.jpg)
WLCG/GridPP as an example community
19 Oct 2016
»The Worldwide Large Hadron Collider Computing Grid (WLCG) is a global collaboration of more than 170 computing centres in 42 countries
» Its mission is to provide global computing resources to store, distribute and analyse the ~30 petabytes of data generated per year by the LHC experiments
»GridPP is a collaboration providing data-intensive distributed computing resources for the UK HEP community and the UK contribution to the WLCG
»Hierarchically arranged with four tiers:› Tier-0 at CERN (and Wigner in Hungary)› 13 Tier-1s (mainly national physics laboratories)› 149 Tier-2s (generally university physics laboratories)› Tier-3s
![Page 3: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/3.jpg)
19 Oct 2016
» Initial modelling of LHC computing requirements suggested a hierarchical tier-based data management and transfer model
»Data exported from Tier-0 at CERN to each Tier-1 and then on to Tier-2s
»However better than expected network bandwidth means that the LHC experiments have been able to relax this hierarchy
»Now data is transferred in an all-to-all mesh configuration»Data often transferred across multiple domains
› e.g. a CMS transfer to Imperial College London might come predominately from Fermilab near Chicago along with other CMS sites
»So good network is crucial to the operation of the WLCG and that means good monitoring
![Page 4: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/4.jpg)
19 Oct 2016
![Page 5: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/5.jpg)
perfSONAR
19 Oct 2016
»Network monitoring tool developed by ESnet, GEANT, Indiana University and Internet2
»'perfSONAR is a widely-deployed test and measurement infrastructure that is used by science networks and facilities around the world to monitor and ensure network performance.’
»'perfSONAR’s purpose is to aid in network diagnosis by allowing users to characterize and isolate problems. It provides measurements of network performance metrics over time as well as “on-demand” tests’
»http://www.perfsonar.net/about/what-is-perfsonar/
![Page 6: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/6.jpg)
Worldwide perfSONAR host locations
19 Oct 2016
![Page 7: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/7.jpg)
19 Oct 2016
![Page 8: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/8.jpg)
19 Oct 2016
![Page 9: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/9.jpg)
19 Oct 2016
![Page 10: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/10.jpg)
19 Oct 2016
Latency
Loss
![Page 11: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/11.jpg)
19 Oct 2016
Reverse throughput
Throughput
![Page 12: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/12.jpg)
19 Oct 2016
Reverse throughput
Throughput
![Page 13: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/13.jpg)
19 Oct 2016
![Page 14: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/14.jpg)
Durham University GridPP site
19 Oct 2016
Replaced perfSONAR host motherboard
![Page 15: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/15.jpg)
Lancaster University GridPP site
19 Oct 2016
“a number of major tweaks to our network configuration”
![Page 16: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/16.jpg)
Oxford University GridPP site
19 Oct 2016
Reconfiguration of site core network
![Page 17: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/17.jpg)
MaDDash visualisation dashboard
19 Oct 2016
»With large meshes it is difficult to check all hosts»Centralised dashboards really help visualise overall
performance»MaDDash (Monitoring and Debugging Dashboard)
displays meshes of perfSONAR hosts»Many examples of MaDDash dashboards, e.g.
ICNRG, WLCG»WLCG dashboard has two aspects
› Open Monitoring Distribution (Nagios monitoring)› MaDDash
»http://psmad.grid.iu.edu/maddash-webui/
![Page 18: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/18.jpg)
perfSONAR configuration interface
19 Oct 2016
»A perfSONAR host can participate in multiple meshes
»Configuration interface and auto-URL enables dynamic configuration of entire network
McKee et al.CHEP2015
![Page 19: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/19.jpg)
19 Oct 2016
»Adding and removing hosts from the mesh configuration is very simple
»Makes use of a WLCG database of hosts»Version of GUI developed by OSG to be
included in perfSONAR toolkit
![Page 20: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/20.jpg)
19 Oct 2016
Initial WLCG meshes based around countries, e.g. UK/GridPP
![Page 21: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/21.jpg)
MaDDash
19 Oct 2016
![Page 22: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/22.jpg)
19 Oct 2016
![Page 23: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/23.jpg)
19 Oct 2016
![Page 24: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/24.jpg)
Dual-stack perfSONAR measurements
19 Oct 2016
»IPv6 rollout is slow but steady»Assumption (hope) that future campus
upgrades will include provision of IPv6»perfSONAR supports IPv4 and IPv6
measurements»Can leave perfSONAR hosts to default to using
IPv6 if it exists but then not always clear which is in use
»Otherwise can force with "ipv6_only": "1” parameter
![Page 25: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/25.jpg)
WLCG/HEPiX IPv6 Working Groups
19 Oct 2016
»The WLCG has an ongoing effort to promote the adoption of IPv6
»Aim to be able to allow sites to offer IPv6-only computing resources to the WLCG by April 2017
»HEPiX/WLCG IPv6 working groups looking into issues
»Developed mesh to track roll-out of IPv6 capable perfSONAR hosts within WLCG
»Currently twenty one WLCG perfSONAR dual-stack nodes are in the mesh
![Page 26: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/26.jpg)
Dual-stack bandwidth measurements
19 Oct 2016
![Page 27: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/27.jpg)
Dual-stack Traceroute
19 Oct 2016
![Page 28: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/28.jpg)
Oxford Oct 2015
19 Oct 2016
IPv4 ~ 5Gbps
IPv6 ~ 0.5Gbps
![Page 29: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/29.jpg)
Oxford Sept 2016
19 Oct 2016
IPv4 ~ 1.3Gbps
IPv6 ~1.3Gbps
![Page 30: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/30.jpg)
Small perfSONAR node projects
19 Oct 2016
»Data Transfer Zones need well-specified, dedicated hardware to run perfSONAR hosts
»Requires some investment of time and money»Would be nice to have an easier way to get
any idea of network performance»GÉANT have developed a small perfSONAR
node using Gigabyte Brix devices costing about £150-200 each
»Using these in a short but time-limited small perfSONAR node project
»IPv6 included from the start
GÉANT
![Page 31: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/31.jpg)
19 Oct 2016
![Page 32: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/32.jpg)
19 Oct 2016
![Page 33: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/33.jpg)
Small perfSONAR node projects
19 Oct 2016
»Jisc would like to take this project forward»Will probably use existing image»Send out small perfSONAR node to users who wish to
get a rapid and easy idea of their network performance
»For example a scientist in a UK institute with slow download of data set from e.g. Diamond or Jasmin
»Also plan to produce a UK mesh into which these small nodes could be added more or less temporarily
»Training course on how to set up such a mesh being run by GEANT in Zurich on 4th November 2016› https://eventr.geant.org/events/2496
![Page 34: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/34.jpg)
Improving diagnostics: Pundit
19 Oct 2016
»A large mesh such as those in use by WLCG contains a lot of useful data
»Should be possible to use network tomography to, for example, identify problematic routers by correlating traceroute and performance data
»PUNDIT project in US aimed at this»Additional executable installed on perfSONAR
host
»More details: http://pundit.gatech.edu and https://indico.cern.ch/event/505613/contributions/2227428/
![Page 35: perfSONAR: getting telemetry on your network](https://reader034.vdocuments.us/reader034/viewer/2022042706/5872eb811a28abfa548b71af/html5/thumbnails/35.jpg)
Summary
19 Oct 2016
»perfSONAR is a valuable resource for characterising and diagnosing network performance
»Bandwidth nodes typically record throughput and traceroute data; latency nodes record latency and loss
»Network administrators should consider installing several at pertinent places, e.g. at the border, next to storage etc
»Meshes together with MadDash dashboards allow relatively easy monitoring of groups of hosts
»Future perfSONAR meshes should include IPv6»Development work is ongoing to improve the automatic
notification and diagnosis of network faults