perfsonar-based network research - · pdf filecurrent perfsonar components • measurement...
TRANSCRIPT
perfSONAR-basedNetworkResearch
BrianTierney,ESnet,bl>[email protected],2016
WhatisperfSONAR?• perfSONARisatoolto:
– Set(hopefullyraise)networkperformanceexpecta>ons– Findnetworkproblems(“soOfailures”)– Helpfixtheseproblems
• Allinmul>-domainenvironments• Theseproblemsareallharderwhenmul>plenetworksareinvolved
– FocusonResearchandEduca>on(R&E)Networking,1Gbpslinksorhigher• perfSONARisprovidesastandardwaytopublishac>veandpassive
monitoringdata– Thisdataisinteres>ngtonetworkresearchersaswellasnetworkoperators
2
CurrentperfSONARcomponents• Measurementtools
– iperf3,bwctl,owamp,traceroute,paris-traceroute,etc.• Measurementarchive• Centraltestmeshmanagementtools• Hostmanagementtools
– Configuretests,configureNTP,etc.• Dataanalysistools
– Plotdatafromthearchive– Dashboardtools
• LookupService
February10,2016 3
Hardvs.SoOFailures• “Hardfailures”arethekindofproblemseveryorganiza>onunderstands
– Fibercut– Powerfailuretakesdownrouters– Hardwareceasestofunc>on
• Classicmonitoringsystemsaregoodataler>nghardfailures– i.e.,NOCseessomethingturnredontheirscreen– Engineerspagedbymonitoringsystems
• “SoOfailures”aredifferentandoOengoundetected– Basicconnec>vity(ping,traceroute,webpages,email)works– Performanceisjustpoor
• HowmuchshouldwecareaboutsoOfailures?
6/2/15 4
6/2/15 5
MainperfSONARrole:Find“SoOFailures”
Gb/s
normalperformance
degradingperformance
onemonth
repair
6/2/15 6
Graduallyfailingop>cs
Under-Poweredfirewalldevice
perfSONARHistory• perfSONARcantraceitsorigintotheInternet2“End2EndperformanceIni>a>ve”
fromtheyear2000.• Whathaschangedsince2000?
– TheGoodNews:• TCPismuchlessfragile;CubicisthedefaultCCalg,autotuningisandlargerTCPbuffersare
everywhere• ReliableparalleltransfersviatoolslikeGlobusOnline• High-performanceUDP-basedcommercialtoolslikeAspera• moregoodnewsinlatestLinuxkernel,butitwilltake3-4yearsbeforethisiswidelydeployed
– TheBadNews:• Thewizardgapiss>lllarge• Under-bufferedandswitchesandroutersares>llcommon• Under-powered/misconfiguredfirewallsarecommon• SoOfailuress>llgoundetectedformonths• Userperformanceexpecta>onsares>lltoolow
7
TheperfSONARcollabora>on• TheperfSONARcollabora>onisaOpenSourceprojectleadbyESnet,Internet2,
IndianaUniversity,andGEANT.– Eachorganiza>onhascommifed1.5FTEefforttotheproject– Plusaddi>onalhelpfrommanyothersinthecommunity(OSG,RNP,SLAC,andmore)
• TheperfSONARRoadmapisinfluenceby– requestsontheprojectissuetracker– annualusersurveyssenttoeveryoneontheuserlist– regularmee>ngswithVOusingperfSONARsuchastheWLCGandOSG– discussionsatvariousperfSONARrelatedworkshops
• Basedontheabove,every6-12monthstheperfSONARgovernancegroupmeetstopriori>zefeaturesbasedon:– impacttothecommunity– levelofeffortrequiredtoimplementandsupport– availabilityofsomeonewiththerightskillsetforthetask
8
publicperfSONARServers(Jan2016)• Totalofaround1700publiclyregisteredservers
– Equalnumberofnon-registeredservers?• ESnet:70
– mostly10G,includesa40GhostinBoston• GEANT:22• Internet2:3• Someothertopdeployments:
– Onenet(24),AMPATH(8),bc.net(10),RNP(8),Canarie(13),kreonet(14),NERO(12),AARnet(19),JGN(17),CENIC(5),KANREN(5)
February10,2016
©2016,hfp://www.perfsonar.net 9
perfSONARHardware• Thesedaysyoucangetagood1Uhostcapableofpushing10Gbps
TCPforaround$500(+10GNICcost,$750?).– SeeperfSONARuserlist
• Andyoucangetahostcapableof1Gforaround$150!– Getamul>-coreIntelCeleron-basedhost
• ARMisnotfastenough– e.g.:ZBOXbyZOTAC:hfps://www.zotac.com/us/product/mini_pcs/
zbox-ci323-nano
• VMsarenotrecommended– ToolsmoreaccurateifcanguaranteeNICisola>on
10
perfSONAR3.5Update• perfSONAR3.5releasedOctober,2015– ModernizetheGUIs– Supportforcentralhostmanagementandnodeauto-configura>on
– SupportforDebian,VMs,andotherinstalla>onop>ons
– Supportforlowcost($150),1Gbpsnodes
February10,2016 11
ExpandedperfSONARUseCases• PreviousUseCase:perfSONARToolkit
• IncludesCentOS6andallperfSONARcomponents
• NewUseCases– perfSONARtoolsonly
• SupportforbothRHEL-basedandDebian-basedhosts– perfSONARhoststhatarecentrallymanaged
• Centralmanagerpackage• Testpointpackage
February10,2016 12
perfSONARforNetworkResearchers• Ac>vemeasurementinteres>ngfornetworkresearchers
– Traceroutedataautoma>callycollectedalongwithbwctl/owampresults
– TCPretransmitsasmeasuredbyiperf3• Dataeasytodownloadforanalysis
– esmond-ps-get-bulk• OutputCSVorJSON• See:hfps://pypi.python.org/pypi/esmond_client
• Addi>onalInforma>onat:– hfp://docs.perfsonar.net/client_apis.html
February10,2016 13
perfSONARonLowCostHardware• Mo>va>on:makeperfSONARaffordableenoughtodeployonallsubnets
• Assump>ons:– 1Gbpstestnodes– Centralizedmeasurementarchive– Centralizedconfigura>onmanagement– DebianLinux
February10,2016 14
CurrentperfSONARdevelopment• Oneofthethemesforv3.6willbe“ControlandScalability”
– perfSONARissuccessfulbecauseofthe‘defaultopen’model.– BUT,asthenumberofperfSONARhostsworldwidegrows,weneedawaytocontrol
• Whoisrunningtests• HowoOenaretheyallowedtoruntests• WhathostscanIrunteststo?HowtoIgetmyhostaddedtosomeoneelse’slistof
allowedhosts?
• Workingonanewtestscheduler(psScheduler):– Sharedbyalltestsandawareoftheresourceseachuses– Containingfinergrainedcontrolsaboutwhocanruntestsandwhatteststhey
areallowedtorun.– Increasedvisibilityandcontrolastowhentestswillberun
February10,201615
Roadmapforv3.6• Atestscheduler(psScheduler):
– Sharedbyalltestsandawareoftheresourceseachuses– Containingfinergrainedcontrolsaboutwhocanruntestsandwhatteststheyareallowedtorun.– Increasedvisibilityandcontrolastowhentestswillberun
• Newgraphsthatallowforeasiercomparisonofmul>plemetrics– basedonESnetToolsteamreact-basedploxngtools
• Awebinterfaceforcrea>ngtestmeshes• Easierselec>onofendpointsbasedontopologyloca>on,geographicloca>on,accessibility
and/orcustomsearches• Dashboardsthatsupportaler>ngbasedonpafernsacrossanen>remesh• Debian8support• CentOS7versionsofthetools,testpoint,core,andcentralmanagementbundles
– FullCentOS7Toolkitwillbeinthenextrelease
• Pre-packagedperfSONARVMimages
February10,201616
ExampleperfSONARResearchProjects
perfSONARControlPlane(PSCP)ProjectProf.YanLuo([email protected])
perfSONARControlPlane(PSCP)• Objec>ves
– MeasurementArchiveDataAnalysis• Whatarethemeasurementresults?Whatcanwelearn?
– Automa>cperfSONARPeerSelec>on• Quicklyiden>fythebestsuitablePSnode(s)ontheroutesinques>on
– ProgrammableMeasurementandTroubleshoo>ng• Definemeasurementtaskandcondi>onswithsoOware
• TheDesignofperfSONARControlPlane– PathDiscovery– MeasurementTaskControl
hfps://github.com/ACANETS/pscpProf.YanLuo([email protected])
Opera>onandUseCaseofPSCP• Obtaintracerouteinforma>onfrom95perfSONARMeasurementArchives• Buildatraceroutegraphbasedonthe1831records
• FindasetofperfSONARnodepairstostartbandwidthtestsandmonitorthe
results• UseCase:Diagnos>canalysisandtrouble-shoo>ngasoOnetworklinkfailure
– <=300LOCPythoncode,<=15minutes
hfps://github.com/ACANETS/pscpProf.YanLuo([email protected])
PythiaNetworkDiagnosisInfrastructure(PuNDIT)
PIs: Shawn McKee ([email protected]) and
Constantine Dovrolis ([email protected])
AboutPuNDIT• PuNDITisaNSFSSIprojectwhichusesperfSONARdatatoiden>fyandlocalizenetworkproblems(2014-2016)– Goaltoautomatewatching/analyzingperfSONARmetrics• informusers/site-adminswhentherearerealnetworkproblemstheyshouldaddress.
• Seefurtherdetailsathfp://pundit.gatech.edu• UserGUImock-uphfp://punditui.aglt2.org/
2/10/16 22
PuNDITArchitecture
2/10/16 23
• perfSONARprovidesthebasemeasurementinfrastructure• Collectsnetworkmetricslikelatency,lossand
reordering• Collectstopologicalinforma>on• AddsscampersupporttoperfSONAR:Mul>path
Detec>onAlgorithm(MDA)fromtheparis-tracerouteteamtohandleloadbalancedpaths
• AlightweightPuNDITprocessoneachhostperformsdetec>on
• Thecentralserverholdseventrepositoryandrunsalocaliza>onalgorithm
PuNDITDetails
2/10/16 24
GatherandAnalyzeNetworkTopologies
CollectNetworkMetrics
DetectProblemSignatures
LocalizeProblema>cLinks
GatherandAnalyzeNetworkTopologies
CollectNetworkMetrics
DetectProblemSignatures
LocalizeProblema>cLinks
GatherandAnalyzeNetworkTopologies
CollectNetworkMetrics
DetectProblemSignatures
LocalizeProblema>cLinks
GatherandAnalyzeNetworkTopologies
CollectNetworkMetrics
DetectProblemSignatures
LocalizeProblema>cLinks
EmailListsandReferenceMaterials
Ac>veandGrowingperfSONARCommunity
• Ac>veemaillistsandforumsprovide:– Instantaccesstoadviceandexper>se
fromthecommunity.– Abilitytosharemetrics,experience
andfindingswithotherstohelpdebugissuesonaglobalscale.
• Joiningthecommunityautoma>callyincreasesthereachandpowerofperfSONAR– Themoreendpointsmeans
exponen>allymorewaystotestanddiscoverissues,comparemetrics
26
• TheperfSONARcollabora>onisworkingtobuildastrongusercommunitytosupporttheuseanddevelopmentofthesoOware.
• perfSONARMailingLists– AnnouncementLists:
• hfps://mail.internet2.edu/wws/subrequest/perfsonar-announce
– UsersList:• hfps://mail.internet2.edu/wws/subrequest/perfsonar-users
perfSONARCommunity
27
UsefulURLs• hfp://docs.perfsonar.net/• hfp://www.perfsonar.net/• hfp://fasterdata.es.net/– hfp://fasterdata.es.net/performance-tes>ng/network-troubleshoo>ng-tools/
• hfps://github.com/perfsonar– hfps://github.com/perfsonar/project/wiki
28
ExtraSlides
bwctlfeatures• BWCTLletsyourunanyofthefollowingbetweenany2perfSONARnodes:– iperf3,iperf,nufcp,ping,owping,traceroute,andtracepath
• SampleCommands:• bwctl -c psmsu02.aglt2.org -s elpa-pt1.es.net -T iperf3• bwping -s atla-pt1.es.net -c ga-pt1.es.net• bwping -E -c www.google.com• bwtraceroute -T tracepath -c lbl-pt1.es.net -l 8192 -s atla-
pt1.es.net• bwping -T owamp -s atla-pt1.es.net -c ga-pt1.es.net -N 1000 -i .01
6/2/15 30
A small amount of packet loss makes a huge difference in TCP performance
MetroArea
Local(LAN)
Regional
Con>nental
Interna>onal
Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss)
With loss, high performance beyond metro distances is essentially impossible
6/2/15 31
ImprovedSupportforCentralManagement
• Goals:– MakeiteasytoincorporateperfSONARhostsintoexis>nghostmanagementsystems(puppet,chef,SaltStack,cfengine,etc.)• Includesamplepuppetconfigfiles
– MakeiteasytomanagemanyperfSONARhostsatasingleins>tu>on
– Newrpmanddebianbundlestosupportthis
February10,2016 32
NewperfSONARInstalla>onop>ons• Inaddi>ontothetradi>onal“Toolkit”install,younowhavethetheseaddi>onalop>ons:
– perfSONAR-Tools:• iperf3,bwctl,owamp,nufcp,etc• InstallthisonDTNs,etctohelpwithtroubleshoo>ng• Doesnotsupportscheduledtes>ng• CentOSandDebiansupport
– perfSONAR-TestPoint:• toolsplusLookupServiceregistra>onand‘meshagent’• Foruseinenvironmentswithacentralmeasurementarchive• Foruseonlowend/olderhardware(e.g.:$100nodes)• Supportsscheduledtes>ng• CentOSandDebiansupport
• See:hfp://docs.perfsonar.net/install_op>ons.html
February10,2016 33
NewperfSONARInstalla>onop>ons(cont.)• perfSONAR-Core:
– Includeseverythingexceptthewebinterface– Usethisinenvironmentswhereyoursitesysadminswanttofullymanagethe
hostconfigura>on,butdon’twanttosetupacentralmeasurementarchive– CentOSonly
• perfSONAR-CentralManagement:– Includesmeasurementarchive,testmeshmanager,dashboard– Usethistomanageacollec>onofperfSONARhostsatyoursite/campus– CentOSonly
February10,2016 34
NewperfSONARInstalla>onop>ons(cont.)• perfSONAR-Complete
– AllperfSONARpackages– Usethisenvironmentswhereyoursysadminswanttomanagetheinstall,buts>lluse
thetoolkitwebinterface,systemsexng,etc• thetoolkitinstallwilloverridecertainchangeseveryupdate.
– CentOSonly
• Otherpackagestonote:– Separaterpms/debsforiptablesconfig,sysctlconfig,andntppackagessoyoucanadd
themontopofperfSONAR-Coreasdesired.
February10,2016 35