troubleshoo>ng amlight - internet2 · troubleshoo>ng amlight: handling network events in a...

Post on 22-May-2020

12 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

JeronimoBezerraFloridaInterna1onalUniversity

<jab@amlight.net>

Internet2TechnologyExchangeMiami,Sep26th2016

Troubleshoo>ngAmLight:HandlingNetworkEventsinaProduc>onSDNEnvironment

MarcosSchwarzRedeNacionaldeEnsinoePesquisa

<marcos.schwarz@rnp.br>

Outline

•  Introduc>ontoAmLight•  RFC7426:SDNTerminology•  Testspre-produc>on•  SDNTopologies•  Whatshouldbemonitored?– ControlPlaneMonitoring– DataPlaneMonitoring

•  Future

2

AmLightisaDistributedAcademicExchangePoint•  Produc>onSDNInfrastructuresinceAug2014

•  PartnershipinvolvingFIU,NSF,ANSP,RNP,RedClaraandAURA•  Connectstwoacademicexchangepoints:AMPATH/MiamiandSouthernLight/Brazil•  CarriesAcademicandNon-Academic/Commercialtraffic

–  L2VPN,IPv4,IPv6,Mul>cast•  SupportsNetworkProgrammability/Slicing

–  OpenFlow1.0–  FlowSpaceFirewallforNetworkProgrammability/Slicing–  OESSforL2VPNs–  OGFNetworkServiceInterface(NSI)enabled–  ONOS/SDN-IPforAcademicIPv4–  Currently5slicesforexperimenta>on(includingGlobalONOSSDN-IP)

•  Currently,opera>ngwithmorethan800flows(produc>onandexperimenta>on)•  Website:www.sdn.amlight.net

3

AmLightSDNStack

4

NSI

AmLight’sNRENs

FIBRESDN-IPONOS

SouthernLightAmpath2

Virtualization/Slices (FlowSpace Firewall)

Ampath1Andes1

Phys

ical L

ayer

Sout

hbou

nd AP

I:Op

enFlo

w 1.0

North

boun

d:Us

ers’

APIs

NOX

IDCP

Other NRENs

NOX

OpenNSA

OESS

OSCARS

OESS

Andes2

Univ.Twente

ONOS Internet2

Other Testbeds

SDN:LayersandArchitectureTerminology•  Thispresenta>onwillusethe

SDNterminologystandardizedthroughIETFRFC7426:– Fourplanes:

•  Applica>on,Control,ForwardingPlane&ManagementPlanes

–  Interfaces:•  Service,ControlPlaneSouthboundandManagementPlaneSouthboundinterfaces

– ServicesandApplica>ons

5Forwarding Device

Operational Plane

Application PlaneApplication Service

Forwarding Plane

Management Abstraction Layer (CAL)

Service Interface

Network Services Abstraction Layer (NSAL)

Service App App Service

Management PlaneControl Plane

App

Control Abstraction Layer (CAL)

Device and Resource Abstraction Layer (DAL)

CP Southbound Interface

MP Southbound Interface

Testspre-produc>on•  BeforeapplyinganychangetotheSDN

environment,allplanes,appsandservicesneedtobevalidatedinacontrolledenvironment–  Samesogwareanddevicesusedinproduc>on

needtobeavailablefortests

•  Manytoolsandapproachesavailable,forexample,OFTest,RyuSwitchTest,Cbenchandsomecommercialpossibili>es–  SometestsmightcauseinstabilitytotheSDN

stack(don’ttrythesetestsinproduc>on)

•  Specialaien>onisrequiredfortheControlandDataplanes–  Manypublica>onswithdifferentmethodologiesand

tests6

Forwarding Device

Operational Plane

Application PlaneApplication Service

Forwarding Plane

Management Abstraction

Layer (CAL)

Service Interface

Network Services Abstraction Layer (NSAL)

Service App App Service

Management PlaneControl Plane

App

Control Abstraction Layer

(CAL)

Device and Resource Abstraction Layer (DAL)

CP Southbound

Interface

MP Southbound

Interface

OFTest

Ryu Switch Test,

Cbench, ...

OFTest

Ryu Switch Test, ...

Unittest

...

Troubleshoo>ngaproduc>onSDNnetwork•  Troubleshoo>ngaproduc>onenvironmenthasdifferentrequirements

–  Itneedstobeagileandleastdisrup>veaspossible–  Itmightneedhistoricalinforma>onandunderstandingoftrafficgoingthroughthenetwork–  Toolshavetobehandy

•  Legacytroubleshoo>ngtoolsarepar>allyusefulorcompletelyuseless–  OAM(Opera>on,Administra>onandMaintenance)isnotsupportedbyOpenFlow(yet)–  Ping,traceroute,SNMP,wireshark/tcpdumparesomehowcompromised

•  Deepknowledgeofthehardwareandsogwareplakormisrequired:–  Usageofthe”hidden”commandsbecomespartofyourrou>ne

•  Sugges>on:geta”premium”supportcontract–  Goingthroughthelevel2TACteamwillincreaseyourstressandthenetworkrecovery>me

7

SDNTopologies:Star>ngSimple

•  Usually,withjustoneSDNApp,troubleshoo>ngislesscomplex–  OneSDNAppisconnectedthroughanout-of-

bandnetworktomul>pleOFswitches–  Usually,theSDNApphasfullcontrolofports

andVLANs

•  AgoodnetworksnifferandaSyslogserverarethekeytosuccesshere –  HelpsvalidatetheOpenFlowmessagessent

andreceived–  Easesaccesstoerrormessages

8

ApplicationLayer

Forwarding Device

SDN App

OpenFlow 1.x

Forwarding DeviceForwarding Device

Forwarding DeviceUser AUser A User BUser B

SDNTopologies:AddingComplexity

•  Differentcontrolplanesinparalleltendstobeaconsequenceofslicing–  Moreapplica>onstounderstandandtrack–  Differentlevelsofsogwarestabilityanddebug–  Higherchancesofnetworkoutages

•  Slicing/Par>>oningaddscomplexity:–  OpenFlowcommunica>onbetweenOpenFlow

switchandSDNAppisnotend-to-end:•  OFSwitch->SlicerorSlicer->OFApp

–  ComplexitytotrackwhichswitchistalkingtowhichSDNAppandvice-versa•  OFdoesn’tcarryDPIDoneachOFmessage

•  ”Tradi>onal”sniffersarenotenoughtotrackindirectOpenFlowmessages

9

ApplicationLayer

Forwarding Device

OESS ONOS/SDN-IP

OpenFlow 1.0

Forwarding DeviceForwarding Device

Forwarding Device

FlowSpace Firewall

OpenFlow 1.0

User AUser A User BUser B

Testbed

ControlPlane:whatshouldbemonitored?

•  EverythingconcernedtotheOpenFlowcommunica>on:–  #offlowsinstalled

•  Avoidgepngclosetothelimitsdocumented(weirdstuffmighthappen)

–  RateofflowMods,PacketOut/PacketIn&Statsrequests/second:•  Switch’sCPUisdirectlyaffectedbytheserates

–  #ofOFP_FLOW_ERRORmessages:•  Somemessagesmightindicatethatacrashisabouttohappen(FULL_TABLE)

–  Flowsdura>on:•  Helpstounderstandtrafficdisrup>onduetoflowsbeingreinstalled

–  FlowandPortCounters(bpsandpps)•  Ifslicingisareality,collectcountersperslice

•  MostoftheSDNappsdon’tprovidesuchdata,someprovidethroughRESTinterfaces 10

DataPlane:whatshouldbemonitored?DataPlaneMonitoring:•  Insomecases,everythinglooksok,buttrafficisnotflowing

•  Somepossibledataplaneblackholes:–  Aspecificlinecardorinterfacediscardingalltraffic

•  Duetoaninterfacememoryissue,flowsareinstalledbuttrafficisdiscarded

–  InterfacedowninonesidebutupintheremoteandtheSDNAppdoesn’tunderstandthat•  Forinstance:10GLAN-PHY,Ethernetcircuitsand100Glonghaulcircuits•  Inthiscase,dependingoftheside,theSDNAppinstallsthecircuitspoin>ngtotheaffectedlink,discardingalltraffic

–  Aspecificinstalledflowentrycrashed•  Duetoaninterfacememoryissue,onespecificflowiscompromissedandtrafficisdiscarded•  DependingofthenumberofOpenFlowswitchesandflowentries,findingtheproblemmightbeextremely>me-consuming

•  Inthesecases,in-bandtestsarerequired:–  JustaveryfewSDNAppstestin-bandperlink–  NoSDNAppstestin-bandperflow

11

Disclaimer:

WhatyouareabouttoseeandhearistheAmLight’sexperience.Wearenotsayingthesearethebestorrecommendedmethods–probablyarenot.Don’ttrythemonyournetwork!

12

ControlPlaneMonitoring•  MonitoringtheOpenFlowmessageswith

passivepacketcapture:–  Non-intrusive–  Almostrisk-free

•  Fewtoolsavailable:–  Wireshark/tshark/tcpdump–  OpenFlowFlightRecorder–  AmLightOpenFlowSniffer

•  AmLightOpenFlowSnifferwascreatedtobeCLI-basedwithsupporttoenvironmentswithslicers:–  Dissects100%ofOpenFlow1.0–  Doesn’trequireGUIorXwindow–  End-to-endcommunica>onvisualiza>on–  Colorstohighlightimportantfields–  Manyfiltersavailabletoop>mizetshoot!–  Source:github.com/jab1982/ofp_sniffer

13

ApplicationLayer

Forwarding Device

OESS ONOS/SDN-IP

OpenFlow 1.0

Forwarding Device

Forwarding Device

Forwarding Device

FlowSpace Firewall

OpenFlow 1.0

User AUser A User BUser B

Testbed

Monitor msgs:OpenFlow Sniffer, OFFR

libpcap

ControlPlaneMonitoring[2]•  MonitoringAllApplica>onsandCountersina

centralizedNMS:–  ScriptscollectinfofromSDNApps’RESTinterfaces

andexportviaJSON–  ZabbiximportsJSONdataandsaveintoaMySQL

Database–  Currently,collec>ngdatafromOESS,ONOS,FSFWand

switches–  Examples:

14

ApplicationLayer

Forwarding Device

OESS ONOS/SDN-IP

OpenFlow 1.0

Forwarding Device

Forwarding Device

Forwarding Device

FlowSpace Firewall

OpenFlow 1.0

User AUser A User BUser B

Testbed

SNMP, REST, JavaAPI, etc

Monitoring:Zabbix + customized scripts

DataPlaneMonitoring•  MostoftheSDNAppsuseLLDPorBDDPfor

topologydiscovery–  Oncethetopologyisdiscovered,theseprotocols

arenotusedtomonitorthetopology–  Also,intervalbetweenLLDP/BDDPpacketsisnot

appropriatedforlinkmonitoring

•  Anin-bandtes>ngapproachisneededtovalidatetheDataPlane–  OESSdoesthroughitsForwardingVerifica>on

module–  MostofotherSDNAppsdon’thaveanything

equivalent

•  EventhoughOESS/FVDvalidatesthedatapath,itdoesn’tvaliteusers’flows–  Afullportissueisdetected,butasingleflowissue

isnot

15

ApplicationLayer

Forwarding Device

OESS ONOS/SDN-IP

OpenFlow 1.0

Forwarding Device

Forwarding Device

Forwarding Device

FlowSpace Firewall

OpenFlow 1.0

User AUser A User BUser B

Testbed

Monitoring Data plane: Trunk ports: OESS FWD

DataPlaneMonitoring[2]•  Monitoringindividualflowsisimportantbutextremelycomplex–  Beingproac>vewithallflowsisdesiredbuttheintervalbetweentestsandnumberofflowsneededtobetakenintoconsidera>on

–  Usingareac>veapproachisthebestsugges>on•  Userswon’tbehappy,butyourswitcheswon’tcrash

•  Approachestotestusers’flowsareyetconsideredexperimental–  ASDNTraceprotocolwasproposed:–  hip://sdntrace-protocol.readthedocs.io/en/latest/ 16

ApplicationLayer

Forwarding Device

OESS ONOS/SDN-IP

OpenFlow 1.0

Forwarding Device

Forwarding Device

Forwarding Device

FlowSpace Firewall

OpenFlow 1.0

User AUser A User BUser B

Testbed

Monitoring User Flows: SDNTrace

DataPlaneMonitoring[3]•  AmLight'sdevelopeditsownSDNTracetotest

users’flowswithoutchangingthem–  WorksthroughGUIorREST–  Verylightweight–  Very“cheap”,onlytwo-fourflowentriesneeded–  TracesL2andL3flows–  Yetunderevalua>onatAmLight–  Developedincollabora>onwiththeAcademic

NetworkofSaoPaulo/Brazil

•  Tracingacircuitisdoneinsecondsinstead

ofmanyminutesandcanworkwithbothZabbixandNagios

Github:github.com/amlight/SDNTrace

17

Future•  Newtools/scripts/protocolsares>llneeded

–  S>llalongandpainfuljourneyahead–  OpenFlow-OAM?

•  ImprovementstoOpenFlowagentsarebeingconstantlyreleased–  ButnewbugsarecomingwiththemL

•  SomeSDNmonitoring-onlyapplica>onsarebeingproposedanddeveloped–  AmLightisdevelopingitsownSDNLookingGlasstoconsolidateallpassiveandac>vemonitoringac>vi>esassociatedtotheSDNenvironment(tobereleasedbyJanuary)

–  Butsideapplica>onsarenotideal:itisimportantthatallSDNApplica>onsincorporatetroubleshoo>ngcapabili>esintheircore!

18

Off-topic:Sugges>onstoNetworkEngineers•  Whatis/willbeourposi>ondescrip>on?

–  NetworkEngineers?SDNEngineers?ResearchNetworkEngineers?–  MaybeNetworkEngineers2.0?–  Itdoesn’tmaierthedescrip>on,itmaiersthatwehavetoevolve!

•  WithSDN,troubleshoo>ngisverydifferent:insteadofusingCLIandsniffers,weneedtoreadcodeandapplica>on’slogs

•  Mostofushatesogwaredevelopment,butitis>metochangeourmentality–  AtAmLight,Idon’trememberlast>meIcreatedaVLANusingaCLI

•  IfSDNbecomesthenextde-factostandard,itwillhappeninafewyears–  Wealls>llhave>metolearnandgetpreparedforthisnewreality

•  Recommenda>ons:–  LearnPythonorJava(JavaScriptisaplus)

•  Ryuisaveryinteres>ngOpenFlowcontrollertostartwith–  JoinRyuorONOSmailinglists–  Mininetisyourfriend!

19

JeronimoBezerraFloridaInterna>onalUniversity

<jab@amlight.net>

Internet2TechnologyExchangeSep26th

Troubleshoo>ngAmLight:HandlingNetworkEventsinaProduc>onSDNEnvironment

Ques8ons???

top related