next-gen network telemetry is within your packets: in-band oam
TRANSCRIPT
PowerPoint Presentation
Next-gen Network Telemetry is Within Your Packets: In-band OAMFrank Brockners, Shwetha Bhandari
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
19/11/2016Cisco Live 2014
Continous & Always-OnOn DemandChecking Health and Compliance2
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
2Cisco Live 20149/11/2016
Continous & Always-OnOn DemandChecking Health and Compliance
3
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
3Cisco Live 20149/11/2016
In-Band OAM Solution EcosystemA Peek at the ImplementationIn-Band OAM The TechnologyWhy In-Band OAM? Use-Case ExamplesPrologue: Does your traffic comply?
4
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco Live 20169/11/20164
Prolog
Consider TE, Service Chaining, Policy Based Routing, etc...:How do you prove that traffic follows the suggested path?5
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
5Cisco Live 20149/11/2016
Willis says the first issue is connecting VNFs to the infrastructure. OpenStack does this in a sequential manner, with the sequence serially numbered in the VNF, but the difficulty comes when trying to verify that the LAN has been connected to the correct LAN port, the WAN has been connected to the correct WAN port and so on. "If we get this wrong for a firewall function it could be the end of a CIO's career," says Willis.
http://www.lightreading.com/nfv/nfv-specs-open-source/bt-threatens-to-ditch-openstack/d/d-id/718735October 14, 2015Light reading citing Peter Willis, Chief researcher for data networks, British Telecom
6
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicEnsuring Path and/or Service Chain Integrity Approach Meta-data added to all user trafficBased on Share of a secretProvisioned by controller over secure channelUpdated at every service hopVerifier checks whether collected meta-data allows retrieval of secretPath verified
ControllerSecret
X
BCA
Verifier7
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicInitial Idea: Combine multiple secretsCompose the OnionApproachA service is described by a set of secrets, where each secret is associated with a service function. Service functions encrypt portions of the meta-data as part of their packet processing.Only the verifying node has access to all secrets. The verifying nodes re-encrypts the meta-data to validate whether the packet correctly traversed the service chain. Notes To be used only when hardware assisted encryption is available. i.e. AES-NI instructions or equivalent. Otherwise this could be very costly operation to verify at line speed.
S1S2S3Service-Secrets are nested like layers of an onion8
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicAdvanced Encryption Standard(AES) NI (New Instructions)8
Solution Approach: Leveraging Shamirs Secret Sharing Polynomials 101- Line: Min 2 points- Parabola: Min 3 pointsGeneral: It takes k+1points to defines apolynomialofdegree k.
9- Cubic function: Min 4 points
Adi Shamir
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicSolution Approach: Leverage Shamirs Secret SharingA polynomial as secretEach service is given a point on the curve When the packet travels through each service it collects these pointsA verifier can reconstruct the curve using the collected pointsOperations done over a finite field (mod prime) to protect against differential analysis
(3,46)(2,28)(1,16)
X
BCASecret:
Verifier10
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicOperationalizing the SolutionLeverage two polynomials: POLY-1 secret, constant: Each hop gets a point on POLY-1Only the verifier knows POLY-1POLY-2 public, random and per packet.Each hop generates a point on POLY-2 each time a packet crosses it.Each service function calculates (Point on POLY-1 + Point on POLY-2) to get (Point on POLY-3) and passes it to verifier by adding it to each packet. The verifier constructs POLY-3 from the points given by all the services and cross checks whether POLY-3 = POLY-1 + POLY-2Computationally efficient: 2 additons, 1 multiplication, mod prime per hopPOLY-1Secret ConstantPOLY-2Public Per Packet+=POLY-3Secret Per Packet11
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicMeta Data for Service/Path VerificationVerification secret is the independent coefficient of POLY-1Computation/retrieval through a cumulative computation at every hop (cumulative)For POLY-2 the independent coefficient is carried within the packet (typically a combination of timestamp and random number)n bits can service a maximum of 2n packetsVerification secret and POLY-2 coefficient (random) are of the same sizeSecret size is bound by prime numberTransferRateRND/SecretSizeMax # of packets(assuming 64 byte packets)Time that random lasts at maximum1 Gbps6410 Gbps64100 Gbps6410 Gbps5610 Gbps4810 Gbps401 Gbps322200 seconds, 36 minutes10 Gbps32220 seconds, 3.5 minutes100 GBps3222 seconds
12
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicMeta-Data ProvisioningMeta-Data for Service Chain Verification (POT) provisioned through a controller (OpenDaylight App)Netconf/YANG based protocolProvisioned information from Controller to Service Function / VerifierService-Chain-Identifier (to be mapped to service chaining technology specific identifier by network element)Service count (number of services in the chain)2 x POT-key-set (even and odd set)Secret (in case of communication to the verifier)Share of a secret, service index2nd polynomial coefficientsPrime number
Service Chain Verification App
S3Verifier
S2
S1POT-key-sets:Primesecret sharepoly-2 POT-key-sets:Primesecret sharepoly-2 POT-key-sets:Primesecret sharepoly-2 POT-key-sets:SecretPrimesecret sharepoly-2 Verification request for a particular service chain13
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
16* Bytes of Meta-Data for POTRandom Unique random number (e.g. Timestamp or combination of Timestamp and Sequence number)Cumulative (algorithm dependent)Transport options for different protocolsSegment Routing: New TLV in SRH headerNetwork Service Header: Type-2 Meta-DataIn-band OAM for IPv6: Proof-of-transit extension headerVXLAN-GPEProof-of-transit embedded-telemetry header... more to be added (incl. IPv4)
Proof of Transit: Meta-Data Transport Options
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Random | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Random (contd) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Cumulative | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Cumulative (contd) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*Note: Smaller numbers are feasible, but require a more frequent renewal of the polynomials/secrets. 2nd PolynomialInterative computation of secret
14
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
15enter: In-Band OAM
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
15Cisco Live 20149/11/2016
In-band OAMOAM traffic embedded in the data traffic but not part of the payload of the packetOAM effected by data trafficExample: IPv4 route recordingOut-of-band OAMOAM traffic is sent as dedicated traffic, independent from the data traffic (probe traffic)OAM not effected by data trafficExamples: Ethernet CFM (802.1ag), Ping, TracerouteHow to send OAM information in packet networks?On-Board Unit
Speed control by police car
16
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicNote the difference to transport OAM (SDH/SONET): In SONET/SDH there is a constant flow of framesOAM is a bit-stream/data between the data encoding sublayer and the physical media and present in every frame, but not part of the data payloadThis mode of OAM transport is often referred to as out of band, because OAM has its own channel but still resides side by side with the payload data (if present)
16
Remember RFC 791?
17
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicMore recently: In-band Network Telemetry for P4
http://p4.org/wp-content/uploads/fixed/INT/INT-current-spec.pdf 18
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
In-Band/Passive OAM - MotivationMultipath Forwarding debug ECMP networksService/Path Verification prove that traffic follows a pre-defined pathService/Quality Assurance Prove traffic SLAs, as opposed to probe-traffic SLAs; Overlay/UnderlayDerive Traffic MatrixCustom/Service Level TelemetryMost large ISP's prioritize Speedtest traffic and I would even go as far to say they probably route it faster as well to keep ping times low.Source: https://www.reddit.com/r/AskTechnology/comments/2i1nxc/can_i_trust_my_speedtestnet_results_when_my_isp/
19
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Example use-cases...Path Tracing for ECMP networksService/Path VerificationDerive Traffic MatrixSLA proof: Delay, Jitter, LossCustom data: Geo-Location,..
Meta-data required...Node-ID, ingress i/f, egress i/fProof of Transit (random, cumulative)Node-IDSequence numbers, TimestampsCustom meta-data
What if you could collect operational meta-data within your traffic?
20
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicExample: Flow Tracing in EMCP NetworksProbe packet (ping) tests the wrong pathTrace all pathsand detect the one with issues
21
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicExample: Derive the Network Traffic Matrix
22
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public23Further Examples
Loss/Re-ordering Detection
Add sequence number Check sequence number Multicast
Delay measurements andtrend analysis
Link 1Link 2Link 4Delay Link1Delay Link2Delay Link3Delay Link4t = 11.2ms14.8ms3.8ms24.8mst = 21.2ms14.8ms3.8ms24.8mst = 31.2ms14.7ms3.8ms24.7mst = 41.2ms14.8ms3.8ms24.8mst = 51.2ms14.8ms3.8ms24.8mst = 61.2ms17.8ms3.8ms24.7mst = 71.2ms17.9ms3.8ms24.7mst = 81.2ms18.1ms3.8ms24.8ms
Link 3
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicIOT examples Industrial automation where networks are delay sensitive and can employ bicasting of traffic when delay / errors are detected in a give pathTimestamp 64 bit NTP timestamp seconds and pico seconds since epoch 1st jan 1970 0:0 UTC23
Example Generic customer meta-data: Geo-Location within your packets
24
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicAmending the OAM capabilities of IP
CapabilityExisting ToolsAdditionsContinuity CheckBFDLight-weight continuity check (check without sending extra traffic)Connectivity VerificationPing / ICMP echoEMCP supportAcknowledge different packet forwarding paths in routersPath Discovery andVerificationTracerouteEMCP supportAcknowledge different packet forwarding paths in routersProve correctness/integrity of forwarding pathDefect IndicationsIndicate if a forwarding policy (service chain) has not been metPerformance MonitoringIPPM (delay/packet loss)Metrics for live data traffic (delay, packet loss)
25
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicIn-Band OAM (iOAM)26
Gather telemetry and OAM information along the path within the data packet, as part of an existing/additional headerNo extra probe-traffic (as with ping, trace, ipsla)Transport optionsIPv6: Native v6 HbyH extension header or double-encapVXLAN-GPE: Embedded telemetry protocol headerSRv6: Policy-Element (proof-of-transit only)NSH: Type-2 Meta-Data (proof-of-transit only)... additional encapsulations being considered (incl. IPv4, MPLS)DeploymentDomain-ingress, domain-egress, and select devices within a domaininsert/remove/update the extension headerInformation export via IPFIX/Flexible-Netflow/publish into KafkaFast-path implementationHdriOAMPayload
iOAM domainNode-ID, Node-ID,..Ingress/Egress I/F, ..Timestamp, Timestamp,..Proof of TransitSequence numberGeneric customer data
2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
IPFIX13456
2ACBDPayloadHdrPayloadHdrPayloadHdrr=45/c=17
A
1C4PayloadHdrr=45/c=39BA61C4
Insert POTmeta-dataPayloadHdrr=45/c=0
A
1
POT meta-dataPath-tracing data
Update POTmeta-data
Update POT meta-data
Update POT meta-dataPOT Verifier27
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicPer node scopeHop-by-Hop information processingDevice_Hop_LNode_IDIngress Interface IDEgress Interface IDTime-StampApplication Meta DataSet of nodes scopeHop-by-Hop information processingService Chain Validation (Random, Cumulative)Edge to Edge scopeEdge-to-Edge information processingSequence Number
iOAM: Information carried28
2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicTracing Option 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Type | Opt Data Len |IOAM-trace-type| Elements-left | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+