cisco ultra packet core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs...
TRANSCRIPT
Aeneas Dodd-Noble, Principal EngineerDaniel Walton, Director of EngineeringOctober 18, 2018
High Performance AND Features
Cisco Ultra Packet Core
© 2018 Cisco and/or its affiliates. All rights reserved.
The World’s Top Networks Rely On Cisco Ultra
90+Deployments
200GGbps / System(at par with physical)
600MTotal Sessions
300MTotal Subscribers
© 2018 Cisco and/or its affiliates. All rights reserved.
Market Evolution
2016 2020
4G5G
Cloud NativeUltra Services Platform
UGP USF UPP
• ASR 5500 Ultra• Performance parity• Functional parity
• Automated Lifecycle Management • Dynamic Distributed Slices• Micro-Services Architecture
Ultra
• Ultra Platform with CUPS • 5G Network Functions• Massive IOT• Multi-Access
Deliv
ering the 1
’s1
Gbps
at edge
1 m
s la
tency
1 b
illio
n c
onnections
Deliv
ering the 1
’s1
Gbps
at edge
1 m
s la
tency
1 b
illio
n c
onnections
Transition to Virtual Performance
Scale | Distributed Architecture | SlicingLow Latency | Gig-Speed
Automation | Containers | Micro-Services
© 2018 Cisco and/or its affiliates. All rights reserved.
Capacity Growth - Fact
Peak hour throughputN/A Tier 1
8PB
16PB
2017
2016
Total daily volume N/A Tier 1
2.5TbpsPeak hour throughput YoY
APJC Tier 2
Peak hour throughput YoYAPJC Tier 1
X2.5X1.8
© 2018 Cisco and/or its affiliates. All rights reserved.
5G Core is Distributed by Design
5G Logical Layout
SMF
SMFSMF
Actual Layout
Centralized Services and Connectivity
Highly Distributed and Fragmented Network and Services
Red Slice
Orange Slice
© 2018 Cisco and/or its affiliates. All rights reserved.
Performance
© 2018 Cisco and/or its affiliates. All rights reserved.
• Despite being best-known for hardware forwarding, Cisco has always built high performance packet forwarding software
• Exception path processing
• CPU-centric products
• VPP (Vector Packet Processor) began ~2002
• VPP has been incorporated into many hardware and software products and more recently has been open-sourced as part of FD.io
Background
© 2018 Cisco and/or its affiliates. All rights reserved.
What is VPP/FD.io?
© 2018 Cisco and/or its affiliates. All rights reserved.
FD.io is…
• Project at Linux Foundation
• Multi-party
• Multi-project
• Software Dataplane
• High throughput
• Low Latency
• Feature Rich
• Resource Efficient
• Bare Metal/VM/Container
• Multiplatform
Bare Metal/VM/ContainerBare Metal/VM/Container
• FD.io Scope:
• Dataplane Management Agents - Control Plane
• Packet Processing –Classify/Transform/Prioritize/Forward/Terminate
• Network IO - NIC/vNIC ↔ cores/threads
Dataplane Management Agent
Packet Processing
Network IO
© 2018 Cisco and/or its affiliates. All rights reserved.
Multiparty: FD.io Members
Service Providers Network Vendors Chip Vendors
Integrators
© 2018 Cisco and/or its affiliates. All rights reserved.
Multiparty: Contributor/Committer Diversity
Universitat Politècnica de Catalunya (UPC)
Yandex
Qiniu
Read more at http://fd.ioRead more at http://fd.io
© 2018 Cisco and/or its affiliates. All rights reserved.
How does VPP work?
© 2018 Cisco and/or its affiliates. All rights reserved.
• A single 10GbE port is capable of 14Mpps
• On a 3.5GHz CPU core, we have 250 cycles/packet
• Each packet must be processed in 67ns
• Main (DDR) memory is 70ns away
• This is >100 clock cycles away
• On Intel Sandy Bridge CPUs caches are 4/12/30 clock cycles away
• If we are serious about performance - we must optimize the code for cache and memory operations
• Programming paradigm shift: Scalar to Vector
Memory is the enemy
© 2018 Cisco and/or its affiliates. All rights reserved.
Primer: Instruction and Data caches
• Instruction cache (I-Cache)
• Stores only CPU instructions.
• Holds branch prediction information
• Helps pre-fetch the incoming instructions
• Data cache (D-Cache)
• Fast buffer that contains application data
• Processor operate on data loaded from memory into the data cache then from cache into the CPU registers
• Resultant stored into register, then to cache and finally to main memory
RegistersInstruction pipeline
© 2018 Cisco and/or its affiliates. All rights reserved.
Scalar Packet Processing• Packet processing
• Ethernet-Input
• IPv4 Input
• IPv4 lookup
• IPv4 transmit
• ECMP processing
• LAG processing
• Transmit
• Process only single packet at a time.
• In scalar processing the whole code cannot fit into instruction cache
• Modules processing packet, are loaded into instruction cache.
• E.g.: 7 modules processing a single packet. So 4 packets will cause 7*4=28 cache misses
• High performance hit, workaround – bigger caches.
© 2018 Cisco and/or its affiliates. All rights reserved.
• Process more than one packet at a time.
• Grabs all available packets from Rx device.
• Form a vector of packets (“frame”)
• Process “frame” (vector) using a directed graph of “nodes”
Vector Packet Processing
VPP Architecture0 1 32 … n
Packet
Vector of n packets
ethernet-input
dpdk-input
vhost-user-input af-packet-input
ip4-inputip6-input arp-input
ip6-lookup
ip4-lookup
ip6-localip6-rewrite ip4-rewriteip4-local
mpls-input
…
…
custom-1
custom-2 custom-3
Packet Processing Graph
Graph Node
Input Graph Node
Plugin Plugins are: First class citizensThat can:
Add graph nodesAdd APIRearrange the graph
Hardware Plugin
hw-accel-input
Skip sftw nodeswhere work is done byhardware already
Can be built independently of VPP source tree
© 2018 Cisco and/or its affiliates. All rights reserved.
How Vector packet processing works ?
• Exploits the probability that most packets will follow the same graph
• Fixes I-cache thrashing
• I-cache reloaded when all packets are finished a node
© 2018 Cisco and/or its affiliates. All rights reserved.
How Vector packet processing works ?
• For eg, here 4 packets will cause I-cache thrashing only 7 times, compared to 28 in scalar packet processing.
• Primary problem VPP solving
• Reducing i-cache misses
• Reducing d-cache misses
© 2018 Cisco and/or its affiliates. All rights reserved.
What happens when processing diverges?
• Same process, but for subset of packets.
• Each node still executes the set of packets that “belong” to that node.
• Scheduler takes care of the node execution.
© 2018 Cisco and/or its affiliates. All rights reserved.
Sounds good. How fast?
Phy-VS-PhyPhy-VS-PhyVPP Performance at Scale
64B
1518B0.0200.0
400.0
600.0[Gbps]]
480Gbps zero frame loss
64B
1518B0.0100.0
200.0
300.0[Mpps]
200Mpps zero frame loss
64B
0200
400
600[Gbps]]
IMIX => 342 Gbps,1518B => 462 Gbps
64B0
100
200
300[Mpps]
64B => 238 Mpps
IPv6, 24 of 72 cores IPv4+ 2k Whitelist, 36 of 72 cores Zero-packet-loss Throughput for 12 port 40GEZero-packet-loss Throughput for 12 port 40GE
Hardware:Cisco UCS C460 M4
Intel® C610 series chipset
4 x Intel® Xeon® Processor E7-8890v3(18 cores, 2.5GHz, 45MB Cache)2133 MHz, 512 GB Total
9 x 2p40GE Intel XL71018 x 40GE = 720GE !!
Latency18 x 7.7trillion packets soak test
Average latency: <23 usec
Min Latency: 7…10 usec
Max Latency: 3.5 ms
HeadroomAverage vector size ~24-27
Max vector size 255
Headroom for much more throughput/featuresNIC/PCI bus is the limit not vpp
Regular performance characterizations online:https://docs.fd.io/csit/rls1807/report/
© 2018 Cisco and/or its affiliates. All rights reserved.
= Terabit Service PlatformFD.io Software
Intel® Xeon® Hardware
FD.io
Intel® Xeon®= Terabit SP
VPP Benefits from Intel® Xeon® Processor DevelopmentsIncreased Processor I/O Improves Packet Forwarding Rates
YESTERDAY
Intel® Xeon® E5-2699v422 Cores, 2.2 GHz, 55MB Cache
Network I/O: 160 GbpsCore ALU: 4-wide parallel µopsMemory: 4-channels 2400 MHzMax power: 145W (TDP)
1
2
3
4
Socket 0
BroadwellServer CPU
Socket 1
BroadwellServer CPU
2
DD
R4
QPI
QPI
4
2
DD
R4
DD
R4
PC
Ie PC
Ie
PC
Ie
x8 50GE
x16 100GE
x16 100GE
3
1
4
PC
Ie PC
Ie
x8 50GE
x16 100GE
Ethernet
1
3DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
SATA
B IOS
PCH
Intel® Xeon® Platinum 816824 Cores, 2.7 GHz, 33MB Cache
TODAY
Network I/O: 280 GbpsCore ALU: 5-wide parallel µopsMemory: 6-channels 2666 MHzMax power: 205W (TDP)
1
2
3
4
Socket 0
Skylake Server CPU
Socket 1
Skylake Server CPU
UPI
UPI
DDR4 DDR4
DDR4
PC
Ie
PC
Ie
PC
Ie PC
Ie
PC
Ie
PC
Ie
x8 50GE
x16 100GE
x8 50GE
x16 100GE
x16 100GE
SATA
B IOS
2
4
2
1
3
1
4
3
x8 50GE
DDR4
PC
Ie
x8 40GE
Lewisburg
PCH
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
0 200 400 600 800 1000 1200
160
280
320
560
640
Server[1 Socket]
Server[2 Sockets]
Server2x [2 Sockets]
+75%
+75%
PCle Packet Forwarding Rate [Gbps]
Intel® Xeon® v3, v4 Processors Intel® Xeon® Platinum 8180 Processors
1,120*Gbps
+75%
* On compute platforms with all PCIe lanes from the Processors routed to PCIe slots.
Breaking the Barrier of Software Defined Network Services1 Terabit Services on a Single Intel® Xeon® Server !
FD.io Takes Full Advantage of Faster Intel® Xeon® Scalable Processors
No Code Change Required
© 2018 Cisco and/or its affiliates. All rights reserved.
Features…
• …define your customers’ experience,
• …define how you charge/monetize,
• …identify fraud, protect your RAN assets
• …provide visibility into your mobile network.
• Do 80–120% of features follow from 4G to 5G?
• How, and when, does Slicing change this?
© 2018 Cisco and/or its affiliates. All rights reserved.
Is hardware still needed?
© 2018 Cisco and/or its affiliates. All rights reserved.
• Cisco has unrivaled expertise in packet forwarding in silicon
• Cisco ASICs Merchant silicon (NPUs, ASICs)
• FPGAs SmartNICs
• GPUs
• Feature / Performance tradeoffs are limiting
• Experience with new software architecture is making software stronger than ever before
• Continue to investigate/prototype
Hardware
© 2018 Cisco and/or its affiliates. All rights reserved.
Summary
© 2018 Cisco and/or its affiliates. All rights reserved.
• Demand for cost / performance
• New form factor (physical, virtual, containers)
• (1.5-2.5x per year)
• CUPS User Plane opportunity for SW deployments
• Much higher data rates demand multithreaded solutions
• Needs small, distributed and public cloud UPF
• Unmatched for feature/performance
• Many IP services beyond standards
Summary
Packet core user planes are changing/adapting
Cisco Ultra Packet Core Feature Rich
4G data is growing fast
5G is coming
© 2018 Cisco and/or its affiliates. All rights reserved.
Thank You!