cisco ultra packet core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs...

31
Aeneas Dodd-Noble, Principal Engineer Daniel Walton, Director of Engineering October 18, 2018 High Performance AND Features Cisco Ultra Packet Core

Upload: others

Post on 18-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

Aeneas Dodd-Noble, Principal EngineerDaniel Walton, Director of EngineeringOctober 18, 2018

High Performance AND Features

Cisco Ultra Packet Core

Page 2: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

The World’s Top Networks Rely On Cisco Ultra

90+Deployments

200GGbps / System(at par with physical)

600MTotal Sessions

300MTotal Subscribers

Page 3: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

Market Evolution

2016 2020

4G5G

Cloud NativeUltra Services Platform

UGP USF UPP

• ASR 5500 Ultra• Performance parity• Functional parity

• Automated Lifecycle Management • Dynamic Distributed Slices• Micro-Services Architecture

Ultra

• Ultra Platform with CUPS • 5G Network Functions• Massive IOT• Multi-Access

Deliv

ering the 1

’s1

Gbps

at edge

1 m

s la

tency

1 b

illio

n c

onnections

Deliv

ering the 1

’s1

Gbps

at edge

1 m

s la

tency

1 b

illio

n c

onnections

Transition to Virtual Performance

Scale | Distributed Architecture | SlicingLow Latency | Gig-Speed

Automation | Containers | Micro-Services

Page 4: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

Capacity Growth - Fact

Peak hour throughputN/A Tier 1

8PB

16PB

2017

2016

Total daily volume N/A Tier 1

2.5TbpsPeak hour throughput YoY

APJC Tier 2

Peak hour throughput YoYAPJC Tier 1

X2.5X1.8

Page 5: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

5G Core is Distributed by Design

5G Logical Layout

SMF

SMFSMF

Actual Layout

Centralized Services and Connectivity

Highly Distributed and Fragmented Network and Services

Red Slice

Orange Slice

Page 6: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

Performance

Page 7: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

• Despite being best-known for hardware forwarding, Cisco has always built high performance packet forwarding software

• Exception path processing

• CPU-centric products

• VPP (Vector Packet Processor) began ~2002

• VPP has been incorporated into many hardware and software products and more recently has been open-sourced as part of FD.io

Background

Page 8: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

What is VPP/FD.io?

Page 9: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

FD.io is…

• Project at Linux Foundation

• Multi-party

• Multi-project

• Software Dataplane

• High throughput

• Low Latency

• Feature Rich

• Resource Efficient

• Bare Metal/VM/Container

• Multiplatform

Bare Metal/VM/ContainerBare Metal/VM/Container

• FD.io Scope:

• Dataplane Management Agents - Control Plane

• Packet Processing –Classify/Transform/Prioritize/Forward/Terminate

• Network IO - NIC/vNIC ↔ cores/threads

Dataplane Management Agent

Packet Processing

Network IO

Page 10: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

Multiparty: FD.io Members

Service Providers Network Vendors Chip Vendors

Integrators

Page 11: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

Multiparty: Contributor/Committer Diversity

Universitat Politècnica de Catalunya (UPC)

Yandex

Qiniu

Page 12: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

Read more at http://fd.ioRead more at http://fd.io

Page 13: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

How does VPP work?

Page 14: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

• A single 10GbE port is capable of 14Mpps

• On a 3.5GHz CPU core, we have 250 cycles/packet

• Each packet must be processed in 67ns

• Main (DDR) memory is 70ns away

• This is >100 clock cycles away

• On Intel Sandy Bridge CPUs caches are 4/12/30 clock cycles away

• If we are serious about performance - we must optimize the code for cache and memory operations

• Programming paradigm shift: Scalar to Vector

Memory is the enemy

Page 15: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

Primer: Instruction and Data caches

• Instruction cache (I-Cache)

• Stores only CPU instructions.

• Holds branch prediction information

• Helps pre-fetch the incoming instructions

• Data cache (D-Cache)

• Fast buffer that contains application data

• Processor operate on data loaded from memory into the data cache then from cache into the CPU registers

• Resultant stored into register, then to cache and finally to main memory

RegistersInstruction pipeline

Page 16: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

Scalar Packet Processing• Packet processing

• Ethernet-Input

• IPv4 Input

• IPv4 lookup

• IPv4 transmit

• ECMP processing

• LAG processing

• Transmit

• Process only single packet at a time.

• In scalar processing the whole code cannot fit into instruction cache

• Modules processing packet, are loaded into instruction cache.

• E.g.: 7 modules processing a single packet. So 4 packets will cause 7*4=28 cache misses

• High performance hit, workaround – bigger caches.

Page 17: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

• Process more than one packet at a time.

• Grabs all available packets from Rx device.

• Form a vector of packets (“frame”)

• Process “frame” (vector) using a directed graph of “nodes”

Vector Packet Processing

Page 18: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

VPP Architecture0 1 32 … n

Packet

Vector of n packets

ethernet-input

dpdk-input

vhost-user-input af-packet-input

ip4-inputip6-input arp-input

ip6-lookup

ip4-lookup

ip6-localip6-rewrite ip4-rewriteip4-local

mpls-input

custom-1

custom-2 custom-3

Packet Processing Graph

Graph Node

Input Graph Node

Plugin Plugins are: First class citizensThat can:

Add graph nodesAdd APIRearrange the graph

Hardware Plugin

hw-accel-input

Skip sftw nodeswhere work is done byhardware already

Can be built independently of VPP source tree

Page 19: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

How Vector packet processing works ?

• Exploits the probability that most packets will follow the same graph

• Fixes I-cache thrashing

• I-cache reloaded when all packets are finished a node

Page 20: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

How Vector packet processing works ?

• For eg, here 4 packets will cause I-cache thrashing only 7 times, compared to 28 in scalar packet processing.

• Primary problem VPP solving

• Reducing i-cache misses

• Reducing d-cache misses

Page 21: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

What happens when processing diverges?

• Same process, but for subset of packets.

• Each node still executes the set of packets that “belong” to that node.

• Scheduler takes care of the node execution.

Page 22: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

Sounds good. How fast?

Page 23: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

Phy-VS-PhyPhy-VS-PhyVPP Performance at Scale

64B

1518B0.0200.0

400.0

600.0[Gbps]]

480Gbps zero frame loss

64B

1518B0.0100.0

200.0

300.0[Mpps]

200Mpps zero frame loss

64B

0200

400

600[Gbps]]

IMIX => 342 Gbps,1518B => 462 Gbps

64B0

100

200

300[Mpps]

64B => 238 Mpps

IPv6, 24 of 72 cores IPv4+ 2k Whitelist, 36 of 72 cores Zero-packet-loss Throughput for 12 port 40GEZero-packet-loss Throughput for 12 port 40GE

Hardware:Cisco UCS C460 M4

Intel® C610 series chipset

4 x Intel® Xeon® Processor E7-8890v3(18 cores, 2.5GHz, 45MB Cache)2133 MHz, 512 GB Total

9 x 2p40GE Intel XL71018 x 40GE = 720GE !!

Latency18 x 7.7trillion packets soak test

Average latency: <23 usec

Min Latency: 7…10 usec

Max Latency: 3.5 ms

HeadroomAverage vector size ~24-27

Max vector size 255

Headroom for much more throughput/featuresNIC/PCI bus is the limit not vpp

Regular performance characterizations online:https://docs.fd.io/csit/rls1807/report/

Page 24: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

= Terabit Service PlatformFD.io Software

Intel® Xeon® Hardware

FD.io

Intel® Xeon®= Terabit SP

VPP Benefits from Intel® Xeon® Processor DevelopmentsIncreased Processor I/O Improves Packet Forwarding Rates

YESTERDAY

Intel® Xeon® E5-2699v422 Cores, 2.2 GHz, 55MB Cache

Network I/O: 160 GbpsCore ALU: 4-wide parallel µopsMemory: 4-channels 2400 MHzMax power: 145W (TDP)

1

2

3

4

Socket 0

BroadwellServer CPU

Socket 1

BroadwellServer CPU

2

DD

R4

QPI

QPI

4

2

DD

R4

DD

R4

PC

Ie PC

Ie

PC

Ie

x8 50GE

x16 100GE

x16 100GE

3

1

4

PC

Ie PC

Ie

x8 50GE

x16 100GE

Ethernet

1

3DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

SATA

B IOS

PCH

Intel® Xeon® Platinum 816824 Cores, 2.7 GHz, 33MB Cache

TODAY

Network I/O: 280 GbpsCore ALU: 5-wide parallel µopsMemory: 6-channels 2666 MHzMax power: 205W (TDP)

1

2

3

4

Socket 0

Skylake Server CPU

Socket 1

Skylake Server CPU

UPI

UPI

DDR4 DDR4

DDR4

PC

Ie

PC

Ie

PC

Ie PC

Ie

PC

Ie

PC

Ie

x8 50GE

x16 100GE

x8 50GE

x16 100GE

x16 100GE

SATA

B IOS

2

4

2

1

3

1

4

3

x8 50GE

DDR4

PC

Ie

x8 40GE

Lewisburg

PCH

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

0 200 400 600 800 1000 1200

160

280

320

560

640

Server[1 Socket]

Server[2 Sockets]

Server2x [2 Sockets]

+75%

+75%

PCle Packet Forwarding Rate [Gbps]

Intel® Xeon® v3, v4 Processors Intel® Xeon® Platinum 8180 Processors

1,120*Gbps

+75%

* On compute platforms with all PCIe lanes from the Processors routed to PCIe slots.

Breaking the Barrier of Software Defined Network Services1 Terabit Services on a Single Intel® Xeon® Server !

FD.io Takes Full Advantage of Faster Intel® Xeon® Scalable Processors

No Code Change Required

Page 25: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

Features…

• …define your customers’ experience,

• …define how you charge/monetize,

• …identify fraud, protect your RAN assets

• …provide visibility into your mobile network.

• Do 80–120% of features follow from 4G to 5G?

• How, and when, does Slicing change this?

Page 26: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

Is hardware still needed?

Page 27: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

• Cisco has unrivaled expertise in packet forwarding in silicon

• Cisco ASICs Merchant silicon (NPUs, ASICs)

• FPGAs SmartNICs

• GPUs

• Feature / Performance tradeoffs are limiting

• Experience with new software architecture is making software stronger than ever before

• Continue to investigate/prototype

Hardware

Page 28: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

Summary

Page 29: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

• Demand for cost / performance

• New form factor (physical, virtual, containers)

• (1.5-2.5x per year)

• CUPS User Plane opportunity for SW deployments

• Much higher data rates demand multithreaded solutions

• Needs small, distributed and public cloud UPF

• Unmatched for feature/performance

• Many IP services beyond standards

Summary

Packet core user planes are changing/adapting

Cisco Ultra Packet Core Feature Rich

4G data is growing fast

5G is coming

Page 30: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo

© 2018 Cisco and/or its affiliates. All rights reserved.

Thank You!

Page 31: Cisco Ultra Packet Core · &lvfr dqg ru lwv diiloldwhv $oo uljkwv uhvhuyhg 7kh :ruog·v 7rs 1hwzrunv 5ho\ 2q &lvfr 8owud 'hsor\phqwv * *esv 6\vwhp dw sdu zlwk sk\vlfdo