dynamiqprocessor designs using cortex-a75 & … : ericsson review, 3gpp tr 38.801. 6 © 2017 arm...
TRANSCRIPT
© 2017 Arm Limited
David Koenen
Sr. Product Manager, Arm
Arm Tech Symposia 2017, Taipei
DynamIQ Processor Designs Using Cortex-A75 & Cortex-
A55 for 5G Networks
2 © 2017 Arm Limited
Agenda
5G networks
Ecosystem software to support Arm based deployments of network hardware
Arm Cortex-A75 and Cortex-A55
Conclusions
© 2017 Arm Limited
5G Networks
4 © 2017 Arm Limited
5G Dynamics Across the Network: Edge to Cloud
Devices
ACC
ACS ion
Storage
ion
Storage
Packet Flows Packet Flows
Acceleration
Storage
Compute
Packet Flows
S
ACS
C
AC
A
S
SA
CIoT, Automotive,
Industrial, Medical
Latency30-50x
Throughput100x
Connections100x
Mobility1.5x
5G vs. LTE
Core / DCAccess/Edge EdgeCompute
5 © 2017 Arm Limited
4G logical RAN architecture
RU
RU
RU
RU
RU
BBU
EPC
BBU
BBU
BBU
RU
RU
RU
RU
BBU
RU
RU
BBU
RU
eNB
eNB
eNB eNB
Radio
RadioL1 / Phy
MAC RLC PDCPRadio
Radio
Management and control
CPRI
Baseband Unit (BBU)Remote Unit (RU)
eNodeB
MME
PGWSGW
Evolved Packet Core (EPC)
S11
S5
HSS
S6a
PCRF
Gx
RU : Radio Unit
BBU : Baseband Unit
EPC : Evolved Packet Core
Source : Ericsson review, 3GPP TR 38.801
6 © 2017 Arm Limited
RURU
RU
RU
RU
RU
5G logical RAN architecture
RU
DU
CUEPC/NGC
DU
DU
DU
DU
DU
RU
RU
RU
RURU
RU : Radio Unit
DU : Distributed Unit
EPC : Evolved Packet Core
CU: Central Unit
NGC : Next Generation Core
gNBPDCP
Central Unit
(CU)
Switch
Management
L1 / Phy
MAC RLC
Management
Distributed Unit (DU)
Source : Ericsson review, 3GPP TR 38.801
7 © 2017 Arm Limited
5G Antenna Mobile Edge Compute Core Network/Data Centre
Server
NetworkOffload
Server
NetworkAnd RAN
Offload
CPU
Air Interface
Processing
Typically standardized platform designs
Typically specialized platform designs
Base Station
Network/Wireless Infrastructure:Three Distinct Segments – Two Distinct Approaches
Edge Network Architecture
5G Air Interface Management• Antenna, RRH, DFE,
Distributed Base Band, Multi-Access Compute Acceleration.
• IoT Gateway and End applications.
• Optimized power, performance, scalability, latency
Server-like Architecture – more efficient on Arm
5G Network Hierarchy Design • Multi-Access Edge Compute,
Distributed Evolved Packet Core, Distributed Applications Processing, Distributed Content Delivery.
• Application of virtualized/containerized workloads.
Mobile Edge Compute
Architecture
• Performance critical functions to the edge
• Application of virtualized / containerized workloads
8 © 2017 Arm Limited
Arm SoC
Firmware & Network Boot ACPI
Virtualization and HAL
Operating System and VirtualInfrastructure Managers
Key Applications, VNFs, and Middleware
OEMs and ODMs
Networking-centric Ecosystem Enabling Deployments
9 © 2017 Arm Limited
Linaro Segment Groups - Infrastructure
ODP cross-platform
support for SoCaccelerators
Reduces fragmentation,
cost, accelerates time to market
for servers based on Arm
OSS for networking
• Real Time Support
• Virtualization, Core isolation
• OpenDataPlane (ODP)
• OPNFV integration
OSS for Arm servers
• ATF/UEFI/ACPI
• KVM/Xen
• Armv8 optimization
• OpenJDK, Hadoop, OpenStack, Docker
Networking - LNG Enterprise - LEG
10 © 2017 Arm Limited
• Initiated and hosted by Arm• Public vehicle to engage
infrastructure developer community (H/W & S/W)
AIDC
WorksOnArm, and AIDC
• Initiated and hosted by Packet.net
• Public vehicle to engage open source developers and share software readiness
WorksOnArm
11 © 2017 Arm Limited
Demonstrating Scalability: the PicoPod
The World’s Tiniest Pharos Pod: The NFV PicoPod
Fully Functional OPNFV Pharos Pod @ ~1 cu. ft.
1 Jump Server, 3 Controllers & 2 Compute
12 port Switch, Power Supply, Fans
Marvell MACCHIATObin boards w/ Marvell Armada 8040 SoC (Quad Arm Cortex-A72)
Dual 10G NICs, 16Gb RAM, PCIe, SATA & more
Runs Danube release NFVi – sample VNFs
© 2017 Arm Limited
Cortex-A75 and Cortex-A55
© 2017 Arm Limited 13
Networking requires diverse platforms
GPP
Mem
GPP
GPP
DSP NPUGPP
GPP NPU
DSP PLD
PLDAccel
Networking
GPP
Storage
Acceleration
NPU
…Wireless-access-point / vCPE …Switch / router blade
…Base station / modem …Scale out / server
GPP Accel
DSP
Networking
GPP
Storage
Acceleration
…Edge content caching
& optimization
…vCPE
Networking
GPP
Acceleration
…NFV applianceStorage
GPP GPP GPP GPP
GPP GPP GPP GPP
GPP GPP GPP GPP
GPP GPP GPP GPP
14 © 2017 Arm Limited
Scalable solutions from edge to core
Access point Data center compute
System cache
AcceleratorCortex-A
CoreLink CMN-600
DMC-620
IO
NIC-450
IO
DMC-620100GbEPCIe
DMC-620 DMC-620
DMC-620
DM
C-6
20
DM
C-6
20
DM
C-6
20
DM
C-6
20
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CoreLink CMN-600Bandwidth 1 TB/s20 GB/s
System cache 128MB0MB
DDR channels 81
Cortex-A CPUs 2564
Scalable system configurations
DMC620 : dynamic memory controller
CMN: mesh network interconnect
15 © 2017 Arm Limited
New DynamIQ-based CPUs for new possibilities
All comparisons at ISO process and frequency
Baseline to Cortex-A72 Baseline to Cortex-A53
All comparisons at ISO process and frequency
1.97x
1.42x
1.21x
LMBench memcpy
SPECfp2006
SPECint2006
1.30x
1.39x
1.23x
LMBench memcpy
SPECfp2006
SPECint2006
Arm Cortex-A55Arm Cortex-A75
16 © 2017 Arm Limited
Next-generation features
Dot product and half-precision float for AI/ML processing.
Virtualized Host Extensions (VHE) offering Type-2 hypervisor (KVM) performance improvements.
Cache stashing and atomic operations improves multicore networking performance and improves latency.
Cache clean to persistence to support storage class memory.
Infrastructure class RAS enhancement including data poisoning and improved error management.
17 © 2017 Arm Limited
DynamIQ cluster
0 - 7 CoresCore 0
Snoop filter
PowerManagement
L3 Cache
BusI/F
ACP andperipheral
port I/F
Core 7
Asynchronous bridges
DynamIQ Shared Unit (DSU)
DynamIQ Shared Unit (DSU)
Streamlines traffic across bridges
Advanced power management features
Latency and bandwidth optimizations
Support for multiple performance domains
Scalable interfaces for edge to cloud applications
Supports large amounts of local memory
Low latency interfaces for closely coupled accelerators
18 © 2017 Arm Limited
Level 3 cache partition in a networking application
Infrastructure
• Process 1 = data plane
• Process 2 = control plane
• Packet processing data sent through low latency ACP interface
Sensors or I/O agents ACP
Process 2
Core group 2
Example configuration with two Core groups in a DynamIQ cluster
Group 1 Group 2
Core group 1
Process 1
L3 cache
Core0 Core1 Core2 Core3 Core4 Core5 Core6 Core7
Reserved for external accelerators via ACP Group 2
19 © 2017 Arm Limited
Increasing performance through cache stashing
Enables reads/writes into the shared L3 cache or per-core L2 cache.
Allows closely coupled accelerators and I/O agents to gain access to core memory.
AMBA 5 CHI and Accelerator Coherency Port (ACP) can be used for cache stashing.
More throughput with Peripheral Port (PP) for acceleration, network, storage use-cases.
Acceleratoror I/O
CoreLink CMN-600
DMC-620 DMC-620
Agile System Cache
DDR4 DDR4
L3 Cache
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
Cortex-A
Agile System Cache
L3 Cache
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
Cortex-A
Stash critical data to any cache level
DynamIQ cluster
0–7 Cores
Core0
Snoop filter
PowerMngmt
L3 Cache
BusI/F
ACP andperipheral
port I/F
Core7
Asynchronous bridges
DynamIQ Shared Unit (DSU)
L1 cache
L2 cache
L1 cache
L2 cache
20 © 2017 Arm Limited
Networking: Compute and packet-processing solution
Build the right mix of compute
• High-performance Cortex-A75
• High-efficiency throughput Cortex-A55
• DSP, accelerators
Scale CoreLink CMN-600 for application
• Access points: 2~5W
• Macro BTS: 20~30W
• Telecom server: 60~100W
DSP
DMC-620
Cortex-A75
Level-3 Cache
DMC-620
Data plane processing• Throughput compute• Optional local accelerators
Scalable coherent SoC memory system
Control plane processing• Performance compute• Timing critical processing
IO IO
Ethernet
Accelerator
Agile System Cache Agile System Cache
Local
AcceleratorCortex-A55
Level-3 Cache
CoreLink CMN-600
21 © 2017 Arm Limited
Data center server blade: Maximize compute density
Cortex-A75 can deliver 1.4x to 2.9x higher performance .
• >1.4x performance with 32 core Cortex-A75 + CMN-600 vs. 32 core Cortex-A72 + CCN-512
• 2.9x performance with 64 core Cortex-A75 +CMN-600
• Higher rate performance than competitive architectures per socket
• 1-1.5W/core enables sustained operation at maximum performance
Power: 60W compute.
Cache Cache Cache Cache
Cache Cache Cache Cache
Cache Cache Cache Cache
Cache Cache Cache CacheD
MC
-60
0
DD
R4
-32
00
DM
C-6
00
DD
R4
-32
00
DMC-600
DDR4-3200
DMC-600
DDR4-3200
DM
C-6
00
DD
R4
-32
00
DM
C-6
00
DD
R4
-32
00
DMC-600
DDR4-3200
DMC-600
DDR4-3200100GbE
CM
L
PCIe
CM
L
PC
IeP
CIe
22 © 2017 Arm Limited
Enabling 5G transformation
Transition to 5G networks and new use cases is transforming network architectures. • Requires efficient compute closer to the edge
• Mobile Edge Compute (MEC) nodes combine server-like design (software driven, high aggregate performance) with edge network needs (latency, throughput, power optimized)
Software investments from Arm and ecosystem partners to enable real 5G deployment.
Cortex-A75 & Cortex-A55 together with CMN and other Arm IP enable scalable design. • Address the compute requirements across the network in a range of power budgets
For more information go to:
https://developer.arm.com
2323
Thank You!Danke!Merci!謝謝!ありがとう!Gracias!Kiitos!감사합니다धन्यवाद
© 2017 Arm Limited
2424 © 2017 Arm Limited
The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks