building the reconfigurable cloud...
TRANSCRIPT
High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering
University of Toronto
Building the Reconfigurable Cloud Ecosystem
Paul Chow
April 8, 2017
The Ecosystem
• What is it? What do we have now?
• Why do we need one?
• What do we need?
• What are we doing about it at UofT?
• What next?
April 8, 2017 ETCD 2017
2
Systems we can learn from
• Microsoft Catapult – two excellent papers
• Amazon EC2 F1 – online blogs and documentation
• Baidu – Hot Chips presentations, other online
• Others?
• Hard for academia to do research at scale
April 8, 2017 ETCD 2017
4
Characteristics
• Cool hardware, but …
• Microsoft still uses Verilog, as far as I know
• Amazon just gives you Vivado and some IP
• Baidu has built accelerators accessed via APIs
• How do mortals use these?
• It’s the dark ages compared to software development!
April 8, 2017 ETCD 2017
5
What about High-Level Synthesis? • Raises the level of abstraction – more software-like • Lots of great research • Absolutely necessary • Tremendous progress recently • Can describe complex computations and functions and
create hardware But!!! We are still building custom hardware.
HLS is not sufficient, only a part of the big picture…
April 8, 2017 ETCD 2017
6
What about OpenCL?
• Vendor tools using a high-level language standard
• Getting closer
• Can almost write code, <CR>, Run – Closer to software environment – Abstracts the hardware – Data movers, run time, scheduling, etc. are transparent
to user
April 8, 2017 ETCD 2017
7
Clouds are about scaling
• And elasticity
• And resilience
• And sharing and virtualization
• And security and privacy
• And accessibility by users, devices, applications
April 8, 2017 ETCD 2017
12
What’s your architectural perspective?
April 8, 2017 ETCD 2017
14
CPU
FPGA
CPU
FPGA
CPU
FPGA
CPU
FPGA
CPU
FPGA
CPU
CPU
CPU
FPGA FPGA FPGA
Accelerator Peers
Accelerator Model
• Amazon EC2 F1, Intel x86 + Arria
• Virtual machine (VM) + accelerator
• Many cloud issues are handled via VM – Still need to manage the communication between the
host and the FPGA, resource allocation – Easier if FPGA is not network-connected
April 8, 2017 ETCD 2017
15
Peer Model • Microsoft Catapult 2 • Pools of computing resources
– Pick the appropriate one
April 8, 2017 ETCD 2017
16
ToR ToR
CS CS
ToR ToR
Bing Ranking SW
HPC
Bing Ranking HW
Speech to text
Large-scale deep learning
• No equivalent cloud infrastructure for FPGA part
From Derek Chiou
Why the Peer Model? • Easier development: SW Prototyping à Migration • Model makes no distinction between CPUs and FPGAs
(in terms of data communication, synchronization) • Heterogeneous or FPGA-only systems are easier to
program – keep a uniform programming model • As performance requirements increase, e.g., video +
5G, CPUs will not keep up leaving FPGA-only solutions for these tasks
April 8, 2017 ETCD 2017
17
Cloud Challenges of Peer Model
• Very little infrastructure to deal with heterogeneity
• FPGAs are very different from CPUs
• Focus of our research at UofT – Building the cloud ecosystem for the reconfigurable
cloud
April 8, 2017 ETCD 2017
18
Ecosystem: www.dictionary.com
noun
1. a system, or a group of interconnected elements, formed by the interaction of a community of organisms with their environment.
2. any system or network of interconnecting and interacting parts
April 8, 2017 ETCD 2017
20
What parts are needed?
• Almost everything!
• Learn from software
• Linux, software portability, scalability, platforms
• Networking, management, security, resilience
• Etc. …
April 8, 2017 ETCD 2017
21
Why do we need this ecosystem? • Already have a software ecosystem
– Continues to grow
• World is becoming heterogeneous – Expand the current software ecosystem to encompass
heterogeneity
• User should just see a collection of processing elements and pick appropriate one to use when required – Cannot continue to treat accelerators as a special case augmenting
a current programming model – Scales better
April 8, 2017 ETCD 2017
22
The Parts (Before the Cloud)
April 8, 2017 ETCD 2017
24
SW Middleware
BIOS Processor Hardware
SW OS
SW Application
Software
MPI Rank 0 and Software Ranks
MPI Library
Linux
Xeon Processor Motherboard
The Parts (Before the Cloud)
April 8, 2017 ETCD 2017
25
SW Middleware
BIOS Processor Hardware
SW OS
SW Application
BSPS
HW OS
HW Middleware
HW Application
BSPH
Software Hardware
Hardware MPI Ranks
Message Passing Engine (MPE)
MPI Network Infrastructure
FPGA BSP
The Parts (Before the Cloud)
April 8, 2017 ETCD 2017
26
SW Middleware
BIOS Processor Hardware
SW OS
SW Application
BSPS
HW OS
HW Middleware
HW Application
BSPH
Software Hardware
PCIe QPI AXI Network
Interconnect
Plus Cloud
April 8, 2017 ETCD 2017
27
SW Middleware
BIOS Processor Hardware
SW OS
SW Application
BSPS
HW OS
HW Middleware
HW Application
BSPH
Software Hardware
Resource Management
Resource Allocation
Deployment
Task Scheduling
Cloud Management
Networking
Interconnect
Start with Software Ecosystem • Build from software as much as possible
– Already lots of knowledge and infrastructure
• OpenStack is starting point for several groups – Cloud resource management – IBM, Huawei, UofT
• Virtualization – Means many things! – Sharing, abstraction
April 8, 2017 ETCD 2017
29
ENABLING FLEXIBLE NETWORK FPGA CLUSTERS IN A HETEROGENEOUS CLOUD DATA CENTER Naif Tarafdar, Thomas Lin, Eric Fukuda,
Hadi Bannazadeh, Alberto Leon-Garcia, Paul Chow
University of Toronto
FPGA 2017
31
April 8, 2017 ETCD 2017
Problems We Target
• Large multi-FPGA systems – Create abstraction between FPGAs in multi-FPGA
systems – Easy scalability of system
32
April 8, 2017 ETCD 2017
Problems We Target
• Large multi-FPGA systems – Create abstraction between FPGAs in multi-FPGA
systems – Easy scalability of system
• Network capabilities – FPGA cluster directly accessible by any other network
device in the datacenter
33
April 8, 2017 ETCD 2017
Overall System View
34
FPGA Mapping File Logical Cluster Description
FPGA Cluster Generator
User Input From User
April 8, 2017 ETCD 2017
Overall System View
35
FPGA Cluster Generator
User
Output to VM with FPGA Tools Individual FPGA
Projects
April 8, 2017 ETCD 2017
Overall System View
36
FPGA Cluster Generator
User
Output to Cloud Manager Command For
Resource Allocation
Commands For Connecting FPGAs to Network
April 8, 2017 ETCD 2017
Overall System View
37
FPGA Cluster Generator
User Output To User
MAC addresses of FPGAs in Multi-FPGA Cluster
April 8, 2017 ETCD 2017
Baseline Infrastructure
• SAVI (Smart Applications on Virtualized Infrastructure)
• OpenStack (Cloud Managing Software)
• Xilinx SDAccel (FPGA Hypervisor) 38
April 8, 2017 ETCD 2017
FPGA Hypervisor: Xilinx SDAccel
• Abstracts physical hardware on FPGA and provides software interface for these modules
• Part of Xilinx SDAccel • No network interface
41 April 8, 2017 ETCD 2017
Logical Cluster Description
42 FPGA Mapping File
Kernel A FPGA 1
Kernel B FPGA 1
Kernel C FPGA 2
April 8, 2017 ETCD 2017
Networking Backend
46
OpenStack SAVI Network Manager
FPGA Cluster Generator
Network Port Request
April 8, 2017 ETCD 2017
Networking Backend
47
OpenStack SAVI Network Manager
FPGA Cluster Generator
Network MAC address
April 8, 2017 ETCD 2017
Networking Backend
48
OpenStack SAVI Network Manager
FPGA Cluster Generator
Network MAC address
FPGA Port on Physical Switch
April 8, 2017 ETCD 2017
Case Study: Scalability of Query Processing Engine
49
• Representative Case study: Database Streaming Query Processing Engine – Size – Streaming
• Scalable
April 8, 2017 ETCD 2017
Case Study: Scalability of Query Processing Engine
50
Query Processing Engine
April 8, 2017 ETCD 2017
Case Study: Scalability of Query Processing Engine
51
Query Processing Engine
Scheduler
Query Processing Engine
April 8, 2017 ETCD 2017
Case Study: Scalability of Query Processing Engine
52
Query Processing Engine
Scheduler
Query Processing Engine -Replicated 6 times - 3 FPGAs - 2 units /FPGA
April 8, 2017 ETCD 2017
Software VNF Chaining • Initially VM1 and VM2 talk through the switch
– Could be software VNFs
• Automate adding a VNF between VM1 and VM2 – Example: Signature matching
• Input: VM1, VM2, type of VNF desired
• Result: traffic flows through VNF
VM1 VM2
Controller
Switch
April 8, 2017 ETCD 2017
56
Software VNF Chaining • Modifyflowsintheswitchwiththecontroller
• VM1's traffic is routed to VNF1
• VNF1's traffic is routed to VM2
• No modification to VM1, VM2, or VNF1
• No user action required
• Any VNF can be software (VM) or hardware (FPGA)
VNF1 VM1 VM2
Controller
Switch
April 8, 2017 ETCD 2017
57
VNFs with FPGAs
• Use same SDI interface but with FPGAs
• OpenStack (Neutron) used to create Network Port
• FPGA VNF: Partial bitstream à requires FPGA "hypervisor”
April 8, 2017 ETCD 2017
59
FPGA Hypervisor
• Hypervisor contains PCIe module which is the master for – An off-chip DRAM Controller
• PCIe passthrough used to connect VM with FPGA
• Processor on FPGA receives VNF application, gates the FPGA (to ensure no corruption) and then programs the FPGA with the partial bitstream
April 8, 2017 ETCD 2017
60
Gating Partial Region
WhatifEthernetisinthemiddleofatransac6onduringthepar6alreconfigura6on?!• ThereadysignalofEthernetnevergoesupagain!Thesolu6onisGa6ngthePar6alRegion.Thesegatesmakethereconfigura6onsafe!
VNF Gate Gate Input Stream Output
Stream
Bitstream
April 8, 2017 ETCD 2017
61
Partial Bitstream Generation • TheVNFimplementedhasastandardizedinterface
• ScripttakesaVNFusingaboveportsandplacesitintheapplica6onregion,automa6callycreatingthebitstreamsneeded.
VNF Reset Clock
Input Stream Output Stream
April 8, 2017 ETCD 2017
62
Partial Bitstream Generation
VNF
The steps that the scripts do are as follows:• PuttheVNFIPintothesta6cregionandmaketheconnec6ons.• SynthesizetheVNFhardware• LoadthenetlistofVNFintosta6cregionwithlockedplaceandroute• PlaceandroutetheVNF• GeneratebitstreamofVNF
ScriptsVNF
April 8, 2017 ETCD 2017
63
What is the Hypervisor?
• Microsoft calls it the “Shell”
• An API that presents the I/O of the devices
• Provides services to application region (MS “Role”)
• Protection, security
April 8, 2017 ETCD 2017
65
BSPS
HW OS
HW Middleware
HW Application
BSPH Hypervisor
Requirements
• Support virtualization
• Multi-tenant/multi-user/multi-task on FPGAs
• Abstracted Peripheral I/O
April 8, 2017 ETCD 2017
66
More thoughts à
What is Virtualization?
• SW Analogues – Server Virt. (i.e. Hypervisor) – Emulates existing physical architecture (of PCs/Servers)
for its I/O abstraction – Multiple “emulated systems” on a single host system for
multi-tenant/multi-user – Includes both data isolation and performance isolation – Often tightly coupled with orchestration SW
April 8, 2017 ETCD 2017
67
What is Virtualization?
• SW Analogues – Operating System – Not often thought of as virtualization – Creates a multi-user/multi-tasking environment – Provides an “invented” abstraction layer for I/O access
• Virtual Memory (in HW), and the OS ABI & APIs (in SW)
– Generally no performance isolation
April 8, 2017 ETCD 2017
68
What is Virtualization?
• SW Analogues – Containerization – Server Virtualization “light” – Creates and manages multiple separate instances of the
“invented” abstraction layer of an OS • Each container looks like it has exclusive access to the
entire OS
– Adds some device emulation (e.g. vNIC, vSwitch) – Adds support for performance isolation
April 8, 2017 ETCD 2017
69
System Requirements
• Can be implemented on multiple vendor’s FPGAs
• Multiple applications on same FPGA (either temporally or spatially)
• Abstraction layer for peripheral I/O Access
• External management ability
April 8, 2017 ETCD 2017
70
Implications of Vendor Portability
• Recompilation for different vendors/PR regions will always be required, therefore: – Lowest level for portable executable in such a system is
source code (maybe netlist?) – Some abstractions/features can be implemented during
static compilation at no cost, if need be (i.e. as HDL IP)
April 8, 2017 ETCD 2017
71
Implications of Multi-Tasking • Requires Data Security/Isolation
– Page Tables/MMU for MM resources – Secure channels for stream-based resources
• Performance Isolation – To prevent DDoS-like attacks – Interesting research area (not aware of any research)
• Partial Reconfig Necessary – Some standardized interface required – Signal decoupling during reconfig
April 8, 2017 ETCD 2017
72
Implications of Multi-Tasking
• Decisions to Make – Hierarchical Resource Management? (i.e. user groups) – Local vs. Remote hosted CAD tools? – Different-sized PR Regions?
April 8, 2017 ETCD 2017
73
Implications of External Mgmt.
• Need to expose either: – Some management protocol connected to an accessible
common network – On-chip CPU running a lightweight SW OS with remote
access (e.g. SSH) • Note – “on-chip CPU” can be modelled by a CPU
connected to a PCIe FPGA card for our purposes (to use the infrastructure that we already have in place)
April 8, 2017 ETCD 2017
74
Another Challenge -- Migration
April 8, 2017 ETCD 2017
75 APPLICATION
HWHYPERVISOR
STANDARDAPITOMEMORYAND
NETWORK
MIGRATIONSIGNALSMIGRATIONCONTROLLER
STANDARDAPITONETWORK
MMURAM MEMCONTROLLER
FPGA
CONTROLFROMHOSTCPU
HOSTCPU
ETHERNETCONTROLLER STANDARDAPI
TONETWORK
STANDARDAPITOMEMORY
ETHERNETPORT
?
Live Migration Controller
What is it? • Need to preserve the hypervisor abstraction across
platforms – Different vendors, devices, board configurations
• Hypervisor supports the rest of the “stack” so that higher layers can remain isolated from low-level platform changes – Software has been very successful at this – Must do this for hardware
• Avoid building it by hand every time – Idea only about 2 weeks old, so needs more thinking!
April 8, 2017 ETCD 2017
77
Conclusions • Lots of focus on HLS today – it’s needed, not sufficient
• Some working now on other layers – need identified
• To achieve a cloud ecosystem for using FPGAs, much more is needed – it’s a big stack
• Need a coordinated effort to enable cloud computing with FPGAs – cannot be haphazard à need a plan – Open source is only way to harness enough resources – How do we do this?
April 8, 2017 ETCD 2017
80
Acknowledgements
Stuart Byma, Naif Tarafdar, Eric Fukuda, Daniel Ly-Ma, Daniel Rozhko, Roberto DiCecco, Nariman Eskandari
SAVI – Prof. Alberto Leon-Garcia, Hadi Bannazadeh, Thomas Lin April 8, 2017 ETCD 2017
81
emSYSCAN