cloud on steroids accelerating your cloud via cyborg

24
2018 Lenovo Internal. All rights reserved. Cloud on steroids Accelerating your cloud via cyborg Jinghua Gao, Zhenghao Wang (Staff Researcher, Lenovo Research) 2018-05-23 OpenStack Vancouver Summit , May 2018

Upload: others

Post on 22-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cloud on steroids Accelerating your cloud via cyborg

2018 Lenovo Internal. All rights reserved.

Cloud on steroids

Accelerating your cloud via cyborg

Jinghua Gao, Zhenghao Wang (Staff Researcher, Lenovo Research)2018-05-23

OpenStack Vancouver Summit , May 2018

Page 2: Cloud on steroids Accelerating your cloud via cyborg

2

Necessity of Acceleration

Management

Cyborg Introduction

Demo

Summary

01

02

04

05

Agenda Lenovo’s Contribution to Cyborg03

2018 Lenovo Internal. All rights reserved.

Page 3: Cloud on steroids Accelerating your cloud via cyborg

32018 Lenovo Internal. All rights reserved.

1. Necessity of Acceleration Management

Page 4: Cloud on steroids Accelerating your cloud via cyborg

42018 Lenovo Internal. All rights reserved.

Prevalence of Accelerations

1. Virtual Networking Offloading

2. Dynamic Optimization of Packet Flow Routing

3. Load Balancing and NAT,

4. Open vSwitch, HTTPs offloading

1. NVMe Over Fabric Enabled Acceleration

2. High Performance Persistent Memory

1. vBRAS, HQoS, Multicast Offloading

2. vRAN, Cipher/Decipher Offloading

3. SBC, Media Codec Offloading

4. Tensorflow, Model Training Acceleration

5. Crpytocurrency Mining Acceleration

6. Next Generation Fire Wall (NGFW) Acceleration

VM/App

layer

Compute Acceleration Storage Acceleration Network Acceleration

Infrastructure

layer

ASIC GPU FPGA

Provide

Hardware Accelerators

DPDK/SPDK

Software Accelerators

Accelerators

Usage Scenarios AI NFV BlockchainGenetic

SequencingBig Data

&

Page 5: Cloud on steroids Accelerating your cloud via cyborg

52018 Lenovo Internal. All rights reserved.

Challenges

• Difficult to standardize various acceleration technologies – Software accelerators: DPDK, SPDK.

– Multi-vendor hardware accelerators with different architecture, like GPU, ASIC, FPGA etc.

• Complex– Different contexts and usage scenarios.

– Different forms: virtualized, shared by time, pass-through, etc.

• Expensive– Non-trivial management efforts

– High price of hardware.

Cyborg Project

Need a unified acceleration management framework to enable acceleration as a service

Page 6: Cloud on steroids Accelerating your cloud via cyborg

62018 Lenovo Internal. All rights reserved.

2. Cyborg Introduction

Page 7: Cloud on steroids Accelerating your cloud via cyborg

72018 Lenovo Internal. All rights reserved.

• General management framework– Software accelerators: DPDK/SPDK, PMEM, XDP/eBPF, ...

– Hardware accelerators : FPGA, GPU, QAT, NVMe SSD,

CCIX based Caches….

• Lifecycle management of accelerators– Discovery, Program, Attach, Detach, Remove

Accelerators

Discovery

Program

AttachDetach

Remove

Timeline and Definition

Rocky Release

os-acc

Xilinx FPGA driver

pythonclient

Nomad repo

established

Feb 2016

Apr 2016

Oct 2016 Feb 2017

Sep 2017

Feb 2018 Sep 2018

First BOF session

at Austin

First design session

in Barcelona

Rename to cyborg

Pike PTG

Becomes official

project

Queens PTG

Queens Release

API-DB

Conductor-Agent

Generic Driver

Page 8: Cloud on steroids Accelerating your cloud via cyborg

8

Architecture

cyborg-api

cyborg-conductor cyborg-db

cyborg-agent

fpga-driver gpu-driver

vendor-a-fpga-driver vendor-b-fpga-driver vendor-c-gpu-driver

spdk-driver…

controller-node

compute-node

2018 Lenovo Internal. All rights reserved.

Page 9: Cloud on steroids Accelerating your cloud via cyborg

92018 Lenovo Internal. All rights reserved.

Interaction with Other Projects

Attached to the VM where

workload demands acceleration.

Two main use case groups Other projects

Nova

FPGA(Intel & Xilinx)

Accelerator examples

Nova & Glance

Used by infrastructure, and then

utilized via appropriate service.

GPU, QAT…

DPDK/SPDK

Page 10: Cloud on steroids Accelerating your cloud via cyborg

10

Interaction with Nova

• Work with Nova through three steps:

Representation

at Discovery

Instance

placement/

scheduling

Attaching

accelerators to

Instances32

2018 Lenovo Internal. All rights reserved.

nova-api

nova-conductor

nova-scheduler

nova-compute

hypervisor

cyborg-api

cyborg-conductor

cyborg-agentDriver A Driver B Driver C

nova-placement-api

accelerators

update

cyborg-db

Upstream:

controllercompute

1

Page 11: Cloud on steroids Accelerating your cloud via cyborg

11

Interaction with Nova

• Work with Nova through three steps:

Representation

at Discovery

Instance

placement/

scheduling

Attaching

accelerators to

Instances

1 32

2018 Lenovo Internal. All rights reserved.

nova-api

nova-conductor

nova-scheduler

nova-compute

hypervisor

cyborg-api

cyborg-conductor

cyborg-agentDriver A Driver B Driver C

nova-placement-api

accelerators

update

cyborg-db

Upstream:

controllercompute

filter/weigher

Page 12: Cloud on steroids Accelerating your cloud via cyborg

12

Interaction with Nova

• Work with Nova through three steps:

Representation

at Discovery

Instance

placement/

scheduling

Attaching

accelerators to

Instances

1 32

2018 Lenovo Internal. All rights reserved.

nova-api

nova-conductor

nova-scheduler

nova-compute

hypervisor

cyborg-api

cyborg-conductor

cyborg-agentDriver A Driver B Driver C

nova-placement-api

accelerators

update

cyborg-db

Upstream:

controllercompute

filter/weigher

os-acc

Page 13: Cloud on steroids Accelerating your cloud via cyborg

132018 Lenovo Internal. All rights reserved.

3. Lenovo’s Contributionto Cyborg

Page 14: Cloud on steroids Accelerating your cloud via cyborg

142018 Lenovo Internal. All rights reserved.

Real World Requirements

AINFV Blockchain Big Data

GPU FPGANVMe

SSD

Accelerators

Netronome

smartnic

cavium

smartnic

Intel QAT

Hypervisor

DPDK

Neutron

OpenStack

Nova

API

Conductor

Agent

cyborg

Driver

...

NFVVNF(vRAN, vBRAS, SBC…) / Infrastructure( NGFW, OVS…)

High performance – 10~100Gbps up

High reliability – up time of 99.999%

Low-latency -- less than 100ms usually

Page 15: Cloud on steroids Accelerating your cloud via cyborg

152018 Lenovo Internal. All rights reserved.

Lenovo’s Efforts on Cyborg

• Integrate with nova.– Provide an acceleration solution without

nova-placement.

– Provide the accelerator during VM boot time or via a separate attach/detach action.

• Extend drivers– Use upstream FPGA driver

– Add GPU, Netronome driver etc.

• There are still productions before newton release don’t have nova-placement.

• To dynamically use accelerators.

• To accelerate different workloads.

Page 16: Cloud on steroids Accelerating your cloud via cyborg

16

Boot Time Attachment

Cyborg Use Case: GPU 1/2

nova-api

nova-conductornova-scheduler

nova-compute

Hypervisor

cyborg-api

cyborg-conductor

cyborg-agentDriver A Driver B Driver C

Accelerators

cyborg-db

Resource updating at discoveryPeriodically update to cyborg-db.

Instance scheduling1. Create VMs with specific image properties.

2. Scheduling using acc_filter.

3. Cyborg return the compute nodes list.

Attaching accelerators to Instances1. Call cyborg to claim required GPU resource.

2. Define the XML with GPU pci_address.

3. Run VM, If fail, call cyborg to release the

allocated GPU resource.

periodically retrieve

acc_filtercontrollercompute

image_propeties

claim resources

2018 Lenovo Internal. All rights reserved.

Page 17: Cloud on steroids Accelerating your cloud via cyborg

17

Run-time Attachment(Hot-plug)

Cyborg Use Case: GPU 2/2

Command:

nova accelerator-attach instance_id --type

GPU

Difference with boot time attachment:1. Query nova-db to get instance location.

2. Call cyborg to get accelerator list.

3. Add a new XML file and attach to VM.

nova-api

nova-compute

Hypervisor

cyborg-api

cyborg-conductor

cyborg-agentDriver A Driver B Driver C

Accelerators

cyborg-db

controllercompute

2018 Lenovo Internal. All rights reserved.

Page 18: Cloud on steroids Accelerating your cloud via cyborg

18

Cyborg Use Case: FPGA1. Use image properties to define the accelerator type & fpga function.

-Request-time Programming

2. Use existing glance table for FPGA bitstreams. Difference with GPU

attachment workflow:

1. Nova-compute call cyborg &

periodically check the

program status of bitstream

programming.

2. Cyborg get bitstreams from

glance then program it to

FPGA.

3. Change “type” of FPGA pf/vf.

The reason to change the

type of vf/pf is that resources

may be different in the

hypervisior level to be

attachded.

e.g. if the FPGA pf/vf is

programed with a given NiC

bitsreams, then cyborg should

change the type from fpga to

smartnic.

glance

2018 Lenovo Internal. All rights reserved.

nova-api

nova-conductornova-scheduler

nova-compute

Hypervisor

cyborg-api

cyborg-conductor

cyborg-agentDriver A Driver B Driver C

Accelerators

cyborg-dbtype

controllercompute

Page 19: Cloud on steroids Accelerating your cloud via cyborg

192018 Lenovo Internal. All rights reserved.

4. DemoVM provisioning with GPU pass-through

Page 20: Cloud on steroids Accelerating your cloud via cyborg

20

Internet

2018 Lenovo Internal. All rights reserved.

Environment

• Lenovo ThinkCloud OpenStack 4.2 Version– 3 nodes, 1 controller node and 2 compute nodes.

– One compute node with NVIDIA GPU.

• Demo: VM Provisioning with GPU Pass-through

node-4

controller

node-5

compute

node-6

compute

G

P

U

Internet

SwitchThinkCloud OpenStack 4.2

Page 21: Cloud on steroids Accelerating your cloud via cyborg

212018 Lenovo Internal. All rights reserved.

5. Summary

Page 22: Cloud on steroids Accelerating your cloud via cyborg

222018 Lenovo Internal. All rights reserved.

Summary

• Achievements– Use cyborg to manage different accelerators in Lenovo Product.

– Integrate with nova, form a standard workflow of creating VM with GPU/FPGA… pass-through.

• Future Work

– Support sharing accelerator hardware among VMs.- Cyborg-driver support for discovering and storing shared accelerators.

– Application Plugin mechanism of cyborg-api etc.

Page 23: Cloud on steroids Accelerating your cloud via cyborg

232018 Lenovo Internal. All rights reserved.

Q&A

• Jinghua Gao– Email: [email protected]

– Twitter: @Miss_Coco_Gao

– IRC: coco

– Network acceleration & Datacenter traffic analysis

• Zhenghao Wang– Email: [email protected]

– IRC: wangzhh

– OpenStack Zun&Cyborg contributor

– Cloud computing researcher at Lenovo

Page 24: Cloud on steroids Accelerating your cloud via cyborg