open-source microservices for gpu-accelerated processing ...€¦ · • ease of development for...

15
IABM Copyright 2018 www.theiabm.org @THEIABM Open-source microservices for GPU-accelerated processing of UHD/HDR material Richard Cartwright – CTO work with Simon Rogers – CRO Streampunk Media Ltd.

Upload: others

Post on 16-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Open-source microservices for GPU-accelerated processing ...€¦ · • Ease of development for media applications • Open-source, free at the point of use • Self-contained Javascript

IABM Copyright 2018 www.theiabm.org @THEIABM

Open-source microservices for GPU-accelerated processing of

UHD/HDR materialRichard Cartwright – CTO

work with Simon Rogers – CROStreampunk Media Ltd.

Page 2: Open-source microservices for GPU-accelerated processing ...€¦ · • Ease of development for media applications • Open-source, free at the point of use • Self-contained Javascript

IABM Copyright 2018 www.theiabm.org @THEIABM

Introduction

First appearing at NAB 2016, Streampunk Media Ltd is a new company based in Scotland, UK, aiming to democratize professional content production with commodity infrastructure. We develop open-source software for stream-based media production, supporting a transition to dynamic software infrastructure in dematerialized facilities.Dr Richard Cartwright is CTO and founder of Streampunk Media. Richard holds a PhD in Computer Science from the University of Warwick. He has previously worked at the BBC, SAM & as Technical Steering Committee Chair of the AMWA. Richard is a co-author of the JT-NM Reference Architecture.

Page 3: Open-source microservices for GPU-accelerated processing ...€¦ · • Ease of development for media applications • Open-source, free at the point of use • Self-contained Javascript

IABM Copyright 2018 www.theiabm.org @THEIABM

Problem

pro video

streams

microservices

dynamic softwareinfrastructure

Internet of Things (IoT)

commodity, non media-specific

Page 4: Open-source microservices for GPU-accelerated processing ...€¦ · • Ease of development for media applications • Open-source, free at the point of use • Self-contained Javascript

IABM Copyright 2018 www.theiabm.org @THEIABM

Software-only solution

• Stream rather than batch• Efficiently stream and process media much faster than real time • In RAM, no going back to baseband

• Independent of the underlying system• CPU – Intel Core/Xeon, AMD Ryzen/Zen, ARM-based• GPU – Nvidia, Radeon, Intel embedded, other• OS & location – Windows, MacOS and Linux, locally / hybrid / cloud

• Ease of development for media applications• Open-source, free at the point of use• Self-contained Javascript programs with simple memory management

• Reusable set of IoT microservices• User does not have in depth knowledge of colour spaces, interlaced etc.

Page 5: Open-source microservices for GPU-accelerated processing ...€¦ · • Ease of development for media applications • Open-source, free at the point of use • Self-contained Javascript

IABM Copyright 2018 www.theiabm.org @THEIABM

Stream rather than batch

• IT systems are non-real time.

• Not slow.• Go parallel!

IT systems pause:• CPU speed step• Virtualisation• Garbage collect.• Threads

Page 6: Open-source microservices for GPU-accelerated processing ...€¦ · • Ease of development for media applications • Open-source, free at the point of use • Self-contained Javascript

IABM Copyright 2018 www.theiabm.org @THEIABM

Using multi-core CPU & GPU

• Prize• Massive parallel processing

resource – intended for 3D• Software only & virtual

• Issues (with e.g. OpenGL)• GPU has separate RAM• Takes time to copy• Different image from broadcast• YUV to RGB, int to float, 8/10 bit

• OpenCL language• Device independent C-like lang.• Driver for big data, blockchain etc..• Shared virtual memory option

• Apply to media processing• nodencl - open source project• OpenCL for video in Javascript• Core: floats RGB up to UHD/WCG• Edge: V210, ST 2110 etc.

Page 7: Open-source microservices for GPU-accelerated processing ...€¦ · • Ease of development for media applications • Open-source, free at the point of use • Self-contained Javascript

IABM Copyright 2018 www.theiabm.org @THEIABM

Intel 9th gen graphics architecture

https://software.intel.com/sites/default/files/managed/c5/9a/The-Compute-Architecture-of-Intel-Processor-Graphics-Gen9-v1d0.pdf

5376 concurrent instances

Page 8: Open-source microservices for GPU-accelerated processing ...€¦ · • Ease of development for media applications • Open-source, free at the point of use • Self-contained Javascript

IABM Copyright 2018 www.theiabm.org @THEIABM

Page 9: Open-source microservices for GPU-accelerated processing ...€¦ · • Ease of development for media applications • Open-source, free at the point of use • Self-contained Javascript

IABM Copyright 2018 www.theiabm.org @THEIABM

Ease of development

• Lots of media software is C++, often hardware control APIs• Not so widely taught or understood, complex

• Node.JS introduced server-side Javascript• Single threaded programs – based on GUI event-loop• Multi-threaded asynchronous APIs – efficient resource management• Easy to learn … massive pool of talent, test infrastructure, knowledge

• Node Package Manager npm – a software ecosystem• Modular software building blocks• Largest software repo in the world – 700,000 packages, 30%+ native accelerated• Many packages are plug-in microservices

• AIM: Add professional media capability into the Node ecosystem

Page 10: Open-source microservices for GPU-accelerated processing ...€¦ · • Ease of development for media applications • Open-source, free at the point of use • Self-contained Javascript

IABM Copyright 2018 www.theiabm.org @THEIABM

NodenCL microserviceconst nodencl = require('nodencl');

const kernel = `__kernel void square(__global unsigned char* input, __global unsigned char* output) {

int i = get_global_id(0); output[i] = input[i] * input[i];}`; let context = await nodencl.createContext();

let program = await context.createProgram(kernel);let [input, outuput] = await Promise.all([context.createBuffer(65536, 'readonly’),context.createBuffer(9000, 'writeonly', 'fine’)

]); await input.hostAccess(); // then copy into bufferlet execTimings = await program.run({input, output});

Page 11: Open-source microservices for GPU-accelerated processing ...€¦ · • Ease of development for media applications • Open-source, free at the point of use • Self-contained Javascript

IABM Copyright 2018 www.theiabm.org @THEIABM

Deconstructed services4:2:2 YUV BT.709

4:2:2 YUV BT.2020 HLG

4:2:2 YUV BT.2020 HDR104:4:4 RGB-Int BT.2020

1920i 10-bit V210

3180p 10-bit V210

3180p 10-bit SMPTE-2110

Xacceleratedmulti-core

Page 12: Open-source microservices for GPU-accelerated processing ...€¦ · • Ease of development for media applications • Open-source, free at the point of use • Self-contained Javascript

IABM Copyright 2018 www.theiabm.org @THEIABM

Television colorimetry elements

https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BT.2380-2015-PDF-E.pdf

x

y

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.00.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

520

560

540

580

600

620

700

500

490

480

470460 380

D65

By CIExy1931.svg: Sakuramboderivative work: GrandDrake (talk) - CIExy1931.svg, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=21864671

BT.709 BT.2020

Page 13: Open-source microservices for GPU-accelerated processing ...€¦ · • Ease of development for media applications • Open-source, free at the point of use • Self-contained Javascript

IABM Copyright 2018 www.theiabm.org @THEIABM

Memory management• At the edge – Node.JS byte buffers in CPU-accessible RAM

• 8- or 10-bit YUV integers• Sub-sampled 4:2:2 or 4:2:0• Packaged as V210, SMPTE 2110 pgroup – transport/storage optimised

• In the middle – GPU-accessible mapped arrays• 16- or 32-bit RGB or RGBA floating point values• No sub-sampling … 4:4:4• From 3.2 to 6.4 times more RAM required

• Expensive to move between the two for composed services• Cost of always going back to baseband and/or transporting• NodenCL keeps the data where it is, move it only when required• Combined un/pack YUV/RGB conversion

Page 14: Open-source microservices for GPU-accelerated processing ...€¦ · • Ease of development for media applications • Open-source, free at the point of use • Self-contained Javascript

IABM Copyright 2018 www.theiabm.org @THEIABM

Conclusion and next steps

• Key takeaways• Demonstrating that fast, software-only UHD/HDR processing is possible• Faster than real time, frame-based microservices in reactive streams pipelines• OpenCL - no need for media-industry-specific hardware – e.g. Matrox• Enable professional media processing with simple (ish) Javascript, hiding:

• CPU or GPU type• OS and platform• memory management

• Open-source library of TV services for colour, transitions, scalers …• Next steps

• Scaling using OpenCL images and samplers• Complete the integration into Streampunk dynamorse, simplified pipelines• Looking for PoCs, field trials & feedback

Page 15: Open-source microservices for GPU-accelerated processing ...€¦ · • Ease of development for media applications • Open-source, free at the point of use • Self-contained Javascript

IABM Copyright 2018 www.theiabm.org @THEIABM

Links, thanks, questions

Linkshttps://github.com/Streampunk/nodencl

https://github.com/Streampunk/node-red-contrib-dynamorse-openclThank you!

Simon RogersBen Russell

Roland Rodgers & BBC NIQuestions?

[email protected]://www.streampunk.media/