cloud, distributed, embedded: erlang in the heterogeneous computing world

Post on 30-Aug-2014

911 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Cloud, Distributed, Embedded. Erlang in the Heterogeneous Computing World

Omer Kilic || @OmerK

omer@erlang-solutions.com

Slide 2 of 46

Outline

• Challenges in modern computing systems

• Heterogeneous computing

• Co-processors and accelerators

• Programming models and tools

• Alternate architectures

• Parallella Vision System

• Erlang Embedded Project

• Q&A

10/12/2013 Build Stuff 2013

Slide 3 of 46

Challenges: Software

• Frequency wall

• Memory bottlenecks

• Software complexity

10/12/2013 Build Stuff 2013

Slide 4 of 46

Amdahl’s Law

• “…the maximum speed-up through parallel processing is set by the amount of code which has to run serial”

10/12/2013 Build Stuff 2013

Slide 5 of 46

Challenges: Hardware

• Yield issues

• Wiring and interconnect

• Thermal density

•Power consumption

End of Moore’s law imminent…

10/12/2013 Build Stuff 2013

Slide 6 of 46

Challenges

“With nearly 10 billion devices connected to the internet and predictions for exponential growth, we’ve reached a point where the space, power, and cost demands of traditional technology are no longer sustainable.”

10/12/2013 Build Stuff 2013

Meg Whitman President and CEO, HP

Slide 7 of 46

Internet of Things

10/12/2013 Build Stuff 2013

Slide 8 of 46

Device Architectures (I)

10/12/2013 Build Stuff 2013

Slide 9 of 46

Device Architectures (II)

10/12/2013 Build Stuff 2013

Slide 10 of 46

Heterogeneous Computing (I)

• Special purpose, highly specialised architectures will outperform general purpose processing devices

– Possibly by orders of magnitude

– In terms of energy efficiency as well as raw speed

– Parallel execution is key

• Non-programmable/pseudo-programmable accelerators: ASIC, DSP, GPU, …

• Fully programmable accelerators: FPGAs

10/12/2013 Build Stuff 2013

Slide 11 of 46

Open Compute Project

10/12/2013 Build Stuff 2013

Slide 12 of 46

Heterogeneous Computing (II)

10/12/2013 Build Stuff 2013

Slide 13 of 46

GPUs

10/12/2013 Build Stuff 2013

Slide 14 of 46

Anatomy of a GPU

10/12/2013 Build Stuff 2013

Slide 15 of 46

Co-processors: NetFPGA 10G

10/12/2013 Build Stuff 2013

Slide 16 of 46

Co-processors: Generic COTS devices

10/12/2013 Build Stuff 2013

Slide 17 of 46

Landscape of accelerator programming

10/12/2013 Build Stuff 2013

Interface CUDA OpenCL DirectCompute RenderScript

Originator NVIDIA Khronos (Apple) Microsoft Google

Year 2007 2008 2009 2011

Area HPC, desktop Desktop, mobile, embedded, HPC

Desktop Mobile

OS Windows, Linux, Mac OS

Windows, Linux, Mac OS (10.6+)

Windows (Vista+) Android (3.0+)

Devices GPUs (NVIDIA) CPUs, GPUs, custom

GPUs (NVIDIA, AMD)

CPUs, GPUs, DSPs

Work unit Kernel Kernel Compute shader Compute script

Language CUDA C/C++ OpenCL C HLSL Script C

Distributed Source, PTX Source Source, bytecode LLVM bitcode

From: “The landscape of accelerator programming: a view from ARM”, Lokhmotov, A., 3rd UK GPU Computing Conference, London

Slide 18 of 46

Accelerator types

• Programmable accelerators

– CPU Vector extensions: x86/SSE/AVX, PowerPC/VMX, ARM/NEON

– GPUs supporting general-purpose computing (GPGPUs)

– Sony/Toshiba/IBM Cell (Sony PlayStation 3, HPC)

– ClearSpeed CSX (HPC, embedded)

– Adapteva Epiphany (HPC, mobile)

– Intel MIC (HPC)

10/12/2013 Build Stuff 2013

Slide 19 of 46

Programming accelerators

• Proprietary low-level APIs, typically C-based:

– Vector intrinsics

– NVIDIA CUDA

– ATI Brook+

– ClearSpeed Cn

• No software portability, obsolescence risk.

10/12/2013 Build Stuff 2013

Slide 20 of 46

OpenCL (I)

“OpenCL (Open Computing Language) is an open, royalty-free standard for general-purpose parallel programming of heterogeneous systems. OpenCL provides a uniform programming environment for

software developers to write efficient, portable code for high-performance compute servers, desktop computer systems and handheld devices using a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and

other parallel processors such as DSPs.”

10/12/2013 Build Stuff 2013

Slide 21 of 46

OpenCL (II)

• Allows you to write C like code which executes on GPUs and many other devices

– CPUs, FPGAs, various other architectures

• Key point is data parallelism: applying the same function to a large amount of data

• Allows us to leverage devices like GPUs from Erlang easily with a minimal wrapper

10/12/2013 Build Stuff 2013

Slide 22 of 46

The Parallella Board

10/12/2013 Build Stuff 2013

Slide 23 of 46

Shiny prototype!

10/12/2013 Build Stuff 2013

Slide 24 of 46

The Parallella Board

10/12/2013 Build Stuff 2013

Slide 25 of 46

Epiphany Architecture

10/12/2013 Build Stuff 2013

Slide 26 of 46

Epiphany-IV 64-core 28nm (E64G401)

• 64 High Performance RISC CPU Cores • 800 MHz Operating Frequency • 100 GFLOPS Peak Performance • 1.6 TB/s Local Memory Bandwidth • 102 GB/s Network-On-Chip Bisection Bandwidth • 6.4 GB/s Off-Chip Bandwidth • 2 MB On-Chip Distributed Shared Memory • 2 Watt Maximum Chip Power Consumption • IEEE Floating Point Instruction Set • Fully-featured ANSI-C/C++ programmable • GNU/Eclipse based tool chain • Source synchronous LVDS off chip links for host or direct chip-to-

chip interfacing. • Chip to chip links for integrating up to 64 chips on a single board

10/12/2013 Build Stuff 2013

Slide 27 of 46

Parallella Vision Demo - Overview

10/12/2013 Build Stuff 2013

Slide 28 of 46

Parallella Vision Demo - Cameras

10/12/2013 Build Stuff 2013

Slide 29 of 46

Parallella Vision Demo - Architecture

10/12/2013 Build Stuff 2013

Slide 30 of 46

OpenCL and Erlang

• Erlang is not that great for crunching image data.

– This is where OpenCL fits in.

• Erlang provides an environment around OpenCL. Our server implementation collect frames, offloads processing to Epiphany and send results back.

– Low latency distributed communications and message passing between processes and nodes

– Monitoring and supervision facilities

– “Glue” between heterogeneous nodes

10/12/2013 Build Stuff 2013

Slide 31 of 46

OpenCL on the Parallella

• Parallella is a little different than standard GPUs

– Work sizes are different (smaller amount of cores compared to GPU)

– Requires some forethought into structuring your kernels

10/12/2013 Build Stuff 2013

Slide 32 of 46

Parallella and Erlang

• Ubuntu armhf packages up and running

– Will be included in the standard distro image

• Vision Demo code available now

– https://github.com/esl/parcv

10/12/2013 Build Stuff 2013

Slide 34 of 46

Embedded Landscape

10/12/2013 Build Stuff 2013

Slide 36 of 46

External Interfaces in Erlang

10/12/2013 Build Stuff 2013

Slide 37 of 46

Accessing hardware

• Peripherals are memory mapped

• Access via /dev/mem…

– Faster, needs root, potentially dangerous!

• …or by kernel modules/sysfs

– Slower, doesn’t need root, easier, relatively safer

Generally very messy…

10/12/2013 Build Stuff 2013

Slide 38 of 46

Introducing…

Erlang/ALE

10/12/2013 Build Stuff 2013

http://github.com/esl/erlang-ale

Actor

Library for

Embedded

Slide 39 of 46

Erlang/ALE

• Brings embedded peripheral interfaces into the Erlang domain

• Provides easy to use, familiar abstractions for Erlang programmers

• Uses Raspberry Pi as reference platform, easy to port it to other embedded platforms

• Open source (Apache version 2)

10/12/2013 Build Stuff 2013

Slide 40 of 46

Beta release

• Based on pihwm

– http://omerk.github.io/pihwm

• GPIO and GPIO interrupts, SPI, I2C and PWM peripherals supported

• Documentation, supporting material and educational package under development

10/12/2013 Build Stuff 2013

Slide 41 of 46

ALE Example: Blink!

{ok, _} = gpio:start_link(?LED_PIN, output),

blink() ->

gpio:write(?LED_PIN, 1),

timer:sleep(1000),

gpio:write(?LED_PIN, 0),

timer:sleep(1000).

10/12/2013 Build Stuff 2013

Slide 42 of 46

ALE Example: Interrupts

{ok, _} = gpio:start_link(?IN_PIN, input),

ok = gpio:set_int(?IN_PIN, rising),

handle_info({gpio_interrupt, _Pin, _Condition}, State) ->

blink().

10/12/2013 Build Stuff 2013

Slide 43 of 46

Hardware Projects – Demo Board

10/12/2013 Build Stuff 2013

Slide 45 of 46

10/12/2013 Build Stuff 2013

Erlang

Slide 46 of 46

Thank you

• http://erlang-embedded.com

• embedded@erlang-solutions.com

• @ErlangEmbedded

10/12/2013 Build Stuff 2013

The world is concurrent. Things in the world don't share data. Things communicate with messages. Things fail.

- Joe Armstrong Father of Erlang

top related