what is simd? - arun raghavanarm: neon gpgpu: cuda/opencl works well for a very specific subset of...

31
Photo: necosky on Flickr

Upload: others

Post on 01-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Photo: necosky on Flickr

Page 2: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Or,

Vector Optimisation For SIMD Newbies

… by a SIMD newbie

Page 3: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

What is SIMD?

Page 4: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Image: Wikipedia

Page 5: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

x86: MMX/3DNow/SSE*

ARM: NEON

GPGPU: CUDA/OpenCL

Page 6: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Works well for a very specific subset of apps

Page 7: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

3D Graphics

Page 8: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Signal processing

Page 9: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Other (specific) forms of data crunching

Page 10: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

But writing SIMD assembly is hard

Page 11: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Need to write once per architecture/processor

Page 12: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Enter: Orc

Page 13: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Project started by David Schleef

Page 14: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Write “programs” in simple ASM-like language

Page 15: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Runtime compiler: “programs” → native assembly

Page 16: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Library: extend Orc for your purposes

Page 17: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Supports:

MMX, SSE (1 to 4.2)

ARM, NEON

Altivec (PPC)

C64x (TI DSP)

Page 18: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing
Page 19: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Getting started

Page 20: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

A simple example: PulseAudio echo canceller

Page 21: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

That alone provided a >20% speedup

Page 22: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Another one: PulseAudio volume scaling

Page 23: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Sample s: 16-bit signed int (usually)

Volume v: 32-bit unsigned int

Operation: (s * v) >> 16

Page 24: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

The C code

Page 25: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

The SSE code

Page 26: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

The Orc code

Page 27: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

TBD: More AEC optimisation (dot product)

Page 28: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Limitations

Page 29: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Future work

Page 30: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

Questions?

Page 31: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing

IRC: #orc on FreeNode