the return of the simd computers: ucsc kestrel and beyond andrea di blas school of engineering...

38
The Return of the SIMD The Return of the SIMD Computers: Computers: UCSC Kestrel and UCSC Kestrel and Beyond Beyond Andrea Di Blas Andrea Di Blas School of Engineering University of California Santa Cruz

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

The Return of the SIMD The Return of the SIMD Computers: Computers:

UCSC Kestrel and BeyondUCSC Kestrel and Beyond

Andrea Di BlasAndrea Di Blas

School of EngineeringUniversity of CaliforniaSanta Cruz

A. Di BlasA. Di Blas 22

OutlineOutline

Introduction: UCSC KestrelIntroduction: UCSC Kestrel

““Synchronous” applicationsSynchronous” applications

““Asynchronous” applicationsAsynchronous” applications

A. Di BlasA. Di Blas 33

INTRODUCTIONINTRODUCTION

A. Di BlasA. Di Blas 44

In a not-too-distant past…

A. Di BlasA. Di Blas 55

A. Di BlasA. Di Blas 66

A long time ago (late 1980’s) A long time ago (late 1980’s) the computing community the computing community

had high hope and had high hope and expectation in a new kind of expectation in a new kind of

architecture, the “Single architecture, the “Single Instruction-Multiple Data” Instruction-Multiple Data”

(SIMD) parallel computers.(SIMD) parallel computers.

Many started building Many started building “massively-parallel” SIMD “massively-parallel” SIMD computers with thousands computers with thousands

of processors.of processors.

A. Di BlasA. Di Blas 77

However, almost all were However, almost all were short-lived. Their high cost, short-lived. Their high cost,

the ever-increasing power of the ever-increasing power of the evil serial CPUs and, the evil serial CPUs and,

above all, the effort required above all, the effort required to program such an to program such an

unfamiliar architecture, unfamiliar architecture, forced big SIMD machines to forced big SIMD machines to

an early retirement.an early retirement.

A. Di BlasA. Di Blas 88

By the mid-90’s, SIMD By the mid-90’s, SIMD machines were already machines were already

disappearing from disappearing from Top500, the list of the Top500, the list of the

world’s largest world’s largest supercomputers.supercomputers.

A. Di BlasA. Di Blas 99

But in late 1998, a small group But in late 1998, a small group at UC Santa Cruz finally had at UC Santa Cruz finally had

the first working prototype of a the first working prototype of a new kind of high-performance, new kind of high-performance, low-cost SIMD co-processor. low-cost SIMD co-processor.

Originally designed for Originally designed for computational biology, it computational biology, it

proved extremely powerful in a proved extremely powerful in a variety of other applications.variety of other applications.

In the computing galaxy, a new In the computing galaxy, a new SIMD star was born…SIMD star was born…

A. Di BlasA. Di Blas 1010

A. Di BlasA. Di Blas 1111

A. Di BlasA. Di Blas 1212

KestrelKestrel

A. Di BlasA. Di Blas 1313

MIMD and SIMDMIMD and SIMD

Multiple Instruction-Multiple Data

Single Instruction-Multiple Data

A. Di BlasA. Di Blas 1414

Image Filters on KestrelImage Filters on Kestrel

2D Gaussian filter2D Gaussian filter

Edge detectorEdge detector

A. Di BlasA. Di Blas 1515

2D Gaussian convolution2D Gaussian convolution

“Red Rocks Canyon”

A. Di BlasA. Di Blas 1616

2D Gaussian convolution2D Gaussian convolution

“Red Rocks Canyon”

A. Di BlasA. Di Blas 1717

2D Gaussian convolution2D Gaussian convolution

“Red Rocks Canyon”

A. Di BlasA. Di Blas 1818

2D Gaussian convolution2D Gaussian convolution

The 2D Gaussian kernel is separableThe 2D Gaussian kernel is separable

A. Di BlasA. Di Blas 1919

2D Gaussian convolution2D Gaussian convolution

512x512-pixel512x512-pixel

Image (8bpp)Image (8bpp)

Kernel size, time in sKernel size, time in s

5x55x5 7x77x7 9x99x9 11x1111x11

CPU timeCPU time 0.0500.050 0.0700.070 0.0700.070 0.0800.080

Kestrel timeKestrel time 0.0160.016 0.0170.017 0.0180.018 0.0190.019

SPEEDUPSPEEDUP 3.123.12 4.124.12 3.893.89 4.214.21

CPU: 1GHz Pentium-III 256 MB RAM cc –O2

Kestrel runs at 20 MHz!

A. Di BlasA. Di Blas 2020

Edge detectorEdge detector

“Big Sur”

A. Di BlasA. Di Blas 2121

Edge detectorEdge detector

“Big Sur”

A. Di BlasA. Di Blas 2222

Edge detectorEdge detector

A. Di BlasA. Di Blas 2323

Edge detectorEdge detector

512x512-pixel512x512-pixel

Image (8bpp)Image (8bpp)time [s]time [s]

CPUCPU 0.0400.040

KestrelKestrel 0.0180.018

SPEEDUPSPEEDUP 2.222.22

CPU: 1GHz Pentium-III 256 MB RAM cc –O2

A. Di BlasA. Di Blas 2424

Asynchronous applicationsAsynchronous applications

Mandelbrot SetMandelbrot Set

2D Median filter2D Median filter

A. Di BlasA. Di Blas 2525

Mandelbrot setMandelbrot set

A. Di BlasA. Di Blas 2626

Mandelbrot set (synchronous)Mandelbrot set (synchronous)

A. Di BlasA. Di Blas 2727

““SIMD Phase Programming Model”SIMD Phase Programming Model”

Simple methodology to turn a sequential, Simple methodology to turn a sequential, data-dependent algorithm into a SIMD-data-dependent algorithm into a SIMD-parallel oneparallel one

Can be used with “partitionable” problemsCan be used with “partitionable” problemsProvides dynamic load balancing without Provides dynamic load balancing without

the need of a high-level support systemthe need of a high-level support system

A. Di BlasA. Di Blas 2828

Mandelbrot set (SPPM)Mandelbrot set (SPPM)

A. Di BlasA. Di Blas 2929

Mandelbrot set (SPPM)Mandelbrot set (SPPM)

A. Di BlasA. Di Blas 3030

Mandelbrot set (SPPM)Mandelbrot set (SPPM)

A. Di BlasA. Di Blas 3131

Mandelbrot set (SPPM)Mandelbrot set (SPPM)

A. Di BlasA. Di Blas 3232

Mandelbrot setMandelbrot set

512x512-pixel512x512-pixel

Image (16bpp)Image (16bpp)

Max # of iterations, time in sMax # of iterations, time in s

10001000 50005000 1000010000

CPU timeCPU time 4.884.88 22.2122.21 44.3744.37

Kestrel time (synch)Kestrel time (synch) 3.653.65 17.1817.18 34.7934.79

Kestrel time (SPPM)Kestrel time (SPPM) 3.553.55 8.738.73 15.1115.11

SPEEDUP (SPPM vs CPU)SPEEDUP (SPPM vs CPU) 1.371.37 2.542.54 2.942.94

SPEEDUP (SPPM vs synch)SPEEDUP (SPPM vs synch) 1.031.03 1.971.97 2.302.30

CPU: 500 MHz UltraSPARC-II, 640MB RAM, cc –xO3

A. Di BlasA. Di Blas 3333

2D Median filter2D Median filter

“Office Hours”

A. Di BlasA. Di Blas 3434

2D Median filter2D Median filter

“Office Hours”

A. Di BlasA. Di Blas 3535

2D Median filter2D Median filter

“Office Hours”

A. Di BlasA. Di Blas 3636

2D Median filter2D Median filter

A. Di BlasA. Di Blas 3737

2D Median filter2D Median filter

512x512-pixel512x512-pixel

Image (8bpp)Image (8bpp)

Window sizeWindow size

5x55x5 7x77x7 9x99x9 11x1111x11

CPU timeCPU time 0.1900.190 0.3700.370 0.5400.540 0.7600.760

Kestrel timeKestrel time 0.0540.054 0.0760.076 0.1050.105 0.1410.141

SPEEDUPSPEEDUP 3.523.52 4.974.97 5.145.14 5.395.39

CPU: 1GHz Pentium-III 256 MB RAM cc –O2

A. Di BlasA. Di Blas 3838

The endThe end