sejits: high performance for productivity...

14
BERKELEY PAR LAB SEJITS: High Performance for Productivity Applications Shoaib Kamil & many others Parlab End of Project Celebration May 30, 2013

Upload: others

Post on 29-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SEJITS: High Performance for Productivity Applicationsparlab.eecs.berkeley.edu/sites/all/parlab/files... · BERKELEY PAR LAB Ongoing Work Knowledge Discovery Toolbox (John Gilbert,

BERKELEY PAR LAB BERKELEY PAR LAB

SEJITS: High Performance

for Productivity

Applications

Shoaib Kamil & many others

Parlab End of Project Celebration

May 30, 2013

Page 2: SEJITS: High Performance for Productivity Applicationsparlab.eecs.berkeley.edu/sites/all/parlab/files... · BERKELEY PAR LAB Ongoing Work Knowledge Discovery Toolbox (John Gilbert,

BERKELEY PAR LAB

Where Did We Start?

2

2

Legacy OS

Intel Multicore/GPGPU

OS Libraries & Services

RAMP Manycore

Hypervisor

Personal

Health

Image

Retrieval

Hearing,

Music Speech

Parallel

Browser

Dwarfs

Sketching

Legacy

Code

Efficiency Language Compilers

Corr

ectn

ess

Composition & Coordination Language (C&CL)

Parallel

Libraries

Parallel

Frameworks

Static

Verification

Dynamic

Checking

Debugging

with Replay

Directed

Testing Autotuners

C&CL Compiler/Interpreter

Efficiency

Languages

Type

Systems

Schedulers Communication &

Synch. Primitives

Page 3: SEJITS: High Performance for Productivity Applicationsparlab.eecs.berkeley.edu/sites/all/parlab/files... · BERKELEY PAR LAB Ongoing Work Knowledge Discovery Toolbox (John Gilbert,

BERKELEY PAR LAB

The Big Problems

Knew how to coordinate serial caller with serial or

parallel callee

How can we do the other two, in a way understandable

to non-expert programmers? 3

Serial Caller

Se

ria

l C

alle

e

Parallel Caller

Pa

ralle

l C

alle

e

?

Page 4: SEJITS: High Performance for Productivity Applicationsparlab.eecs.berkeley.edu/sites/all/parlab/files... · BERKELEY PAR LAB Ongoing Work Knowledge Discovery Toolbox (John Gilbert,

BERKELEY PAR LAB

The Big Problems

Layering is good for engineering, bad for

performance

4

Multicore GPU “Cloud”

App 1 App 2 App 3

Dense Sparse Graph Trav.

4

Page 5: SEJITS: High Performance for Productivity Applicationsparlab.eecs.berkeley.edu/sites/all/parlab/files... · BERKELEY PAR LAB Ongoing Work Knowledge Discovery Toolbox (John Gilbert,

BERKELEY PAR LAB

Inspiration: Auto-tuned DSL for Stencils

Began with a large productivity-programmer-created

codebase: new climate code

Use code transformation to go from productive to efficient code

5

.f95 .cu .f95 .h

Reference Implementation

Myriad of equivalent, optimized, implementations

(plus test harness)

.c

Best performing implementation

and configuration parameters

.c

Pars

e

Internal Abstract Syntax Tree

Representation

Code Generators

C with

pthreads

CUDA

FORTRAN

Strategy Engines

Parallel

Serial

GTX280

Victoria Falls

Search Engines

in context

of specific problem

Transformation & Code Generation with high-level knowledge

Expected

Peak

OpenMP

Comparison

serial

reference

[IPDPS10]

Page 6: SEJITS: High Performance for Productivity Applicationsparlab.eecs.berkeley.edu/sites/all/parlab/files... · BERKELEY PAR LAB Ongoing Work Knowledge Discovery Toolbox (John Gilbert,

BERKELEY PAR LAB

Inspiration: ActiveRecord

Ruby on Rails ORM

Complex queries generated from simple chains of

method invocations

Productivity programmer doesn’t need to know

SQL

6

Product.find_by_price(95) Product.all.where(“quantity > ?”, 0).where(:color => “red”)

Page 7: SEJITS: High Performance for Productivity Applicationsparlab.eecs.berkeley.edu/sites/all/parlab/files... · BERKELEY PAR LAB Ongoing Work Knowledge Discovery Toolbox (John Gilbert,

BERKELEY PAR LAB

SEJITS In A Nutshell

7

Non-DSL

Code

Program

Code in

DSL A

Code in

DSL B

Interpreter Data

Com

pile

Ph

ase

DSL

Codegen

External

Compiler

Data Code in

DSL A

Dynamic

Link Library

Result

Execute

Phase

Page 8: SEJITS: High Performance for Productivity Applicationsparlab.eecs.berkeley.edu/sites/all/parlab/files... · BERKELEY PAR LAB Ongoing Work Knowledge Discovery Toolbox (John Gilbert,

BERKELEY PAR LAB

Case Study: Sepya

Stencils Embedded in Python with Auto-tuning

Simple, declarative source for fast stencil computations

on multicore/multisocket CPUs

8

class My1DKernel(StencilKernel): def __init__(self): super(MyKernel, self).__init__() # set the neighbors set 1 as the point # before and the point after self.set_neighbors(1, [[1],[-1]]) # the actual kernel function def kernel(self, out_grid, in_grid): for x in out_grid.interior_points(): for y in self.neighbors(in_grid, x, 1): out_grid[x] += 0.5 * in_grid[y]

Page 9: SEJITS: High Performance for Productivity Applicationsparlab.eecs.berkeley.edu/sites/all/parlab/files... · BERKELEY PAR LAB Ongoing Work Knowledge Discovery Toolbox (John Gilbert,

BERKELEY PAR LAB

Case Study: Sepya

9

93% of Roofline Peak

2.2x faster than

state-of-the-art

Page 10: SEJITS: High Performance for Productivity Applicationsparlab.eecs.berkeley.edu/sites/all/parlab/files... · BERKELEY PAR LAB Ongoing Work Knowledge Discovery Toolbox (John Gilbert,

BERKELEY PAR LAB

Asp is SEJITS for Python

Library for building auto-tuned DSELs

Introspection of Python

Code generation via templates or tree transformations

Auto-tuning support

Data structure interoperability

Automatic compiler detection & demand-driven

compilation with caching

Ongoing development & support

http://sejits.org

Screencasts available

10 [SciPy11, PPoPP12]

Page 11: SEJITS: High Performance for Productivity Applicationsparlab.eecs.berkeley.edu/sites/all/parlab/files... · BERKELEY PAR LAB Ongoing Work Knowledge Discovery Toolbox (John Gilbert,

BERKELEY PAR LAB

Copperhead

Data parallel compiler for Python subset

Bryan Catanzaro et al (NVIDIA Research)

http://copperhead.github.io

11

from copperhead import * import numpy as np @cu def axpy(a, x, y): return [a * xi + yi for xi, yi in zip(x, y)] x = np.arange(100, dtype=np.float64) y = np.arange(100, dtype=np.float64) with places.gpu0: gpu = axpy(2.0, x, y) with places.openmp: cpu = axpy(2.0, x, y)

[GTC12]

Page 12: SEJITS: High Performance for Productivity Applicationsparlab.eecs.berkeley.edu/sites/all/parlab/files... · BERKELEY PAR LAB Ongoing Work Knowledge Discovery Toolbox (John Gilbert,

BERKELEY PAR LAB

Ongoing Work

Knowledge Discovery Toolbox (John Gilbert, Adam Lugowski, and

Aydin Buluc, UCSB & LBNL) [IPDPS13]

PyCASP: Python-based Content Analysis Using Specialization (Katya

Gonina) [ASRU11]

Bag of Little Bootstraps (Peter Birsinger, Richard Xia, et al) [SciPy12]

Communication-Avoiding Recursive Algorithms (David Eliahu, Omer

Spillinger, Oded Schwartz, Ben Lipshitz, Jim Demmel) [IPDPS13]

Communication-Avoiding Solvers (Jeffrey Morlan) [iWAPT12]

Three Fingered Jack (David Sheffield) [HotPar13]

ASPIRE Project

12

Page 13: SEJITS: High Performance for Productivity Applicationsparlab.eecs.berkeley.edu/sites/all/parlab/files... · BERKELEY PAR LAB Ongoing Work Knowledge Discovery Toolbox (John Gilbert,

BERKELEY PAR LAB

SEJITS Participants

Bryan Catanzaro, Kurt Keutzer, Katya Gonina,

Jeffrey Morlan, Armando Fox, Richard Xia, Henry

Cook, Peter Birsinger, David Patterson, Katherine

Yelick, James Demmel, Tim Mattson, Henry Gabb,

David Sheffield, Michael Anderson, Juan Vargas,

Jessica Miller, Scott Beamer, Omer Spillinger, David

Eliahu, Aakash Prasad, Oded Schwartz, Ben

Lipshitz, David Howard, Derrick Coetzee, Tayfun

Elmas, Koushik Sen

13

Page 14: SEJITS: High Performance for Productivity Applicationsparlab.eecs.berkeley.edu/sites/all/parlab/files... · BERKELEY PAR LAB Ongoing Work Knowledge Discovery Toolbox (John Gilbert,

BERKELEY PAR LAB

Demo & Testimonial

Demo: Derrick Coetzee showing a simple image

processing kernel

Testimonial: Tim Mattson, Intel

14