exploiting domain-specific high-level runtime support for parallel code generation xiaogang li...

20
Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University

Upload: lillian-malone

Post on 27-Dec-2015

217 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Exploiting Domain-Specific High-level Runtime Support for Parallel

Code Generation

Xiaogang Li Ruoming Jin

Gagan Agrawal Department of Computer and

Information SciencesOhio State University

Page 2: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Motivation

Languages, compilers, and runtime systems for high-end computing

Typically focus on scientific applications Can commercial applications benefit ?

A majority of top 500 parallel configurations are used as database servers

Is there a role for parallel systems research ? Parallel relational databases – probably not Data mining, OLAP, decision support – quite likely

Page 3: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Data Mining

Extracting useful models or patterns from large datasets

Includes a variety of tasks - mining associations, sequences, clustering data, building decision trees, predictive models - several algorithms proposed for each

Both compute and data intensive Algorithms are well suited for parallel execution High-level interfaces can be useful for

application development

Page 4: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Project Overview

Page 5: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Project Components

A middleware system called FREERIDE (Framework for Rapid Implementation of Datamining Engines) (SDM 01, SDM 02)

Performance modeling and prediction (for parallelization strategy selection) SIGMETRICS 2002

Runtime and compiler support for shared memory parallelization (LCPC 02)

Translation from mining operators (not yet ) Focus on language and compiler support for

distributed memory parallelization in this talk

Page 6: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Common Processing Structure

Structure of Common Data Mining Algorithms {* Outer Sequential Loop *} While () { { * Reduction Loop* } Foreach (element e) { (i,val) = process(e); Reduc(i) = Reduc(i) op val; } }

Applies to major association mining, clustering and decision tree construction algorithms

Parallelization approach Compute local copy of reduction objects Perform global reduction

Page 7: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Middleware Support for Distributed Memory Parallelization

Interface Requires: Specification of an iterator and termination condition Local reduction for each parallel loop Global reduction for each loop

Functionality Fetch data elements chunk by chunk, apply local

reduction Broadcast the reduction object after finishing one pass

on data Perform global reduction, broadcast the results Check termination condition, move to next iteration

Page 8: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Compilation Approach

Support a general high-level language Use middleware functionality in compilation Exploit the domain-specific common structure

Reduction loop with associative and commutative operations

Disk-resident input datasets, smaller output

Page 9: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

· A data parallel dialect of Java: to give compiler information about independent collections of objects, parallel loops and reduction operations — domain & rectdomain — foreach loop — reduction variables:

- can only be updated inside a foreach loop by operations

that are associative & commutative - intermediate value of the reduction variables may not be

used within the loop, except for self-updates

Language Support

Page 10: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Example code

public class kNN { static buffer kbuffer; public static void main(String[] args) { double dis; Point<3> lowend = … Point<3> hiend = … Point<3> p; RectDomain<3> InputDomain=[lowend:hiend]; kPoint[3d] Input=new kPoint[InputDomain];

foreach (p in InputDomain) { if (Input[p].inRange(R)) { dis=Input[p].distance(W); kbuffer.insert(Input[p],dis); }

Page 11: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Compilation Task

Extract local reduction function Simple from body of data parallel loop

Extract an iterator and termination condition

Simple from the overall code Extract a global reduction function

Can be quite challenging in the presence of complex control flow and data-structures

A new algorithm developed

Page 12: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Extracting Global Reduction from Local Reduction : Motivating Example

I = k – 1 ; While (newdis < distance) && I >= 0) { if(I>0) { x1[I] = x1[I-1] ; x2[I] = x2[I-1] ; … } I = I – 1 ; } If(I < k-1) { x1[I+1] = kpoint.x1 ; x2[I+1] = kpoint.x2 ; … }

I = k – 1 ; While (kpoint.dis < distance) && I >= 0) { if(I>0) { x1[I] = x1[I-1] ; x2[I] = x2[I-1] ; … } I = I – 1 ; } If(I < k-1) { x1[I+1] = kpoint.x1 ; x2[I+1] = kpoint.x2 ; … }

For( j = 0; j < k ; j++) { I = k – 1 ; While (buf.dis[j] < distance) && I >= 0) { if(I>0) { x1[I] = x1[I-1] ; x2[I] = x2[I-1] ; … } I = I – 1 ; } If(I < k-1) { x1[I+1] = buf..x1[j] ; x2[I+1] = buf..x2[I] ; … } }

Page 13: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Overall Approach

Classify each assignment to a data member of reduction object into following types:

O.x = g(e), where e is the input element O.x = O.x op g(e), op is an associative and

commutative operator Expression involving loop constants and other

members of the reduction object Classify control dependence on any of the

above assignment statements as: Loop constant Non-loop constant

Page 14: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Code Generation: Handling Different Types of Assignment Statements

Three types of assignment statements: O.x = g(e) (Type a) If x can represent many fields, iterate over all of

them O.x = O.x op g(e) (Type b) Replace by O.x = O.x op O1.x If x can represent many fields, iterate over all of

them Expression involving loop constants and other data

members (Type c) Keep as it is

Page 15: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Handling Control Flow

Control predicates for Type (b) assignments: Remove non-loop constant control predicates Keep loop constant control predicates

Control predicates for Type (a) and Type (c) statements:

Keep loop constant control predicates Classify non-loop constant into two types:

Predicate involves a value that is assigned to a data member

Replace that value by the data member Other predicates - Simply remove

Page 16: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Experimental Platform

Cluster of Workstations Sun Ultra Enterprise 450 250 MHz Ultra-II processors 1 GB of 4-way interleaved main memory Myrinet as the interconnect

Page 17: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Results from k-means clustering

0

20

40

60

80

100

120

140

160

1 2 4 8 nodes

compinlinecomp + inline

1 GB dataset with 3 dimensional pointsK = 3

Page 18: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Results from Apriori Association Mining

0

2000

4000

6000

8000

10000

12000

1 2 4 8 nodes

compmanual

3 GB dataset

Page 19: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Results from k-nearest neighbors

0

20

40

60

80

100

120

140

160

1 2 4 8 nodes

compmanual

1 GB dataset 3 dimensional pts. k = 100

Page 20: Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information

Summary

Focus on a new class of applications Exploit the common structure within the class Develop a runtime system supporting this

structure Use it as a compiler target

Very simple compiler implementation (< 1000 lines of code)

A new algorithm for synthesizing global reduction functions

Performance of compiler generated code is very competitive