computer science and engineering parallelizing defect detection and categorization using freeride...

24
Leonid Glimcher P. 1 Computer Science and Engineering ipdps’05 Parallelizing Defect Detection and Categorization Using FREERIDE Scaling and Parallelizing a Scientific Feature Detection and Categorization Application Using a Cluster Middleware. L. Glimcher, G. Agrawal, S. Mehta, R. Jin, R. Machiraju The Ohio State University

Upload: silvester-arnold

Post on 17-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 1

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Scaling and Parallelizing a Scientific Feature Detection and Categorization Application

Using a Cluster Middleware.

L. Glimcher, G. Agrawal,

S. Mehta, R. Jin, R. Machiraju

The Ohio State University

Page 2: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 2

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Presentation Road Map

• Motivation for scalable datamining.• Description of middleware and functionality.• Description of defect detection and

categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.

Page 3: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 3

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Motivation for FREERIDE

• Problem:– Simulation data from engineering and

scientific applications is growing larger,– Analysis models are more complex ,– Drawing knowledge becomes increasingly

more complicated.• Solution:

– Parallel datamining, but …• Catch: application development effort.

Page 4: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 4

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

FREERIDE

KEY observation: most algorithms follow canonical loop.

Middleware API:

• Subset of data to be processed,

• Reduction object,

• Local and global reduction operations,

• Iterator.

Supports:

• Disk resident datasets

• Shared & distributed Memory

While( ) {

forall( data instances d) {

I = process(d)

R(I) = R(I) op d

}

…….

}

Page 5: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 5

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Previously on FREERIDE

• FREERIDE has been used for:– Apriori and FP-tree frequent item set mining,– KNN classification and decision tree

construction,– K-means and EM clustering,– Vortex Detection (IPDPS 2004).

• Will it work for a scientific mining task with a more complex processing structure?

Page 6: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 6

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Presentation Road Map

• Motivation for scalable datamining.• Description of middleware and functionality.• Description of defect detection and

categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.

Page 7: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 7

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Overview of Sequential Algorithm

• To understand the properties of the materials– How defects affect the materials?

• Data generated by Molecular Dynamics Simulation– Simulator by Physics Department (OSU)

• Main Tasks– Phase 1 – Defect Detection– Phase 2 – Defect Categorization

Page 8: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 8

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Example – Different shades represent different detected defects

Page 9: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 9

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Mapping detection/categorization to FREERIDE

FREERIDE Processing Stage Algo phases

Local Processing Node Processing

Global Combination Post Processing

Detection phase Rule Discovery

Defect Segmentation

Categorization phase

Moment Pruning

LCS Matching

Page 10: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 10

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Presentation Road Map

• Motivation for scalable datamining.• Description of middleware and functionality.• Description of sequential defect detection and

categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.

Page 11: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 11

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Key Parallelization Issues

• Challenges in detection phase stem from partitioning data into chunks:– detection on chunk boundaries,– joining multi-chunk defects.

• Categorization phase:

1. Load balancing is necessary for scalability.

2. Updating catalog with new classes needs to be efficient.

Page 12: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 12

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Detection Challenges

Page 13: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 13

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Intuitive (un-balanced) Categorization

P N

M

Increasing no. of nodes will increase “sequential” fraction.

Page 14: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 14

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Load Balanced Categorization

Approach has been tested with variable numberof multi-node defects.

Page 15: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 15

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Intuitive (sequential) Catalog Updates

“Catalog completeness” hasdirect effect on scalability.

Page 16: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 16

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Parallel Catalog Updates

Tested with different levels of “catalog completeness”.

Page 17: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 17

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Presentation Road Map

• Motivation for scalable datamining.• Description of middleware and functionality.• Description of sequential defect detection and

categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.

Page 18: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 18

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Experimental Results: Demonstrating Scalability

• Experimental results for up to 8 processing nodes.

• Experimental Platform:– Cluster (1-8) of 700

MHz Pentium machines– Connected through

Myrinet LANai 7.0– 1 GB memory each

node– Datasets ranging in

size from 133 MB to 1.8 GB

0

5000

10000

15000

20000

1 2 4 8Processing Nodes

Exe

cuti

on

Tim

e (s

ec)

Categorization

Detection

Breakdown of Total Execution time (1.8 GB)

Page 19: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 19

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

More Scalability Experiments

01000200030004000

Exe

cuti

on

T

ime

(sec

)

1 2 4 8

Processing Nodes

0/3 in db

1/3 in db

2/3 in db

3/3 in db

• 480 MB Dataset, 1-8 nodes

• Catalog completeness varies, but speedups remain near linear.

• More scalability experiments in paper.

Page 20: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 20

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Experimental Results: Evaluating Load Balancing

0

200

400

600

800

1000

1200

1400

1600

Exec

utio

n Ti

me

(sec

)

1 2 4 8

Processing Nodes

Parallel detection

Un-optimizedcategorization

Optimizedcategorization

0200400600800

100012001400160018002000

Exec

utio

n Ti

me (

sec)

1 2 4 8

Processing Nodes

Parallel detection

Un-optimizedcategorization

Optimizedcategorization

480 MB, 2/3 in db

480 MB, 0/3 in db

Optimized scales better!

Page 21: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 21

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Experimental Results: Parallel Matching Approach

Default implementation performs sequential categorization of the non-matching defects.

Optimized implementation:

1. parallel local catalog update,

2. merging of local catalogs on Master node,

3. finalizing local catalogs in parallel.

0

500

1000

1500

2000

Ex

ec

uti

on

Tim

e

(se

c)

1 2 4 8

Processing Nodes

0/3 in db1/3 in db2/3 in db3/3 in db

`

0

500

1000

1500

2000

Exe

cuti

on

Tim

e (s

ec)

1 2 4 8

Processing Nodes

0/3 in db

1/3 in db

2/3 in db

3/3 in db

Page 22: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 22

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Presentation Road Map

• Motivation for scalable datamining.• Description of middleware and functionality.• Description of sequential defect detection and

categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.

Page 23: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 23

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Conclusions

FREERIDE can be used to parallelize scientific mining algorithms with a more complex processing structure.

Scalability can be achieved with less programming effort than if a parallel application was “hand-coded”.

Parallel applications created using FREERIDE allow working efficiently with disk-resident datasets.

Our approaches to load balancing and to parallel categorization of non-matching defects perform better than naïve approaches to solving the posed problem.

Page 24: Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing

Leonid GlimcherP. 24

Computer Science and Engineering

ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE

Questions?