computer science and engineering parallelizing defect detection and categorization using freeride...
TRANSCRIPT
Leonid GlimcherP. 1
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Scaling and Parallelizing a Scientific Feature Detection and Categorization Application
Using a Cluster Middleware.
L. Glimcher, G. Agrawal,
S. Mehta, R. Jin, R. Machiraju
The Ohio State University
Leonid GlimcherP. 2
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Presentation Road Map
• Motivation for scalable datamining.• Description of middleware and functionality.• Description of defect detection and
categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.
Leonid GlimcherP. 3
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Motivation for FREERIDE
• Problem:– Simulation data from engineering and
scientific applications is growing larger,– Analysis models are more complex ,– Drawing knowledge becomes increasingly
more complicated.• Solution:
– Parallel datamining, but …• Catch: application development effort.
Leonid GlimcherP. 4
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
FREERIDE
KEY observation: most algorithms follow canonical loop.
Middleware API:
• Subset of data to be processed,
• Reduction object,
• Local and global reduction operations,
• Iterator.
Supports:
• Disk resident datasets
• Shared & distributed Memory
While( ) {
forall( data instances d) {
I = process(d)
R(I) = R(I) op d
}
…….
}
Leonid GlimcherP. 5
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Previously on FREERIDE
• FREERIDE has been used for:– Apriori and FP-tree frequent item set mining,– KNN classification and decision tree
construction,– K-means and EM clustering,– Vortex Detection (IPDPS 2004).
• Will it work for a scientific mining task with a more complex processing structure?
Leonid GlimcherP. 6
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Presentation Road Map
• Motivation for scalable datamining.• Description of middleware and functionality.• Description of defect detection and
categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.
Leonid GlimcherP. 7
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Overview of Sequential Algorithm
• To understand the properties of the materials– How defects affect the materials?
• Data generated by Molecular Dynamics Simulation– Simulator by Physics Department (OSU)
• Main Tasks– Phase 1 – Defect Detection– Phase 2 – Defect Categorization
Leonid GlimcherP. 8
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Example – Different shades represent different detected defects
Leonid GlimcherP. 9
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Mapping detection/categorization to FREERIDE
FREERIDE Processing Stage Algo phases
Local Processing Node Processing
Global Combination Post Processing
Detection phase Rule Discovery
Defect Segmentation
Categorization phase
Moment Pruning
LCS Matching
Leonid GlimcherP. 10
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Presentation Road Map
• Motivation for scalable datamining.• Description of middleware and functionality.• Description of sequential defect detection and
categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.
Leonid GlimcherP. 11
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Key Parallelization Issues
• Challenges in detection phase stem from partitioning data into chunks:– detection on chunk boundaries,– joining multi-chunk defects.
• Categorization phase:
1. Load balancing is necessary for scalability.
2. Updating catalog with new classes needs to be efficient.
Leonid GlimcherP. 12
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Detection Challenges
Leonid GlimcherP. 13
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Intuitive (un-balanced) Categorization
P N
M
Increasing no. of nodes will increase “sequential” fraction.
Leonid GlimcherP. 14
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Load Balanced Categorization
Approach has been tested with variable numberof multi-node defects.
Leonid GlimcherP. 15
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Intuitive (sequential) Catalog Updates
“Catalog completeness” hasdirect effect on scalability.
Leonid GlimcherP. 16
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Parallel Catalog Updates
Tested with different levels of “catalog completeness”.
Leonid GlimcherP. 17
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Presentation Road Map
• Motivation for scalable datamining.• Description of middleware and functionality.• Description of sequential defect detection and
categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.
Leonid GlimcherP. 18
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Experimental Results: Demonstrating Scalability
• Experimental results for up to 8 processing nodes.
• Experimental Platform:– Cluster (1-8) of 700
MHz Pentium machines– Connected through
Myrinet LANai 7.0– 1 GB memory each
node– Datasets ranging in
size from 133 MB to 1.8 GB
0
5000
10000
15000
20000
1 2 4 8Processing Nodes
Exe
cuti
on
Tim
e (s
ec)
Categorization
Detection
Breakdown of Total Execution time (1.8 GB)
Leonid GlimcherP. 19
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
More Scalability Experiments
01000200030004000
Exe
cuti
on
T
ime
(sec
)
1 2 4 8
Processing Nodes
0/3 in db
1/3 in db
2/3 in db
3/3 in db
• 480 MB Dataset, 1-8 nodes
• Catalog completeness varies, but speedups remain near linear.
• More scalability experiments in paper.
Leonid GlimcherP. 20
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Experimental Results: Evaluating Load Balancing
0
200
400
600
800
1000
1200
1400
1600
Exec
utio
n Ti
me
(sec
)
1 2 4 8
Processing Nodes
Parallel detection
Un-optimizedcategorization
Optimizedcategorization
0200400600800
100012001400160018002000
Exec
utio
n Ti
me (
sec)
1 2 4 8
Processing Nodes
Parallel detection
Un-optimizedcategorization
Optimizedcategorization
480 MB, 2/3 in db
480 MB, 0/3 in db
Optimized scales better!
Leonid GlimcherP. 21
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Experimental Results: Parallel Matching Approach
Default implementation performs sequential categorization of the non-matching defects.
Optimized implementation:
1. parallel local catalog update,
2. merging of local catalogs on Master node,
3. finalizing local catalogs in parallel.
0
500
1000
1500
2000
Ex
ec
uti
on
Tim
e
(se
c)
1 2 4 8
Processing Nodes
0/3 in db1/3 in db2/3 in db3/3 in db
`
0
500
1000
1500
2000
Exe
cuti
on
Tim
e (s
ec)
1 2 4 8
Processing Nodes
0/3 in db
1/3 in db
2/3 in db
3/3 in db
Leonid GlimcherP. 22
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Presentation Road Map
• Motivation for scalable datamining.• Description of middleware and functionality.• Description of sequential defect detection and
categorization algorithm.• Parallelization challenges and solutions.• Experimental results.• Conclusions.
Leonid GlimcherP. 23
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Conclusions
FREERIDE can be used to parallelize scientific mining algorithms with a more complex processing structure.
Scalability can be achieved with less programming effort than if a parallel application was “hand-coded”.
Parallel applications created using FREERIDE allow working efficiently with disk-resident datasets.
Our approaches to load balancing and to parallel categorization of non-matching defects perform better than naïve approaches to solving the posed problem.
Leonid GlimcherP. 24
Computer Science and Engineering
ipdps’05Parallelizing Defect Detection and Categorization Using FREERIDE
Questions?