mr. scan: efficient clustering with mrnet and gpus
DESCRIPTION
Mr. Scan: Efficient Clustering with MRNet and GPUs. Evan Samanas and Ben Welton. Density-based clustering. Discovers the number of clusters Finds oddly-shaped clusters. Clustering Example (DBSCAN [1] ). Goal: Find regions that meet minimum density and spatial distance characteristics. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/1.jpg)
Paradyn Project
Paradyn / Dyninst WeekMadison, Wisconsin
April 29-May 3, 2013
Mr. Scan: Efficient Clustering with MRNet and GPUs
Evan Samanas and Ben Welton
![Page 2: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/2.jpg)
Density-based clusteringo Discovers the number of clusterso Finds oddly-shaped clusters
2Mr. Scan: Efficient Clustering with MRNet and GPUs
![Page 3: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/3.jpg)
Goal: Find regions that meet minimum density and spatial
distance characteristics
The two parameters that determine if a point is in a cluster
is Epsilon (Eps), and MinPtsIf the number of points in Eps is > MinPts, the point is a core point.For every discovered point, this
same calculation is performed until the cluster is fully expanded
Clustering Example (DBSCAN[1])
3Mr. Scan: Efficient Clustering with MRNet and GPUs
EpsMinPts
MinPts: 3
[1] M. Ester et. al., A density-based algorithm for discovering clusters in large spatial databases with noise, (1996)
![Page 4: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/4.jpg)
Scaling DBSCANo PDBSCAN (1999)[2]
o Quality equivalent to single DBSCANo Linear speedup up to 8 nodes
o DBDC (2004)[3]
o Sacrifices qualityo ~30x speedup on 15 nodes
o PDSDBSCAN (2012) [4]
o Quality equivalent to single node DBSCANo 5675x Speedup on 8192 nodes (72 Million Points)
o 2 Map/Reduce attempts (2011, 2012)o Quality equivalent to single node DBSCANo 6x speedup on 12 nodes
4Mr. Scan: Efficient Clustering with MRNet and GPUs
[2] X. Xu et. al., A fast Parallel Clustering Algorithm for Large Spatial Databases (1999)[3] E. Januzaj et. al., DBDC: Density Based Distributed Clustering (2004)[4] M Patwary et. al., A new scalable parallel DBSCAN algorithm using the disjoint-set data structure (2012)
![Page 5: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/5.jpg)
Challenges of scaling DBSCANo Data distribution
o How do we effectively take an input file and create partitions that can be clustered by DBSCAN?oDistributed 2-D partitioner reading from a distributed file
systemo Load balancing
o How to keep variance in clustering times across nodes to a minimum?oDense Box
o Mergeo How do we reduce the amount of data needed for
the merge while keeping accuracy high?oRepresentative points
5Mr. Scan: Efficient Clustering with MRNet and GPUs
![Page 6: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/6.jpg)
6Mr. Scan: Efficient Clustering with MRNet and GPUs
MRNet – Multicast / Reduction Networko General-purpose
TBON APIo Network: user-defined
topologyo Stream: logical data channel
o to a set of back-endso multicast, gather, and custom
reductiono Packet: collection of datao Filter: stream data operator
o synchronizationo transformation
o Widely adopted by HPC toolso CEPBA toolkit o Cray ATP & CCDBo Open|SpeedShop & CBTFo STATo TAU
FE
…… …BE
appappappapp
BE
appappappapp
BE
appappappapp
BE
appappappapp
CP CP
CP CP CP CP
F(x1,…,xn)
![Page 7: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/7.jpg)
TBON Computation
7Mr. Scan: Efficient Clustering with MRNet and GPUs
FE
BE
appappappapp
BE
appappappapp
BE
appappappapp
CP CP
BE
appappappapp
Ideal Characteristics:o Filter output size constant or decreasing
o Computation rate similar across levels
o Adjustable for load balance
Data Size: 10MB per BE
Packet Size: ≤10 MB
Packet Size:≤10 MB
~10 sec
~40 sec
…4x
~10 sec
~10 sec
~10 sec
Total Time: ~30 sec
Total Time: ~60 sec
![Page 8: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/8.jpg)
Intro to Mr. Scan
8Mr. Scan: Efficient Clustering with MRNet and GPUs
BE BE BE
CP CP
BE
DBSCAN
Merge
FE
Mr. Scan Phases
Partition: Distributed
DBSCAN: GPU(@ BE)
Merge: CPU (x #levels)
Sweep: CPU (x #levels)FE
BE BE BE BE
Merge
FS
Sweep
Sweep
![Page 9: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/9.jpg)
Mr. Scan Architecture
9Mr. Scan: Efficient Clustering with MRNet and GPUs
Time: 0 Time: 18.2 Min
Partitioner
DB
SC
AN
Merge &
Sw
eep
Clustering 6.5 Billion Points
FS Read 224 Secs
FS Write489 Secs
MRNetStartup
130 Secs
FS Read: 24 Secs
DBSCAN168 Secs
Merge Time: 6 Secs
Sweep Time: 4 Secs
Write Output: 19 Secs
![Page 10: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/10.jpg)
Partition Phaseo Goal: Partitions computationally equivalent to
DBSCANo Algorithm:
o Form initial partitionso Add shadow regionso Rebalance
10Mr. Scan: Efficient Clustering with MRNet and GPUs
![Page 11: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/11.jpg)
Distributed Partitioner
11Mr. Scan: Efficient Clustering with MRNet and GPUs
![Page 12: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/12.jpg)
GPU DBSCAN Filter
12Mr. Scan: Efficient Clustering with MRNet and GPUs
DBSCAN is performed in two distinct steps
Step 1: Detect Core Points
Block 1
Block 2
Block 900
T1
T2
T512
T1
T2
T512
T1
T2
T512
Block 1T1
T2
T512
Block 2T1
T2
T512
Block 900T1
T2
T512
Step 2: Expand core points and color
![Page 13: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/13.jpg)
Dense Box
13Mr. Scan: Efficient Clustering with MRNet and GPUs
• One significant scalability issue is dealing with dense regions of data
• Density increases the computation cost of DBSCAN
R2 Requires morecomparison operations
R1 R2
• We reduce the computation cost of high density regions by pre-clustering these regions
KD-Tree
Look at each leaf bounding box looking for boxes with point count > minpts and size < 0.35 * eps
DBSCAN no longer needs to expand these regions
`
![Page 14: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/14.jpg)
Merge Algorithmo Merge overlapping clusters found on
different nodes. o Two steps in the merge operation
1. Select Representative points (BE)2. Merge operation
14Mr. Scan: Efficient Clustering with MRNet and GPUs
![Page 15: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/15.jpg)
Representative Pointso These are points that represent the core
points in the dataset. o Create a boundary which at least one
core point shared between overlapping clusters must be contained.
15Mr. Scan: Efficient Clustering with MRNet and GPUs
Representative points are the points closest to the corners and middle of the side of the eps box
These points create a boundary (shaded region) which a point must fall in to merge overlapping clusters
![Page 16: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/16.jpg)
Merge Algorithm
16Mr. Scan: Efficient Clustering with MRNet and GPUs
• Merge algorithm is responsible for merging overlapping clusters detected on different DBSCAN nodes.
• Need to handle the merge with low overhead and without the full dataset
Node 1 Node 2
Core Point
Non-Core Point
1. Core/Core overlap
Core Point in common. 64 operations to detect.
Node 1 Node 2
Core Point
Non-Core Point
2. Non-core/Core overlap
Core point seen as non-core by one node. MinPts * 2 operations required to detect
![Page 17: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/17.jpg)
Sweep Stepo Get cluster identifiers and file offsets
down to BE’s to write final clusters. o FE gives each cluster a unique ID and a
file offset. o This data is passed back down to the BE
that holds the data in the cluster.o Data is written out to disk by the BE.
17Mr. Scan: Efficient Clustering with MRNet and GPUs
![Page 18: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/18.jpg)
Experiment Setupo Dataset: Generated data with
distribution from real Twitter datao Measuring:
oWeak Scaling up to 8192 GPUso Strong ScalingoQuality compared to single-threaded
DBSCAN
18Mr. Scan: Efficient Clustering with MRNet and GPUs
![Page 19: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/19.jpg)
Results
19Mr. Scan: Efficient Clustering with MRNet and GPUs
Weak Scaling: 4096x data/compute increase 18.48x-31.68x time increase
![Page 20: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/20.jpg)
Results Breakdown – Partition Phase@ 6.5 Billion Points: 65.9% of Mr. Scan’s time
94.6% I/O time
20Mr. Scan: Efficient Clustering with MRNet and GPUs
![Page 21: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/21.jpg)
Results Breakdown – GPU Cluster Time
21Mr. Scan: Efficient Clustering with MRNet and GPUs
![Page 22: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/22.jpg)
Strong Scaling
22Mr. Scan: Efficient Clustering with MRNet and GPUs
![Page 23: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/23.jpg)
Quality
23Mr. Scan: Efficient Clustering with MRNet and GPUs
![Page 24: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/24.jpg)
Future Worko Remove partitioner’s I/O bottlenecko Multiple dimensions
24Mr. Scan: Efficient Clustering with MRNet and GPUs
![Page 25: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/25.jpg)
Conclusiono Clustered 6.5 billion points with
DBSCAN in 18.2 minuteso Controlled computational variance of
DBSCANo Partitioner I/O = scaling enemy
25Mr. Scan: Efficient Clustering with MRNet and GPUs
![Page 26: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/26.jpg)
Questions?
26A Brief Discussion of Ways and Means
![Page 27: Mr. Scan: Efficient Clustering with MRNet and GPUs](https://reader036.vdocuments.us/reader036/viewer/2022062310/568165cb550346895dd8d4d5/html5/thumbnails/27.jpg)
Summary of previous Mr. Scan implementation
27Mr. Scan: Efficient Clustering with MRNet and GPUs
FE
BE BE BE
CP CP
BE
DBSCAN
Algorithm Steps
SpatialDecomp: CPU(@ FE)
DBSCAN: CPU or GPU(@ BE)
DrawBoundBox: CPU or GPU
MergeCluster: CPU (x #levels)
MergeCluster