mr. scan: efficient clustering with mrnet and gpus
Post on 24-Feb-2016
36 Views
Preview:
DESCRIPTION
TRANSCRIPT
Paradyn Project
Paradyn / Dyninst WeekMadison, Wisconsin
April 29-May 3, 2013
Mr. Scan: Efficient Clustering with MRNet and GPUs
Evan Samanas and Ben Welton
Density-based clusteringo Discovers the number of clusterso Finds oddly-shaped clusters
2Mr. Scan: Efficient Clustering with MRNet and GPUs
Goal: Find regions that meet minimum density and spatial
distance characteristics
The two parameters that determine if a point is in a cluster
is Epsilon (Eps), and MinPtsIf the number of points in Eps is > MinPts, the point is a core point.For every discovered point, this
same calculation is performed until the cluster is fully expanded
Clustering Example (DBSCAN[1])
3Mr. Scan: Efficient Clustering with MRNet and GPUs
EpsMinPts
MinPts: 3
[1] M. Ester et. al., A density-based algorithm for discovering clusters in large spatial databases with noise, (1996)
Scaling DBSCANo PDBSCAN (1999)[2]
o Quality equivalent to single DBSCANo Linear speedup up to 8 nodes
o DBDC (2004)[3]
o Sacrifices qualityo ~30x speedup on 15 nodes
o PDSDBSCAN (2012) [4]
o Quality equivalent to single node DBSCANo 5675x Speedup on 8192 nodes (72 Million Points)
o 2 Map/Reduce attempts (2011, 2012)o Quality equivalent to single node DBSCANo 6x speedup on 12 nodes
4Mr. Scan: Efficient Clustering with MRNet and GPUs
[2] X. Xu et. al., A fast Parallel Clustering Algorithm for Large Spatial Databases (1999)[3] E. Januzaj et. al., DBDC: Density Based Distributed Clustering (2004)[4] M Patwary et. al., A new scalable parallel DBSCAN algorithm using the disjoint-set data structure (2012)
Challenges of scaling DBSCANo Data distribution
o How do we effectively take an input file and create partitions that can be clustered by DBSCAN?oDistributed 2-D partitioner reading from a distributed file
systemo Load balancing
o How to keep variance in clustering times across nodes to a minimum?oDense Box
o Mergeo How do we reduce the amount of data needed for
the merge while keeping accuracy high?oRepresentative points
5Mr. Scan: Efficient Clustering with MRNet and GPUs
6Mr. Scan: Efficient Clustering with MRNet and GPUs
MRNet – Multicast / Reduction Networko General-purpose
TBON APIo Network: user-defined
topologyo Stream: logical data channel
o to a set of back-endso multicast, gather, and custom
reductiono Packet: collection of datao Filter: stream data operator
o synchronizationo transformation
o Widely adopted by HPC toolso CEPBA toolkit o Cray ATP & CCDBo Open|SpeedShop & CBTFo STATo TAU
FE
…… …BE
appappappapp
BE
appappappapp
BE
appappappapp
BE
appappappapp
CP CP
CP CP CP CP
F(x1,…,xn)
TBON Computation
7Mr. Scan: Efficient Clustering with MRNet and GPUs
FE
BE
appappappapp
BE
appappappapp
BE
appappappapp
CP CP
BE
appappappapp
Ideal Characteristics:o Filter output size constant or decreasing
o Computation rate similar across levels
o Adjustable for load balance
Data Size: 10MB per BE
Packet Size: ≤10 MB
Packet Size:≤10 MB
~10 sec
~40 sec
…4x
~10 sec
~10 sec
~10 sec
Total Time: ~30 sec
Total Time: ~60 sec
Intro to Mr. Scan
8Mr. Scan: Efficient Clustering with MRNet and GPUs
BE BE BE
CP CP
BE
DBSCAN
Merge
FE
Mr. Scan Phases
Partition: Distributed
DBSCAN: GPU(@ BE)
Merge: CPU (x #levels)
Sweep: CPU (x #levels)FE
BE BE BE BE
Merge
FS
Sweep
Sweep
Mr. Scan Architecture
9Mr. Scan: Efficient Clustering with MRNet and GPUs
Time: 0 Time: 18.2 Min
Partitioner
DB
SC
AN
Merge &
Sw
eep
Clustering 6.5 Billion Points
FS Read 224 Secs
FS Write489 Secs
MRNetStartup
130 Secs
FS Read: 24 Secs
DBSCAN168 Secs
Merge Time: 6 Secs
Sweep Time: 4 Secs
Write Output: 19 Secs
Partition Phaseo Goal: Partitions computationally equivalent to
DBSCANo Algorithm:
o Form initial partitionso Add shadow regionso Rebalance
10Mr. Scan: Efficient Clustering with MRNet and GPUs
Distributed Partitioner
11Mr. Scan: Efficient Clustering with MRNet and GPUs
GPU DBSCAN Filter
12Mr. Scan: Efficient Clustering with MRNet and GPUs
DBSCAN is performed in two distinct steps
Step 1: Detect Core Points
Block 1
Block 2
Block 900
T1
T2
T512
T1
T2
T512
T1
T2
T512
Block 1T1
T2
T512
Block 2T1
T2
T512
Block 900T1
T2
T512
Step 2: Expand core points and color
Dense Box
13Mr. Scan: Efficient Clustering with MRNet and GPUs
• One significant scalability issue is dealing with dense regions of data
• Density increases the computation cost of DBSCAN
R2 Requires morecomparison operations
R1 R2
• We reduce the computation cost of high density regions by pre-clustering these regions
KD-Tree
Look at each leaf bounding box looking for boxes with point count > minpts and size < 0.35 * eps
DBSCAN no longer needs to expand these regions
`
Merge Algorithmo Merge overlapping clusters found on
different nodes. o Two steps in the merge operation
1. Select Representative points (BE)2. Merge operation
14Mr. Scan: Efficient Clustering with MRNet and GPUs
Representative Pointso These are points that represent the core
points in the dataset. o Create a boundary which at least one
core point shared between overlapping clusters must be contained.
15Mr. Scan: Efficient Clustering with MRNet and GPUs
Representative points are the points closest to the corners and middle of the side of the eps box
These points create a boundary (shaded region) which a point must fall in to merge overlapping clusters
Merge Algorithm
16Mr. Scan: Efficient Clustering with MRNet and GPUs
• Merge algorithm is responsible for merging overlapping clusters detected on different DBSCAN nodes.
• Need to handle the merge with low overhead and without the full dataset
Node 1 Node 2
Core Point
Non-Core Point
1. Core/Core overlap
Core Point in common. 64 operations to detect.
Node 1 Node 2
Core Point
Non-Core Point
2. Non-core/Core overlap
Core point seen as non-core by one node. MinPts * 2 operations required to detect
Sweep Stepo Get cluster identifiers and file offsets
down to BE’s to write final clusters. o FE gives each cluster a unique ID and a
file offset. o This data is passed back down to the BE
that holds the data in the cluster.o Data is written out to disk by the BE.
17Mr. Scan: Efficient Clustering with MRNet and GPUs
Experiment Setupo Dataset: Generated data with
distribution from real Twitter datao Measuring:
oWeak Scaling up to 8192 GPUso Strong ScalingoQuality compared to single-threaded
DBSCAN
18Mr. Scan: Efficient Clustering with MRNet and GPUs
Results
19Mr. Scan: Efficient Clustering with MRNet and GPUs
Weak Scaling: 4096x data/compute increase 18.48x-31.68x time increase
Results Breakdown – Partition Phase@ 6.5 Billion Points: 65.9% of Mr. Scan’s time
94.6% I/O time
20Mr. Scan: Efficient Clustering with MRNet and GPUs
Results Breakdown – GPU Cluster Time
21Mr. Scan: Efficient Clustering with MRNet and GPUs
Strong Scaling
22Mr. Scan: Efficient Clustering with MRNet and GPUs
Quality
23Mr. Scan: Efficient Clustering with MRNet and GPUs
Future Worko Remove partitioner’s I/O bottlenecko Multiple dimensions
24Mr. Scan: Efficient Clustering with MRNet and GPUs
Conclusiono Clustered 6.5 billion points with
DBSCAN in 18.2 minuteso Controlled computational variance of
DBSCANo Partitioner I/O = scaling enemy
25Mr. Scan: Efficient Clustering with MRNet and GPUs
Questions?
26A Brief Discussion of Ways and Means
Summary of previous Mr. Scan implementation
27Mr. Scan: Efficient Clustering with MRNet and GPUs
FE
BE BE BE
CP CP
BE
DBSCAN
Algorithm Steps
SpatialDecomp: CPU(@ FE)
DBSCAN: CPU or GPU(@ BE)
DrawBoundBox: CPU or GPU
MergeCluster: CPU (x #levels)
MergeCluster
top related