Download - Fast N-Body Algorithms for Massive Datasets
![Page 1: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/1.jpg)
Fast N-Body Algorithmsfor Massive Datasets
Alexander GrayGeorgia Institute of Technology
![Page 2: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/2.jpg)
Is science in 2007different from science in 1907?
Instruments
[Science, Szalay & J. Gray, 2001]
![Page 3: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/3.jpg)
Is science in 2007different from science in 1907?
1990 COBE 1,0002000 Boomerang 10,0002002 CBI
50,0002003 WMAP 1 Million2008 Planck 10 Million
Data: CMB Maps
Data: Local Redshift Surveys1986 CfA 3,5001996 LCRS 23,0002003 2dF 250,0002005 SDSS 800,000
Data: Angular Surveys1970 Lick 1M1990 APM 2M2005 SDSS 200M2008 LSST 2B
Instruments
1.0E+02
1.0E+03
1.0E+04
1.0E+05
1.0E+06
1.0E+07
1985 1990 1995 2000 2005 2010
1.0E+02
1.0E+03
1.0E+04
1.0E+05
1.0E+06
1.0E+07
1985 1990 1995 2000 2005 2010
[Science, Szalay & J. Gray, 2001]
![Page 4: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/4.jpg)
Sloan Digital Sky Survey (SDSS)
![Page 5: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/5.jpg)
Size matters! Now possible:• low noise: subtle patterns• global properties and patterns• rare objects and patterns • more info: 3d, deeper/earlier, bands• in parallel: more accurate simulations• 2008: LSST – time-varying phenomena
1 billion objects144 dimensions
(~250M galaxies in 5 colors, ~1M 2000-D spectra)
![Page 6: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/6.jpg)
Happening everywhere!Molecular biologymicroarray chips
Earth sciencessatellite topography
Neurosciencefunctional MRI
microprocessor
nuclear mag. resonance Drug discovery
Physical simulation
Internetfiber optics
![Page 7: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/7.jpg)
1.How did galaxies evolve?2.What was the early universe like?3.Does dark energy exist?4. Is our model (GR+inflation) right?
Astrophysicist
Machine learning/statistics guy
R. Nichol, Inst. Cosmol. GravitationA. Connolly, U. Pitt PhysicsC. Miller, NOAOR. Brunner, NCSAG. Kulkarni, Inst. Cosmol. GravitationD. Wake, Inst. Cosmol. Gravitation
R. Scranton, U. Pitt PhysicsM. Balogh, U. Waterloo PhysicsI. Szapudi, U. Hawaii Inst. AstronomyG. Richards, Princeton PhysicsA. Szalay, Johns Hopkins Physics
![Page 8: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/8.jpg)
1.How did galaxies evolve?2.What was the early universe like?3.Does dark energy exist?4. Is our model (GR+inflation) right?
Astrophysicist
Machine learning/statistics guy
O(Nn)O(N2)
O(N2)O(N2)
O(N2)O(N3)
O(cDT(N))
R. Nichol, Inst. Cosmol. Grav.A. Connolly, U. Pitt PhysicsC. Miller, NOAOR. Brunner, NCSAG. Kulkarni, Inst. Cosmol. Grav.D. Wake, Inst. Cosmol. Grav.
R. Scranton, U. Pitt PhysicsM. Balogh, U. Waterloo PhysicsI. Szapudi, U. Hawaii Inst. Astro.G. Richards, Princeton PhysicsA. Szalay, Johns Hopkins Physics
• Kernel density estimator • n-point spatial statistics• Nonparametric Bayes classifier• Support vector machine• Nearest-neighbor statistics• Gaussian process regression• Bayesian inference
![Page 9: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/9.jpg)
R. Nichol, Inst. Cosmol. Grav.A. Connolly, U. Pitt PhysicsC. Miller, NOAOR. Brunner, NCSAG. Kulkarni, Inst. Cosmol. Grav.D. Wake, Inst. Cosmol. Grav.
R. Scranton, U. Pitt PhysicsM. Balogh, U. Waterloo PhysicsI. Szapudi, U. Hawaii Inst. Astro.G. Richards, Princeton PhysicsA. Szalay, Johns Hopkins Physics
• Kernel density estimator • n-point spatial statistics• Nonparametric Bayes classifier• Support vector machine• Nearest-neighbor statistics• Gaussian process regression• Bayesian inference
1.How did galaxies evolve?2.What was the early universe like?3.Does dark energy exist?4. Is our model (GR+inflation) right?
Astrophysicist
Machine learning/statistics guy
O(Nn)O(N2)
O(N2)O(N2)
O(N2)O(N3)
O(cDT(N))
![Page 10: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/10.jpg)
R. Nichol, Inst. Cosmol. Grav.A. Connolly, U. Pitt PhysicsC. Miller, NOAOR. Brunner, NCSAG. Kulkarni, Inst. Cosmol. Grav.D. Wake, Inst. Cosmol. Grav.
R. Scranton, U. Pitt PhysicsM. Balogh, U. Waterloo PhysicsI. Szapudi, U. Hawaii Inst. Astro.G. Richards, Princeton PhysicsA. Szalay, Johns Hopkins Physics
• Kernel density estimator • n-point spatial statistics• Nonparametric Bayes classifier• Support vector machine• Nearest-neighbor statistics• Gaussian process regression• Bayesian inference
1.How did galaxies evolve?2.What was the early universe like?3.Does dark energy exist?4. Is our model (GR+inflation) right?
But I have 1 million points
Astrophysicist
Machine learning/statistics guy
O(Nn)O(N2)
O(N2)O(N2)
O(N2)O(N3)
O(cDT(N))
![Page 11: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/11.jpg)
Data: The Stack
Apps (User, Science)
Perception Computer Vision, NLP, Machine Translation, Bibleome , Autonomous vehicles
ML / Opt Machine Learning / Optimization / Linear Algebra / Privacy
Data Abstractions DBMS , MapReduce , VOTables ,
Clustering / Threading Programming with 1000s of powerful compute nodes
O/SNetworkMotherboards / DatacenterICs
![Page 12: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/12.jpg)
Data: The Stack
Apps (User, Science)
Perception Computer Vision, NLP, Machine Translation, Bibleome , Autonomous vehicles
ML / Opt Machine Learning / Optimization / Linear Algebra / Privacy
Data Abstractions DBMS , MapReduce , VOTables , Data structures
Clustering / Threading Programming with 1000s of powerful compute nodes
O/SNetworkMotherboards / DatacenterICs
![Page 13: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/13.jpg)
Making fast algorithms
• There are many large datasets. There are many questions we want to ask them.– Why we must not get obsessed with one
specific dataset.– Why we must not get obsessed with one
specific question.• The activity I’ll describe is about
accerating computations which occur commonly across many ML methods.
![Page 14: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/14.jpg)
Scope• Nearest neighbor• K-means• Hierarchical clustering• N-point correlation functions• Kernel density estimation• Locally-weighted regression• Mean shift tracking• Mixtures of Gaussians• Gaussian process regression• Manifold learning• Support vector machines• Affinity propagation• PCA• ….
![Page 15: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/15.jpg)
Scope
• ML methods with distances underneath– Distances only– Continuous kernel functions
• ML methods with counting underneath
![Page 16: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/16.jpg)
Scope
• Computational ideas in this tutorial:– Data structures – Monte Carlo– Series expansions– Problem/solution abstractions
• Challenges– Don’t introduce error, if possible– Don’t introduce tweak parameters, if
possible
![Page 17: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/17.jpg)
Two canonical problems
• Nearest-neighbor search
• Kernel density estimation
)(1)(ˆ
N
qrrqhq xxK
Nxf
rqrq xxxNN minarg)(
![Page 18: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/18.jpg)
Ideas
1. Data structures and how to use them2. Monte Carlo3. Series expansions4. Problem/solution abstractions
![Page 19: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/19.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Nearest Neighbor - Naïve Approach
• Given a query point X.• Scan through each point Y:
– Calculate the distance d(X,Y)
– If d(X,Y) < best_seen then Y is the new nearest neighbor.
• Takes O(N) time for each query!
33 Distance Computations
![Page 20: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/20.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Speeding Up Nearest Neighbor
• We can speed up the search for the nearest neighbor:– Examine nearby points first.– Ignore any points that are further then the nearest
point found so far.• Do this using a KD-tree:
– Tree based data structure– Recursively partitions points into axis aligned boxes.
![Page 21: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/21.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
KD-Tree Construction
Pt X Y1 0.00 0.002 1.00 4.313 0.13 2.85
… … …
We start with a list of n-dimensional points.
![Page 22: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/22.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
KD-Tree Construction
Pt X Y1 0.00 0.003 0.13 2.85
… … …
We can split the points into 2 groups by choosing a dimension X and value V and separating the points into X > V and X <= V.
X>.5
Pt X Y
2 1.00 4.31
… … …
YESNO
![Page 23: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/23.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
KD-Tree Construction
Pt X Y1 0.00 0.003 0.13 2.85
… … …
We can then consider each group separately and possibly split again (along same/different dimension).
X>.5
Pt X Y
2 1.00 4.31
… … …
YESNO
![Page 24: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/24.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
KD-Tree Construction
Pt X Y3 0.13 2.85
… … …
We can then consider each group separately and possibly split again (along same/different dimension).
X>.5
Pt X Y
2 1.00 4.31
… … …
YESNO
Pt X Y1 0.00 0.00… … …
Y>.1NO YES
![Page 25: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/25.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
KD-Tree Construction
We can keep splitting the points in each set to create a tree structure. Each node with no children (leaf node) contains a list of points.
![Page 26: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/26.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
KD-Tree Construction
We will keep around one additional piece of information at each node. The (tight) bounds of the points at or below this node.
![Page 27: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/27.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
KD-Tree Construction
Use heuristics to make splitting decisions:
• Which dimension do we split along? Widest
• Which value do we split at? Median of value of that split dimension for the points.
• When do we stop? When there are fewer then m points left OR the box has hit some minimum width.
![Page 28: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/28.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Exclusion and inclusion, using point-node kd-tree bounds.
O(D) bounds on distance minima/maxima:
D
dddddii uxxlxx 0,max0,maxmin 22
D
dddddii lxxuxx 22 )(,maxmax
![Page 29: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/29.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Exclusion and inclusion, using point-node kd-tree bounds.
O(D) bounds on distance minima/maxima:
D
dddddii uxxlxx 0,max0,maxmin 22
D
dddddii lxxuxx 22 )(,maxmax
![Page 30: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/30.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Nearest Neighbor with KD Trees
We traverse the tree looking for the nearest neighbor of the query point.
![Page 31: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/31.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Nearest Neighbor with KD Trees
Examine nearby points first: Explore the branch of the tree that is closest to the query point first.
![Page 32: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/32.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Nearest Neighbor with KD Trees
Examine nearby points first: Explore the branch of the tree that is closest to the query point first.
![Page 33: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/33.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Nearest Neighbor with KD Trees
When we reach a leaf node: compute the distance to each point in the node.
![Page 34: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/34.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Nearest Neighbor with KD Trees
When we reach a leaf node: compute the distance to each point in the node.
![Page 35: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/35.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Nearest Neighbor with KD Trees
Then we can backtrack and try the other branch at each node visited.
![Page 36: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/36.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Nearest Neighbor with KD Trees
Each time a new closest node is found, we can update the distance bounds.
![Page 37: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/37.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Nearest Neighbor with KD Trees
Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.
![Page 38: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/38.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Nearest Neighbor with KD Trees
Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.
![Page 39: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/39.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Nearest Neighbor with KD Trees
Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.
![Page 40: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/40.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Simple recursive algorithm(k=1 case)
NN(xq,R,dlo,xsofar,dsofar){ if dlo > dsofar, return.
if leaf(R), [xsofar,dsofar]=NNBase(xq,R,dsofar). else, [R1,d1,R2,d2]=orderByDist(xq,R.l,R.r). NN(xq,R1,d1,xsofar,dsofar). NN(xq,R2,d2,xsofar,dsofar).}
![Page 41: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/41.jpg)
Slides by Jeremy KubicaQuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Nearest Neighbor with KD Trees
Instead, some animations showing real data…1. kd-tree with cached sufficient statistics2. nearest-neighbor with kd-trees3. range-count with kd-trees
For animations, see:http://www.cs.cmu.edu/~awm/animations/kdtree
![Page 42: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/42.jpg)
Range-count example
![Page 43: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/43.jpg)
Range-count example
![Page 44: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/44.jpg)
Range-count example
![Page 45: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/45.jpg)
Range-count example
![Page 46: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/46.jpg)
Range-count example
Pruned!(inclusion)
![Page 47: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/47.jpg)
Range-count example
![Page 48: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/48.jpg)
Range-count example
![Page 49: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/49.jpg)
Range-count example
![Page 50: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/50.jpg)
Range-count example
![Page 51: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/51.jpg)
Range-count example
![Page 52: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/52.jpg)
Range-count example
![Page 53: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/53.jpg)
Range-count example
![Page 54: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/54.jpg)
Range-count example
Pruned!(exclusion)
![Page 55: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/55.jpg)
Range-count example
![Page 56: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/56.jpg)
Range-count example
![Page 57: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/57.jpg)
Range-count example
![Page 58: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/58.jpg)
Some questions• Asymptotic runtime analysis?
– In a rough sense, O(logN)– But only under some regularity conditions
• How high in dimension can we go?– Roughly exponential in intrinsic dimension– In practice, in less than 100 dimensions,
still big speedups
![Page 59: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/59.jpg)
Another kind of tree• Ball-trees, metric trees
– Use balls instead of hyperrectangles– Can often be more efficient in high
dimension (though not always)– Can work with non-Euclidean metric (you
only need to respect the triangle inequality)– Many non-metric similarity measures can
be bounded by metric quantities.
![Page 60: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/60.jpg)
A Set of Points in a metric
space
![Page 61: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/61.jpg)
Ball Tree root node
![Page 62: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/62.jpg)
A Ball Tree
![Page 63: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/63.jpg)
A Ball Tree
![Page 64: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/64.jpg)
A Ball Tree
![Page 65: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/65.jpg)
A Ball Tree
![Page 66: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/66.jpg)
A Ball Tree
•J. Uhlmann, 1991
•S. Omohundro, NIPS 1991
![Page 67: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/67.jpg)
Ball-trees: properties
Let Q be any query point and let x be a point inside ball B
|x-Q| |Q - B.center| - B.radius |x-Q| |Q - B.center| + B.radius
Q
B.center
x
![Page 68: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/68.jpg)
How to build a metric tree, exactly?
• Must balance quality vs. build-time• ‘Anchors hierarchy’ (farthest-points
heuristic, 2-approx used in OR)• Omohundro: ‘Five ways to build a ball-tree’• Which is the best? A research topic…
![Page 69: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/69.jpg)
Some other trees
• Cover-tree– Provable worst-case O(logN) under an
assumption (bounded expansion constant)– Like a non-binary ball-tree
• Learning trees– In this conference
![Page 70: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/70.jpg)
‘All’-type problems
• Nearest-neighbor search
All-nearest neighbor (bichromatic):
• Kernel density estimation
‘All’ version (bichromatic):
)(1)(ˆ
N
qrrqhq xxK
Nxf
)(1)(ˆ:
N
qrrqhqq xxK
Nxfx
rqrqq xxxNNx minarg)(:
rqrq xxxNN minarg)(
![Page 71: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/71.jpg)
Almost always ‘all’-type problems• Kernel density estimation• Nadaraya-Watson & locally-wgtd regression• Gaussian process prediction• Radial basis function networks
• Monochromatic all-nearest neighbor (e.g. LLE)
• n-point correlation (n-tuples)
Always ‘all’-type problems
![Page 72: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/72.jpg)
Dual-tree idea
If all the queries are available simultaneously, then it is faster to:
1. Build a tree on the queries as well2. Effectively process the queries in
chunks rather than individually work is shared between similar query points
![Page 73: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/73.jpg)
Single-tree:
![Page 74: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/74.jpg)
Single-tree:
Dual-tree (symmetric):
![Page 75: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/75.jpg)
Exclusion and inclusion, using point-node kd-tree bounds.
O(D) bounds on distance minima/maxima:
D
dddddii uxxlxx 0,max0,maxmin 22
D
dddddii lxxuxx 22 )(,maxmax
![Page 76: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/76.jpg)
Exclusion and inclusion, using point-node kd-tree bounds.
O(D) bounds on distance minima/maxima:
D
dddddii uxxlxx 0,max0,maxmin 22
D
dddddii lxxuxx 22 )(,maxmax
![Page 77: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/77.jpg)
Exclusion and inclusion, using kd-tree node-node bounds.
O(D) bounds on distance minima/maxima:
(Analogous to point-node bounds.)
Also needed:Nodewise bounds.
![Page 78: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/78.jpg)
Exclusion and inclusion, using kd-tree node-node bounds.
O(D) bounds on distance minima/maxima:
Also needed:
(Analogous to point-node bounds.)
Nodewise bounds.
![Page 79: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/79.jpg)
Single-tree: simple recursive algorithm(k=1 case)
NN(xq,R,dlo,xsofar,dsofar){ if dlo > dsofar, return.
if leaf(R), [xsofar,dsofar]=NNBase(xq,R,dsofar). else, [R1,d1,R2,d2]=orderByDist(xq,R.l,R.r). NN(xq,R1,d1,xsofar,dsofar). NN(xq,R2,d2,xsofar,dsofar).}
![Page 80: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/80.jpg)
Single-tree Dual-tree
• xq Q
• dlo(xq,R) dlo(Q,R)
• xsofar xsofar, dsofar dsofar
• store Q.dsofar=maxQdsofar
• 2-way recursion 4-way recursion
• N x O(logN) O(N)
![Page 81: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/81.jpg)
Dual-tree: simple recursive algorithm (k=1)AllNN(Q,R,dlo,xsofar,dsofar){ if dlo > Q.dsofar, return.
if leaf(Q) & leaf(R), [xsofar,dsofar]=AllNNBase(Q,R,dsofar). Q.dsofar=maxQdsofar. else if !leaf(Q) & leaf(R), … else if leaf(Q) & !leaf(R), … else if !leaf(Q) & !leaf(R), [R1,d1,R2,d2]=orderByDist(Q.l,R.l,R.r). AllNN(Q.l,R1,d1,xsofar,dsofar). AllNN(Q.l,R2,d2,xsofar,dsofar). [R1,d1,R2,d2]=orderByDist(Q.r,R.l,R.r). AllNN(Q.r,R1,d1,xsofar,dsofar). AllNN(Q.r,R2,d2,xsofar,dsofar). Q.dsofar = max(Q.l.dsofar,Q.r.dsofar).}
![Page 82: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/82.jpg)
Query points Reference points
Dual-tree traversal(depth-first)
![Page 83: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/83.jpg)
Query points Reference points
Dual-tree traversal
![Page 84: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/84.jpg)
Query points Reference points
Dual-tree traversal
![Page 85: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/85.jpg)
Query points Reference points
Dual-tree traversal
![Page 86: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/86.jpg)
Query points Reference points
Dual-tree traversal
![Page 87: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/87.jpg)
Query points Reference points
Dual-tree traversal
![Page 88: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/88.jpg)
Query points Reference points
Dual-tree traversal
![Page 89: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/89.jpg)
Query points Reference points
Dual-tree traversal
![Page 90: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/90.jpg)
Query points Reference points
Dual-tree traversal
![Page 91: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/91.jpg)
Query points Reference points
Dual-tree traversal
![Page 92: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/92.jpg)
Query points Reference points
Dual-tree traversal
![Page 93: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/93.jpg)
Query points Reference points
Dual-tree traversal
![Page 94: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/94.jpg)
Query points Reference points
Dual-tree traversal
![Page 95: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/95.jpg)
Query points Reference points
Dual-tree traversal
![Page 96: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/96.jpg)
Query points Reference points
Dual-tree traversal
![Page 97: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/97.jpg)
Query points Reference points
Dual-tree traversal
![Page 98: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/98.jpg)
Query points Reference points
Dual-tree traversal
![Page 99: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/99.jpg)
Query points Reference points
Dual-tree traversal
![Page 100: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/100.jpg)
Query points Reference points
Dual-tree traversal
![Page 101: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/101.jpg)
Query points Reference points
Dual-tree traversal
![Page 102: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/102.jpg)
Query points Reference points
Dual-tree traversal
![Page 103: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/103.jpg)
Meta-idea: Higher-order
Divide-and-conquer
Break each set into pieces.
Solving the sub-parts of the problem and combining these sub-solutions appropriately
might be easier than doing this over only one set.
Generalizes divide-and-conquer of a single set to divide-and-conquer of multiple sets.
![Page 104: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/104.jpg)
Ideas
1. Data structures and how to use them2. Monte Carlo3. Series expansions4. Problem/solution abstractions
![Page 105: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/105.jpg)
2-point correlation
r
N
i
N
ijji rxxI )(
Characterization of an entire distribution?
“How many pairs have distance < r ?”
2-point correlationfunction
![Page 106: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/106.jpg)
The n-point correlation functions• Spatial inferences: filaments, clusters, voids,
homogeneity, isotropy, 2-sample testing, …• Foundation for theory of point processes
[Daley,Vere-Jones 1972], unifies spatial statistics [Ripley 1976]
• Used heavily in biostatistics, cosmology, particle physics, statistical physics
)](1[212 rdVdVdP
2pcf definition:
)],,()()()(1[ 1323121323123213 rrrrrrdVdVdVdP
3pcf definition:
![Page 107: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/107.jpg)
3-point correlation
)()()( 321 rIrIrI ki
N
i
N
ij
N
ijkjkij
“How many triples have pairwise distances < r ?”
r3
r1
r2
Standard model: n>0 terms should be zero!
![Page 108: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/108.jpg)
How can we count n-tuples efficiently?
“How many triples have pairwise distances < r ?”
![Page 109: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/109.jpg)
Use n trees![Gray & Moore, NIPS 2000]
![Page 110: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/110.jpg)
“How many valid triangles a-b-c(where )
could there be? CcBbAa ,,
A
B
C
r
count{A,B,C} =
?
![Page 111: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/111.jpg)
“How many valid triangles a-b-c(where )
could there be? CcBbAa ,,
count{A,B,C} =
count{A,B,C.left}+
count{A,B,C.right}A
B
C
r
![Page 112: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/112.jpg)
“How many valid triangles a-b-c(where )
could there be? CcBbAa ,,
A
B
C
r
count{A,B,C} =
count{A,B,C.left}+
count{A,B,C.right}
![Page 113: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/113.jpg)
“How many valid triangles a-b-c(where )
could there be? CcBbAa ,,
AB
C
r
count{A,B,C} =
?
![Page 114: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/114.jpg)
“How many valid triangles a-b-c(where )
could there be? CcBbAa ,,
AB
C
r
Exclusion
count{A,B,C} =
0!
![Page 115: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/115.jpg)
“How many valid triangles a-b-c(where )
could there be? CcBbAa ,,
A B
C
count{A,B,C} =
?
r
![Page 116: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/116.jpg)
“How many valid triangles a-b-c(where )
could there be? CcBbAa ,,
A B
C
Inclusion
count{A,B,C} =
|A| x |B| x |C|
r
Inclusion
Inclusion
![Page 117: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/117.jpg)
Key idea(combinatorial proximity
problems):
for n-tuples: n-tree recursion
![Page 118: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/118.jpg)
3-point runtime(biggest previous: 20K)
VIRGO simulation data,N = 75,000,000
naïve: 5x109 sec. (~150 years)multi-tree: 55 sec. (exact)
n=2: O(N)
n=3: O(Nlog3)
n=4: O(N2)
![Page 119: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/119.jpg)
But…
Depends on rD-1.Slow for large radii.
VIRGO simulation data, N = 75,000,000
naïve: ~150 yearsmulti-tree: large h: 24 hrs
Let’s develop a method for large radii.
![Page 120: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/120.jpg)
c = p T
EASIER?known.hard.
Sppzp )ˆ1(ˆˆ 2/
no dependence on N! but it does depend on p
![Page 121: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/121.jpg)
c = p T
Sppzp )ˆ1(ˆˆ 2/
no dependence on N! but it does depend on p
![Page 122: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/122.jpg)
c = p T
Sppzp )ˆ1(ˆˆ 2/
no dependence on N! but it does depend on p
![Page 123: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/123.jpg)
c = p T
Sppzp )ˆ1(ˆˆ 2/
no dependence on N! but it does depend on p
![Page 124: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/124.jpg)
c = p T
Sppzp )ˆ1(ˆˆ 2/
no dependence on N! but it does depend on p
![Page 125: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/125.jpg)
This is junk:don’t bother
c = p T
![Page 126: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/126.jpg)
This ispromising
c = p T
![Page 127: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/127.jpg)
Basic idea:
1. Remove some junk(Run exact algorithm for a while)
make p larger
2. Sample from the rest
![Page 128: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/128.jpg)
Get disjoint sets from the recursion tree
… … … [prune]
all possible n-tuples
nN
![Page 129: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/129.jpg)
1T + + =
+ + =
+ + =
3T2T T
11 p̂TT
22 p̂TT
33 p̂TT p̂
21
21 ̂
TT 2
2
22 ̂
TT 2
3
23 ̂
TT 2̂
Now do stratified sampling
![Page 130: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/130.jpg)
Speedup Results
VIRGO simulation dataN = 75,000,000
naïve: ~150 yearsmulti-tree: large h: 24 hrs
multi-tree monte carlo: 99% confidence: 96 sec
![Page 131: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/131.jpg)
Ideas
1. Data structures and how to use them2. Monte Carlo3. Multipole methods4. Problem/solution abstractions
![Page 132: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/132.jpg)
Kernel density estimation
N
qrrqhqq xxK
Nxfx )(1)(ˆ,
![Page 133: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/133.jpg)
How to use a tree…1. How to approximate?
2. When to approximate?
[Barnes and Hut, Science, 1987]
q
i
RRi qKNxqK ),(),(
if rs
sR
r R
![Page 134: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/134.jpg)
hiR
hiqRR
hihi
loR
loqRR
lolo
KNqKNqq
KNqKNqq
),()()(
),()()(
How to use a tree…3. How to know potential error?
Let’s maintain bounds on the true kernel sumi
ixqKq ),()(
hihi
lolo
NKq
NKq
)(
)(At the beginning:
R
![Page 135: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/135.jpg)
Single-tree:
Dual-tree (symmetric): [Gray & Moore 2000]
How to use a tree…4. How to do ‘all’ problem?
N
qrrqhqq xxK
Nxfx )(1)(ˆ,
![Page 136: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/136.jpg)
How to use a tree…4. How to do ‘all’ problem?
rRrQ
s
i
RRi qKNxqKQq ),(),(,
if
),max( RQ rrs
Generalizes Barnes-Hut to dual-tree
RQ
![Page 137: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/137.jpg)
Case 1 – alg. gives no error boundsCase 2 – alg. gives error bounds, but must be rerun Case 3 – alg. automatically achieves error tolerance
BUT:
We have a tweak parameter:
So far we have case 2; let’s try for case 3
Let’s try to make an automatic stopping rule
![Page 138: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/138.jpg)
Finite-difference function approximation.
)()()(21)()(
1
1i
ii
iii xx
xxxfxfxfxf
)()()(21)()( lo
lohi
lohilo KKKK
))(()()( axafafxf Taylor expansion:
Gregory-Newton finite form:
![Page 139: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/139.jpg)
Finite-difference function approximation.
)()(2
hiQR
loQR
RN
rqrq KKNKKerr
R
)()(21 hi
QRloQR KKK
assumes monotonic decreasing kernel
approximate {Q,R} if
)()()( 2 QKK loNhilo
)(
:)(
:,q
qR
q
qR
xerr
qNN
xerr
Rq
![Page 140: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/140.jpg)
Speedup Results (KDE)
One order-of-magnitude speedupover single-tree at ~2M points
12.5K 7 .1225K 31 .3150K 123 .46
100K 494 1.0200K 1976* 2400K 7904* 5800K 31616* 101.6M 35 hrs 23
dual-N naïve tree
5500x
![Page 141: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/141.jpg)
Ideas
1. Data structures and how to use them2. Monte Carlo3. Multipole methods4. Problem/solution abstractions
![Page 142: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/142.jpg)
These are all examples of…
Generalized N-body problems
General theory and toolkit for designing algorithms for
such problems
All-NN:
2-point:
3-point:
KDE:
SPH: };),(,,{}}{;),(,,{}),(,,,{
}),(,,{},min,arg,{
twKhKwI
wI
h
h
H
h
![Page 143: Fast N-Body Algorithms for Massive Datasets](https://reader035.vdocuments.us/reader035/viewer/2022062315/56815dd2550346895dcbfd01/html5/thumbnails/143.jpg)
For more…
In this conference:• Learning trees• Monte Carlo for statistical summations• Large-scale learning workshop
• EMST• GNP’s and MapReduce-like parallelization• Monte Carlo SVD