i/o streaming evaluation of batch queries for data-intensive computational turbulence
DESCRIPTION
I/O Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence. Kalin Kanov, Eric Perlman, Randal Burns, Yanif Ahmad, and Alexander Szalay Johns Hopkins University. I/O Streaming For Batch Queries. Based on partial sums - PowerPoint PPT PresentationTRANSCRIPT
Kalin Kanov, Eric Perlman, Randal Burns, Yanif Ahmad, and Alexander Szalay
Johns Hopkins University
I/O Streaming Evaluation of Batch Queries for Data-Intensive Computational
Turbulence
I/O Streaming For Batch QueriesBased on partial sums Allows access to the underlying data in any
order and in partsData streamed from disk in a single passEliminates redundant I/OOver an order of magnitude improvement
in performance over direct evaluation of queries
IntroductionData-intensive computing breakthroughs
have allowed for new interaction with scientific numerical simulations
Formerly, analysis performed during the computationNo data stored for subsequent examination
Turbulence Database ClusterStores entire space-time evolution of the
simulationTwo datasets totaling 70TB; part of the 1.1PB
GrayWulf clusterProvides public access to world-class
simulationImplements “immersive turbulence*”
approach
*E. Perlman, R. Burns, Y. Li, and C. Meneveau. Data exploration of turbulence simulations using a database
cluster. In Supercomputing, 2007.
Turbulence Database Cluster
MotivationWithout I/O streaming:
Heavy DB usage slows down the service by a factor of 10 to 20
Query evaluation techniques adapted from simulation code do not access data coherently
Substantial storage overhead (~42%) incurred to localize each computation
Turbulence queries:95% of queries perform Lagrange Polynomial
interpolationCan be evaluated in parts
0 1 2 3 4 5 6 7 8 910
11
12
13
14
15
Processing a Batch Query
10 11 14 15
8 9 12 13
2 3 6 7
0 1 4 5
0 1 2 3 4 5 6 7 8 910
11
12
13
14
15
Processing a Batch Query
10 11 14 15
8 9 12 13
2 3 6 7
0 1 4 5query 1 query 3
query 2
q1:
q2: 911
12
14
q3: 4 5 6 7
0 1 2 3 4 6 8 912
Redundant I/OMultiple disk
seeks
Streaming Evaluation MethodLinear data requirements of the
computation allow for:Incremental evaluationStreaming over the dataConcurrent evaluation of batch queries
0 1 2 3 4 5 6 7 8 910
11
12
13
14
15
Processing a Batch Query
10 11 14 15
8 9 12 13
2 3 6 7
0 1 4 5query 1 query 3
query 2
11
145 70 1 2 3 4 6 8 9
12
q1 q1 q1 q1 q1
q3
q3 q1
q3
q3 q1 q1
q2
q2 q1
q2
q2
I/O Streaming:
Sequential I/OSingle pass
Lagrange Polynomial Interpolation
€
f (x',y ') = lyp−N
2+ j
j=1
N
∑ (y') lxn−N
2+i
i=1
N
∑ (x') ⋅ f (xn−N
2+i,y
p−N
2+ j)
Lagrange coefficients Dat
a
Processing a Batch QueryInput queries pre-processed into a key-
value dictionary Keys are z-index values of data atoms stored in
DB Entries are lists of queries
Temp table is created out of dictionary keysExecute a join between temp table and data
tableWhen data atom is read-in all queries that
need data from it are processed and their partial sums updated
Experimental EvaluationRandom workloads:
across the entire cube space a 1283 subset of the entire space
Workload from the usage log of the Turbulence cluster
Compare with direct methods of evaluation:DirectSortingJoin/Order By
3D Workload
Used for generating global statistics
128 Workload
Used for:Examining ROICreating
visualizations
Experimental EvaluationRandom workloads:
across the entire cube space a 1283 subset of the entire space
Workload from the usage log of the Turbulence cluster
Compare with direct methods of evaluation:DirectSortingJoin/Order By
SetupExperimental version of the MHD database
~300 timesteps of the velocity fields of the MHD simulation
Two 2.33 GHz dual quad-core Windows 2003 servers with SQL Server 2008 and 8GB of memory
Part of the 1.1PB GrayWulf cluster with aggregate low-level throughput of 70 GB/sec
Data tables striped across 7 disks per node
3D Workload
Over an order of magnitude improvement Sorting leads to a more sequential accesJoin/Order By executes entire batch as a joinI/O Streaming
Each atom is read only onceEffective cache usage
128 Workload
Less I/OMore data sharing
I/O Streaming alleviates I/O bottleneckComputation emerges as the more costly operation
128 Workload
Future WorkExtend I/O streaming technique to other
decomposable kernel computations:DifferentiationTemporal interpolationFiltering
Multi-job batch scheduling:Integrate into a batch scheduling framework
such as JAWS*
*X. Wang, E. Perlman, R. Burns, T. Malik, T. Budavari, C. Meneveau, and A. Szalay. Jaws: Job-aware workload scheduling for the exploration of
turbulence simulations. In Supercomputing, 2010.
SummaryI/O Streaming method for data-intensive
batch queriesSingle pass by means of partial-sumsEffective exploitation of data sharing Improved cache localityOver an order of magnitude improvement
in performance
Questions
Images courtesy of Kai Buerger ([email protected])