i/o streaming evaluation of batch queries for data-intensive computational turbulence

Kalin Kanov, Eric Perlman, Randal Burns, Yanif Ahmad, and Alexander Szalay

Johns Hopkins University

I/O Streaming Evaluation of Batch Queries for Data-Intensive Computational

Turbulence

I/O Streaming For Batch QueriesBased on partial sums Allows access to the underlying data in any

order and in partsData streamed from disk in a single passEliminates redundant I/OOver an order of magnitude improvement

in performance over direct evaluation of queries

IntroductionData-intensive computing breakthroughs

have allowed for new interaction with scientific numerical simulations

Formerly, analysis performed during the computationNo data stored for subsequent examination

Turbulence Database ClusterStores entire space-time evolution of the

simulationTwo datasets totaling 70TB; part of the 1.1PB

GrayWulf clusterProvides public access to world-class

simulationImplements “immersive turbulence*”

approach

*E. Perlman, R. Burns, Y. Li, and C. Meneveau. Data exploration of turbulence simulations using a database

cluster. In Supercomputing, 2007.

Turbulence Database Cluster

MotivationWithout I/O streaming:

Heavy DB usage slows down the service by a factor of 10 to 20

Query evaluation techniques adapted from simulation code do not access data coherently

Substantial storage overhead (~42%) incurred to localize each computation

Turbulence queries:95% of queries perform Lagrange Polynomial

interpolationCan be evaluated in parts

0 1 2 3 4 5 6 7 8 910

11

12

13

14

15

Processing a Batch Query

10 11 14 15

8 9 12 13

2 3 6 7

0 1 4 5

0 1 2 3 4 5 6 7 8 910

11

12

13

14

15


10 11 14 15

8 9 12 13

2 3 6 7

0 1 4 5query 1 query 3

query 2

q1:

q2: 911

12

14

q3: 4 5 6 7

0 1 2 3 4 6 8 912

Redundant I/OMultiple disk

seeks

Streaming Evaluation MethodLinear data requirements of the

computation allow for:Incremental evaluationStreaming over the dataConcurrent evaluation of batch queries

0 1 2 3 4 5 6 7 8 910

11

12

13

14

15


10 11 14 15

8 9 12 13

2 3 6 7

0 1 4 5query 1 query 3

query 2

11

145 70 1 2 3 4 6 8 9

12

q1 q1 q1 q1 q1

q3

q3 q1

q3

q3 q1 q1

q2

q2 q1

q2

q2

I/O Streaming:

Sequential I/OSingle pass

Lagrange Polynomial Interpolation

€

f (x',y ') = lyp−N

2+ j

j=1

N

∑ (y') lxn−N

2+i

i=1

N

∑ (x') ⋅ f (xn−N

2+i,y

p−N

2+ j)

Lagrange coefficients Dat

a

Processing a Batch QueryInput queries pre-processed into a key-

value dictionary Keys are z-index values of data atoms stored in

DB Entries are lists of queries

Temp table is created out of dictionary keysExecute a join between temp table and data

tableWhen data atom is read-in all queries that

need data from it are processed and their partial sums updated

Experimental EvaluationRandom workloads:

across the entire cube space a 1283 subset of the entire space

Workload from the usage log of the Turbulence cluster

Compare with direct methods of evaluation:DirectSortingJoin/Order By

3D Workload

Used for generating global statistics

128 Workload

Used for:Examining ROICreating

visualizations

Experimental EvaluationRandom workloads:

across the entire cube space a 1283 subset of the entire space

Workload from the usage log of the Turbulence cluster

Compare with direct methods of evaluation:DirectSortingJoin/Order By

SetupExperimental version of the MHD database

~300 timesteps of the velocity fields of the MHD simulation

Two 2.33 GHz dual quad-core Windows 2003 servers with SQL Server 2008 and 8GB of memory

Part of the 1.1PB GrayWulf cluster with aggregate low-level throughput of 70 GB/sec

Data tables striped across 7 disks per node

3D Workload

Over an order of magnitude improvement Sorting leads to a more sequential accesJoin/Order By executes entire batch as a joinI/O Streaming

Each atom is read only onceEffective cache usage

128 Workload

Less I/OMore data sharing

I/O Streaming alleviates I/O bottleneckComputation emerges as the more costly operation

128 Workload

Future WorkExtend I/O streaming technique to other

decomposable kernel computations:DifferentiationTemporal interpolationFiltering

Multi-job batch scheduling:Integrate into a batch scheduling framework

such as JAWS*

*X. Wang, E. Perlman, R. Burns, T. Malik, T. Budavari, C. Meneveau, and A. Szalay. Jaws: Job-aware workload scheduling for the exploration of

turbulence simulations. In Supercomputing, 2010.

SummaryI/O Streaming method for data-intensive

batch queriesSingle pass by means of partial-sumsEffective exploitation of data sharing Improved cache localityOver an order of magnitude improvement

in performance

Questions

Images courtesy of Kai Buerger ([email protected])

i/o streaming evaluation of batch queries for data-intensive computational turbulence

Documents

direct evaluation of

batch queryinput queries

computationno data

data tablewhen data

underlying data

computation turbulence

query evaluation techniques

batch queriesbased