query compilation of dataflow programs for heterogeneous platforms … · 2014. 10. 7. · workshop...
TRANSCRIPT
![Page 1: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/1.jpg)
Query Compilation of Dataflow Programs for Heterogeneous
Platforms
Felix Beier & Kai-Uwe SattlerDBIS@TU Ilmenauwww.tu-ilmenau.de/dbis
![Page 2: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/2.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
Outline1. Requirements of Engineering
Applications2. Dataflow Programming in PipeFlow3. Query Compilation4. Outlook
2
![Page 3: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/3.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
Motivation• Data-intensive engineering applications
– Improved sensing technology– Complex models– „Smart“ devices
• Relying on database technology?– Not really, but Matlab & friends
3
![Page 4: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/4.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
• Find defects in electronic components• 3D surface measurement (x,y position, z measure)• Resolution: nm (100s MB - 10s TBs per scan)• Static data CAD circuit models (if available)
Nanopositioning Machine
(1) (2) (3)
4
![Page 5: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/5.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
Nanopositioning Machine
5
![Page 6: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/6.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
Camera Head
Stitching Rasterization
Normalization
Separation
Error Detection Error Model
Postprocessing External Info
Element Model
Nanopositioning Machine
6
![Page 7: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/7.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
• Identification of neurophysiologically active areas• N EEG/MEG sensors, data rate: 600-5000 Hz• Static brain model with K - M vertices, brain atlas• Complex analysis pipeline: signal filtering,
decomposition, matrix inversion, time-based folding• Indexing for brain model & decomposition dictionary
Online Source Localization
(4) (4) (4)(5)
7
![Page 8: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/8.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
• Blood simulation for medical filters• N particles of different kinds (blood, water, …)• Static scene with dynamic particles• KNN-queries with index-rebuild in each step
Particle Simulation
(6) (6)
8
![Page 9: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/9.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
DB Technology Inside?• Database technology = providing abstractions
– Data manipulation (query language etc.)– Hardware (storage, storage hierarchies, …)– Scalability (indexing, parallel processing,
MapReduce)• Simplifies development
– Code reuse– Providing correctness and performance
guarantees
9
But, are these still the right abstractions?
![Page 10: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/10.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
• Very large, spatiotemporal data sets• Large gaps between computer science and
engineering contexts– Domain-specific languages– Tooling ecosystem
• Reinventing the wheel– Data management algorithms– Specializations for parallel hardware– Optimizations
Lessons Learned
10
![Page 11: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/11.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
Requirements• Extensibility by user-defined operations
– Optimization, parallelization, recovery, …• Complex data types
– Spatio-temporal data, time series, matrices, …• Low latency, online processing
– Data stream processing, CEP• Dealing with uncertainty in measurements
and models
11
![Page 12: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/12.jpg)
Dataflow Programming in PipeFlow
Big Dynamic Engineering Data
![Page 13: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/13.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
• Model flow of data between transformation steps• Inject data management & processing primitives
– Partitioning & merging steps– Parallelization– Mapping to (specialized) hardware– Distributed workload management at cluster-scale– Flow optimizations– Fault tolerance
• Keeping framework usable for engineers– No new languages / paradigms– Reusable programs– Integration of domain-specific data types & libraries
Dataflow Programming
13
![Page 14: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/14.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
QueryQ
uery
DSMS
Query
DBMS
Query
Source SinkOperator
Store & Process Continuous Query Processing
Publish-Subscribe Pattern
typed pipes
Programming Model
14
![Page 15: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/15.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
embedded in application
standalone process
distributed processesin clusters
Dataflowspecification
C++ code generation
PipeFlowCompiler
graph checking and rewriting
PipeFlow
15
![Page 16: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/16.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
�I(t)
�P (t)
�O(t)
�S(t)
λ
τi τp
... ...
Tfuture tuples processed tuples
Input Queue Operator States Output QueueT T
• Encapsulated functionality– Computation
(possibly stateful)– Implementation in domain-
specific languages– Specialization for hardware
platforms (CPU/GPU)– Wrapper for library functions
• Meta-information– Typed input, output and
parameter channels– State handling– Operator location– Profiling information
Operator Model
16
![Page 17: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/17.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
PipeFlow: Overview• Dataflow specification language inspired by Pig• Specification = sequence of operators connected by
typed „pipes“• Large set of predefined operators (sources, joins,
aggregation, windows, CEP, …)
17
$pipe1 := operator1(…) params;$pipe2 := operator2(…) params;$out := operator3($pipe1, $pipe2, …) params;
• Supported by the PipeFabric engine: C++ library of operator templates & utility functions
![Page 18: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/18.jpg)
Query Compilation
![Page 19: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/19.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
PipeFlow: Operators & Expressions
• Type-specific template instantiation for operators
• Expressions are compiled into native code
19
typedef Tuple<int, double, MatrixXd> MyTuple;auto op = new Filter<MyTuple>([&](MyTuplePtr tp) { return std::get<0>(*tp) > 10; });
$o := filter($in) by i1 > 10;
![Page 20: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/20.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
Dataflow Parallelization• Providing abstraction for data parallelism
• Partitioning of input data stream: tuple-wise, batch-wise, column-wise
• Execution environment: threads for multi-core CPU, threads for GPU, distributed processes for compute cluster
• Result merging• Supporting user-defined operators!
20
Split Merge
Op1
Opn
…
Semantics
![Page 21: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/21.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
Parallelization in PipeFlow• Make parallelization explicit but hide the
implementation details by a parallelize operator
21
define calc_statistics ($in) returns $out { $x := myOp($in); $out := mySecondOp($x) ...;};
$res := parallelize($in) on slice(x) do calc_statistics using (mode = "thread", partitions = 10);
![Page 22: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/22.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
Slice, Split & Merge• Physical algebra operators
• Slice := split a single tuple or tuple value into multiple instances, i.e. vertical partitioning, vector/matrix decomposition
• Scatter:= route tuples to subqueries based on PartitionID• Gather:= collect partial results from parallel subqueries• Merge:= combine partial results, i.e. merge streams, join
tuple components or even values, final aggregation
22
Scatter Gather
Op1
Opn
…Slice
ExchangeExchange
Merge
SliceFunc ScatterFn MergeFnGatherFn
![Page 23: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/23.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
GPU Processing
23
• GPU: vector processor attached to host system
• SIMD / SIMT operations• Input copy -> compute -> output copy
Host SystemPCIe Bus
(7)
![Page 24: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/24.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
GPU Processing Example
24
+*
V1
V2
c
R
• Vector addition and scalar multiplication• R = (V1 + V2) * c• Generate parallel GPU Kernel• Parallelize for multiple GPUs
![Page 25: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/25.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
GPU Processing Example
25
• Fine granular parallelism: SIMD• Vectorize on element index• Mapping to thread ID
template< typename E >__global__ void gpuVecAddMul(c,v1,v2,scatter,gather){ int t = calculateThreadID(); // consider grid E e1 = scatter(v1,t); // read v1[t] E e2 = scatter(v2,t); // read v2[t] E r = c * (e1 + e2); gather(r,t); // write to result[t]};
![Page 26: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/26.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
GPU Processing Example
26
• Coarse granular parallelism: multi-GPU• Partition input vectors• Mapping (disjoint) partitions to each GPU• Collect & merge (partial) results
template< typename E >void multiGpuVecAddMul(c,v1,v2,slice,scatter,proc,gather,merge){ slices = slice(c,v1,v2); // partition input vectors scatter(slices,gpus); // host-to-device copy partitions results = in_parallel_do(gpuVecAddMul(...)); // launch kernels collected = gather(results); // device-to-host copy merge(collected); // combine in result vector and publish};
![Page 27: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/27.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
Query Compilation: Rewriting
27
datatype operation slice scatter gather merge
atomic filter, projection, ..
stream hash, key
union -
atomic aggregate stream hash, key
union post-aggregation
vector,matrix
+, scalar mult.
1-dim decomposition
slice-id union compostion
matrix advanced decomposition slice-id union problem-specific
• Option 1: determine functions based on datatype + operation + X? automatically
• Option 2: user-provided functions
![Page 28: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/28.jpg)
Outlook
![Page 29: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/29.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
• Modules for domain-specific types– First-class types, e.g., events, signals, images, matrices, tensors,
graphs, …– Library integration, e.g., OpenCV, Pregel, …– Modeling uncertainty
• Aspect-orientation for injecting data management routines– Functional models as monads, arrows– Parallelization primitives – Automatic & manual partitioning– Elastic scaling
• Multi-level optimizations– Rule-based graph optimizations– Domain-specific optimization rules– Machine-specific optimizations for hardware
What’s next?
29
![Page 30: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/30.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
Discussion
30
![Page 31: Query Compilation of Dataflow Programs for Heterogeneous Platforms … · 2014. 10. 7. · Workshop on System Software Support for Big Data 25.09.2014 Felix Beier / 29 Query Compilation](https://reader033.vdocuments.us/reader033/viewer/2022060903/609f3a94abce591cc324c20c/html5/thumbnails/31.jpg)
Workshop on System Software
Support for Big Data / 2925.09.2014 Felix Beier
Query Compilation of Dataflow Programs for Heterogeneous Platforms
References(1) http://www.tu-ilmenau.de/en/institute-of-process-
measurement-and-sensor-technology/research/nanopositionier-und-nanomesstechnik/
(2) http://www.tu-ilmenau.de/cc-npmm/projekte/(3) http://www.itwissen.info/definition/lexikon/Chip-chip.html(4) Dinh, C.; Rühle, J.; Bollmann, S.; Haueisen, J.; Güllmar,
D.: A GPU-accelerated Performance Optimized RAP-MUSIC Algorithm for Real-Time Source Localization. In Biomedizinische Technik (Berl.), 2012, 57
(5) http://people.ee.ethz.ch/~cattin/MIA-ETH/02-IntensityBasedRegistration-media/figs/labelled-brain.png
(6) M. Färber. Molecular dynamics simulation, 2013.(7) http://www.nvidia.de/object/tesla_c1060_de.html
31