compilation of socialite, dedalus & webdamlog a spontaneous talk

67
Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Upload: onawa

Post on 06-Feb-2016

26 views

Category:

Documents


1 download

DESCRIPTION

Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk. Christoph Koch EPFL DATA Lab [email protected] http://data.epfl.ch. Warning: I decided to prepare this talk today at 1:30pm !!!111!!. This talk. Some of you know the DBToaster project: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation of Socialite, Dedalus & WebDAMLog

A Spontaneous Talk

Christoph KochEPFL DATA Lab

[email protected]://data.epfl.ch

Page 2: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

This talk

Warning: I decided to prepare this talk today at 1:30pm !!!111!!

2

Page 3: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Goal

• Some of you know the DBToaster project:

• Aggressive incremental view maintenance + compilation.

• Recently, we have made our compiled code much faster.

• New goal: compile languages with recursion:

• Socialite (Lam/Stanford): graph analytics

• Dynamic languages: WebDAMLog, Bloom, Dedalus

• But: this work is not ready. I will only sketch challenges and tell you about our status.

• A talk to friends on work in progress is probably better than utterly boring them with something unrelated to this workshop.

3

Page 4: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Talk Overview

• Classic DBToaster

• The new backend: Scala & LMS

• Towards a compiler for WebDAMLog

4

Page 5: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Use Cases of DBToaster

• Online/real-time analytics

• Real-time data warehousing, network/financial policy monitoring, clickstream analysis, spyware , order-book trading, …

• Stream/CEP abstractions (window semantics, “finite” automata) often not appropriate.• Combine stream with historical data• E.g. order books are not windows

5

Page 6: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Incremental View Maintenance

• Given a DB update, do not recompute the view from scratch.

• Perform only the work needed to update the view.

• Compute delta query:• Delta queries work on less data; also slightly simpler.• But: we still need a classical query engine for processing the delta

queries.

CREATE MATERIALIZED VIEW empdepREFRESH FAST ON COMMIT ASSELECT empno, ename, dnameFROM emp e, dept dWHERE e.deptno = d.deptno;

(Example in Oracle)

[Roussopoulos, 1991; Yan and Larson, 1995;Colby et al, 1996; Kotidis and Roussopoulos, 2001; Zhou et al, 2007]

Page 7: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Recursive incremental processing

Page 8: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compiling incremental view maintenance

Page 9: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

A ring of relations

Page 10: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Aggregation Calculus (AGCA)

Page 11: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Deltas of AGCA queries; closure

AGCA is closed under taking deltas!

Page 12: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Degrees of deltas; high deltas are independent of the database

Page 13: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation Example

q[] = select sum(LI.P * O.XCH) from Order O, LineItem LI where O.OK = LI.OK;

Page 14: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation Example

q[] = select sum(LI.P * O.XCH) from Order O, LineItem LI where O.OK = LI.OK;

+O(xOK, xCK, xD, xXCH) q[] +=

select sum(LI.P * O.XCH) from {<xOK, xCK, xD, xXCH>} O, LineItem LI where O.OK = LI.OK;

+LI(yOK, yPK, yP) q[] += ...

Page 15: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation Example

q[] = select sum(LI.P * O.XCH) from Order O, LineItem LI where O.OK = LI.OK;

+O(xOK, xCK, xD, xXCH) q[] +=

select sum(LI.P * xXCH) from LineItem LI where xOK = LI.OK;

+LI(yOK, yPK, yP) q[] += ...

Page 16: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation Example

q[] = select sum(LI.P * O.XCH) from Order O, LineItem LI where O.OK = LI.OK;

+O(xOK, xCK, xD, xXCH) q[] += xXCH *

select sum(LI.P) from LineItem LI qO[xOK] where xOK = LI.OK;

+LI(yOK, yPK, yP) q[] += ...

Page 17: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation Example

q[] = select sum(LI.P * O.XCH) from Order O, LineItem LI where O.OK = LI.OK;

+O(xOK, xCK, xD, xXCH) q[] += xXCH * qO[xOK]; foreach xOK: qO[xOK] = select sum(LI.P) from LineItem LI where xOK = LI.OK;

+LI(yOK, yPK, yP) q[] += ...

Page 18: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation Example

q[] = select sum(LI.P * O.XCH) from Order O, LineItem LI where O.OK = LI.OK;

+O(xOK, xCK, xD, xXCH) q[] += xXCH * qO[xOK];+LI(yOK, yPK, yP) foreach xOK: qO[xOK] += select sum(LI.P) from {<yOK, yPK, yP>} LI where xOK = LI.OK;

+LI(yOK, yPK, yP) q[] += ...

Page 19: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation Example

q[] = select sum(LI.P * O.XCH) from Order O, LineItem LI where O.OK = LI.OK;

+O(xOK, xCK, xD, xXCH) q[] += xXCH * qO[xOK];+LI(yOK, yPK, yP) foreach xOK: qO[xOK] += select yP

where xOK = yOK;

+LI(yOK, yPK, yP) q[] += ...

Page 20: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation Example

q[] = select sum(LI.P * O.XCH) from Order O, LineItem LI where O.OK = LI.OK;

+O(xOK, xCK, xD, xXCH) q[] += xXCH * qO[xOK];+LI(yOK, yPK, yP) qO[yOK] += yP;

+LI(yOK, yPK, yP) q[] += ...

Page 21: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation Example

q[] = select sum(LI.P * O.XCH) from Order O, LineItem LI where O.OK = LI.OK;

+O(xOK, xCK, xD, xXCH) q[] += xXCH * qO[xOK];+LI(yOK, yPK, yP) qO[yOK] += yP;+LI(yOK, yPK, yP) q[] +=

select sum(LI.P * O.XCH) from Order O, {<yOK, yPK, yP>} LI where O.OK = LI.OK;

Page 22: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation Example

q[] = select sum(LI.P * O.XCH) from Order O, LineItem LI where O.OK = LI.OK;

+O(xOK, xCK, xD, xXCH) q[] += xXCH * qO[xOK];+LI(yOK, yPK, yP) qO[yOK] += yP;+LI(yOK, yPK, yP) q[] +=

select sum( yP * O.XCH) from Order O where O.OK = yOK;

Page 23: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation Example

q[] = select sum(LI.P * O.XCH) from Order O, LineItem LI where O.OK = LI.OK;

+O(xOK, xCK, xD, xXCH) q[] += xXCH * qO[xOK];+LI(yOK, yPK, yP) qO[yOK] += yP;+LI(yOK, yPK, yP) q[] += yP *

select sum( O.XCH) from Order O where O.OK = yOK;

Page 24: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation Example

q[] = select sum(LI.P * O.XCH) from Order O, LineItem LI where O.OK = LI.OK;

+O(xOK, xCK, xD, xXCH) q[] += xXCH * qO[xOK];+LI(yOK, yPK, yP) qO[yOK] += yP;+LI(yOK, yPK, yP) q[] += yP * qLI[yOK];

select sum( O.XCH) from Order O qLI[yOK] where O.OK = yOK;

Page 25: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation Example

q[] = select sum(LI.P * O.XCH) from Order O, LineItem LI where O.OK = LI.OK;

+O(xOK, xCK, xD, xXCH) q[] += xXCH * qO[xOK];+LI(yOK, yPK, yP) qO[yOK] += yP;+LI(yOK, yPK, yP) q[] += yP * qLI[yOK];+O(xOK, xCK, xD, xXCH) foreach yOK: qLI[yOK] += select sum( O.XCH) from {<xOK, xCK, xD, xXCH>} O where O.OK = yOK;

Page 26: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation Example

q[] = select sum(LI.P * O.XCH) from Order O, LineItem LI where O.OK = LI.OK;

+O(xOK, xCK, xD, xXCH) q[] += xXCH * qO[xOK];+LI(yOK, yPK, yP) qO[yOK] += yP;+LI(yOK, yPK, yP) q[] += yP * qLI[yOK];+O(xOK, xCK, xD, xXCH) foreach yOK: qLI[yOK] += select xXCH

where xOK = yOK;

Page 27: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Compilation Example

q[] = select sum(LI.P * O.XCH) from Order O, LineItem LI where O.OK = LI.OK;

+O(xOK, xCK, xD, xXCH) q[] += xXCH * qO[xOK];+LI(yOK, yPK, yP) qO[yOK] += yP;+LI(yOK, yPK, yP) q[] += yP * qLI[yOK];+O(xOK, xCK, xD, xXCH) qLI[xOK] += xXCH;

• The triggers for incrementally maintaining all the maps run in constant time!

• No nonincremental algorithm can do that!

Page 28: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Parallel complexity of compiled queries

Page 29: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

The DBToaster compiler: status

• We are able to maintain queries with nested aggregates!

• Query optimization problem interesting

• Nesting, correlated variables, maintaining their domains.

• See our VLDB 2012 paper.

• Scala and C code generators (Scala is faster!)

• Released at www.dbtoaster.org

• Three parallel runtime systems:• Using Spark (bad), Storm/Squall (data flow parallelism), Cumulus (message

passing)

• Released soon.

29

Page 30: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

DBToaster Experiments

Both classical view maintenance and compilation of main-memory operators 1-4 orders of magnitude slower.

Page 31: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Talk Overview

• Classic DBToaster

• The new backend: Scala & LMS

• Towards a compiler for WebDAMLog

31

Page 32: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

DSL Compiler Construction

• We will live in a fast-growing ecosystem of DSLs• Quickly developed and revised

• We need tools for quickly creating programming environments.• Compilers, VMs, …

• Lightweight Modular Staging (LMS)

32

Page 33: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Lightweight Modular Staging (LMS)• A compiler as a library. Easy to extend.

• Leverages Scala virtualization.

• Staging: build (JIT) compilers, interpreters, static analysis systems, etc.

[Odersky, Rompf. Lightweight Modular Staging, CACM]

Page 34: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

In the LMS Library

• Duplicate expression elimination.

• Dead code elimination

• Loop fusion

• Loop and recursion unrolling.

• Code generators.

• All this for general Scala (!)

• Example: D/FFT: turn recursive def into circuit automatically.

Delite (Stanford PPL/EPFL): built on top of LMS• further code generators (C/CUDA, …)• example DSLs (OptiML, OptiQl, OptiGraph, GreenMarl, …)

Page 35: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Current work on LMS

• Automatic lifting of collection types, functions, classes

• Fusion in the presence of side-effects

• Global, cost-based optimization

• Lightweight embeddings (YinYang)

• Joint work of Oracle Labs, Odersky’s Lab (Scala), and us• PhD students V. Jovanovich, A. Shaikhha, and M. Dashti.• Staff researchers A. Noetzli, T. Coppey

Page 36: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

javac, Graal, and LMS

• Oracle is working on replacing the current javac by a completely new compiler.• Codename: Graal

• Graal is based on LMS: it will be able to stage compilation in various ways.• Typically it will JIT to native and won’t need a VM.• Adaptivity as in the HotSpot VM.

Page 37: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

DBToaster backend in LMS (prelim)

• Matches or beats our handwritten backend

• One order of magnitude less code (15kLOC vs 2kLOC)

37

Page 38: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

We observed that Scala is faster than C++, why

• Boost/C++:

• Add     : 0.309

• Clear   : 0.400

• Aggr    : 0.090

• Update  : 0.107

38

JVM/Oracle Java 7:

•Add     : 0.215

•Clear   : 0.001

•Aggr    : 0.021

•Update  : 0.072

Hashmap costs

Page 39: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Talk Overview

• Classic DBToaster

• The new backend: Scala & LMS

• Towards a compiler for WebDAMLog

39

Page 40: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Domain Maintenance in DBToaster: a Datalog problem

• Domain maintenance: deciding what to have and keep in the domains of the hashmaps.

• Why important: window queries, where the window of relevant values moves:• e.g. sliding k-day median price of a stock.

• IVM is about memoization, but sometimes it is important to remove data because it is costly to keep maintaining them.• “View caches”

40

Page 41: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Domain Maintenance in DBToaster: a Datalog problem

What are the domains of these maps?•In general there are binding patterns on the domains.

Example: Q[x,y] <- R(x), (S(y), x<y). Map m[x^b, y^f] = S(y), x<y.

•The programs have been recursively decomposed by the DBToaster rewriting algo.•Determining the the domains of maps is essentially magic sets/QSQ, where we look for the supplementary relations!

Page 42: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

IVM and datalog

• We can rewrite datalog programs using our techniques rule by rule. No new ideas needed.• Seminaive evaluation is classical IVM on datalog rules.• This does everything seminaive eval does, and (much)

more.• Can we do better? How?

• If we want to support deletion, we get to bag semantics.• But we don’t know how to deal with bags and recursion.

42

Page 43: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

IVM and datalog

• Joint work with V. Kuncak: synthesis of incremental programs.• Search for programs achieving our goal, rather than

syntactically rewriting them.• Already works for insertions into various (balanced) tree

data structures (from invariants!).

• Efficient runtime verification (in Java, for instance).• Incrementally maintaining assertions.• This is getting hot in PL.

43

Page 44: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Notions of recursion

• Datalog recursion – additional querying power• E.g. computing graph queries -- Graphlog, Socialite

• Modeling Dynamics – database in each iteration corresponds to a different time point/state.• Bloom, WebDAMLog: capturing distributed computation, nw

protocols• Triggers that call triggers? cf. Active databases.

44

Page 45: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Challenges

• Triggers that call triggers: which notion of consistency, if you want to capture both classical recursion for its expressiveness and distributed dynamic computations?

• Aggregates(bag semantics) and fixpoints.• You need bag semantics and aggregates for interesting

analytics.• But then we generally don’t have fixpoints (see e.g. Val’s

work) and it’s too hard to tell when we are not in trouble.

45

Page 46: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Challenges

• We can think of it as a relational programming language and leave all worries about consistency and termination to the programmer.• It would be important to make guarantees though.

• Argh I am running out of time making these sli

46

Page 47: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Thank you!

Page 48: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Use case: Algorithmic trading

• High-frequency algorithmic trading is getting huge:• Q1/2009 in the US: 73% of all trades in equities.• ~15% annual profit margin across the industry!

• Algo execution practices:• Extremely low latencies: stock exchanges make 3ms guarantees• Algos run in data centers close to exchanges

• Algo development practices:• Algos and underlying models are constantly monitored, improved, and

experimented with in large simulations• Analytics have to be performed at very high data rates, beyond DBMS or

stream engines (expressiveness!). The fastest algo makes money.• Currently, algos written on a low level (C). Constant backtesting and

improvement means coding bottleneck: Many programmers needed, software crisis.

Page 49: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

A ring of relations

Page 50: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Parameterized GMRs

• A parameterized GMR is a function that takes an environment to a GMR.

• Can be used to model binding passing in query calculi.

• Conditions etc. do not have finite support/are not finite relations. Valuations of variables can be passed from the left.

• Still a ring!

• + and * strictly generalize monoid rings (GMRs) and thus union and join, resp.

50

Page 51: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Modeling binding passing in queries

Strictly generalizesmonoid rings andthus union& join!

Page 52: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Use case 1: Algorithmic trading

• Trading strategies can be implemented with most of the number crunching expressed as embedded SQL.

• DBToaster compiles the embedded SQL queries down to machine code and links it into the strategy code.

• DBToaster incrementally maintains the views at the update rate. Strategies can subscribe to changes in the views.

• Strategies perform buy/sell actions.

Page 53: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Use case: Algorithmic trading

• Trading input: order book data

• Classes of strategies: arbitrage, market making, frontrunning.

Static order book imbalance (SOBI)

Bids (buyers) and asks (sellers) order book data

Page 54: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

NC0C programs

Page 55: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Parallel complexity consequences

• In an NC0C statement, there is NO overlap among the variables occurring in distinct atomic rhs terms.

• Statements with for-loops are relational products, not joins: no filtering.

• Distributed implementation: no communication is needed to determine where changes have to be sent.

• Purely push-based implementation sends minimal amount of data.

• No need for bloom-joins or synchronization: lightweight eventual consistency approach!

Page 56: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

DBToaster Experiments

Page 57: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

DBToaster Experiments

Page 58: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

DBToaster Experiments

Page 59: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Systems efforts in the DBToaster Project

• The DBToaster engine for update streams• Applications: Low-latency algorithmic trading (order books),

clickstream analysis, …

• Cumulus:• OLAP using a distributed key-value store plus a very simple

message passing protocol for view maintenance.• Exact aggregate view maintenance in realtime!

• Squall:• A distributed online aggregation system based on Storm.

Page 60: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Use Case 2: OLAP with DBToaster

• Cumulus is a distributed shared-nothing substrate for running DBToaster code, used for an online main-memory OLAP system.

• Exploits the fact that NC0C programs admit embarrassing parallelism• Each map value to be written requires only a constant amount of work to

update!

Page 61: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Use case 3: Embedded systems

• Declarative programming for productivity

• Compilation to very efficient, low footprint code.

• Embedded devices:• Sensors; 3G phones, …

• Example: Android OS

Page 62: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Conclusions

• Work at the intersection of databases and compilers is hot!

• A clean foundation for incremental view maintenance (IVM) based on algebra.

• An aggressive IVM technique based on compilation• Can eliminate all joins• Embarassing parallelism (NC0)!

• Systems efforts towards ultra-high frequency view refreshment (tens of kHz), real-time analytics and data warehousing.

• The system has been released at www.dbtoaster.org

Page 63: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

What does Scala buy us (vs Java)?• Functional programming, pattern matching convenient

for compiler construction.

• Language virtualization• no need to write parsers if your language is a superset of

a subset of Scala.

• Mix-in composition• Easy to refine behavior in a clean, fine-grained way.

• Scala can be used as a scripting language.

+ All the benefits of Java (libs, linkability, …).

Page 64: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

DSL Embeddings• Shallow embedding: DSL programs are programs of the host

language that use a library of DSL operators.

• Deep embedding (e.g. LMS): create AST, manipulate AST.• Pros: Necessary for optimization, custom code generation• Cons: Compilation slower, error messages nearly unreadable.

• Yin-Yang: transplants/rewires shall DSL embeddings at compile time into to deep embeddings.• Faster program type checking, prototyping, and debugging.• Error messages much more readable.

V. Jovanovic, N. Pham, V. Nikolaev, V. Ureche, S. Stucki, K, M. Odersky, “Yin-Yang: Transparent Deep Embedding of DSLs”, under submission.

Page 65: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

LMS Example

Page 66: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

LMS Example

Page 67: Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk

Lifting Collection Libraries in LMS• Develop lift-friendly data structure (hashmaps, trees)

libraries.

• Extend LMS to be able to lift them whole.

• Inline library code completely.

Current work on the DBToaster compiler:

• Rewrite backend using LMS (currently: OCAML)

• Extend LMS to match the performance of our own codebase.

• Inline collections using LMS, row-store/column-store pivot.

Joint work with Y. Ahmad, M. Dashti, V. Jovanovich, O. Kennedy, A. Noetzli, T. Rompf.