the history of datalog

Post on 10-Feb-2016

44 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

The History of Datalog. Origins Failure Resurrection. An Odd Encounter. Several years ago, I met a colleague, Monica Lam, in the hallway at Stanford. “I hear you were involved in the early work on Datalog.” - PowerPoint PPT Presentation

TRANSCRIPT

1

The History of Datalog

OriginsFailure

Resurrection

2

An Odd Encounter Several years ago, I met a

colleague, Monica Lam, in the hallway at Stanford.

“I hear you were involved in the early work on Datalog.”

She had discovered this work and used it in her system for large-scale data-flow analysis.

3

Odd Encounter – (2) The application is naturally recursive. Very large-scale (analyzed code of

800K lines). They (Monica and her student John

Whaley) had an implementation bddbddb that compiled Datalog rules into BDD’s (binary decision diagrams).

4

Where Did Datalog Come From?

1. Codd’s tuple and domain calculus (1972).

2. Gallaire and Minker’s “Logic and Databases” (1978).

3. Prolog (1976).

5

Codd’s Logics TRC. { t | R(r) and S(s) and t.A =

r.A and r.B = s.B and t.C = s.C } Implemented by Stonebraker as

QUEL. DRC. { ac | R(ab) and S(bc) }

Implemented by Zloof as Query-by-Example.

6

“Logic and Databases” Viewed queries as the result of an

entire logical theory. Thus allows recursion, negation,

theories with multiple minimal models.

Closed/open-world evaluations.

7

Prolog A conventional programming language

with predicates as function calls. Bizarre execution rule. Example: you have to write TC as:path(X,Y) :- arc(X,Y).path(X,Y) :- arc(X,Z),path(Z,Y).

8

Implementation of Logical Query Languages for

Databases In 1984 I took sabbatical at Hebrew

University and wrote a paper with the above title.

It has some crazy stuff that makes me wonder “what was I thinking?”

Much was fixed by others, later. Published in SIGMOD (no real

theorems!).

9

Implementation – (2) Key idea: Prolog notation + Horn-

clause, unique fixedpoint semantics.

Key idea: It’s about algorithms for query execution, not logical models. Original thought in that direction was

really by Henschen and Naqvi.

10

Enter “Datalog” The term “Datalog” to refer to

positive Horn clauses without function symbols was first proposed by Dave Maier and David S. (“the other”) Warren.

Appears in their book Programming with Logic (1988), but in common use before that.

11

Good Implementation Ideas

1. Seminaive evaluation (Bancilhon and Ramakrishnan, 1986 – also in SIGMOD).

2. Specialized linear-recursion implementations (many people including Naughton, Ramakrishnan, Sagiv, Vardi,…).

3. Magic sets (Beeri and Ramakrishnan, 1987 – finally something got into PODS).

12

Magic Sets A query-rewriting scheme. Similar in effect to a number of

query-execution ideas such as1. Query-Subquery (Rohmer,

Lescoeur, and Kerasit, 1986).2. Memoing (Dietrich and Warren,

1985).

13

Negation With negated subgoals in Datalog

Example: bachelor(X) :- male(X),NOT married(X,Y)

you run the risk of multiple minimal models.

Stratified model (Chandra-Harel, 1982; Apt, Blair, Walker, 1985).

Well-founded semantics (Van Gelder, Ross, Schlipf, 1988).

14

The Death of Datalog Recursion turned out not to be all

that important in the world of the 1980’s.

In the AI community, where logic was taken more seriously than in DB, the emphasis was on expressiveness, not tractability.

15

The Rebirth Datalog slept, but nothing could

take away its important virtues: Simplicity and declarativeness. Tractability. Simple execution engine.

While “rule-based systems” were long an AI staple, they never got these features of Datalog.

16

bddbddb Why did Monica Lam think of

Datalog for data-flow analysis? Classical DFA was for code

optimization. Only inner loops are important, so

data never needed to get really large.

17

bddbddb – (2) Monica was looking at a different

application: software security. Example: can a string read at one point

be passed to a SQL call without first being the argument of a function that checks safety?

Entire program analyzed as a whole. Example: 800K lines of Apache. Now it’s a database problem.

18

Overlog and Dedalus At about the same time, Joe

Hellerstein was experimenting with Datalog, first for prototyping and later for the real implementation.

General direction: protocols for distributed systems.

19

Overlog and Dedalus – (2) Two important additions: time and

space as first-class concepts. Example (space): Assume each

node has a table of arcs out. arc(@n, h) means the table at node n

contains an arc to node h.

20

Example – Continued Each node n computes the set of

nodes it can reach by consulting the reach sets for the nodes to which n has arcs.

reach(@n, m) :- arc(@n, h),reach(@h, m).

21

Some Other Datalog Directions1. Webdamlog (Abiteboul et al.,

these proceedings). Adds creation of rules at remote sites.

2. PrPl (Lam et al.). Social networking in Datalog.

3. SecPAL (Becker et al.). Microsoft authorization language

translated to Datalog.

22

Other Directions – (2)4. LogicBlox (Molham Aref, CEO).

Startup in Atlanta GA. One of several Datalog-based startups.

Uses Datalog for customized decision-support systems.

Many extensions, including controlled 2nd –order predicates.

Still has a tractable, straightforward execution model.

23

Conclusions Too early to tell how important

Datalog will be. Will simplicity and tractability beat

expressiveness? But moving in the right direction(s)

now. From Datalog 2.0 Workshop: needs

an open-source standard, like mySQL.

top related