university of washington database group the complexity of causality and responsibility for query...

19
University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer, Katherine Moore, and Dan Suciu http://db.cs.washington.edu/ causality/ 1

Upload: monique-martins

Post on 02-Apr-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 1University of WashingtonDatabase Group

The Complexity of Causality and Responsibilityfor Query Answers and non-Answers

Alexandra Meliou, Wolfgang Gatterbauer, Katherine Moore, and Dan Suciu

Page 2: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 2

Motivating Example: Explanations

?

QueryIMDB Database Schema

Relevant lineage: 137 tuples !!

“What genres does Tim Burton direct?”

Page 3: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 3

Example cont. (Musicals)

Ranking Provenance

important tuples

unimportant tuple

Goal:Rank tuples in order of importance

Page 4: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 4

Solution: Causality The fundamental question of causality:

“What is the cause of an effect?”

Causality theory has long been studied in AI and philosophy. [Lewis73, EiterLucasiewicz02, HalpernPearl05, Menzies08]

Offers a metric (responsibility) for measuring the contribution of a variable to an outcome

ranking[ChocklerHalpern04]

Page 5: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 5

Contributions We suggest responsibility as an effective measure for ranking

provenance. Explanations Error tracing

We define causality and responsibility in a database context.

Complete complexity analysis for computing causality and responsibility for the case of conjunctive queries without self-joins Interesting dichotomy result. Non-trivial algorithm for computing responsibility in the PTIME cases.

Page 6: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 6

Endogenous/exogenous tuplesPartition the data into 2 groups: Exogenous tuples (denoted by )

tuples that we consider correct/verified/trusted. They are not candidate causes

E.g. the Genre, and Movie_Director tables Endogenous tuples (denoted by )

Untrusted tuples, or simply of interest to the user. They are potential causes

E.g. the Director and Movie tables

Page 7: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 7

Counterfactuals A variable is a counterfactual cause if a change

in its value, changes the value of the result E.g.

Limitations: disjunctive causes E.g.

A and B are both counterfactual causes of C

Page 8: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 8

Contingencies Generalize counterfactual causes

A contingency is a hypothetical setting of the endogenous variables that makes a tuple counterfactual

A is a cause under the contingency B=0

Page 9: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 9

Responsibility (intuition) Measures the degree of causality, the

contribution of a tuple

A larger contingency, means a tuple has smaller degree of causality

Counterfactual causes have the most contribution (empty contingency set)

Page 10: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 10

Causality for Conjunctive Queries

Definition: Causality

(contingency)

Definition: Responsibility

Intuition: If the removal of t removes the answer, then t is counterfactualIf there is a set of tuples whose removal makes t counterfactual, t is a cause

Intuition: The more tuples that need to be removed, the less important t is

(an answer to q)(endogenous tuple)(database)

(endogenous tuples)

Page 11: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 11

ExampleQuery:

Database:

Lineage expression:(Datalog notation)

Responsibility:

Assume all endogenous

NOTE: If is exogenous, is not a cause.

Page 12: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 12

Complexity Results (Data Complexity)

dichotomy

answers non-answers

Page 13: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 13

Responsibility: PTIME Queries Assume conjunctive queries with no self joins

A simple case:

The lineage of q will be of the form:

What is the responsibility of

PTIME

Page 14: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 14

Responsibility: PTIME Queries More interesting:

easy ✔

Intuition: a cut in the graph interrupts the s-t flow. The addition of t re-instantiates it.

t becomes counterfactual*

*

(R tuples) (S tuples)

Page 15: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 15

Responsibility: Hard Queries

endogenous

If unspecified, it could be either

Theorem: The following queries are NP-hard:

Page 16: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 16

Query Dual Hypergraph

Query hypergraph

Query dual hypergraph

Definition: Linear QueriesThere exists an ordering of the nodes of the dual hypergraph, such that every hyperedge is a consecutive subsequence.

Theorem:Computing responsibility for all linear queries is in PTIME.

None of these are linear

Page 17: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 17

Weakenings

R is exogenous, and therefore its tuples cannot be part of the contingency set

Expand R with the domain of z. Responsibility of T tuples is not affected! Dissociation

PTIME

NP-hard

Page 18: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 18

Responsibility Dichotomy

Dichotomy Theorem:(data complexity)

• If q is weakly linear, then computing responsibility for q is in PTIME

• If q is not weakly linear, then it is NP-hard

Definition: Weakly Linear QueriesA query is weakly linear, if there exists a set of weakenings that leads to a linear query

Page 19: University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

http://db.cs.washington.edu/causality/ 19

Conclusions Defined causality and responsibility for

conjunctive queries Complete complexity analysis for CQ without

self-joins Interesting dichotomy result Non-trivial algorithm for PTIME cases

Open problem: Self-joins