full disjunctions: polynomial-delay iterators in action sara cohen technion israel yaron kanza...

60
Full Disjunctions Full Disjunctions : : Polynomial-Delay Iterators Polynomial-Delay Iterators in Action in Action Sara Cohen Sara Cohen Technion Israel Yaron Kanza Yaron Kanza University of Toronto Canada Benny Kimelfeld Benny Kimelfeld Hebrew University Israel Yehoshua Sagiv Yehoshua Sagiv Hebrew University Israel Itzhak Fadida Itzhak Fadida Technion Israel VLDB 2006 Seoul, Korea

Upload: amie-franklin

Post on 25-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full DisjunctionsFull Disjunctions:: Polynomial-Delay Iterators in ActionPolynomial-Delay Iterators in Action

Sara Cohen Sara Cohen Technion Israel

Yaron KanzaYaron KanzaUniversity of Toronto

Canada Benny Kimelfeld Benny Kimelfeld Hebrew University

Israel

Yehoshua SagivYehoshua SagivHebrew University

Israel

Itzhak FadidaItzhak FadidaTechnion Israel

VLDB 2006Seoul, Korea

Page 2: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Computing Full DisjunctionsComputing Full Disjunctions

The full disjunctionfull disjunction is a relational operator that maximally combines data from several relations– It extends the natural join by allowing incompleteness– It extends the binary outerjoin to many relations

This paper presents algorithms and optimizations for computing full disjunctions– Theoretically, full disjunctions are more tractable than

previously known– Practically, a significant improvement over the state-of-

art, an iterator-like evaluation

Page 3: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

ContentsContents

Full Disjunctions−Complexity

Contributions

Algorithms−Algorithm NLOJ for Tree-Structured Schemes

−Algorithm PDelayFD for General Schemes

−Algorithm BiComNLOJ − Main Algorithm

Experimental Results

Conclusion

Page 4: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

ContentsContents

Full DisjunctionsFull Disjunctions−Complexity

Contributions

Algorithms−Algorithm NLOJ for Tree-Structured Schemes

−Algorithm PDelayFD for General Schemes

−Algorithm BiComNLOJ − Main Algorithm

Experimental Results

Conclusion

Page 5: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

The Natural The Natural JoinJoin Operator Operator

CountryClimateCityHotelStarsSiteClimates Accommodations Sites

CountryClimateCanadadiverse

Bahamastropical

UKtemperat

e

ClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3

BahamasNassauHilton

Accommodations

CountryCitySiteCanadaLondonAir Show

CanadaMouth Logan

UKLondonBuckinghamUKLondonHyde Park

Sites

⋈ ⋈

CanadadiverseLondonRamad

a3Air

Show

Page 6: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

The Natural Join Misses InformationThe Natural Join Misses Information

CountryClimateCanadadiverse

Bahamastropical

UKtemperat

e

CountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3

BahamasNassauHilton

CountryCitySiteCanadaLondonAir Show

CanadaMouth Logan

UKLondonBuckinghamUKLondonHyde Park

Climates Accommodations

Sites

CanadadiverseLondonRamad

a3Air

Show

Climates Accommodations SitesCountryClimateCityHotelStarsSite

⋈ ⋈

Bahamas is not in SitesSites, so the natural join misses

it

Page 7: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

The Natural Join Misses InformationThe Natural Join Misses Information

CountryClimateCanadadiverse

Bahamastropical

UKtemperat

e

Climates AccommodationsCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3

BahamasNassauHilton

CountryCitySiteCanadaLondonAir Show

CanadaMouth Logan

UKLondonBuckinghamUKLondonHyde Park

CountryClimateCityHotelStarsSiteClimates Accommodations Sites

CanadadiverseLondonRamad

a3Air

Show

Bahamas is not in SitesSites, so the natural join misses

itMouth Logan is not in a

city, hence missed

⋈Empty space means null value

Page 8: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

The Natural Join Misses InformationThe Natural Join Misses Information

CountryClimateCanadadiverse

Bahamastropical

UKtemperat

e

Climates AccommodationsCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3

BahamasNassauHilton

A looser notion of join is needed—one that enables joining tuples from some of the tablesA looser notion of join is needed—one that enables joining tuples from some of the tables

CountryCitySiteCanadaLondonAir Show

CanadaMouth Logan

UKLondonBuckinghamUKLondonHyde Park

CountryClimateCityHotelStarsSiteClimates Accommodations Sites

CanadadiverseLondonRamad

a3Air

Show

⋈ ⋈

Bahamas is not in SitesSites, so the natural join misses

itMouth Logan is not in a

city, hence missed

Page 9: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

The Natural The Natural JoinJoin Operator Operator

CountryClimateCityHotelStarsSiteClimates Accommodations Sites

CountryClimateCanadadiverse

Bahamastropical

UKtemperat

e

ClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3

BahamasNassauHilton

Accommodations

CountryCitySiteCanadaLondonAir Show

CanadaMouth Logan

UKLondonBuckinghamUKLondonHyde Park

Sites

⋈ ⋈

CanadadiverseLondonRamad

a3Air

Show

A tuple of the join corresponds to a set of tuples from the source relations

Join consistentJoin consistent

ConnectedConnectedNo Cartesian product

CompleteCompleteOne tuple from each relation

Join consistentJoin consistent

ConnectedConnectedNo Cartesian product

CompleteCompleteOne tuple from each relation

Page 10: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Join-Consistent Sets of TuplesJoin-Consistent Sets of Tuples

A set T of tuples is join-consistent if every two tuples of T are join-consistent

Two tuples t1 and t2 are join-consistent if for every common attribute A:

1.1. t1[A] and t2[A] are non-null

2.2. t1[A] = t2[A]

CountryCityHotelStarsCanadaLondonRamada

CountryCitySiteCanadaLondonAir Show

Page 11: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Connected Sets of TuplesConnected Sets of Tuples

CountryClimateCanadadiverse

CountryCitySiteUKLondonBuckingham

The nodes are the tuples of T An edge between every two tuples with a common attribute

The join graph of a set T of tuples:

A set of tuples is connected if its join graph is connected

CityHotelStarsTorontoPlaza4

Page 12: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Natural Natural JoinJoin (w/o Cartesian Product) (w/o Cartesian Product)

T is join consistentT is join consistent1.1.

Each tuple of the result corresponds to aset T of tuples from the source relations

T is connectedNo Cartesian product

T is connectedNo Cartesian product

2.2.

T is completeOne tuple from each relation

T is completeOne tuple from each relation

3.3.

JCCJCC

Page 13: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

FullFull Disjunction Disjunction (Galindo-Legaria 1994)(Galindo-Legaria 1994)

T is join consistentT is join consistent1.1.

T is connectedNo Cartesian product

T is connectedNo Cartesian product

2.2.

T is completeOne tuple from each relation

T is completeOne tuple from each relation

3.3.

Each tuple of the result corresponds to a set T of tuples from the source relations

T is maximalNot properly contained in any JCC set

T is maximalNot properly contained in any JCC set

3.3.

JCCJCC

Page 14: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

An Example of a Full DisjunctionAn Example of a Full Disjunction

CountryClimateCanadadiverse

UKtemperate

ClimatesClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3

AccommodationsAccommodations

CountryCitySiteCanadaLondonAir Show

CanadaMouth Logan

UKLondonBuckingham

SitesSites

CountryClimateCityHotelStarsSiteFD(R)

R

Page 15: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

An Example of a Full DisjunctionAn Example of a Full Disjunction

CountryClimateCanadadiverse

UKtemperate

ClimatesClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3

AccommodationsAccommodations

CountryCitySiteCanadaLondonAir Show

CanadaMouth Logan

UKLondonBuckingham

SitesSites

CountryClimateCityHotelStarsSite

CanadadiverseTorontoPlaza4

FD(R)

R

Page 16: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

An Example of a Full DisjunctionAn Example of a Full Disjunction

CountryClimateCanadadiverse

UKtemperate

ClimatesClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3

AccommodationsAccommodations

CountryCitySiteCanadaLondonAir Show

CanadaMouth Logan

UKLondonBuckingham

SitesSites

CountryClimateCityHotelStarsSite

CanadadiverseTorontoPlaza4

CanadadiverseLondonRamad

a3Air Show

FD(R)

R

Page 17: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

An Example of a Full DisjunctionAn Example of a Full Disjunction

CountryClimateCanadadiverse

UKtemperate

ClimatesClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3

AccommodationsAccommodations

CountryCitySiteCanadaLondonAir Show

CanadaMouth Logan

UKLondonBuckingham

SitesSites

CountryClimateCityHotelStarsSite

CanadadiverseTorontoPlaza4

CanadadiverseLondonRamad

a3Air ShowCanadadiverse

Mouth Logan

FD(R)

R

Page 18: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

An Example of a Full DisjunctionAn Example of a Full Disjunction

CountryClimateCanadadiverse

UKtemperate

ClimatesClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3

AccommodationsAccommodations

CountryCitySiteCanadaLondonAir Show

CanadaMouth Logan

UKLondonBuckingham

SitesSites

CountryClimateCityHotelStarsSite

CanadadiverseTorontoPlaza4

CanadadiverseLondonRamad

a3Air ShowCanadadiverse

Mouth Logan

UKtempera

te

London Buckingha

m

FD(R)

R

Page 19: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

An Example of a Full DisjunctionAn Example of a Full Disjunction

CountryClimateCanadadiverse

UKtemperate

ClimatesClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3

AccommodationsAccommodations

CountryCitySiteCanadaLondonAir Show

CanadaMouth Logan

UKLondonBuckingham

SitesSites

CountryClimateCityHotelStarsSite

CanadadiverseTorontoPlaza4

CanadadiverseLondonRamad

a3Air ShowCanadadiverse

Mouth Logan

UKtempera

te

London Buckingha

m

FD(R)

R

Page 20: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Padding Joined Tuple Sets with NullsPadding Joined Tuple Sets with Nulls

CountryCitySite

CanadaMouth Logan

CountryClimate

Canadadiverse

Canadadiverse Mouth Logan

CountryClimateCityHotelStarsSite

Page 21: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

The Outerjoin OperatorThe Outerjoin Operator

The outerjoin of two relations R1 and R2

R1 R2

o⋈

The natural join R1 R2 and, in addition, all dangling tuples padded with nulls

Page 22: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Example of an Outerjoin Example of an Outerjoin

CountryClimate

Canadadiverse

Bahamastropical

UKtempera

te

ClimatesClimatesCountryCityHotelStars

CanadaToront

oPlaza4

FranceParisAtala 4

BahamasNassauHilton

AccommodationsAccommodations

CountryClimateCityHotelStars

CanadadiverseToront

oPlaza4

BahamastropicalNassauHilton

UKtemperat

e

FranceParis Atala4

Climates AccommodationsClimates Accommodationso⋈

Page 23: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Combining Relations using Outerjoins Combining Relations using Outerjoins

The outerjoin operator is not associativeFor more than two relations, the result depends on the order in which the outerjoin is applied

In general, outerjoins cannot maximally combine relations (no matter what order is used)

Outerjoin is not suitable for Outerjoin is not suitable for combining more than two relationscombining more than two relations!!

Outerjoin is not suitable for Outerjoin is not suitable for combining more than two relationscombining more than two relations!!

Page 24: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

ContentsContents

Full Disjunctions−ComplexityComplexity

Contributions

Algorithms−Algorithm NLOJ for Tree-Structured Schemes

−Algorithm PDelayFD for General Schemes

−Algorithm BiComNLOJ − Main Algorithm

Experimental Results

Conclusion

Page 25: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Efficiency of EvaluationEfficiency of Evaluation

The full-disjunction operator (as well as other operators

like the Cartesian product or the natural join) can generate an exponential (in the input size) number of tuples

Polynomial running time is not a suitable yardstick

The usual notion:

Polynomial time in the combined size of the input and the output

Page 26: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

History of Algorithms for Full DisjunctionsHistory of Algorithms for Full Disjunctions

SourceSource TimeTime DatabasesDatabases

RU96 O(n+F2) -acyclic

KS03 O(n5N2F2) general

CS05 O(n3NF2)“incremental polynomial”

general

n:N:F:

number of relationsnumber of tuples in the DBnumber of tuples in the FD

This paper: linear dependence on FThis paper: linear dependence on F

F is typically very large Can be exponential in the

size of the database

Page 27: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Polynomial DelayPolynomial Delay

One way to obtain an evaluation with a running time linear in the output is to devise an algorithm that actsas an iterator with an efficient next() operator, that is,

An enumeration algorithm that runs with polynomial delay

An enumeration algorithm runs with polynomial delay if the time between every two successive answers is polynomial in the size of the input

time

Page 28: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Other Benefits of Polynomial DelayOther Benefits of Polynomial Delay

Incremental evaluationIncremental evaluation First tuples are generated quickly

Full disjunctions are large, yet the user need not wait for the whole result to be generated

Suitable for Web applications, where users expect to get the first few pages quickly

In addition, the user can decide anytime that enough information has been shown

Enable parallel query processingEnable parallel query processing While one processor generates the FD tuples,

other processors apply further processing

Page 29: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

ContentsContents

Full Disjunctions−Complexity

ContributionsContributions

Algorithms−Algorithm NLOJ for Tree-Structured Schemes

−Algorithm PDelayFD for General Schemes

−Algorithm BiComNLOJ − Main Algorithm

Experimental Results

Conclusion

Page 30: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Main ContributionsMain Contributions

1.1. First algorithm for computing full disjunctions with polynomial delaypolynomial delay

2.2. First algorithm for computing full disjunctions in time linearlinear in the output

3.3. A general optimizationoptimization technique for computing full disjunctions

Division into biconnected components

Substantial improvement over the state-of-art is proved theoretically and experimentally

Page 31: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

ContentsContents

Full Disjunctions−Complexity

Contributions

AlgorithmsAlgorithms−Algorithm NLOJ for Tree-Structured Schemes

−Algorithm PDelayFD for General Schemes

−Algorithm BiComNLOJ − Main Algorithm

Experimental Results

Conclusion

Page 32: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Our Algorithms Our Algorithms

Algorithm NLOJNLOJTree Schemes

Algorithm NLOJNLOJTree Schemes

Algorithm PDelayFDPDelayFDGeneral Schemes

Algorithm PDelayFDPDelayFDGeneral Schemes

Division into Biconnected ComponentsBiconnected ComponentsOptimization

Division into Biconnected ComponentsBiconnected ComponentsOptimization

CombineCombine

Algorithm BiComNLOJBiComNLOJMain Algorithm − General Schemes

Algorithm BiComNLOJBiComNLOJMain Algorithm − General Schemes

Page 33: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

ContentsContents

Full Disjunctions−Complexity

Contributions

Algorithms−Algorithm Algorithm NLOJNLOJ for Tree-Structured Schemes for Tree-Structured Schemes

−Algorithm PDelayFD for General Schemes

−Algorithm BiComNLOJ − Main Algorithm

Experimental Results

Conclusion

Page 34: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Tree SchemesTree Schemes

R1

R2 R3

R4

R5

R6R7

Scheme graphs w/o cycles

In the scheme graph, the relation schemes are the nodes and there is an edge between every two schemes with one or more common attributes

Page 35: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Left-Deep Sequence of OuterjoinsLeft-Deep Sequence of Outerjoins

RR : a set of relations with a tree scheme

RR11,…,,…,RRnn :: a connected-prefix order of R

Algorithm NLOJ (Nested Loop OuterJoin)

1.1. Compute a connected-prefix order of R2.2. Apply outerjoins in a left-deep order

FD(R) = (…((R1 R2) R3) …) RnFD(R) = (…((R1 R2) R3) …) Rn

o⋈ o⋈ o⋈ o⋈Proposition:Proposition:

Page 36: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Connected-Prefix Order of RelationsConnected-Prefix Order of Relations

A connected-prefix order of relations:Each prefix forms a (connected) subtree

R1

R2 R3

R4

R5

R6R7

R1 R3 R2 R7 R4 R5 R6

Page 37: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Achieving Polynomial DelayAchieving Polynomial Delay

Algorithm NLOJ (Nested Loop OuterJoin)

1.1. Compute a connected-prefix order of R2.2. Apply outerjoins in a left-deep order

R1 R2

o⋈ R3

o⋈ Rn-1

o⋈ Rn

o⋈…

Already exponential size!

Problem:Problem: exp. delayProblem:Problem: exp. delay Solution:Solution: use iteratorsSolution:Solution: use iterators

Page 38: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

IteratorsIterators

AlgorithmAlgorithm

Operate on top of an enumeration algorithm

Implement next() by controlling the execution

To obtain polynomial delay, we use iterators

IteratorIteratornextnext()()

Page 39: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Using Iterators for OuterjoinsUsing Iterators for Outerjoins

Iterator 1Iterator 1

Iterator Iterator nn

Iterator 2Iterator 2

Iterator Iterator nn-1-1

R1 R2

o⋈ R3

o⋈ Rn-1

o⋈ Rn

o⋈…

Page 40: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Outerjoins are not Always ApplicableOuterjoins are not Always Applicable

It is not always possible to formulate a full disjunction as a left-deep sequenceof outerjoins

Rajaraman and UllmanRajaraman and Ullman [PODS 96]:: Some full disjunctions cannot be formulated as expressions of outerjoins (i.e., with arbitrary placement of parentheses)

Page 41: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

ContentsContents

Full Disjunctions−Complexity

Contributions

Algorithms−Algorithm NLOJ for Tree-Structured Schemes

−Algorithm Algorithm PDelayFDPDelayFD forfor GeneralGeneral SchemesSchemes

−Algorithm BiComNLOJ − Main Algorithm

Experimental Results

Conclusion

Page 42: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

About the AlgorithmAbout the Algorithm

Unlike NLOJ, the next algorithm, PDelayFD, is applicable to all schemes (and not just trees)

Algorithm PDelayFD has a polynomial delay, but the delay is larger than that of NLOJ

Nevertheless, PDelayFD by itself is a significant improvement over the state-of-art

Page 43: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Shifting a Maximal JCC Tuple Set Shifting a Maximal JCC Tuple Set TT

tt-shifting -shifting TT::

t t t

t-shift of T

1.1. Add t to T

2.2. Extract max. JCC subset containing t

3. 3. Extend to a maximal JCC setT

Page 44: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Validate that the t-shift is not already in Q or C

Algorithm Algorithm PDelayFDPDelayFD

1.1. Generate a max. JCC set T0

2.2. Insert T0 into Q

Repeat until Q is empty:

1.1. Move some T from Q to C

2.2. Print the join of T, padded with nulls

3.3. Insert into Q a t-shift of T for all tuples t in the database

OutputOutput:: …

PDelayFD(R) computesFD(R) with polynomial delay

Theorem:Theorem:

CQ

Page 45: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

ContentsContents

Full Disjunctions−Complexity

Contributions

Algorithms−Algorithm NLOJ for Tree-Structured Schemes

−Algorithm PDelayFD for General Schemes

−AlgorithmAlgorithm BiComNLOJBiComNLOJ − Main Algorithm− Main Algorithm

Experimental Results

Conclusion

Page 46: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

NLOJNLOJ vs. vs. PDelayFDPDelayFDR3

R5

R2

R9

R8

R7

R10

R4R6

R1

NLOJNLOJNLOJNLOJ PDelayFDPDelayFDPDelayFDPDelayFD

R3

R5

R2

R9

R8

R7

R10

R4R6

R1

R3

R5

R2

R9

R8

R7

R10

R4R6

R1

??

Our approach: divide and conquerdivide and conquer

Shorter delays Less space Simpler to impl.

Page 47: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Biconnected ComponentsBiconnected Components

R1

R2 R3

R4R7

R1 R2

R4

R7

R8

R9

R5

R6R3

R5

R6

R8

Biconnected componentBiconnected component::

A maximal subset B of relations, s.t. the scheme graph hastwo (or more) disjoint paths between every two relations of B

Page 48: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Left-Deep Sequence of OuterjoinsLeft-Deep Sequence of Outerjoins

RR : a set of relations

Theorem:Theorem:

Optimized Algorithm:Optimized Algorithm:

1.1. Compute the biconnected components of R2.2. Compute the full disjunction of each component3.3. Apply outerjoins in a suitable order

There exists an (efficiently computable) order B1,…,Bk of the biconnected components of R, s.t.

FD(R) = (…((FD(B1) FD(B2)) …) FD(Bk)

There exists an (efficiently computable) order B1,…,Bk of the biconnected components of R, s.t.

FD(R) = (…((FD(B1) FD(B2)) …) FD(Bk)o⋈ o⋈ o⋈

Page 49: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

BiComNLOJBiComNLOJ: a Naïve Attempt: a Naïve Attempt

1.1. Divide R into biconnected components

→ B1,…Bk in a suitable order

1.1. Divide R into biconnected components

→ B1,…Bk in a suitable order

2.2. Compute FD(B1),…,FD(Bk)

− using PDelayFDPDelayFD

2.2. Compute FD(B1),…,FD(Bk)

− using PDelayFDPDelayFD

3.3. Using NLOJNLOJ, compute (…((FD(B1) FD(B2)) …) FD(Bk)

3.3. Using NLOJNLOJ, compute (…((FD(B1) FD(B2)) …) FD(Bk)

Each FD(Bi) can be exponential in the input

Non-polynomial delay!Non-polynomial delay!

Iterator Iterator Iterator Iterator Iterator Iterator Solution:Solution:

o⋈ o⋈ o⋈

Page 50: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

• After generating a tuple t of FD(B1), we need to generate all tuples of FD(B2) that can join t

• Non-polynomial delay if all of FD(B2) is computed for finding these tuples!

• Solution:Solution: PDelayFD can be modified so that it generates only those tuples of FD(B2) that can join t

Retaining Polynomial Delay: 1Retaining Polynomial Delay: 1stst Problem Problem

For simplification, assume only two components

R2

R3R1

R4

R6

R7R5

R8B1 B2

Details in the proceedings…Details in the proceedings…Details in the proceedings…Details in the proceedings…

Page 51: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

• The last step is to generate all tuples of FD(B2) that cannot be joined with tuples of FD(B1)

• However, this task is by itself NP-hard!

• Solution: When generating all tuples of FD(B2) that can be joined with some tuple of FD(B1), we collect enough information for generating the remaining tuples of FD(B2)

Retaining Polynomial Delay: 2Retaining Polynomial Delay: 2ndnd Problem Problem

For simplification, assume only two components

Details in the proceedings…Details in the proceedings…Details in the proceedings…Details in the proceedings…

R2

R3R1

R4

R6

R7R5

R8B1 B2

Page 52: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

ContentsContents

Full Disjunctions−Complexity

Contributions

Algorithms−Algorithm NLOJ for Tree-Structured Schemes

−Algorithm PDelayFD for General Schemes

−Algorithm BiComNLOJ − Main Algorithm

Experimental ResultsExperimental Results

Conclusion

Page 53: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Experimental SettingExperimental Setting

Algorithms: PDelayFD, BiComNLOJ (main) IncrementalFD (CS05, state-of-art)

PosgreSQLPosgreSQL (open source)

HW:HW: Pentium4, 1.6GHZ, 512MB RAM

Implementation

R3

R1

R5R2

R4R6

R9

R8

R7

R10

Scheme S1

R3R1 R7R5

R8

R2

R4

R6

R10R9

Scheme S2

R2

R5

R1

R4

R9

R10R8

R7

R6R3

Scheme S3

• Synthetic data (randomly generated)

• Fixed schemes

Page 54: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

0

50

100

150

1000 2000 3000 4000 5000Number of Tuples in each Relation

Ave

rag

e D

ela

y (m

sec)

State-of-Art vs. Main AlgorithmState-of-Art vs. Main Algorithm

IncrementalFDIncrementalFD

(state of art, CS05)

BiComNJOJBiComNJOJour main algorithm

BiComNLOJBiComNLOJ is a substantial improvement over the state-of-art

BiComNLOJBiComNLOJ is a substantial improvement over the state-of-art

Scheme 11

Scheme 22

Scheme 33

Page 55: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

0

50

100

1000 2000 3000 4000 5000Number of Tuples in each Relation

Ave

rag

e D

ela

y (m

sec)

Division into Biconnected ComponentsDivision into Biconnected Components

Division reduces delaysDivision reduces delays(amount depends on the scheme)

Division reduces delaysDivision reduces delays(amount depends on the scheme)

PDelayFDPDelayFD

(no division to b.c.c.)

BiComNJOJBiComNJOJour main algorithm

Scheme 11

Scheme 22

Scheme 33

Page 56: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Behavior of DelayBehavior of Delay

IncrementalFDIncrementalFD

(state of art, CS05)

BiComNJOJBiComNJOJour main algorithm

0

50

100

150

200

0 5000 10000 15000Tuple Number

Del

ay (

mse

c)Measure the delay before

each generated tuple

While IncrementalFD has a slowdown, the delay of BiComNLOJ remains almost constant

While IncrementalFD has a slowdown, the delay of BiComNLOJ remains almost constant

Page 57: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

ContentsContents

Full Disjunctions−Complexity

Contributions

Algorithms−Algorithm NLOJ for Tree-Structured Schemes

−Algorithm PDelayFD for General Schemes

−Algorithm BiComNLOJ − Main Algorithm

Experimental Results

ConclusionConclusion

Page 58: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

SummarySummary

Full DisjunctionFull Disjunction::

An associative extension of the outerjoin operator to an arbitrary number of relations

33 Algorithms for computing FD: Algorithms for computing FD:

NLOJNLOJNested-Loop

Outerjoin

Tree-Structured Schemes

NLOJNLOJNested-Loop

Outerjoin

Tree-Structured Schemes

PDelayFDPDelayFDPolynomial-Delay Full Disjunction

General Schemes

PDelayFDPDelayFDPolynomial-Delay Full Disjunction

General Schemes

BiComNLOJBiComNLOJCombine first 2, deploy

div. into biconnected components

General Schemes

BiComNLOJBiComNLOJCombine first 2, deploy

div. into biconnected components

General Schemes

Page 59: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

ContributionsContributions

Substantial improvement of evaluation timeimprovement of evaluation time over the state-of-art Proved theoretically and experimentally

Full disjunctions can be computed with polynomial polynomial delaydelay and in time linearlinear in the output size

OptimizationOptimization techniques for computing FDs

Implementation within PostgreSQL PostgreSQL (ongoing…)

Incorporating our algorithms into an SQL optimizerSQL optimizer E.g., some operators can be pushed through the FD Not discussed here, appears in the proceedings…

Page 60: Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06

Thank you.Thank you.

Questions?