on the use of query-driven xml auto-indexing€¦ · isp: 10 tables each having 8 columns and max 3...

55
University of Kaiserslautern On the Use of Query-driven XML Auto-Indexing Karsten Schmidt and Theo Härder SMDB'10 (ICDE), Long Beach – March, 1

Upload: others

Post on 20-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

On the Use of Query-driven XML Auto-Indexing

Karsten Schmidt and Theo Härder

SMDB'10 (ICDE), Long Beach – March, 1

Page 2: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Motivation – Self-Tuning '10● The last 10+ years

– Index tuning

What-if

Wizards, Guides, Druids

– Monitoring

– Workload analysis

– Static resources

– Single-user optimization

– …● Today: dynamic environments (queries, data, resources)

– Self-tuning has to be done Permanently and Online1

Page 3: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Motivation – Index Tuning● Which column(s) to index?

2

ID Name Age DepID Location

1 Norman 28 12A North

2 Charles 58 98 West

Page 4: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Motivation – Index Tuning● Which column(s) to index?

2

PK?

ID Name Age DepID Location

1 Norman 28 12A North

2 Charles 58 98 West

Page 5: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Motivation – Index Tuning● Which column(s) to index?

2

ID Name Age DepID Location

1 Norman 28 12A North

2 Charles 58 98 West

query-supportingcandidate?

Page 6: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Motivation – Index Tuning● Which column(s) to index?

2

ID Name Age DepID Location

1 Norman 28 12A North

2 Charles 58 98 West

multiple candidates?

Page 7: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Motivation – Index Tuning● Which column(s) to index?

2

ID Name Age DepID Location

1 Norman 28 12A North

2 Charles 58 98 West

multi-column indexor

multiple indexes ?

Page 8: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

ID Name Age DepID Location

1 Norman 28 12A North

2 Charles 58 98 West

ID Name Age DepID Location

1 Norman 28 12A North

2 Charles 58 98 West

Motivation – Index Tuning● Which column(s) to index?

2

ID Name Age DepID Location

1 Norman 28 12A North

2 Charles 58 98 West

ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes

redundancyand

maintenance!

Page 9: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

ID Name Age DepID Location

1 Norman 28 12A North

2 Charles 58 98 West

ID Name Age DepID Location

1 Norman 28 12A North

2 Charles 58 98 West

Motivation – Index Tuning● Which column(s) to index?

2

ID Name Age DepID Location

1 Norman 28 12A North

2 Charles 58 98 West

ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes

● XML specifics on top of the ISP

– Semi-structured and schema changes

– Indexing may focus on structure and/or content

– Flexible path expressions (axes, wildcard, name)

→ containment problem and generalized indexes

→ descendant axis is evil, but wildcards too

– Node types (elements, attributes, text, …)

Page 10: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Outline

● XML Index Types● Statistics for Cost-based Index Selection● Index Configuration Management● Evaluation● Summary & Outlook

Page 11: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Outline

● XML Index Types● Statistics for Cost-based Index Selection● Index Configuration Management● Evaluation● Summary & Outlook

Page 12: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

XML Index Types - Storage● XML Mapping

– Native

– No Shredding or Blob

● Fundamentals– Node labeling

– B-tree

4

Page 13: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

XML Index Types - Storage

5

bib

publication

book book book . . .

year id title

1994 1 TCP/IP

author

last first

Stevens W.

price

65.95

Structure

content

. . .

● XML Mapping Elementless

Page 14: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

XML Index Types - Storage

5

1

1.3

bib

publication

1.3.3 book 1.3.5 book 1.3.7 book . . .

1.3.3.1year

1.3.3.2.1id

1.3.3.3title

1994 1 TCP/IP

1.3.3.5author

1.3.3.5.3last

1.3.5.5first

Stevens W.

1.3.3.7price

65.95

1.3.3.1.3 1.3.3.2.1.3 1.3.3.3.3

1.3.3.5.3.3 1.3.3.5.5.3

1.3.3.7.3

Structure

content

. . .

● XML Mapping Elementless

Page 15: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

● XML Mapping Elementless 1

1.3

bib

publication

1.3.3 book 1.3.5 book 1.3.7 book . . .

1.3.3.1year

1.3.3.2.1id

1.3.3.3title

1994 1 TCP/IP

1.3.3.5author

1.3.3.5.3last

1.3.5.5first

Stevens W.

1.3.3.7price

65.95

1.3.3.1.3 1.3.3.2.1.3 1.3.3.3.3

1.3.3.5.3.3 1.3.3.5.5.3

1.3.3.7.3

Structure

content

. . .

XML Index Types - Storage

5

Page 16: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

● XML Mapping Elementless

Path Synopsis

bib

last

id titleyear price

book

publication

type author paperpublisher

name

journal

first loc titlename

1

1817

16

1514

13

12

11

109

87654

3

2

1

1.3

bib

publication

1.3.3 book 1.3.5 book 1.3.7 book . . .

1.3.3.1year

1.3.3.2.1id

1.3.3.3title

1994 1 TCP/IP

1.3.3.5author

1.3.3.5.3last

1.3.5.5first

Stevens W.

1.3.3.7price

65.95

1.3.3.1.3 1.3.3.2.1.3 1.3.3.3.3

1.3.3.5.3.3 1.3.3.5.5.3

1.3.3.7.3

Structure

content

. . .

XML Index Types - Storage

5

Page 17: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

● XML Mapping Elementless

content

4

1.3.3.5.3.3

. . .

1994

1.3.3.7.3

1.3.3.3.3 TCP/IP10

1.3.3.2.1.3

W.

DeweyID PCR

1.3.3.1.3 1.3.3.9.3

1.3.7.1.3 . . .

document container

documentindex1.3.9

1.3.3.9.3 1.3.11.1.3

1.3.11

. . .

9 Stevens

1.3.3.1.3

5 1 . . .7

1.3.3.5.5.311

. . .13

65.95 4

. . .

Path Synopsis + B-tree layout

bib

last

id titleyear price

book

publication

type author paperpublisher

name

journal

first loc titlename

1

1817

16

1514

13

12

11

109

87654

3

2

bib1

1994 1 TCP/IP

Stevens W.

65.95

1.3.3.1.3 1.3.3.2.1.3 1.3.3.3.3

1.3.3.5.3.3 1.3.3.5.5.3

1.3.3.7.3

Structure

content

XML Index Types - Storage

5

Page 18: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

● XML Mapping Elementless

content

4

1.3.3.5.3.3

. . .

1994

1.3.3.7.3

1.3.3.3.3 TCP/IP10

1.3.3.2.1.3

W.

1.3.3.1.3 1.3.3.9.3

1.3.7.1.3 . . .

documentindex1.3.9

1.3.3.9.3 1.3.11.1.3

1.3.11

. . .

9 Stevens

1.3.3.1.3

5 1 . . .7

1.3.3.5.5.311

. . .13

65.95 4

. . .

Path Synopsis + B-tree layout

bib

last

id titleyear price

book

publication

type author paperpublisher

name

journal

first loc titlename

1

1817

16

1514

13

12

11

109

87654

3

2

bib1

1994 1 TCP/IP

Stevens W.

65.95

1.3.3.1.3 1.3.3.2.1.3 1.3.3.3.3

1.3.3.5.3.3 1.3.3.5.5.3

1.3.3.7.3 content

DeweyID PCR

document container

Primary target: space efficiencySecondary: more indexing options for free!

XML Index Types - Storage

5

Page 19: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

XML Index Types● Content

6

● Element

Page 20: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

XML Index Types

● Content

● Path

6

● Element

● Content And Structure

● Content

● Path

● Element

● Content And StructurePath: /bib//title PCRs: → 5,11,33 Path: /bib//title PCRs: → 5,11,33

Page 21: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

XML Index Types

● Content

● Path

● Element

● Content And Structure

Path: /bib//title PCRs: → 5,11,33 Path: /bib//title PCRs: → 5,11,33

● Content

● Path

● Element

● Content And Structure

●Easy to define●Generic●Large●Maintenance costs

●Easy to define●Generic●False positive filtering●Large●Maintenance costs

●Specific●Clustering●Small●Hard to define

●Specific●Clustering●Medium size●No document order●Type support●Hard to define

6

Page 22: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

XML Index Types – Usage Sample● Simplified Query Graph Model for XMark 01

7

Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()

No Indexes: 3 x document scans 1 x document navigational access

Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()

Page 23: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

XML Index Types – Usage Sample

7

Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()

● Simplified Query Graph Model for XMark 01

Page 24: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

XML Index Types – Usage Sample

7

Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()

No Indexes: 3 x document scans 1 x document navigational access

Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()

● Simplified Query Graph Model for XMark 01

Page 25: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

XML Index Types – Usage Sample

7

Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()

Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()

● Simplified Query Graph Model for XMark 01

Page 26: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

XML Index Types – Usage Sample

7

Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()

● Simplified Query Graph Model for XMark 01

Page 27: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

XML Index Types – Usage Sample

7

Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()

● Simplified Query Graph Model for XMark 01

Page 28: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

XML Index Types – Usage Sample

Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()

● Simplified Query Graph Model for XMark 01

Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()

Besides element and content indexMore than 280 path or CAS indexes possible

by varying depth, axes (“/”,”//”), clustering, and wildcards

7

Page 29: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Outline

● XML Index Types● Statistics for Cost-based Index Selection● Index Configuration Management● Evaluation● Summary & Outlook

Page 30: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Statistics● Extended Path Synopsis

9

bib

last

id titleyear price

book

publication

type author paperpublisher

name

journal

first loc titlename

1

1817

16

1514

13

12

11

109

87654

3

2

bib1

Each node is extended:

● Index Statistics

Gathered during storage:●B-tree height●# leave pages●Cardinality●Index size

… h

pages, size

Instance counter | o content length | IUD counter

●Extended PS typically <0.1% of XML document●No optimization yet (e.g., recursion)●Processing overhead less than 5-6%

Overhead:

journal12

… …

Page 31: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Statistics● Cost Estimation for Index Candidates

● Estimation Accuracy

10

1.Evaluate Index Expression on Path Synopsis → PCR set2.For each PCR add node's statistics (cardinality, width, IUD counter)3.Index type-dependent estimation of height, no. of leaves, and size

Page 32: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Outline

● XML Index Types● Statistics for Cost-based Index Selection● Index Configuration Management● Evaluation● Summary & Outlook

Page 33: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Index Configuration Management● Integrated into XTC

backend

● Asynchronous jobs for

– Candidate search

– Index building● Flexible configuration

– Type support

– Pruning thresholds

– Size limits

– Job schedule

12

Page 34: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Index Configuration Management1. Record query processing costs

2. If cost > threshold do Auto Indexing

3. Feedback Auto Indexing costs

Auto Indexing

1. Traverse query plan bottom-up (access operators)

2. Generate index candidates

3. Rerun optimization including candidates

4. Analyze plan(s) for selected candidates

5. Update candidate set

Cost/benefit calculation → schedule index materialization jobs13

Page 35: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Candidate Generation

14

Auto Indexing – Traverse

Candidates:QGM representation

Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title

Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title

Page 36: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

14

Auto Indexing – Traverse

Candidates:QGM representation

Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title

1. //

Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title

Candidate Generation

Page 37: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

1. Record query processing costs

2. If cost < threshold do Auto Indexing

3. Feedback Auto Indexing costs

Auto Indexing

1. Traverse query plan bottom-up (access operators)

2. Create index candidates

3. Rerun optimization including candidates

4. Analyze plan(s) for selected candidates

5. Update candidate set

14

Auto Indexing – Traverse

Candidates:QGM representation

Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title

1. //2. book

Candidate Generation

Page 38: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

14

Auto Indexing – Traverse

Candidates:QGM representation

Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title

1. //2. book3. @id

Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title

Candidate Generation

Page 39: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

14

Auto Indexing – Traverse

Candidates:QGM representation

1. //2. book3. @id4. “book1”

Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title

Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title

Candidate Generation

Page 40: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

14

Auto Indexing – Traverse

Candidates:QGM representation

1. //2. book3. @id4. “book1”…

Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title

Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title

Candidate Generation

Page 41: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

14

Auto Indexing – Traverse

Candidates:QGM representation

1. //2. book3. @id4. “book1”…

Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title

Candidate Generation

Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title

Over 18 enumeration rules and optimization rules

Page 42: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Cost-Benefit Calculation

● Rank candidates by benefit

● Observe space limitation (Greedy algorithm)

● Search space pruning (merge indexes)

Containment Problem:

● Comparing PCR sets, but index types and semantics may differ

Reconfigure Index Set

● Schedule materialization and deletion jobs

Candidate Selection

Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title

15

Page 43: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Outline

● XML Index Types● Statistics for Cost-based Index Selection● Index Configuration Management● Evaluation● Summary & Outlook

Page 44: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Evaluation● Workload Documents

– XMark documents (12MB and 112MB)

– TPoX collection (~250MB)

– Standard XML documents (dblp, treebank, psd, …)

● Benchmark Scenarios– No Indexes

– Manual (element + content index)

– Manual + Self-tuning

– Self-tuning

17

Page 45: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Evaluation

18

Page 46: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Evaluation

18

Page 47: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Evaluation

18

(2)

Page 48: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Evaluation

18

(2)

Page 49: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Evaluation

18

(2)(2 + 23)

Page 50: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Evaluation

18

(2)(2 + 23)

Page 51: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Evaluation

18

(2)(2 + 23)

(23)

Page 52: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Evaluation

Impact of parallel indexing

19

Impact of aggressive indexing

Page 53: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Evaluation

20

Overhead in %

% of workload

Overhead / Pruning effects Workload shifts

Page 54: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

Summary● Self-tuning of XML indexes causes new challenges to the ISP

● Path Synopsis use for storage, indexing, and managing

● Overhead of statistics and management (mostly) pays off

● Self-tuning of tuning “frequency” and pruning are effective

● Integrate Update workload

● Analyze workload shift reaction (stability vs. effectiveness)

● Evaluate XML warehouse scenarios

● Integrate data placement decision

● Combine with buffer tuning

Outlook

21

Page 55: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured

University of Kaiserslautern

<www.xtc-project.de />

Finish