a transducer-based xml query processor

46
A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD

Upload: deacon

Post on 29-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

A Transducer-Based XML Query Processor. Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD. Overview. Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Transducer-Based XML Query Processor

A Transducer-Based XML Query Processor

Bertram Ludäscher, SDSC/CSE UCSD

Pratik Mukhopadhyay, CSE UCSDYannis Papakonstantinou, CSE UCSD

Page 2: A Transducer-Based XML Query Processor

Overview

Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

Page 3: A Transducer-Based XML Query Processor

Efficient Processing of Sequentially Accessed XML Data

XML MessageTransformer

TransformedXML message

WebService

XMLmessage

Web Service Implementations & RMI

Page 4: A Transducer-Based XML Query Processor

Web Front-End

Efficient Processing of Sequentially Accessed XML Data

XML-to-XHTMLTransformer

XMLfile

Web Development

XHTMLpage

Page 5: A Transducer-Based XML Query Processor

Efficient Processing of Sequentially Accessed XML Data

Archive Transformation & ETL (Extraction Transformation & Loading)

Applications

XMLProcessor

XMLarchive

file

XMLtarget

file

Page 6: A Transducer-Based XML Query Processor

Efficient Processing of Sequentially Accessed XML Data

Sensor DataProcessor

Stream

Stream Acting/Mining

SoftwareXML

Sensor Data Analysis

Page 7: A Transducer-Based XML Query Processor

Bandwidth & Connectivity will Increase the Amount of Data …

XMLSensor Data

ProcessorXML

stream

XMLstream

XML

XMLstream

XML

streamXM

L

XM

L

XML

Page 8: A Transducer-Based XML Query Processor

…Hardware Advances do not Favor Conventional Architectures

Magnitude

Year

CPU Speed

CPU2MemorySpeed

Bandwidth

Page 9: A Transducer-Based XML Query Processor

Overview

Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

Page 10: A Transducer-Based XML Query Processor

Transducer-Based Processing:On-the-Fly & Minimal Memory

Condition| Action

Buffers

XML Stream Machine

Inputbuffer

Outputbuffer

Condition| Action

Page 11: A Transducer-Based XML Query Processor

XML Stream Machine (XSM)High-Level Architecture

XQuery Compiler

XSM-to-C Compiler

XSM

XQuery

C program

OptionalInputDTD

Page 12: A Transducer-Based XML Query Processor

Components of the XQuery Compiler

XQuery-to-NetworkTranslation

XSM Composition

XSM Network

Single XSM

XQuery

SchemaOptimization

OptionalInputDTD

Page 13: A Transducer-Based XML Query Processor

Overview

Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

Page 14: A Transducer-Based XML Query Processor

for-where-returnExpressions

XQuery SubsetPath Expressions

ElementConstruction

Concatenation

for $X in $R/a returnfor $Y in $X/b return

<res> $Y, $X </res>

Page 15: A Transducer-Based XML Query Processor

XML Stream:Tags, Data & Control Tokens

…<r> <a> <b> 5 </b> <b> 1 </b></a>

XML Stream is Sequence of

Data Open Tag & Close Tag Tokens

Control Tokens

S$R E$R

Page 16: A Transducer-Based XML Query Processor

Overview

Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

Page 17: A Transducer-Based XML Query Processor

XML Stream Machine (XSM)

0

1

2

3

*y=S y | y

++*x=S

x |w(z,S

z ), x++

*y=E y | y

++

*y!=Ey |w(z,*y),

y++

*x!=Ex | w(z,*x), x++

*x=Ex |

w(z,Ez ), x++

C

Concatenation of bindings of Y, X into bindings of Z

<a><b> 5 </b><b> 1 </b></a>Sx Ex …Sx

<b> 1 </b>Sy …Ey Sy Ey Sy<b> 5 </b>

x

yInput

Buffer Y

InputBuffer X

SzOutputBuffer Z

<b> 5 </b><a>5 </b> <b> 1 </b> </a> Ez

Page 18: A Transducer-Based XML Query Processor

XML Stream Machine (XSM)

0

1

2

3

*y=S y | y

++*x=S

x |w(z,S

z ), x++

*y=E y | y

++

*y!=Ey |w(z,*y),

y++

*x!=Ex | w(z,*x), x++

*x=Ex |

w(z,Ez ), x++

C

<a><b> 5 </b><b> 1 </b></a>Sx Ex …Sx

<b> 1 </b>Sy …Ey Sy Ey Sy<b> 5 </b>

x

y

z

InputBuffer Y

InputBuffer X

OutputBuffer Z

Page 19: A Transducer-Based XML Query Processor

XML Stream Machine (XSM)

0

1

2

3

*y=S y | y

++*x=S

x |w(z,S

z ), x++

*y=E y | y

++

*y!=Ey |w(z,*y),

y++

*x!=Ex | w(z,*x), x++

*x=Ex |

w(z,Ez ), x++

C

<a><b> 5 </b><b> 1 </b></a>Sx Ex …Sx

<b> 1 </b>Sy …Ey Sy Ey Sy<b> 5 </b>

x

y

z

InputBuffer Y

InputBuffer X

OutputBuffer Z

Page 20: A Transducer-Based XML Query Processor

XML Stream Machine (XSM)

0

1

2

3

*y=S y | y

++*x=S

x |w(z,S

z ), x++

*y=E y | y

++

*y!=Ey |w(z,*y),

y++

*x!=Ex | w(z,*x), x++

*x=Ex |

w(z,Ez ), x++

C

<a><b> 5 </b><b> 1 </b></a>Sx Ex …Sx

<b> 1 </b>Sy …Ey Sy Ey Sy

Sz

<b> 5 </b>

x

y

z

InputBuffer Y

InputBuffer X

OutputBuffer Z

Page 21: A Transducer-Based XML Query Processor

XML Stream Machine (XSM)

0

1

2

3

*y=S y | y

++*x=S

x |w(z,S

z ), x++

*y=E y | y

++

*y!=Ey |w(z,*y),

y++

*x!=Ex | w(z,*x), x++

*x=Ex |

w(z,Ez ), x++

C

<a><b> 5 </b><b> 1 </b></a>Sx Ex …Sx

<b> 1 </b>Sy …Ey Sy Ey Sy

Sz

<b> 5 </b>

x

y

z

InputBuffer Y

InputBuffer X

OutputBuffer Z

<b>

Page 22: A Transducer-Based XML Query Processor

XML Stream Machine (XSM)

0

1

2

3

*y=S y | y

++*x=S

x |w(z,S

z ), x++

*y=E y | y

++

*y!=Ey |w(z,*y),

y++

*x!=Ex | w(z,*x), x++

*x=Ex |

w(z,Ez ), x++

C

<a><b> 5 </b><b> 1 </b></a>Sx Ex …Sx

<b> 1 </b>Sy …Ey Sy Ey Sy

Sz

<b> 5 </b>

x

y

z

InputBuffer Y

InputBuffer X

OutputBuffer Z

<b> 5 </b>

Page 23: A Transducer-Based XML Query Processor

XML Stream Machine (XSM)

0

1

2

3

*y=S y | y

++*x=S

x |w(z,S

z ), x++

*y=E y | y

++

*y!=Ey |w(z,*y),

y++

*x!=Ex | w(z,*x), x++

*x=Ex |

w(z,Ez ), x++

C

<a><b> 5 </b><b> 1 </b></a>Sx Ex …Sx

<b> 1 </b>Sy …Ey Sy Ey Sy

Sz

<b> 5 </b>

x

y

z

InputBuffer Y

InputBuffer X

OutputBuffer Z

<b> 5 </b>

Page 24: A Transducer-Based XML Query Processor

XML Stream Machine (XSM)

0

1

2

3

*y=S y | y

++*x=S

x |w(z,S

z ), x++

*y=E y | y

++

*y!=Ey |w(z,*y),

y++

*x!=Ex | w(z,*x), x++

*x=Ex |

w(z,Ez ), x++

C

<a><b> 5 </b><b> 1 </b></a>Sx Ex …Sx

<b> 1 </b>Sy …Ey Sy Ey Sy

Sz

<b> 5 </b>

x

y

z

InputBuffer Y

InputBuffer X

OutputBuffer Z

<b> 5 </b><a>

Page 25: A Transducer-Based XML Query Processor

XML Stream Machine (XSM)

0

1

2

3

*y=S y | y

++*x=S

x |w(z,S

z ), x++

*y=E y | y

++

*y!=Ey |w(z,*y),

y++

*x!=Ex | w(z,*x), x++

*x=Ex |

w(z,Ez ), x++

C

<a><b> 5 </b><b> 1 </b></a>Sx Ex …Sx

<b> 1 </b>Sy …Ey Sy Ey Sy

Sz

<b> 5 </b>

x

y

z

InputBuffer Y

InputBuffer X

OutputBuffer Z

<b> 5 </b><a>5 </b> <b> 1 </b> </a>

Page 26: A Transducer-Based XML Query Processor

XML Stream Machine (XSM)

0

1

2

3

*y=S y | y

++*x=S

x |w(z,S

z ), x++

*y=E y | y

++

*y!=Ey |w(z,*y),

y++

*x!=Ex | w(z,*x), x++

*x=Ex |

w(z,Ez ), x++

C

<a><b> 5 </b><b> 1 </b></a>Sx Ex …Sx

<b> 1 </b>Sy …Ey Sy Ey Sy<b> 5 </b>

x

y

z

InputBuffer Y

InputBuffer X

SzOutputBuffer Z

<b> 5 </b><a>5 </b> <b> 1 </b> </a> Ez

Page 27: A Transducer-Based XML Query Processor

Comparison of XSM against State Automata & Transducers

State Automata Do not construct Do not store

intermediate results

Sufficient for XPath only

Transducers Finite alphabets State is the

only memory No reset of

input pointers

XSM Unbounded

alphabet Buffers Pointer reset

Page 28: A Transducer-Based XML Query Processor

Overview

Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

Page 29: A Transducer-Based XML Query Processor

XSM Networks: Intermediate Step in Translating Queries to XSMs

XQuery-to-NetworkTranslation

XSM Composition

XSM Network

Single XSM

XQuery

Page 30: A Transducer-Based XML Query Processor

XSM Network

for $X in $R/a returnfor $Y in $X/b return

<res> $Y, $X </res>

$R$R/a $X

$X/b$Y

For $Y[$Y,$X] [$Y’,$X’]

$X’

$Y’

<res> $Z </res>$O

$Y’,$X’ $Z

Page 31: A Transducer-Based XML Query Processor

From XQueries to XSM Networks:Non-FLWR Expressions

<res> $Y, $X </res>

$X

$Y

$O

$Z$Y,$X

$X

$Y<res> $Z </res>

$O

Page 32: A Transducer-Based XML Query Processor

From XQueries to XSM Networks:FLWRs without Free Variables

for $X in G returnexpr($X)

$X$RG expr($X)

$O

Page 33: A Transducer-Based XML Query Processor

From XQueries to XSM Networks:FLWRs with Free Variables

for $Y in $X/b return<res> $Y, $X </res>

free variable $X

$X$X/b

$Y

For $Y[$Y,$X] [$Y’,$X’]

$X’

$Y’

<res> $Y’, $X’</res>

$O

Page 34: A Transducer-Based XML Query Processor

Overview

Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

Page 35: A Transducer-Based XML Query Processor

Composition Merges Two XSMs Into One

$R$R/a $X

$X/b$Y

For $Y[$Y,$X] [$Y’,$X’]

$X’

$Y’

<res> $Z </res>$O

$Y’,$X’ $Z

Page 36: A Transducer-Based XML Query Processor

Composition Merges Two XSMs into One

$R$R/a $X

$X/b$Y

For $Y[$Y,$X] [$Y’,$X’]

$X’

$Y’

$O<res> $Y’, $X’</res>

Page 37: A Transducer-Based XML Query Processor

XSM Composition: “State Product” Emulates Producer-ConsumerProducer M1 Consumer M2

q1

q1 q2

“State Product” M3 = (M2 o M1)

q2

Page 38: A Transducer-Based XML Query Processor

M1 M2

Naive Composition

q1 q1’1|A1 ...... q2 q2’

2|A2 ......

q1 q2 q1 q2’2|A2... ...

q1 q2 q1’ q2

¬1|A1... ...

M3 = (M2 o M1)

M2 step if (q2)

M1 step if ¬(q2)

(q2) = ¬AE(r1) ... ¬AE(rn)= “no shared read-pointer ri of q2 is At End”

r1 ... rn

Page 39: A Transducer-Based XML Query Processor

Smart Composition

Normalization Assumptions: #( read-pointers-into-shared-buffer(q2) ) 1 Atomic actions only

Basic idea: avoid runtime tests (“At-End”) whenever

outcome can be determined at compile- Different “modes”:

go: consumer M2 proceeds (full buffer) no: producer M1 proceeds (empty buffer)

may be consumer can follow immediately ae: do runtime check AE:

Page 40: A Transducer-Based XML Query Processor

Smart Composition: no Case (shared buffer is empty)

A1 does not

write to the shared buffer

M2 does not wait

on shared buffer

Transition insertedCase

noq’1 q2

1|A1q1 q2 no

2|A2noq’2q1 q2 no

M1 M2

q1 q1’1|A1 ...... q2 q2’

2|A2 ......

q1

Page 41: A Transducer-Based XML Query Processor

Smart Composition: Producer fills buffer

Case Transition insertedIf A1 writes token

to the shared buffer and M2 consumes token

If A1 writes to the shared buffer, but M2 doesn’t advance its read pointer

noq’1 q’212|A12q1 q2 no

goq’1 q’212|A12q1 q2 no

Combination ofA1 with A2

Combination of1 with 2

Page 42: A Transducer-Based XML Query Processor

Smart Composition: go - ae - no

noq1 q’2

goq1 q’2

goq1 q2

2|A2

2|A2

if A2 advances the read pointer into shared buffer

in go mode

if A2 does not advance read pointer into shared buffer

goq1 q2

Page 43: A Transducer-Based XML Query Processor

Performance Datapoint(Transformation Query on DBLP)

Data Size (KB)

Xalan

(ms)

XSM Java

XSM C

4 663 266 30

5000 7031 2360 312

20000 102710 8266 1156

80000 32078 4640

Page 44: A Transducer-Based XML Query Processor

Conclusions & Future Work

Novel query processor model Success in filtering & transformation

To be extended for joins & aggregations Memory footprint questions

Facilitated by model’s simplicity

Page 45: A Transducer-Based XML Query Processor

Related Work

Relational Data Streams & Sequence Data Models

Pipelined Join Operators Aggregates & Approximations Fast XPath on streams Memory requirements of validating XML

Page 46: A Transducer-Based XML Query Processor

Smart Composition: go - ae - no

aeq’1 q2

1|A1

if A1 does not advance

shared write pointer

in no mode: execute M1 step ...

if A1 does advance

shared write pointer q1 q2 no

if A2 advances

shared read pointer

if A2 does not advanceshared read pointergoq’1 q’2

12|A12q1 q2 no

... AND possibly M2 step simplified composed

12 and (A1;A2)

noq’1 q2

1|A1q1 q2 no

noq’1 q’212|A12q1 q2 no