regular expressions for processing data on the web · • you may have heard of regular expressions...

227
Regular Expressions for Processing Data on the Web Wim Martens

Upload: others

Post on 23-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expressions for

Processing Data on the Web

Wim Martens

Page 2: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

What will we be doing?

Page 3: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

It's very simple:

What will we be doing?

Page 4: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

It's very simple:

• You may have heard of regular expressions

What will we be doing?

Page 5: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

It's very simple:

• You may have heard of regular expressions• Processing Data on the Web is a hot topic

What will we be doing?

Page 6: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

It's very simple:

• You may have heard of regular expressions• Processing Data on the Web is a hot topic

What will we be doing?

Page 7: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

It's very simple:

• You may have heard of regular expressions• Processing Data on the Web is a hot topic

So, let's

What will we be doing?

Page 8: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

It's very simple:

• You may have heard of regular expressions• Processing Data on the Web is a hot topic

So, let's

• put both of them in a pot, stir around a bit

What will we be doing?

Page 9: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

It's very simple:

• You may have heard of regular expressions• Processing Data on the Web is a hot topic

So, let's

• put both of them in a pot, stir around a bit • aim for teaching you something new

What will we be doing?

Page 10: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

It's very simple:

• You may have heard of regular expressions• Processing Data on the Web is a hot topic

So, let's

• put both of them in a pot, stir around a bit • aim for teaching you something new• and see what happens

What will we be doing?

Page 11: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Why Regular Expressions?They are available in tools

Page 12: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Why Regular Expressions?They are available in tools

Your father was on holiday.

Page 13: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Why Regular Expressions?They are available in tools

Your father was on holiday. He made 37.000.000 photos,

Page 14: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Why Regular Expressions?They are available in tools

Your father was on holiday. He made 37.000.000 photos,with several cameras.

Page 15: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Why Regular Expressions?They are available in tools

Your father was on holiday. He made 37.000.000 photos,with several cameras. Some store the photos as .jpg,some as .jpeg, and some as .JPG.

Page 16: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Why Regular Expressions?They are available in tools

Your father was on holiday. He made 37.000.000 photos,with several cameras.

He doesn't like that.Some store the photos as .jpg,

some as .jpeg, and some as .JPG.

Page 17: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Why Regular Expressions?They are available in tools

Your father was on holiday. He made 37.000.000 photos,with several cameras.

He doesn't like that.Some store the photos as .jpg,

some as .jpeg, and some as .JPG.In his opinion, .jpg is the only true JPEG file extension.

Page 18: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Why Regular Expressions?They are available in tools

Your father was on holiday. He made 37.000.000 photos,with several cameras.

He doesn't like that.Some store the photos as .jpg,

some as .jpeg, and some as .JPG.In his opinion, .jpg is the only true JPEG file extension.He is prepared to pay you one week's salary to fix this.

Page 19: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Why Regular Expressions?They are available in tools

Your father was on holiday. He made 37.000.000 photos,with several cameras.

He doesn't like that.Some store the photos as .jpg,

some as .jpeg, and some as .JPG.In his opinion, .jpg is the only true JPEG file extension.He is prepared to pay you one week's salary to fix this.

How do you do it?(Without developing repetetive strain injury)

Page 20: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Why Regular Expressions?They are available in tools

Your father was on holiday. He made 37.000.000 photos,with several cameras.

He doesn't like that.Some store the photos as .jpg,

some as .jpeg, and some as .JPG.In his opinion, .jpg is the only true JPEG file extension.He is prepared to pay you one week's salary to fix this.

How do you do it?(Without developing repetetive strain injury)

Answer: rename 's/(\.JPG$|\.jpeg$)/\.jpg$/' *.*

(these are Perl regular expressions)

Page 21: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression Recap

Basics

• An alphabet is a non-empty, finite set• It contains letters, which we denote by a, b, c, ...

We usually denote an alphabet by the letter Σ

Page 22: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression RecapRegular Expressions

The set of regular expressions over alphabet Σ is inductively defined as:• the symbol ∅ is a regular expression

• the symbol 𝜀 is a regular expression• every symbol from Σ is a regular expression• if A and B are regular expressions, then so are

• (A.B) (concatenation)• (A + B) (disjunction)• (A*) (Kleene star)

We abbreviate (A.B) by (AB)

Page 23: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression RecapRegular Expressions: SemanticsThe language L(r) of regular expression r is a set of words / sequences over alphabet Σ and is inductively defined as:• L(∅) = ∅

• L(𝜀) = {𝜀}• L(a) = {a}, for every a in Σ• If r = (r1. r2) then L(r) = L(r1) ∪ L(r2)

• If r = (r1. r2) then L(r) = L(r1) . L(r2) (= {w1 w2 | w1 ∈ L(r1), w2 ∈ L(r2)})

• If r = (r1*) then L(r) = L(r1)* (= {w1 ... wk | k ∈ 𝐍, every wi ∈ L(r1)})

Page 24: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression Recap

Page 25: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression RecapRegular Expressions: Remarks

Page 26: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression RecapRegular Expressions: Remarks

Expressions are not very readable when strictly adhering to the definition:

(((((ab)*)c)+(de))*)

Page 27: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression RecapRegular Expressions: Remarks

Expressions are not very readable when strictly adhering to the definition:

(((((ab)*)c)+(de))*)

So we use • associativity of concatenation and disjunction• priorities between operators

to make them more readable

Page 28: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression RecapRegular Expressions: Remarks

Expressions are not very readable when strictly adhering to the definition:

(((((ab)*)c)+(de))*)

So we use • associativity of concatenation and disjunction• priorities between operators

to make them more readable

Priorities: first (), then *, then ., then +

Page 29: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression RecapRegular Expressions: Remarks

Expressions are not very readable when strictly adhering to the definition:

(((((ab)*)c)+(de))*)

So we use • associativity of concatenation and disjunction• priorities between operators

to make them more readable

Priorities: first (), then *, then ., then +

The above expression becomes ((ab)*c+de)*

Page 30: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression RecapRegular Expressions: Examples

Page 31: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression RecapRegular Expressions: Examples

(aa)*

Page 32: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression RecapRegular Expressions: Examples

(aa)*

(a + b)* a

Page 33: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression RecapRegular Expressions: Examples

(a + b)* a (a + b)

(aa)*

(a + b)* a

Page 34: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression RecapRegular Expressions: Examples

(a + b)* a (a + b)

(aa)*

(a + b)* a

(a+b)* abb (a+b)*

Page 35: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression RecapRegular Expressions: Examples

(a + b)* a (a + b)

(aa)*

(a + b)* a

(a+b)* abb (a+b)*

{(ab)n a (ba)n | n ∈ 𝐍}

Page 36: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression Recap

Page 37: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression Recap

Syntactic sugar

Page 38: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Regular Expression Recap

Syntactic sugar

r? abbreviates (r + 𝜀) r+ abbreviates r . r*

Page 39: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Expressions vs Automata

Page 40: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Expressions vs AutomataAutomata:

a a a bb b

aa(bb)*ab

Page 41: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Expressions vs Automata

L(A) language of automaton A

Automata:

a a a bb b

aa(bb)*ab

Page 42: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Expressions vs Automata

L(A) language of automaton A

deterministic versus non-deterministic automata

Automata:

a a a bb b

aa(bb)*ab

Page 43: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Expressions vs AutomataAutomata:

a a a bb b

aa(bb)*ab

Notation

• I ⊆ Q: initial states

• F ⊆ Q: accepting states

Automaton A = (Q, Σ,𝛅,I,F) with• Q: Finite set of states• Σ : Alphabet• 𝛅 : transition function

Page 44: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Expressions vs AutomataTheoremRegular expressions and finite automata

define the same languages: the regular languages

Page 45: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Expressions vs AutomataTheoremRegular expressions and finite automata

define the same languages: the regular languages

More precisely, for each regular expression r, there is a finite automaton A such that L(A) = L(r) and vice versa

Page 46: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Expressions vs AutomataTheorem

But what about the blow-up?

Regular expressions and finite automatadefine the same languages: the regular languages

Page 47: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Expressions vs AutomataTheorem

But what about the blow-up?

Expression to automaton: O(n)

Automaton to expression: exponential

Regular expressions and finite automatadefine the same languages: the regular languages

Page 48: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

So, these are the basics

What do we do now?

Page 49: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Well...This could surprise you, but

Page 50: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Well...This could surprise you, but

• Regular expressions are still used a lot in research We are still discovering new fundamental properties

Page 51: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Well...This could surprise you, but

• Regular expressions are still used a lot in research We are still discovering new fundamental properties

• Research also uses many variants of regular expressions

Page 52: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Well...This could surprise you, but

• Regular expressions are still used a lot in research We are still discovering new fundamental properties

• Research also uses many variants of regular expressions

This is common in research:

Often you're solving a problem and the standard tools you have are not really what you want / need

So you have to tweak them a little bit

Page 53: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Well...This could surprise you, but

• Regular expressions are still used a lot in research We are still discovering new fundamental properties

• Research also uses many variants of regular expressions

This is common in research:

Often you're solving a problem and the standard tools you have are not really what you want / need

So you have to tweak them a little bit

We will be looking at both kinds of cases

Page 54: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Expressions vs the WebWho uses expressions on the Web?

Page 55: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Expressions vs the WebWho uses expressions on the Web?

You can play golf with them

Page 56: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Expressions vs the WebWho uses expressions on the Web?

You can play golf with them

"regex golf"

Page 57: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Expressions vs the WebWho uses expressions on the Web?

You can play golf with them

Schema languages for XML

"regex golf"

Page 58: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Expressions vs the WebWho uses expressions on the Web?

You can play golf with them

Schema languages for XML

Query languages for XML

"regex golf"

Page 59: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Expressions vs the WebWho uses expressions on the Web?

You can play golf with them

Schema languages for XML

Query languages for XML

Query languages for graph data

"regex golf"

Page 60: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML

Page 61: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XMLeXtensible Markup Language

Page 62: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XMLeXtensible Markup Language

Developed by World Wide Web Consortium (W3C)

Page 63: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XMLeXtensible Markup Language

Developed by World Wide Web Consortium (W3C)

A widely used standard for exchanging data on the Web

Page 64: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML

"Stores data in a tree"

eXtensible Markup Language

Developed by World Wide Web Consortium (W3C)

A widely used standard for exchanging data on the Web

Page 65: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML by Example<store>  <general>  <name>The  excellent  guitar  shop</name>  <url>www.theexcellentguitarshop.com</url>  

</general>  <catalog>  <guitar>  <maker>Gibson</maker>  <type>Les  Paul</type>  <year>1959</year>  <price>2000</price>  

</guitar>  <guitar>  ...  

</guitar>  </catalog>  

</store>

Page 66: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML by Example<store>  <general>  <name>The  excellent  guitar  shop</name>  <url>www.theexcellentguitarshop.com</url>  

</general>  <catalog>  <guitar>  <maker>Gibson</maker>  <type>Les  Paul</type>  <year>1959</year>  <price>2000</price>  

</guitar>  <guitar>  ...  

</guitar>  </catalog>  

</store>

store

general catalog

name url guitar guitar

maker type year price

The  excellent... www...

Gibson Les  Paul 1959 2000

...

Page 67: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Exchanging XMLWeb

Page 68: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Exchanging XMLWeb

A B

Page 69: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Exchanging XMLWeb

A Bベルギービールは、世界で最⾼高です

Page 70: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Exchanging XMLWeb

A Bベルギービールは、世界で最⾼高です ???

Page 71: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Exchanging XMLWeb

A BXML

Page 72: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Exchanging XMLWeb

A BAha!XML

Page 73: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Exchanging XMLWeb

A BXS XS

XS is a schema

XML

Page 74: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML Schemas by Examplestore

general catalog

name url guitar guitar

maker type year price

The  excellent... www...

Gibson Les  Paul 1959 2000maker type year priceFender Stratocaster 1954 2500

discount400

Page 75: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML Schemas by Examplestore

general catalog

name url guitar guitar

maker type year price

The  excellent... www...

Gibson Les  Paul 1959 2000

Schemas describe trees:store      -­‐>  general,  catalog  general  -­‐>  name,  url  catalog  -­‐>  guitar*  guitar    -­‐>  maker,  type,  year,  price,  discount?

maker type year priceFender Stratocaster 1954 2500

discount400

Page 76: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML Schemas by Examplestore

general catalog

name url guitar guitar

maker type year price

The  excellent... www...

Gibson Les  Paul 1959 2000

Schemas describe trees:store      -­‐>  general,  catalog  general  -­‐>  name,  url  catalog  -­‐>  guitar*  guitar    -­‐>  maker,  type,  year,  price,  discount?

maker type year priceFender Stratocaster 1954 2500

discount400

Aha!Schemas are based on regular expressions!

Page 77: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML Schemas by ExampleSchemas describe trees:store      -­‐>  general,  catalog  general  -­‐>  name,  url  catalog  -­‐>  guitar*  guitar    -­‐>  maker,  type,  year,  price,  discount?

Page 78: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML Schemas by ExampleSchemas describe trees:store      -­‐>  general,  catalog  general  -­‐>  name,  url  catalog  -­‐>  guitar*  guitar    -­‐>  maker,  type,  year,  price,  discount?

Definition: Schema for XMLA schema for XML is a tuple (Σ,S,R), where• Σ is the alphabet• S⊆Σ is a set of start symbols

• R is a set of rules of the forma → r

with a ∈ Σ and r is a regular expression over Σ

Page 79: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML Schemas by ExampleDefinition: Schema for XMLA schema for XML is a tuple D = (Σ,S,R), where• Σ is the alphabet• S⊆Σ is a set of start symbols

• R is a set of rules of the forma → r

with a ∈ Σ and r is a regular expression over Σ

Page 80: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML Schemas by ExampleDefinition: Schema for XMLA schema for XML is a tuple D = (Σ,S,R), where• Σ is the alphabet• S⊆Σ is a set of start symbols

• R is a set of rules of the forma → r

with a ∈ Σ and r is a regular expression over Σ

Remarks and conventions

• We assume that, for every a∈Σ, there is at most one rule with left-hand side a

• In examples, S is always the singleton containing the left symbol of the first rule (unless stated otherwise)

• When we don't write a rule for a, we implicitly assumea →𝜀

Page 81: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML Schemas by ExampleDefinition: Schema for XMLA schema for XML is a tuple D = (Σ,S,R), where• Σ is the alphabet• S⊆Σ is a set of start symbols

• R is a set of rules of the forma → r

with a ∈ Σ and r is a regular expression over Σ

An (XML) tree t is in the language L(D), if• its root is labeled by an element in S• for every node u, labeled a

the word formed by the labels of its children is in L(r),where a → r is a rule in R

Page 82: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML Schemas by Examplestore

general catalog

name url guitar guitar

maker type year price

Schemas describe trees:store      -­‐>  general,  catalog  general  -­‐>  name,  url  catalog  -­‐>  guitar*  guitar    -­‐>  maker,  type,  year,  price,  discount?

maker type year price discount

With these definitions, the tree is in the language of the schema

Page 83: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML Schemas by ExampleTheory vs Practice:

I just showed you a formal definition of aDocument Type Definition (DTD)

DTDs are part of the specification for XML

Page 84: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML Schemas by ExampleTheory vs Practice:

I just showed you a formal definition of aDocument Type Definition (DTD)

DTDs are part of the specification for XML

But their description in the spec is much longerwhich means that I'm not saying some things

Page 85: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML Schemas by ExampleTheory vs Practice:

I just showed you a formal definition of aDocument Type Definition (DTD)

DTDs are part of the specification for XML

But their description in the spec is much longerwhich means that I'm not saying some things

The biggest thing that I'm not saying:In DTDs, regular expressions must be deterministic

Page 86: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

Page 87: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

What does this mean?

Page 88: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

What does this mean?

Let's have a look

Page 89: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

Page 90: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressionsaa(ab + ac) is not deterministic

Page 91: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

aaa(b+c) is deterministic

aa(ab + ac) is not deterministic

Page 92: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

aaa(b+c) is deterministic

aa(ab + ac) is not deterministic

Deterministic regular expressionsA regular expression is deterministic, if its Glushkov automaton is deterministic

Page 93: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

aaa(b+c) is deterministic

aa(ab + ac) is not deterministic

Deterministic regular expressionsA regular expression is deterministic, if its Glushkov automaton is deterministic

Glushkov automaton by example a a (a b + a c)

Page 94: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

aaa(b+c) is deterministic

aa(ab + ac) is not deterministic

Deterministic regular expressionsA regular expression is deterministic, if its Glushkov automaton is deterministic

Glushkov automaton by example a a (a b + a c)

a a a b c

Its states are positions in the regular expression

a

Page 95: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

aaa(b+c) is deterministic

aa(ab + ac) is not deterministic

Deterministic regular expressionsA regular expression is deterministic, if its Glushkov automaton is deterministic

Glushkov automaton by example a a (a b + a c)*

a a a

a

b c

Its states are positions in the regular expression

Page 96: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

aaa(b+c) is deterministic

aa(ab + ac) is not deterministic

Deterministic regular expressionsA regular expression is deterministic, if its Glushkov automaton is deterministic

Glushkov automaton by example a a (a b + a c)*

a a a

a

b c

Its states are positions in the regular expressiona

a

Page 97: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

aaa(b+c) is deterministic

aa(ab + ac) is not deterministic

Deterministic regular expressionsA regular expression is deterministic, if its Glushkov automaton is deterministic

Glushkov automaton by example a a (a b + a c)*

a a a

a

b c

Its states are positions in the regular expressiona

a

a

a

Page 98: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

aaa(b+c) is deterministic

aa(ab + ac) is not deterministic

Page 99: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

aaa(b+c) is deterministic

aa(ab + ac) is not deterministic

(a+b)*a

Page 100: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

aaa(b+c) is deterministic

aa(ab + ac) is not deterministic

(a+b)*a is not deterministic

Page 101: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

aaa(b+c) is deterministic

aa(ab + ac) is not deterministic

(a+b)*a

b*a(b*a)*

is not deterministic

Page 102: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

aaa(b+c) is deterministic

aa(ab + ac) is not deterministic

(a+b)*a

b*a(b*a)*

is not deterministic

is deterministic

Page 103: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

(a + b)* a (a + b)

Deterministic regular expressions

aaa(b+c) is deterministic

aa(ab + ac) is not deterministic

(a+b)*a

b*a(b*a)*

is not deterministic

is deterministic

Page 104: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

(a + b)* a (a + b)

Deterministic regular expressions

aaa(b+c) is deterministic

aa(ab + ac) is not deterministic

(a+b)*a

b*a(b*a)*

is not deterministic

is deterministic

is not deterministic

Page 105: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

(a + b)* a (a + b)

Deterministic regular expressions

aaa(b+c) is deterministic

aa(ab + ac) is not deterministic

(a+b)*a

b*a(b*a)*

An equivalent deterministic expression seems not so easy to find?

is not deterministic

is deterministic

is not deterministic

Page 106: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressions

Page 107: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressionsA few words about deterministic expressions

Page 108: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressionsA few words about deterministic expressions

• Testing if an expression is deterministic is easy (Build Glushkov automaton and check it)

Page 109: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressionsA few words about deterministic expressions

• Testing if an expression is deterministic is easy (Build Glushkov automaton and check it)

Page 110: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressionsA few words about deterministic expressions

• Testing if an expression is deterministic is easy (Build Glushkov automaton and check it)

• Not every regular expression can be determinized [Brüggemann-Klein, Wood, 1998] (a+b)*a(a+b) is an example

Page 111: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressionsA few words about deterministic expressions

• Testing if an expression is deterministic is easy (Build Glushkov automaton and check it)

• Not every regular expression can be determinized [Brüggemann-Klein, Wood, 1998] (a+b)*a(a+b) is an example

Page 112: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressionsA few words about deterministic expressions

• Testing if an expression is deterministic is easy (Build Glushkov automaton and check it)

• Not every regular expression can be determinized [Brüggemann-Klein, Wood, 1998] (a+b)*a(a+b) is an example

• It can be tested if a given expression can be determinized [Brüggemann-Klein, Wood, 1998] (It's complicated)

Page 113: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressionsA few words about deterministic expressions

• Testing if an expression is deterministic is easy (Build Glushkov automaton and check it)

• Not every regular expression can be determinized [Brüggemann-Klein, Wood, 1998] (a+b)*a(a+b) is an example

• It can be tested if a given expression can be determinized [Brüggemann-Klein, Wood, 1998] (It's complicated)

Page 114: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Deterministic regular expressionsA few words about deterministic expressions

• Testing if an expression is deterministic is easy (Build Glushkov automaton and check it)

• Not every regular expression can be determinized [Brüggemann-Klein, Wood, 1998] (a+b)*a(a+b) is an example

• It can be tested if a given expression can be determinized [Brüggemann-Klein, Wood, 1998] (It's complicated)

• The problem is PSPACE-complete [Czerwinski et al, 2013]

Page 115: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

What do people do with schemas?

Back to Schemas

Page 116: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

What do people do with schemas?

Back to Schemas

• Validate trees• Construct new schemas by combining others• Redesign or optimize them

Page 117: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML ValidationThe XML validation problem Input: A tree t and a schema D

Question: Is t ∈ L(D) ?

Page 118: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML ValidationThe XML validation problem Input: A tree t and a schema D

Question: Is t ∈ L(D) ?

Solution:

Page 119: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML ValidationThe XML validation problem Input: A tree t and a schema D

Question: Is t ∈ L(D) ?

Solution:Test, for every node u of the tree, whether the word formed by its children is in the language of the relevant expression

Page 120: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML ValidationThe XML validation problem Input: A tree t and a schema D

Question: Is t ∈ L(D) ?

Solution:Test, for every node u of the tree, whether the word formed by its children is in the language of the relevant expression

This was easy!

Page 121: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML ValidationThe incremental XML validation problem

store

general catalog

name url guitar guitar

maker type year price

Schema:store      -­‐>  general,  catalog  general  -­‐>  name,  url  catalog  -­‐>  guitar*  guitar    -­‐>  maker,  type,  year,  price,  discount?

maker type year price discount

Page 122: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML ValidationThe incremental XML validation problem

store

general catalog

name url guitar guitar

maker type year discount

Schema:store      -­‐>  general,  catalog  general  -­‐>  name,  url  catalog  -­‐>  guitar*  guitar    -­‐>  maker,  type,  year,  price,  discount?

maker type year price discount

Page 123: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML ValidationThe incremental XML validation problem

store

general catalog

name url guitar guitar

maker type year discount

Schema:store      -­‐>  general,  catalog  general  -­‐>  name,  url  catalog  -­‐>  guitar*  guitar    -­‐>  maker,  type,  year,  price,  discount?

maker type year price discount

Oh no, it's broken!

Page 124: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML ValidationThe incremental XML validation problem

Page 125: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML ValidationThe incremental XML validation problem

So, the setting is as follows:

Page 126: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML ValidationThe incremental XML validation problem

So, the setting is as follows:• We have a schema D and a (possibly huge) XML tree t

Page 127: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML ValidationThe incremental XML validation problem

So, the setting is as follows:• We have a schema D and a (possibly huge) XML tree t• t ∈ L(D)

Page 128: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML ValidationThe incremental XML validation problem

So, the setting is as follows:• We have a schema D and a (possibly huge) XML tree t• t ∈ L(D)

• But t is updated to u(t)

Page 129: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML ValidationThe incremental XML validation problem

So, the setting is as follows:• We have a schema D and a (possibly huge) XML tree t• t ∈ L(D)

• But t is updated to u(t)Question: is u(t) in L(D) ?

Page 130: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML ValidationThe incremental XML validation problem

So, the setting is as follows:• We have a schema D and a (possibly huge) XML tree t• t ∈ L(D)

• But t is updated to u(t)Question: is u(t) in L(D) ?

This boils down to the same problem for regular expressions

Page 131: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

XML ValidationThe incremental XML validation problem

So, the setting is as follows:• We have a schema D and a (possibly huge) XML tree t• t ∈ L(D)

• But t is updated to u(t)Question: is u(t) in L(D) ?

This boils down to the same problem for regular expressionsJust take the above scenario and replace

tree t and schema Dby

word w and expression r

Page 132: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Incremental EvaluationIncremental EvaluationSay that we have word w = a1 ... an and expression r

We can do the updates:• replace(i,b): replace ai by b• insert(i,b): insert a new symbol b after position a• delete (i): delete position i

Page 133: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Incremental EvaluationIncremental EvaluationSay that we have word w = a1 ... an and expression r

We can do the updates:• replace(i,b): replace ai by b• insert(i,b): insert a new symbol b after position a• delete (i): delete position i

We want to deal with an update quickly

Say, time logarithmic in n and polynomial in |r|

To achieve this, we're allowed to store some auxiliary data Can we do it?

Page 134: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Incremental EvaluationIncremental Evaluation: How to do it

w =

Take A = (Q,Σ,𝛅,I,F), a non-deterministic automaton for r

Page 135: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Incremental EvaluationIncremental Evaluation: How to do it

w =

Take A = (Q,Σ,𝛅,I,F), a non-deterministic automaton for r

Page 136: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Incremental EvaluationIncremental Evaluation: How to do it

w =

Take A = (Q,Σ,𝛅,I,F), a non-deterministic automaton for r

(q1,q2) such that q2 ∈ 𝛅(q1, )

Page 137: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Incremental EvaluationIncremental Evaluation: How to do it

w =

Take A = (Q,Σ,𝛅,I,F), a non-deterministic automaton for r

(q1,q2) such that q2 ∈ 𝛅(q1, )

Page 138: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Incremental EvaluationIncremental Evaluation: How to do it

w =

Take A = (Q,Σ,𝛅,I,F), a non-deterministic automaton for r

(q1,q2) such that q2 ∈ 𝛅(q1, )

Page 139: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Incremental EvaluationIncremental Evaluation: How to do it

w =

Take A = (Q,Σ,𝛅,I,F), a non-deterministic automaton for r

(q1,q2) such that q2 ∈ 𝛅(q1, )

Page 140: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Incremental EvaluationIncremental Evaluation: How to do it

w =

Take A = (Q,Σ,𝛅,I,F), a non-deterministic automaton for r

(q1,q2) such that q2 ∈ 𝛅(q1, )

Page 141: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Incremental EvaluationIncremental Evaluation: How to do it

w =

Take A = (Q,Σ,𝛅,I,F), a non-deterministic automaton for r

(q1,q2) such that q2 ∈ 𝛅(q1, )

R = {(q1,q3) | ∃ q2 with (q1,q2) in R1 and (q2,q3) in R2}

Page 142: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Incremental EvaluationIncremental Evaluation: How to do it

w =

Take A = (Q,Σ,𝛅,I,F), a non-deterministic automaton for r

(q1,q2) such that q2 ∈ 𝛅(q1, )

Page 143: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Incremental EvaluationIncremental Evaluation: How to do it

w =

Take A = (Q,Σ,𝛅,I,F), a non-deterministic automaton for r

(q1,q2) such that q2 ∈ 𝛅(q1, )

Page 144: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Incremental EvaluationIncremental Evaluation: How to do it

w =

Take A = (Q,Σ,𝛅,I,F), a non-deterministic automaton for r

We have w ∈L(r) iff this contains a (qi, qf) with qi ∈ I and qf ∈ F

(q1,q2) such that q2 ∈ 𝛅(q1, )

Page 145: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

This shows that we can do incremental evaluationIn time O( (log n) . |r|3) per update, while maintaining auxiliary structure of size O( n . |r|2)

Incremental Evaluation

[Patnaik and Immerman, 1997]

Page 146: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

What do people do with schemas?

Back to Schemas

• Validate trees ✓• Construct new schemas by combining others• Redesign or optimize them

Page 147: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Combine Schemas: Why?Web

A BXS XS

XML

Page 148: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Combine Schemas: Why?Web

A1

B

XS1

?

XML

A2 XS2XML

Page 149: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Combine Schemas: Why?Web

A1

B

XS1

?

XML

A2 XS2XML

B needs the union of XS1 and XS2

Page 150: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Combine Schemas: Why?Web

A

B1

?

XMLXS1

B2XS2XML

Page 151: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Combine Schemas: Why?Web

A

B1

?

XMLXS1

B2XS2XML

A needs the intersection of XS1 and XS2

Page 152: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Combine Schemas: Why?

• You have a schema S• You update it to S'

Page 153: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Combine Schemas: Why?

• You have a schema S• You update it to S'

Can you make a schema for the trees that are now invalid?

Page 154: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Combine Schemas: Why?

• You have a schema S• You update it to S'

Can you make a schema for the trees that are now invalid?

This requires taking the difference of schemas

Page 155: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Combine Schemas: Why?

• You have a schema S• You update it to S'

Can you make a schema for the trees that are now invalid?

This requires taking the difference of schemas...which requires taking the difference of regular expressions

Page 156: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Combine Schemas: Why?

• You have a schema S• You update it to S'

Can you make a schema for the trees that are now invalid?

This requires taking the difference of schemas...which requires taking the difference of regular expressions

Actually, many fundamental questions regarding regular expressions regained interest through schema-related problems

Page 157: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsUnionGiven a regular expressions r1 and r2,what is the worst-case blow up for their union?

Page 158: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsUnionGiven a regular expressions r1 and r2,what is the worst-case blow up for their union?

Theorem [AMW School 2015]

Page 159: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsUnionGiven a regular expressions r1 and r2,what is the worst-case blow up for their union?

Theorem [AMW School 2015]Dude, this is trivial: linear. You just write (r1 + r2)

Page 160: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsIntersectionGiven regular expressions r1 and r2, how large, in general, is the smallest expression for their intersection?

Page 161: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsIntersectionGiven regular expressions r1 and r2, how large, in general, is the smallest expression for their intersection?

Quick Thoughts:

Page 162: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsIntersectionGiven regular expressions r1 and r2, how large, in general, is the smallest expression for their intersection?

Quick Thoughts:

• We can convert them to NFAs N1 and N2 (linear)• We construct NFA for their intersection (O(|r1| x |r2|))

• Convert this NFA back to an expression (exponential)

Page 163: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsIntersectionGiven regular expressions r1 and r2, how large, in general, is the smallest expression for their intersection?

Quick Thoughts:

• We can convert them to NFAs N1 and N2 (linear)• We construct NFA for their intersection (O(|r1| x |r2|))

• Convert this NFA back to an expression (exponential)

What? Can't we do better?

Page 164: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsIntersectionGiven regular expressions r1 and r2, how large, in general, is the smallest expression for their intersection?

Theorem [Gelade, Neven, 2012 (STACS 2008)]

Page 165: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsIntersectionGiven regular expressions r1 and r2, how large, in general, is the smallest expression for their intersection?

Theorem [Gelade, Neven, 2012 (STACS 2008)]For every n ∈𝗡, the are deterministic regular expressions r1 and r2 such that• r1 and r2 have size O(n) and• every regular expression for L(r1) ⋂ L(r2) has size at least 2n

Page 166: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsComplementationGiven a regular expression r, how large, in general, is the smallest expression for its complement, that is, for Σ* - L(r)?

Page 167: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsComplementationGiven a regular expression r, how large, in general, is the smallest expression for its complement, that is, for Σ* - L(r)?

Quick Thoughts:

Page 168: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsComplementationGiven a regular expression r, how large, in general, is the smallest expression for its complement, that is, for Σ* - L(r)?

Quick Thoughts:

• We can convert r to NFA Nr (linear)• Determinize Nr and obtain DFA Dr (exponential)• Complement Dr and obtain DFA D¬r (linear)• Convert D¬r to regular expression ¬r (exponential)

Page 169: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsComplementationGiven a regular expression r, how large, in general, is the smallest expression for its complement, that is, for Σ* - L(r)?

Quick Thoughts:

• We can convert r to NFA Nr (linear)• Determinize Nr and obtain DFA Dr (exponential)• Complement Dr and obtain DFA D¬r (linear)• Convert D¬r to regular expression ¬r (exponential)

That's double exponential!

Page 170: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsComplementationGiven a regular expression r, how large, in general, is the smallest expression for its complement, that is, for Σ* - L(r)?

Quick Thoughts:

• We can convert r to NFA Nr (linear)• Determinize Nr and obtain DFA Dr (exponential)• Complement Dr and obtain DFA D¬r (linear)• Convert D¬r to regular expression ¬r (exponential)

That's double exponential!

Can we do better?

Page 171: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsComplementationGiven a regular expression r, what is the smallest expression for its complement, that is, for Σ* - L(r)?

Page 172: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsComplementationGiven a regular expression r, what is the smallest expression for its complement, that is, for Σ* - L(r)?

Theorem [Gelade, Neven, 2012 (STACS 2008)]

Page 173: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Some Regular Expression QuestionsComplementationGiven a regular expression r, what is the smallest expression for its complement, that is, for Σ* - L(r)?

Theorem [Gelade, Neven, 2012 (STACS 2008)]There exist regular expressions (rn)n∈𝗡 such that

• each rn has size O(n) and• every regular expression for Σ* - L(r) has size at least 2(2^n)

Page 174: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Open ProblemComplementationGiven a regular expression r, what is the smallest expression for its complement, that is, for Σ* - L(r)?

[Losemann et al., 2012]

Page 175: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Open Problem

• We know that the smallest expression for the complement can be double exponential

ComplementationGiven a regular expression r, what is the smallest expression for its complement, that is, for Σ* - L(r)?

[Losemann et al., 2012]

Page 176: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Open Problem

• We know that the smallest expression for the complement can be double exponential

• But what if r comes from a schema? (i.e. r is deterministic)

ComplementationGiven a regular expression r, what is the smallest expression for its complement, that is, for Σ* - L(r)?

[Losemann et al., 2012]

Page 177: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Open Problem

• We know that the smallest expression for the complement can be double exponential

• But what if r comes from a schema? (i.e. r is deterministic)• What if we want to find a deterministic expression for the

complement?

ComplementationGiven a regular expression r, what is the smallest expression for its complement, that is, for Σ* - L(r)?

[Losemann et al., 2012]

Page 178: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Open Problem

• We know that the smallest expression for the complement can be double exponential

• But what if r comes from a schema? (i.e. r is deterministic)• What if we want to find a deterministic expression for the

complement?

ComplementationGiven a regular expression r, what is the smallest expression for its complement, that is, for Σ* - L(r)?

Nobody knows!

[Losemann et al., 2012]

Page 179: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

What do people do with schemas?

Back to Schemas

• Validate trees ✓• Construct new schemas by combining others ✓• Redesign or optimize them

Page 180: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

What do people do with schemas?

Back to Schemas

• Validate trees ✓• Construct new schemas by combining others ✓• Redesign or optimize them

The problems:Say that we redesign, rewrite, or optimize a schemaDoes it still accept all the data it accepted before?

Page 181: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

What do people do with schemas?

Back to Schemas

• Validate trees ✓• Construct new schemas by combining others ✓• Redesign or optimize them

The problems:Say that we redesign, rewrite, or optimize a schemaDoes it still accept all the data it accepted before?

Say that we want to keep the language the sameCan we find a good algorithm for optimizing the schema?

Page 182: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

What do people do with schemas?

Back to Schemas

• Validate trees ✓• Construct new schemas by combining others ✓• Redesign or optimize them

The problems:Say that we redesign, rewrite, or optimize a schemaDoes it still accept all the data it accepted before?

Say that we want to keep the language the sameCan we find a good algorithm for optimizing the schema?

This is called containment: is L(S1) ⊆ L(S2)?

Page 183: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

What do people do with schemas?

Back to Schemas

• Validate trees ✓• Construct new schemas by combining others ✓• Redesign or optimize them

The problems:Say that we redesign, rewrite, or optimize a schemaDoes it still accept all the data it accepted before?

Say that we want to keep the language the sameCan we find a good algorithm for optimizing the schema?

This is called containment: is L(S1) ⊆ L(S2)?

This is called minimization: find the smallest equivalent schema

Page 184: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

What do people do with schemas?

Back to Schemas

• Validate trees ✓• Construct new schemas by combining others ✓• Redesign or optimize them

Page 185: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

What do people do with schemas?

Back to Schemas

• Validate trees ✓• Construct new schemas by combining others ✓• Redesign or optimize them

FactBoth the containment and minimization problems boil down to the same problems for regular expressions

Page 186: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

What do people do with schemas?

Back to Schemas

• Validate trees ✓• Construct new schemas by combining others ✓• Redesign or optimize them

FactBoth the containment and minimization problems boil down to the same problems for regular expressions

For minimization, this is easy to see:If a schema has some non-minimal regular expression, it is obviously not minimal

Page 187: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

What do people do with schemas?

Back to Schemas

• Validate trees ✓• Construct new schemas by combining others ✓• Redesign or optimize them

FactBoth the containment and minimization problems boil down to the same problems for regular expressions

For containment, it is easy to see that L(D1) ⊆ L(D2) if and only if, for every a ∈ Σ, we have L(r1) ⊆ L(r2), where a → ri is the rule for a in Di (i = 1,2)

Page 188: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Containment and MinimizationWe will show: Containment of regular expressions is PSPACE-complete

Page 189: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Containment and MinimizationWe will show: Containment of regular expressions is PSPACE-complete

Containment:Given regular expressions r1 and r2, is L(r1) ⊆ L(r2)?

Page 190: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Containment and MinimizationWe will show: Containment of regular expressions is PSPACE-complete

Containment:Given regular expressions r1 and r2, is L(r1) ⊆ L(r2)?

Step 1: Containment is in PSPACEThis is not so difficult: Construct the NFAs N1 and N2

Guess a word w symbol by symbol Test if w is in L(N1) but not in L(N2)

Page 191: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Containment and MinimizationWe will show: Containment of regular expressions is PSPACE-complete

Step 1: Containment is PSPACE-hard

Containment:Given regular expressions r1 and r2, is L(r1) ⊆ L(r2)?

This is the interesting direction

We will reduce from corridor tiling

Page 192: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Corridor TilingCorridor Tiling:

A tiling system S consists of:

• finite set T of tile types s: { , , , }

• the top row of tiles:

• the bottom row of tiles:

Page 193: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Corridor TilingCorridor Tiling:

A tiling system S consists of:

• finite set T of tile types s: { , , , }

• the top row of tiles:

• the bottom row of tiles:

Question: Can we make a correct corridor tiling?

Page 194: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Corridor Tiling

Question: Can we make a correct corridor tiling?

bottom row

top row

Page 195: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Corridor TilingCorrect Tilings:A corridor tiling is correct iff:

• the bottom row is correct• the top row is correct• in between are only tile types from T• adjacent sides on tiles have the same color

Page 196: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Corridor TilingCorrect Tilings:A corridor tiling is correct iff:

• the bottom row is correct• the top row is correct• in between are only tile types from T• adjacent sides on tiles have the same color

TheoremGiven a tiling system, deciding if there is a correct corridor tiling is PSPACE-complete

Page 197: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Corridor TilingTheoremGiven a tiling system, deciding if there is a correct corridor tiling is PSPACE-complete

Given a tiling system S, we construct regular expressions r1, r2

such that L(r1) ⊆ L(r2) if and only if there is no correct corridor tiling for S

Page 198: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Encoding TilingsTo this end, we must encode tilings as words:

is encoded as the word

# # # # # #over alphabet T ∪ {#}

Page 199: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Corridor Tiling: NotationT: set of tilesMatching Relations H and V

• H ⊆ T x T: horizontal matching relationH := {(x1,x2) | right of x1 has same color as left of x2}

• V ⊆ T x T: vertical matching relation V := {(x1,x2) | top of x1 has same color as bottom of x2}

Page 200: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Corridor Tiling: NotationT: set of tilesMatching Relations H and V

• H ⊆ T x T: horizontal matching relationH := {(x1,x2) | right of x1 has same color as left of x2}

• V ⊆ T x T: vertical matching relation V := {(x1,x2) | top of x1 has same color as bottom of x2}

Reminder: Goal

Page 201: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Corridor Tiling: NotationT: set of tilesMatching Relations H and V

• H ⊆ T x T: horizontal matching relationH := {(x1,x2) | right of x1 has same color as left of x2}

• V ⊆ T x T: vertical matching relation V := {(x1,x2) | top of x1 has same color as bottom of x2}

Reminder: GoalExpressions r1 and r2 such that

Page 202: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Corridor Tiling: NotationT: set of tilesMatching Relations H and V

• H ⊆ T x T: horizontal matching relationH := {(x1,x2) | right of x1 has same color as left of x2}

• V ⊆ T x T: vertical matching relation V := {(x1,x2) | top of x1 has same color as bottom of x2}

Reminder: GoalExpressions r1 and r2 such that L(r1) ⊆ L(r2)

Page 203: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Corridor Tiling: NotationT: set of tilesMatching Relations H and V

• H ⊆ T x T: horizontal matching relationH := {(x1,x2) | right of x1 has same color as left of x2}

• V ⊆ T x T: vertical matching relation V := {(x1,x2) | top of x1 has same color as bottom of x2}

Reminder: GoalExpressions r1 and r2 such that L(r1) ⊆ L(r2) iff S has no correct tiling

Page 204: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Corridor Tiling: NotationT: set of tilesMatching Relations H and V

• H ⊆ T x T: horizontal matching relationH := {(x1,x2) | right of x1 has same color as left of x2}

• V ⊆ T x T: vertical matching relation V := {(x1,x2) | top of x1 has same color as bottom of x2}

Reminder: GoalExpressions r1 and r2 such that L(r1) ⊆ L(r2) iff S has no correct tiling iff all tilings violate H or V

Page 205: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Corridor Tiling: NotationT: set of tilesMatching Relations H and V

• H ⊆ T x T: horizontal matching relationH := {(x1,x2) | right of x1 has same color as left of x2}

• V ⊆ T x T: vertical matching relation V := {(x1,x2) | top of x1 has same color as bottom of x2}

Reminder: GoalExpressions r1 and r2 such that L(r1) ⊆ L(r2) iff S has no correct tiling iff all tilings violate H or V iff all encodings of tilings ⊆ words that violate H or V

Page 206: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Corridor Tiling: NotationT: set of tilesMatching Relations H and V

• H ⊆ T x T: horizontal matching relationH := {(x1,x2) | right of x1 has same color as left of x2}

• V ⊆ T x T: vertical matching relation V := {(x1,x2) | top of x1 has same color as bottom of x2}

Reminder: GoalExpressions r1 and r2 such that L(r1) ⊆ L(r2) iff S has no correct tiling iff all tilings violate H or V iff all encodings of tilings ⊆ words that violate H or V r1 r2

Page 207: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Construction

# # # # # #Encoding:

Page 208: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Construction

# # # # # #Encoding:

NotationT abbreviates x1 + ... + xk, where T = {x1,...,xk}Tn abbreviates T ... T (n times)

Page 209: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Construction

# # # # # #Encoding:

r1

NotationT abbreviates x1 + ... + xk, where T = {x1,...,xk}Tn abbreviates T ... T (n times)

Page 210: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Construction

# # # # # #Encoding:

r1 r2

NotationT abbreviates x1 + ... + xk, where T = {x1,...,xk}Tn abbreviates T ... T (n times)

Page 211: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Construction

# # # # # #Encoding:

r1 r2

• Tiling: (# Tn)* #

NotationT abbreviates x1 + ... + xk, where T = {x1,...,xk}Tn abbreviates T ... T (n times)

Page 212: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Construction

# # # # # #Encoding:

r1 r2

• Tiling: (# Tn)* #• Bot. row: # b1 ... bn # (T+#)*

NotationT abbreviates x1 + ... + xk, where T = {x1,...,xk}Tn abbreviates T ... T (n times)

Page 213: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Construction

# # # # # #Encoding:

r1 r2

• Tiling: (# Tn)* #• Bot. row: # b1 ... bn # (T+#)*• Top row: (T+#)* # t1 ... tn #

NotationT abbreviates x1 + ... + xk, where T = {x1,...,xk}Tn abbreviates T ... T (n times)

Page 214: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Construction

# # # # # #Encoding:

r1 r2

• Tiling: (# Tn)* #• Bot. row: # b1 ... bn # (T+#)*• Top row: (T+#)* # t1 ... tn #

• For all (x1,x2) not in H: (T+#)* x1 x2 (T+#)*

NotationT abbreviates x1 + ... + xk, where T = {x1,...,xk}Tn abbreviates T ... T (n times)

Page 215: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Construction

# # # # # #Encoding:

r1 r2

• Tiling: (# Tn)* #• Bot. row: # b1 ... bn # (T+#)*• Top row: (T+#)* # t1 ... tn #

• For all (x1,x2) not in H: (T+#)* x1 x2 (T+#)*

• For all (x1,x2) not in V: (T+#)* x1 (T+#)n x2 (T+#)*

NotationT abbreviates x1 + ... + xk, where T = {x1,...,xk}Tn abbreviates T ... T (n times)

Page 216: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Constructionr1 r2

• Tiling: (# Tn)* #• Bot. row: # b1 ... bn # (T+#)*• Top row: (T+#)* # t1 ... tn #

• For all (x1,x2) not in H: (T+#)* x1 x2 (T+#)*

• For all (x1,x2) not in V: (T+#)* x1 (T+#)n x2 (T+#)*

Page 217: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Constructionr1 r2

• Tiling: (# Tn)* #• Bot. row: # b1 ... bn # (T+#)*• Top row: (T+#)* # t1 ... tn #

• For all (x1,x2) not in H: (T+#)* x1 x2 (T+#)*

• For all (x1,x2) not in V: (T+#)* x1 (T+#)n x2 (T+#)*

r1 = # b1 ... bn (# Tn)* # t1 ... tn #

r2 = ⨁(x1,x2) ∉ H [(T+#)* x1 x2 (T+#)*] + ⨁(x1,x2) ∉ V [(T+#)* x1 (T+#)n x2 (T+#)*]

Page 218: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Constructionr1 r2

• Tiling: (# Tn)* #• Bot. row: # b1 ... bn # (T+#)*• Top row: (T+#)* # t1 ... tn #

• For all (x1,x2) not in H: (T+#)* x1 x2 (T+#)*

• For all (x1,x2) not in V: (T+#)* x1 (T+#)n x2 (T+#)*

r1 = # b1 ... bn (# Tn)* # t1 ... tn #

r2 = ⨁(x1,x2) ∉ H [(T+#)* x1 x2 (T+#)*] + ⨁(x1,x2) ∉ V [(T+#)* x1 (T+#)n x2 (T+#)*]

So, the tiling system has no correct tiling iff L(r1) ⊆ L(r2)

Page 219: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Constructionr1 r2

• Tiling: (# Tn)* #• Bot. row: # b1 ... bn # (T+#)*• Top row: (T+#)* # t1 ... tn #

• For all (x1,x2) not in H: (T+#)* x1 x2 (T+#)*

• For all (x1,x2) not in V: (T+#)* x1 (T+#)n x2 (T+#)*

r1 = # b1 ... bn (# Tn)* # t1 ... tn #

r2 = ⨁(x1,x2) ∉ H [(T+#)* x1 x2 (T+#)*] + ⨁(x1,x2) ∉ V [(T+#)* x1 (T+#)n x2 (T+#)*]

So, the tiling system has no correct tiling iff L(r1) ⊆ L(r2)

...which means that regular expression containmentis PSPACE-complete

Page 220: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Actually,

Page 221: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Actually,this whole construction can be changed such that r1 = Σ*

Page 222: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Actually,this whole construction can be changed such that r1 = Σ*

So, it is also hard to decide if a given expression defines Σ*

Page 223: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Actually,this whole construction can be changed such that r1 = Σ*

So, it is also hard to decide if a given expression defines Σ*

This, in turn, can be used to show that also regular expression minimization is PSPACE-complete!

Page 224: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Now you've seen somebasics of regular expressions

Page 225: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

Now you've seen somebasics of regular expressions

...and how new applications lead to the discovery of fundamental results about them

Page 226: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

References• [Brüggemann-Klein, Wood, 1998]

Anne Brüggemann-Klein, Derick Wood One-Unambiguous Regular Languages. Inf. Comput. 140(2): 229-253 (1998)

• [Czerwinski et al., 2013] Wojciech Czerwinski, Claire David, Katja Losemann, Wim Martens Deciding Definability by Deterministic Regular Expressions. FoSSaCS 2013: 289-304

• [Gelade and Neven, 2012]Wouter Gelade, Frank Neven Succinctness of the Complement and Intersection of Regular Expressions. ACM Trans. Comput. Log. 13(1): 4 (2012)

• [Losemann et al., 2012] Katja Losemann, Wim Martens, Matthias NiewerthDescriptional Complexity of Deterministic Regular Expressions. MFCS 2012: 643-654

• [Patnaik and Immerman, 1997]Sushant Patnaik, Neil ImmermanDyn-FO: A Parallel, Dynamic Complexity Class. J. Comput. Syst. Sci. 55(2): 199-209 (1997)

Page 227: Regular Expressions for Processing Data on the Web · • You may have heard of regular expressions • Processing Data on the Web is a hot topic So, let's • put both of them in

End of Part 1