function matching

45
Function Matching Function Matching Amihood Amir Yonatan Aumann Moshe Lewenstein Ely Porat Bar Ilan University

Upload: ethan

Post on 12-Jan-2016

31 views

Category:

Documents


3 download

DESCRIPTION

Function Matching. Amihood Amir Yonatan Aumann Moshe Lewenstein Ely Porat. Bar Ilan University. Baker ’ s Parameterized Matching. Prog.c int a,b; a=1; a = g(a)*5+f(a); b=2; a = func(a,b); a = a*g(b); b=1; b = g(b)*5+f(b); …. Baker ’ s Parameterized Matching. c=1; - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Function Matching

Function MatchingFunction Matching

Amihood Amir

Yonatan Aumann

Moshe Lewenstein

Ely Porat

Bar Ilan University

Page 2: Function Matching

Prog.c

int a,b;

a=1;a = g(a)*5+f(a);b=2;a = func(a,b);a = a*g(b);b=1;b = g(b)*5+f(b);….

Baker’s Parameterized MatchingBaker’s Parameterized Matching

Page 3: Function Matching

Prog.c

int a,b;

a=1;a = g(a)*5+f(a);b=2;a = func(a,b);a = a*g(b);b=1;b = g(b)*5+f(b);….

Baker’s Parameterized MatchingBaker’s Parameterized Matching

c=1;c = g(c)*5+f(c);

Pattern

Baker’s work

pdup dupstat psearch

SICOMP 1997 JCSS 1996

Page 4: Function Matching

Two dimensional parameterized matchingTwo dimensional parameterized matching

pattern

‘A horse is a horse,it ain’t make a differencewhat color it is’ John Wayne

Page 5: Function Matching

Input P = p1…pm over alphabet T = t1 . . . tn over alphabet

Output: locations i of T, for which a bijection : exists s.t.

(P) = (p1) (p2)… (pm) = ti…ti+m-1

T

P

TPΠ

Π Π Π Π

Parameterized MatchingParameterized Matching

Page 6: Function Matching

Parameterized MatchingParameterized Matching

• One dimensional

• Baker 1996, JCSS - Suffix Trees

• Baker 1997, SICOMP - Boyer Moore

• Amir, Farach, Muthu 1995, IPL - Knuth-Morris-Pratt

• Two dimensional

Regular methods fail !!

Page 7: Function Matching

Function MatchingFunction Matching

Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet

Output: locations i of T, where f: exists s.t.

f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1

T

P

TP

Page 8: Function Matching

Input: P = p1…pm over alphabet T = t1 . . . tn over alphabetT

P

P = h e h a e hT = a b c b a c b a d a b d a d d a d

Function MatchingFunction Matching

TPOutput: locations i of T, where f: exists s.t.

f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1

Page 9: Function Matching

Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet T

P

P = h e h a e hT = a b c b a c b a d a b d a d d a d

f(h) = bf(e) = cf(a) = a

Function MatchingFunction Matching

TPOutput: locations i of T, where f: exists s.t.

f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1

Page 10: Function Matching

Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet T

P

P = h e h a e hT = a b c b a c b a d a b d a d d a d

f(h) = af(e) = df(a) = b

Function MatchingFunction Matching

TPOutput: locations i of T, where f: exists s.t.

f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1

Page 11: Function Matching

Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet T

P

P = h e h a e hT = a b c b a c b a d a b d a d d a d

f(h) = df(e) = af(a) = d

Function MatchingFunction Matching

TPOutput: locations i of T, where f: exists s.t.

f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1

Page 12: Function Matching

Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet T

P

P = h e h a e hT = a b c b a c b a d a b d a d d a d

f(h) = ??no match !

Function MatchingFunction Matching

TPOutput: locations i of T, where f: exists s.t.

f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1

Page 13: Function Matching

Function Matching vs. Parameterized MatchingFunction Matching vs. Parameterized Matching

P p-matches ti…ti+m-1 iff

1. P f-matches ti…ti+m-1

and 2. # of symbols in ti…ti+m-1 = # of symbols in P

P = h e h a e h h e h a e hT = a b c b a c b a d a b d a d d a d

f(h) = df(e) = af(a) = d

f(h) = bf(e) = cf(a) = a

Page 14: Function Matching

Naïve AlgorithmNaïve Algorithm

At each location i of text T check if pattern f-matches

CheckFor each letter ‘a’ in pattern Are elements aligned with the pattern ‘a’s the same? no? declare ‘no match’ All letters “OK” – declare ‘match’

Running time: O(nm), where m = |P| and n = |T|

Page 15: Function Matching

Function Matching with Don’t CaresFunction Matching with Don’t Cares

Input: P = p1…pm over alphabet {?} T = t1 . . . tn over alphabet T

P

P = h e ? ? e hT = a b c b a c b c d b c d a d d a d

TPOutput: locations i of T, where f: exists s.t.

f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1,

f(?) - wildcard

Page 16: Function Matching

Why do we need don’t cares?Why do we need don’t cares?

Pattern

Text

Page 17: Function Matching

Linearize Text and PatternLinearize Text and Pattern

Text

Pattern

…Line 1 Line 2

T =

Page 18: Function Matching

Linearize Text and PatternLinearize Text and Pattern

Text

Pattern

…Line 5 Line 6

T= … P = ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Line 1 Line 2

n

n

m

m

n-m n-m

Page 19: Function Matching

t1 t2 t3 t4 . . . tn-2 tn-1 tn pm pm-1 . . . p2 p1

p1t1 p1t2 . . . p1tn-2 p1tn-1 p1tn p2t1 p2t2 p2t3 . . . p2tn-2 p2tn-1 p2tn

p3t1 p3t2 p3t3 p3t3 . . . p3tn-1 p3tn

pmt1 . . . pmtm pmtm+1 . . pmtn-1 pmtn

. . .. . .

..

Polynomial Multiplication - ConvolutionsPolynomial Multiplication - Convolutions

. . .. . .

Running time: O(n log m)

Page 20: Function Matching

t1 t2 t3 t4 . . . tn-2 tn-1 tn pm pm-1 . . . p2 p1

p1t1 p1t2 . . . p1tn-2 p1tn-1 p1tn p2t1 p2t2 p2t3 . . . p2tn-2 p2tn-1 p2tn

p3t1 p3t2 p3t3 p3t4 . . . p3tn-1 p3tn

pmt1 . . . pmtm pmtm+1 . . pmtn-1 pmtn

. . .. . .

..

Convolutions: Fischer-Patterson [1974]Convolutions: Fischer-Patterson [1974]

p1 p2 p3 p4 . . . pm

m

iiitp

1

. . .. . .

Page 21: Function Matching

t1 t2 t3 t4 . . . tn-2 tn-1 tn pm pm-1 . . . p2 p1

p1t1 p1t2 . . . p1tn-2 p1tn-1 p1tn p2t1 p2t2 p2t3 . . . p2tn-2 p2tn-1 p2tn

p3t1 p3t2 p3t3 p3t4 . . . p3tn-1 p3tn

pmt1 . . . pmtm pmtm+1 . . pmtn-1 pmtn

. . .. . .

..

p1 p2 p3 p4 . . . pm

m

iiitp

11

. . .. . .

Convolutions: Fischer-Patterson [1974]Convolutions: Fischer-Patterson [1974]

Page 22: Function Matching

How does this help for Function Matching?How does this help for Function Matching?

beneath each symbol from the pattern alphabet all text characters must be the same

The property that needs to be checked is:

Page 23: Function Matching

T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? e PR = e ? h e a h e h

Example -Example -

Page 24: Function Matching

h in P vs.a in T

T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? e PR = e ? h e a h e h

Example -Example -

Ta = 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1PR

h = 0 0 1 0 0 1 0 1

Page 25: Function Matching

h - a Ta = 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1PR

h = 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 1 0 2 0 2 1 0 3 0 1 2 0 1 2 0 1 1 0 1

T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? e PR = e ? h e a h e h

Example -Example -

Page 26: Function Matching

h - a Ta = 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1PR

h = 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 1 0 2 0 2 1 0 3 0 1 2 0 1 2 0 1 1 0 1

T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? e PR = e ? h e a h e h

Example -Example - h e h a e h ? e

Page 27: Function Matching

h - a

0 0 1 0 0 1 1 1 0 2 0 2 1 0 3 0 1 2 0 1 2 0 1 1 0 1

=> in O(n log m) time!!

T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? e PR = e ? h e a h e h

Example -Example -

Ta = 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1PR

h = 0 0 1 0 0 1 0 1

Page 28: Function Matching

h - a 1 0 2 0 2 1 0 3 0 1 2 0

=> in O(| | n log m) time!!

h - b 0 3 0 1 1 1 1 0 1 0 1 0

h - c 2 0 1 2 0 1 1 0 1 0 0 0

h - d 0 0 0 0 0 0 1 0 1 2 0 3

T

0 1 0 0 0 0 0 1 0 0 0 1Match(h)

T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? e PR = e ? h e a h e h

Example -Example -

Page 29: Function Matching

In general - the AlgorithmIn general - the Algorithm

• For each character ‘a’ in create Pa

• For each character ‘b’ in create Tb

• For all Pa and Tb multiply them and construct Match(a) for each ‘a’ in

• Announce each location i of T as a ‘match’ if Match(a)[i] = 1 for all a’s in P

=> in O(| || | n log m) time.T P

T

P

P

Page 30: Function Matching

Improvement Improvement

Lemma: Let a1, ..., ak , then

k iff

for all i,j, ai = aj

Ν

k

1h

2h

k

1h h

2 )a(a

Idea: Let’s encode text with numbers for symbols

and encode pattern to compute their sum

and separately their sum of squares.

Page 31: Function Matching

Improvement Improvement

Lemma: Let a1, ..., ak , then

k iff

for all i,j, ai = aj

Ν

T# = 1 2 3 2 13 2 1 3 1 2 4 1 4 4 1 4 5 1T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? ePe = 0 1 0 0 1 0 0 1

Example: Compute sum of text char’s beneath “e”

k

1h

2h

k

1h h

2 )a(a

Page 32: Function Matching

Improvement Improvement

Lemma: Let a1, ..., ak , then

k iff

for all i,j, ai = aj

Ν

T#2= 1 4 9 4 1 9 4 1 9 1 4 16 1 16 16 1 16 25 1

T# = 1 2 3 2 1 3 2 1 3 1 2 4 1 4 4 1 4 5 1T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? ePe = 0 1 0 0 1 0 0 1

Example: Compute sum of squares beneath “e”

k

1h

2h

k

1h h

2 )a(a

Page 33: Function Matching

Improvement Improvement

Lemma: Let a1, ..., ak , then

k iff

for all i,j, ai = aj

Ν

k

1h

2h

k

1h h

2 )a(a

Running Time:

Two convolutions for each pattern character.

O(| | n log m)P

Page 34: Function Matching

Can we do better for big alphabets?

We have seen – 2 algorithms for Function Matching

1. O(nm) - naïve algorithm

2. O(| | n log m) - convolution basedP

We will see:

1. O(n log2m) - randomized convolutions based2. Lower bound of (nm) for deterministic

convolutions based methodsΩ

Page 35: Function Matching

Def:Def: A pattern is 2-charactered if every character appears at most twice in the pattern.

Example:Example: P = a b c b c c b b P1 = a1 b1 c1 b1 c1 c2 b2 b2 (even pairs) P2 = a1 b1 c1 b2 c2 c2 b2 b3 (odd pairs)

Lemma: Lemma: Let P be a pattern and T a text. 2-charactered patterns P1 and P2 s.t. at loc. i of T P f-matches iff P1 and P2 f-match.

Page 36: Function Matching

Situation:Situation: An algorithm for Function Matching with 2-charactered patterns a general algorithm for Function Matching.

So,all that needs to be checked is that: each pair in P has equal text symbols beneath it.each pair in P has equal text symbols beneath it.

Page 37: Function Matching

1.1. For each character:For each character: - a in T, randomly choose ra in {0, 1} - relace all a’s in T with ra - get T’

- b in P, randomly choose sb in {1,2} - set first b to be sb and the second b to be -sb - get P’

2. Convolve T’ and P’R

3. For each location i, for which T’*P’R[i] equals 0 for the convolutiondeclare a ‘match’

New Randomized AlgorithmNew Randomized Algorithm

Page 38: Function Matching

Example:Example:

P = v q v u q u ? sT = a b a a b a b a c a b d a b c b d b a

f(a) =f(b) =f(c) =f(d) =

1001

g(v) =g(q) =g(u) =

268

f(T) = 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1

g(P) = 2 6 –2 8 –6 –8 0 0

2+0–2+8+0–8+0+0 = 0

h(v) = ah(q) = bh(u) = ah(s) = a

Page 39: Function Matching

Example:Example:

P = v q v u q u ? sT = a b a a b a b a c a b d a b c b d b a

f(a) =f(b) =f(c) =f(d) =

1001

g(v) =g(q) =g(u) =

268

f(T) = 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1

g(P) = 2 6 –2 8 –6 –8 0 0

0+6–2+0-6+0+0+0 = -2

Page 40: Function Matching

Example:Example:

P = v q v u q u ? sT = a b a a b a b a c a b d a b c b d b a

f(a) =f(b) =f(c) =f(d) =

1001

g(v) =g(q) =g(u) =

268

f(T) = 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1

g(P) = 2 6 –2 8 –6 –8 0 0

0= 2+6+0+0+0-8+0+0

Page 41: Function Matching

Running Time: Running Time: O(nk log m) with probability 2-k

O(n log2m) with probability 1/m

if P f-matches at location i of T then f(T)*g(P)R [i+m-1] is trivially always equal to 0

if P does not f-match at location i of T then for each convolution <f,g>, f(T)*g(P)R [i+m-1],equals 0 with probability ½with k rounds of amplification the probability is (½)k

Correctness:Correctness:

Page 42: Function Matching

Limitation of the Convolutions ModelLimitation of the Convolutions Model

Can we do the same deterministically? No!

To show this we use the model of communication complexity

Alice Bob

xf(x,y)

y

Page 43: Function Matching

Limitation of the Convolutions ModelLimitation of the Convolutions Model

Known:Known: for x,y in {0,1}k the communication complexity of equals(x,y) is (k)

Take pattern P = a1 a2 a3 … am a1 a2 a3 … am, where i j ai aj

Given a collection of convolutions {<g(P), f(T)>}the convolutions of location i, (g(P)*f(t))[i+m-1] = g(aj )*f(ti+j-1) + g(aj )*f(ti+j+m-1). Since we arein essence comparing ti…ti+m-1 to ti+m…ti+2m-1

we get the equal information from the convolution.This is lower bounded by (m) for each location,In general (nm)

ΩΩ

m

j 1

m

j 1

Page 44: Function Matching

Another Application for Function MatchingAnother Application for Function Matching

Protein Folding detection:

1 2 3 4 5 6

78910

789

10

1 2 3

P = 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 11 12 … 12 11 3 2 1

Page 45: Function Matching

QuestionsQuestions

1. Can Function Matching be solved deterministicallyin o(nm) time for big alphabets?

2. Are there special cases of Function Matching thatare easier (other than Parameterized Matching andother trivial ones)?

3. Does 2-dimensional Parameterized Matching needto be solved with function matching?