program transformation for optimization of parsing algorithms and other weighted logic programs...

Program Transformation for Optimization of Parsing Algorithms and other Weighted Logic Programs

Jason Eisner and John Blatz29 July 2006

May 3, 2023Jason Eisner & John Blatz

If you only see one slide…

• Weighted logic programming– Convenient language for expressing wide range

of practical problems– Dyna: a working system for evaluating (a class

of) weighted logic programs: www.dyna.org • Program transformations

– Formalizes relationships between programs– Generalizeable to discover new algorithms – Useful transformations advertised in this talk

• Please read the paper for more details


Logic Programming

• CKY parsing algorithm: constit(X,I,J) is true iff the words from position I to position J in a sentence form a grammatical constituent of type X

• Grammar (rewrite(…)) specified as input to parsing engine

• Sentence (word(…), length(…)) also given as input• Performs recognition: “is this sentence grammatical?”

goal :- constit(s,0,N), end(N).constit(X,I,J) :- rewrite(X,W), word(W,I,J).constit(X,I,K) :- rewrite(X,Y,Z),

constit(Y,I,J), constit(Z,J,K).


Logic Programming - CKY Algorithmgoal :- constit(s,0,N), end(N).constit(X,I,J) :- rewrite(X,W), word(W,I,J).constit(X,I,K) :- rewrite(X,Y,Z),

constit(Y,I,J), constit(Z,J,K).

John loves Mary0 1 2 3

constit(‘np’,0,1) constit(‘v’,1,2) constit(‘np’,1,3)

word(‘John’,0,1) word(‘loves,1,2) word(‘Mary’,2,3)

constit(‘s’,0,3)constit(‘vp’,1,3)

goal


Weighted logic programming

• Items have (numeric) values in addition to being provable/unprovable

• Computes total inside probability of all parses of a sentence

• Dynamic programming is a special case:– Only ground items are provable – Computation is stratified aggregation– Intermediate subresults are stored for reuse in

computation

goal += constit(s,0,N) if length(N).constit(X,I,J) += rewrite(X,W) if word(W,I,J).constit(X,I,K) += rewrite(X,Y,Z) *

constit(Y,I,J) * constit(Z,J,K).


Semantics of weighted logic programs

• Every item has a unique aggregation operator• Free variables on the right-hand side of a rule are

aggregated over• Examples in this talk assume that we are operating

in a semiring

program:constit(X,I,K) += rewrite(X,Y,Z) * constit(Y,I,J) * constit(Z,J,K)constit(X,I,K) += rewrite(X,W) * word(W,I,K)

semantics:[constit(X,I,K)]= YZK [rewrite(X,Y,Z)]*[constit(Y,I,J)]*[constit(Z,J,K)]

+ W [rewrite(X,W)] * [word(W,I,K)]


Evaluation of weighted logic programs

• Mostly beyond the scope of this paper, but options include…

• Backward-chaining (à la Prolog)• Forward-chaining (à la Dyna [Eisner et. al.

EMNLP ‘05], [Goodman ‘99] for semirings)

• Hybrid of backward-chaining and forward-chaining

• Something entirely different…


Runtime complexity of WLP evaluation

• Complexity of a rule = number of ways to instantiate it:

– k grammar symbols (X, Y, Z)– n words in sentence (I, J, K)– O(k3n3)

• Sparsity or use of default values may make actual runtime faster than it appears

constit(X,I,K) += rewrite(X,Y,Z) * constit(Y,I,J) * constit(Z,J,K)


Program Transformations

• Optimize performance of a program– consolidate or avoid work– especially useful for a declarative language

• Demonstrate semantic equivalence of pairs of programs

• Generalize techniques discovered for particular problems to seemingly unrelated problems


Folding

constit(X,I,K) += constit(Y,I,J) * constit(Z,J,K) * rewrite(X,Y,Z)

constit(‘s’,0,3)

constit(‘vp’,1,3)constit(‘np’,0,1)

rewrite(‘s’,‘np’,‘vp’)


Folding

constit(X,I,K) += constit(Y,I,J) * constit(Z,J,K) * rewrite(X,Y,Z)temp(X,Y,Z,J,K) += constit(Z,J,K) * rewrite(X,Y,Z)




temp(‘s’,‘np’,‘vp’,1,3)


Folding

constit(X,I,K) += constit(Y,I,J) * temp(X,Y,Z,J,K) temp(X,Y,Z,J,K) += constit(Z,J,K) * rewrite(X,Y,Z)




temp(‘s’,‘np’,‘vp’,1,3)


Folding

constit(X,I,K) += constit(Y,I,J) * temp(X,Y,Z,J,K)temp(X,Y,Z,J,K) += constit(Z,J,K) * rewrite(X,Y,Z)




temp(‘s’,‘np’,‘vp’,1,3)


Folding + distributive law

• Using the distributive property can reduce complexity

N multiplicationsN-1 additions

1 multiplicationN-1 additions

N2 multiplications(N-1)2 additions

N multiplications2¢(N-1) additions


Folding as categorial grammar

constit(X,I,K) += constit(Y,I,J) * temp(X,Y,J,K)temp(X,Y, J,K) += constit(Z,J,K) * rewrite(X,Y,Z)




temp(‘s’,‘np’,1,3)

constit(‘s’,I,3)/constit(‘np’,I,1)


Folding as categorial grammar

constit(X,I,K) += constit(Y,I,J) * constit(X,I,K)/constit(Y,I,J)constit(X,I,K)/constit(Y,I,J) += constit(Z,J,K) * rewrite(X,Y,Z)




constit(‘s’,I,3)/constit(‘np’,I,1)


Var. elimination in graphical models as folding

marginal(A,B) += p1(X,Y,A) * p2(A,Y,Z,W) * p3(B,Z,W)• Marginalize out Z,W:

temp(A,Y,B) += p2(A,Y,Z,W) * p3(Z,B,W)marginal(A,B) += p1(X,Y,A) * temp(A,Y,B)

X

A

Y

W

Z

Bp1 p2 p3


Unfolding

• Inline computation: constit(X,I,K) += constit(Y,I,J) * constit(Z,J,K) * rewrite(X,Y,Z)rewrite(‘np’,‘np’,‘pp’) := 0.4rewrite(‘np’,‘det’,‘n’) := 0.3

constit(‘np’,I,K) += constit(‘np’,I,J) * constit(‘pp’,J,K) * 0.4

constit(‘np’,I,K) += constit(‘det’,I,J) * constit(‘n’,J,K) * 0.3


Folding + Unfolding

• Folding--introduces intermediate item– Increases or does not affect asymptotic space

complexity– Decreases or does not affect asymptotic time

complexity• Unfolding--removes intermediate item

– Decreases or does not affect asymptotic space complexity

– Increases or does not affect asymptotic time complexity

• Folding + unfolding together: can improve both time and space complexity


“Middle-up” computation

• Retirement savings problem:

• deposit(X): amount invested per year by person X• rate(Prev, Year): rate of investment growth from year

Prev to year Year• value(X, Year): total value of investment in year Year

• Will iterate over the full term of the investment for every possible person X!

value(X, Year) += deposit(X) if contributed(Year).value(X, Year) += value(X, Prev) * rate(Prev, Year)


“Middle-up” computation

• growth(Year): total percentage growth up to Year

• growth is a “speculative” version of value:

value(X, Year) += deposit(X) if contributed(Year).value(X, Year) += value(X, Prev) * rate(Prev, Year)

growth(Year) += 1 if contributed (Year).growth(Year) += growth(Prev) * rate(Prev, Year).value(X, Year) += deposit(X) * growth(Year).


Speculation transformation

• Original in this work, as far as we know• Build items “speculatively”:

– Instead of using bar to build foo, build an item called foo/bar and attach bar to it later

• Generalization of classical folding:– Allows folding from recursive rules– Operates on a set of items in the program,

rather than a single rule• Generalization of gap-passing in

combinatorial grammar


From folding to speculation

a

b c

r dx

a += b * cb += rc += d * e * x

e



a

b c

r dx

a += b * cb += rc += d * e * x

e

folding

a

b c

r

d

x

e

c/x

a += b * cb += rc += c/x * xc/x += d * e



a

b c

r dx

a += b * cb += rc += d * e * x

e



a

b c

r dx

a += b * cb += rc += d * e * x

e

for each x, we must recomputeall of these items

…unless we compute them speculatively before attaching x


Gap-passing as speculation

a

b c

r dx

a += b * cb += rc += d * e * x

e

speculationa/x

b c/x

r dx/xe

a

x

1

a += a/x * x a/x += b * c/xb += rc/x += d * e * x/xx/x += 1


Gap-passing as speculation

a

b c

r dx

a += b * cb += rc += d * e * x

e

speculationa/x

b c/x

r dx/xe

a

x

1

a += a/x * x a/x += b * c/xb += rc/x += d * e * x/xx/x += 1

independentof x


Speculation on recursive rules

a

b c

r dx

a += b * cb += rc += d * e * xc += f * c

e

Even if a recursive rule depends on x, we can still build speculatively!f


Speculation on recursive rules

a

b c

r dx

a += b * cb += rc += d * e * xc += f * c

e f

a/x

b c/x

r dx/xe f

x

a

1

a += a/x * xa += b * c/xb += rc/x += d * e * x/xc/x += f * c/xx/x += 1

speculation


Why does this work?

a += b * cb += rc += d * e * xc += f * c

a += b * cb += rc += d * e * xc += f * d * e * xc += f * f * c

unfold

a += b * cb += rc += d * e * xc += f * d * e * xc += f * f * d * e * xc += f * f * f * d * e * xc += f * f * f * f * c

unfoldagain

always an x present to fold out


Difficulties of recursive speculation

a

b c

r dx

a += b * cb += rc += d * e * xc += f * cc += g

e fg


Difficulties of recursive speculation

a

b c

r dx

a += b * cb += rc += d * e * xc += f * cc += g

e fg

c += d * e * xc += f * d * e * xc += f * f * cc += f * gc += g

Part of c is derived from x,part from g

Need to separate parts of c that x contributes to from parts that it doesn’t


Folding into other() items

a

b c

r dx

a += b * cb += rc += d * e * xc += f * cc += g

e fg

Need to separate parts of c that x contributes to from parts that it doesn’t

a

b c

r

dxefg

other(c) c’

f

Now we can slash x out of c’ as before


Split head-automaton grammars

• Bilexical CFG which attaches all children right of the head before all children left of the head

• Eisner & Satta ‘99 improve runtime from O(n5) to O(n3) using the speculation transformation

r-constit(X:H,I,K) += word(H,I,K). r-constit(X:H,I,K) += rewrite(X:H,Y:H,Z:H2) * r-constit(Y:H,I,J) * constit(Z:H2,J,K). constit(X:H,I,K) += r-constit(X:H,I,K) constit(X:H,I,K) += rewrite(X:H,Y:H2,Z:H) * constit(Y:H2,I,J) * constit(Z:H,J,K) goal += constit(s:H,0,N) * length(N).




+= *

+=

+= *




+= *

+=

+= *

=

l-constit(X:H,X0,I,I0) = constit(X:H,I,J)/r-constit(X:H,I,K)


Split head-automaton grammars+= *

+=

+= *

=

l-constit(X:H,X0,I,I0) = constit(X:H,I,J)/r-constit(X:H,I,K)

+= *

+=

+= *

1

+= *

transformed program:


SHAG transformed program

r-constit(X:H,I,K) += word(H,I,K)r-constit(X:H,I,K) += rewrite(X:H,Y:H,Z:H2) * r-constit(Y:H,I,J) * constit(Z:H2,J,K).l-constit(X0:H0,X0,J0,J0) += 1l-constit(X:H0,X0,I,J0) += rewrite(X:H0,Y:H2,Z:H0) * constit(Y:H2,I,J) * l-constit(Z:H0,X0,J,J0)constit(X:H,I,K) += l-constit(X:H,X0,I,I0,K) * r-constit(X0:H,I0,K)goal += constit(s:H,0,N) * length(N)• Reduces runtime to O(n4)• Subsequent unfolding and folding can reduce runtime to O(n3)


Forward-chaining

• Greedy algorithm: – Given a set of proven items and their values– Update the values of items occuring on left-

hand side of a rule so that they match the current set

• Forward-chaining can produce (possibly infinitely) many items that we are not interested in

• We can use program transformation to control the evaluation order.


Forward-chaining example

goal += a(X) * b(X)a(X) += c(X,Y) * d(Y)d(Y) += e(Y)f(X) += c(X,X) * e(X)g(X) += f(X) * a(X)

e(2) := 2c(1,2) := 1c(3,3) := 2e(3) := 4b(1) := 1b(2) := 5b(3) := 3

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)

green= derived by forward chaining




e(2) := 2c(1,2) := 1c(3,3) := 2e(3) := 4b(1) := 1b(2) := 5b(3) := 3

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)




e(2) := 2c(1,2) := 1c(3,3) := 2e(3) := 4b(1) := 1b(2) := 5b(3) := 3

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(3) g(3)




e(2) := 2c(1,2) := 1c(3,3) := 2e(3) := 4b(1) := 1b(2) := 5b(3) := 3

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)




e(2) := 2c(1,2) := 1c(3,3) := 2e(3) := 4b(1) := 1b(2) := 5b(3) := 3

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)

pink derived values were irrelevant to goal


Backward-chaining

• Starting from goal, recursively compute values for all items that could be used to derive goal

• Forward-chaining: may prove items that are irrelevant to goal

• Backward-chaining: may investigate items that cannot be proven


Backward example


e(2) := 2c(1,2) := 1c(3,3) := 2e(3) := 4b(1) := 1b(2) := 5b(3) := 3

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)


Backward example


e(2) := 2c(1,2) := 1c(3,3) := 2e(3) := 4b(1) := 1b(2) := 5b(3) := 3

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)

blue nodes visited by backward chaining


Backward example


e(2) := 2c(1,2) := 1c(3,3) := 2e(3) := 4b(1) := 1b(2) := 5b(3) := 3

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)

backtrack up to goal += a(1) * b(1)


Backward example


e(2) := 2c(1,2) := 1c(3,3) := 2e(3) := 4b(1) := 1b(2) := 5b(3) := 3

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)


Backward example


e(2) := 2c(1,2) := 1c(3,3) := 2e(3) := 4b(1) := 1b(2) := 5b(3) := 3

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)

no way to derive c(2,2)!


Backward example


e(2) := 2c(1,2) := 1c(3,3) := 2e(3) := 4b(1) := 1b(2) := 5b(3) := 3

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)

pink nodes visited but not provable


Backward example


e(2) := 2c(1,2) := 1c(3,3) := 2e(3) := 4b(1) := 1b(2) := 5b(3) := 3

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)

etc…


Summary


e(2) := 2c(1,2) := 1c(3,3) := 2e(3) := 4b(1) := 1b(2) := 5b(3) := 3

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)

light blue nodes visited by both algorithms


Simulating backward chaining

• “Magic templates” transformation [Ramakrishnan 91] simulates backward-chaining evaluation using forward-chaining strategy

• Derive “magic” filters on forward-chaining• “magic_X” “backward-chaining would

examine X”


Magic templates transformation

• “magic_X” “backward-chaining would examine X”


magic

goal += a(X) * b(X) if magic_goala(X) += c(X,Y) * d(Y) if magic_a(X)d(Y) += e(Y) if magic_d(Y)f(X) += c(X,X) * e(X) if magic_f(X)g(X) += f(X) * a(X) if magic_g(X)magic_goal.magic_a(X) :- magic_goalmagic_b(X) :- magic_goal, a(X).magic_c(X,Y) :- magic_a(X).magic_d(Y) :- magic_a(X), c(X,Y).etc…


Magic evaluationgoal += a(X) * b(X) if

magic_goala(X) += c(X,Y) * d(Y) if

magic_a(X)d(Y) += e(Y) if

magic_d(Y)f(X) += c(X,X) * e(X) if

magic_f(X)g(X) += f(X) * a(X) if

magic_g(X)magic_goal.magic_a(X) :-

magic_goalmagic_b(X) :-

magic_goal, a(X).magic_c(X,Y) :-

magic_a(X).etc.

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)










magic_a(X).etc.

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)

blue: derived magic filter










magic_a(X).etc.

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)

magic_a(X) is a single default value, which filters all three instantiations










magic_a(X).etc.

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)










magic_a(X).c(1,2) := 2. if

magic_c(X,Y).

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)

now we can actually build c(1,2) !










magic_a(X).c(1,2) := 2. if

magic_c(X,Y).

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)










magic_a(X).magic_d(Y) :-

magic_a(X), c(X,Y).

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)










magic_a(X).magic_e(Y) :-

magic_d(Y).

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)










magic_a(X).magic_e(Y) :-

magic_d(Y).

goal

a(1) b(1) a(2) b(2) a(3) b(3) f(3)f(1) f(2)

c(1,2) d(2)

e(2)

c(3,3) d(3)

e(3)

c(2,2)d(1)

e(1)

g(1) g(2) g(3)

magic filtering avoided generating irrelevant items


Magic Templates Transformation

• 3rd argument of magic_constit always appears unbound in the head, so a single “default” value is derived for it

magic_constit(Y,I,J) :-

magic_constit(X,I,K), rewrite(X,Y,Z)

magic_constit(Z,J,K) :-

magic_constit(X,I,K), rewrite(X,Y,Z),constit(Y,I,J)

constit(X,I,K) +=

rewrite(X,Y,Z) * constit(Y,I,J) * constit(Z,J,K)

needed_only_if magic_constit(X,I,K)


Earley’s Algorithm

• constit(X,I,K) is built only if it can attach to something that is accessible from goal.

• Same operation as Earley’s algorithm [Minnen 96]• Applying the magic templates transformation

again yields the left-corner filter– only magic items are built if they will be provable

bottom-up

• Magic templates can also be used to compute values on-demand


Conclusions

• Program transformation on weighted logic programs is a useful generalization of the relationships between algorithms

• Described four useful transformations, including novel speculation transformation

• See paper for a more formal treatment• Future work:

– more transformations– automatic search for optimal transformation sequence– forward-chaining algorithm for non-ground items


Thank you!

program transformation for optimization of parsing algorithms and other weighted logic programs...

Documents