a generalization of forward-backward algorithm ai azuma yuji matsumoto nara institute of science and...

26
A Generalization of Forward-backward Algorithm Ai Azuma Yuji Matsumoto Nara Institute of Science and Technology

Upload: ilene-stone

Post on 14-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

A Generalization of Forward-backward Algorithm

Ai AzumaYuji Matsumoto

Nara Institute of Science and Technology

Forward-backward algorithm

• Allows efficient calculation of sums (e.g. expectation, ...) over all paths in a trellis.

• Plays an important role in sequence modeling• HMMs (Hidden Markov Models)• CRFs (Conditional Random Fields)

[Lafferty et al., 2001]• ...

A sequential labeling example: part-of-speech tagging

SOURCE

“Time flies like an arrow”

Time[noun]

Time[verb]

Time[prep.]

flies[noun]

flies[verb]

flies[prep.]

like[noun]

like[verb]

like[prep.]

an[noun]

an[verb]

an[prep.]

arrow[noun]

arrow[verb]

arrow[prep.]

SINK

Time[indef. art.]

flies[indef. art.]

like[indef. art.]

an[indef. art.]

arrow[indef. art.]

in CRFs and HMMs, we need to compute the "sum" ofthe probabilities (or scores) of all paths.

Forward-backward algorithm efficiently computes sums over all paths in the trellis with dynamic programming

It is intractable to enumerate all paths in the trellis because the number of all paths is enormous

Forward-backward algorithm recursively computes the sum from source/sink to sink/source with keeping intermediate results on each node and arc

Forward-backward algorithm is applicable to

Normalization constant of CRFs

E-step for HMMs

Feature expectationon CRFs

Yy yy

xCc

tcCc

P ctE ,,

Yy yy

xxFλx Cc

kCc

kP cfcZ

fE ,,exp1

Yy y

xFλxCc

cZ ,exp

t = type of node/node pair

= k-th featurekf yC = set of nodes and arcs (cliques) in path yY = set of paths

0th-order moment(Normalization constant)

1st-order moment

Type of sums computable with forward-backward algorithm:

Yy yy CcCc

cfc

Yy yCc

c

yC = set of nodes and arcs (cliques) in path yY = set of paths

But sometimes we need higher-order multivariate moments...

Yy yyy

Kn

CcK

n

CcCc

cfcfc 1

1

To name a few examples:Correlation between featuresObjectives more complex than log-likelihoodParameter differentiations of these...

Our goal: To generalize forward-backward algorithm for higher-order multivariate moments!

Can we derive dynamic programming for this formula?

Answer Record multiple forward/backward variables for each clique,

and Combine all the previously calculated values by the binomial theorem

xYy yyy

Kn

CcK

n

CcCc

cfcfc 1

1

SOURCE

u Cc

cusrc

0Yy y

・・・・・

u

u CcCc

cfcusrc

1Yy yy

u

n

CcCcn cfcu

srcYy yy

A set of paths  from SOURCE to u

usrcY

SOURCE

u Cc

cusrc

0Yy y

・・・・・

u

u CcCc

cfcusrc

1Yy yy

u

n

CcCcn cfcu

srcYy yy

A set of paths  from SOURCE to u

usrcY

Ordinary forward-backward records only this variable

Direct ancestors of v

u

v・・・・・SOURCE

vx

xvvprev

00

vxvx

xvfxvvprev

0prev

11

i

j vxji

ji vvf

j

ivv

0 prev

・・・・・

・・・・・

ni ,,0

vprev

・・・・・

Direct ancestors of v

u

v・・・・・SOURCE

vx

xvvprev

00

vxvx

xvfxvvprev

0prev

11

i

j vxji

ji vvf

j

ivv

0 prev

・・・・・

・・・・・

ni ,,0

vprev

・・・・・

These are derived from the binomial theorem

These are derived from the binomial theorem

Direct ancestors of SINK

SINK・・・・・SOURCE

SINKprev

・・・・・

SINKprev

00 SINKSINKx

x

SINKprev

0

SINK

SINKSINKSINK

xji

i

j

ji f

j

i

・・・・・

・・・・・ ni ,,0

Desired values

Summary of Our Ideas

u

v・・・・・ ・・・・・

u0

・・・・・

u1

un

v0

・・・・・

v1

vn・・・・・

SOURCE

multiple variablesfor each clique

multiple variablesfor each clique

Dependency between variables in a step,which is derived from the binomial theoremDependency between variables in a step,

which is derived from the binomial theorem

For multivariate cases, forward/backward variables have multiple indices

u

u0,,0

・・・・・

u1,,0

uKnn ,,1

xYy yyy

00

1Cc

KCcCc

cfcfc

xYy yyy

10

1Cc

KCcCc

cfcfc

xYy yyy

Kn

CcK

n

CcCc

cfcfc 1

1

・・・・・

To calculate the following form

computational cost of the generalized forward-backward is proportional to

.11 22

21 nnEV

Computational cost is only linear in the number of nodes and arcs in the trellis

xYy yyy

Kn

CcK

n

CcCc

cfcfc 1

1

Linear in |V| and |E|Linear in |V| and |E|

Merits of the generalized forward-backward algorithm

1. The generalized forward-backward subsumes many existing task-specific algorithms

2. For some tasks, it leads to a solution more efficient than the existing ones

Merit 1. The generalized forward-backward subsumes many existing task-specific algorithms:

Task Sum to compute

Parameter diffs. of Hamming-loss for CRFs [Kakade et al., 2002]

Parameter diffs. of entropy for CRFs[Mann et al., 2007]

Hessian-vector

product for CRFs[Vishwanathan et al., 2006]

y yyy

xxFλxFλCc

kCcCc

cfcc ,,,exp

y yyy

y yyy yy

xFλxFxFλ

xFλxFλxFxFλ

CcCcCc

CcCcCcCc

ccc

cccc

,,,exp

,,exp,,exp

y yyy

xxFλCc

kCcCc

cfcc ,,exp

Merit 1. The generalized forward-backward subsumes many existing task-specific algorithms:

Task Sum to compute

Parameter diffs. of Hamming-loss for CRFs [Kakade et al., 2002]

Parameter diffs. of entropy for CRFs[Mann et al., 2007]

Hessian-vector

product for CRFs[Vishwanathan et al., 2006]

y yyy

xxFλxFλCc

kCcCc

cfcc ,,,exp

y yyy

y yyy yy

xFλxFxFλ

xFλxFλxFxFλ

CcCcCc

CcCcCcCc

ccc

cccc

,,,exp

,,exp,,exp

y yyy

xxFλCc

kCcCc

cfcc ,,exp

All these formulas have a form computable with our proposed method.All these formulas have a form computable with our proposed method.

The previously proposed algorithms for these tasks are task-specific

The generalized forward-backward is a task-independent algorithm applicable to formulae of the form

If a problem involves this form, it immediately offers efficient solution

xYy yyy

Kn

CcK

n

CcCc

cfcfc 1

1

Merits of the generalized forward-backward algorithm

1. The generalized forward-backward subsumes many existing task-specific algorithms

2. For some tasks, it leads to a solution more efficient than the existing ones

Merit 2. Efficient optimization procedure with respect to Generalized Expectation Criteria for CRFs [Mann et al., 2008]

     

     

     

EVL Computational cost is proportional to

   

Computational cost is proportional to

EV

Algorithm proposed in [Mann et al., 2008] By a specialization of the generalization

Nodes labeled as answers

(L = # of nodes labeled as answers)

Future tasks

• Explore other tasks to which our generalized forward-backward algorithm is applicable

• Extend the generalized forward-backward to trees and general graphs containing cycles

Summary• We have generalized the forward-backward

algorithm to allow for higher-order multivariate moments

• The generalization offers an efficient way to compute complex models of sequences that involve higher-order multivariate moments

• Many existing task-specific algorithms are instances of this generalization

• It leads to a faster algorithm for computing Generalized Expectation Criteria for CRFs

Summary• We have generalized the forward-backward

algorithm to allow for higher-order multivariate moments

• The generalization offers an efficient way to compute complex models of sequences that involve higher-order multivariate moments

• Many existing task-specific algorithms are instances of this generalization

• It leads to a faster algorithm for computing Generalized Expectation Criteria for CRFs

Thank you for your attention!