margin-based decomposed amortized inference gourab kundu, vivek srikumar, dan roth

1
Margin-based Decomposed Amortized Inference Gourab Kundu, Vivek Srikumar, Dan Roth This research is sponsored by the Army Research Laboratory (ARL) under agreement W911NF-09-2-0053, Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0181, DARPA under agreement number FA8750-13-2-0008, Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20155. 1. Margin based Amortization 2. Decomposed Amortized Inference using Lagrangian Relaxation Experimental Setup Amortized Inference Research Question: Can we solve the k-th inference instance much fast than the 1 st ? Amortized inference (Srikumar et al 2012) shows how computation from earlier inference instances can be used to speed up inference for new, previously unseen instances. Key Observations: (1) In NLP, we solve a lot of inference problems, at least one per sentence. (2) Redundancy of structures: The number of observed structures (blue solid line) is much smaller than the number of inputs (red dotted line). Moreover, the distribution of observed structures is highly skewed (inset). (Eg. for POS, a small number of tag sequences are much more frequent than the rest.) Pigeon Principle Applies. General Recipe If CONDITION(problem cache, new problem) then (no need to call the solver) SOLUTION(new problem) = old solution Else Call base solver and update cache End + A theorem guaranteeing correctness We simulate a long-running NLP process by caching problems and solutions from Gigaword corpus. We used a database engine to cache ILPs and their solutions along their and structured margin. We compare our approaches to a state-of-the-art ILP solver (Gurobi) and also to Theorem 1 from (Srikumar et al.2012). Semantic Role Labeling Entities and Relations Let B be the solution y p for inference problem p with objective c p , denoted by the red hyperplane and let A be the second best assignment. For a new objective function c q (blue), if the margin δ is greater than the sum of the decrease in the objective value of y p and the maximum increase in the objective of another solution (Δ), then the solution to the new inference problem will still be y p . Formally , the condition is given by: - ( c q - c p ) T y p + Δ <= δ At the caching stage, δ is computed for each ILP and stored in cache. At test time, computing Δ exactly requires solving an ILP. Instead, we compute an approximate Δ by solving a relaxed problem which can be done efficiently. More redundancy among smaller structures Redundancy in components of structures: We extend amortization to cases where the full structured output is not repeated by storing partial computation for future inference problems. Given a problem: max c q T y s.t. M T y <= b: 1. Partition constraints into two sets, say C1 and C2 (with constraints M 2 T y b 2 ) 2. Define L(λ): max y ϵ C1 c q T y – λ T (M 2 T y b 2 ), solved using any amortized algorithm 3. Dual problem: min λ >= 0 L(λ), solved using gradient descent over dual variables λ. The dual can be decomposed into multiple smaller problems if no constraint in C 1 has active variables from two different smaller problems. Even otherwise, it has fewer constraints than the original problem. Considering the dual can have two advantages from the perspective of amortization: (1) Smaller problems (2) Higher chance of cache hits [Punyakanok, et al 2008] [Roth & Yih 2004] Solve only one in six problems Solve only one in four problems See also This work continues the original work on Amortization: Srikumar, Kundu and Roth. On Amortizing Inference Cost for Structured Prediction. EMNLP, 2012 Wall clock improvements too ILP formulations ILP (Integer Linear Programming) is a general formulation for inference in structured prediction tasks [Roth & Yih, 04, 07] Inference using ILP has been successful in NLP tasks e.g. SRL, Dep. Parsing, Information extraction and more. ILP can be expressed as: max cx max2x 1 + 3x 2 s.t. Ax b x 1 + x 2 <= 1 x integer Amortized inference depends on having an ILP formulation; but multiple solvers might be used.

Upload: robert-marshall

Post on 31-Dec-2015

39 views

Category:

Documents


0 download

DESCRIPTION

Margin-based Decomposed Amortized Inference Gourab Kundu, Vivek Srikumar, Dan Roth. Amortized Inference. General Recipe. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Margin-based Decomposed Amortized Inference Gourab Kundu, Vivek Srikumar, Dan Roth

Margin-based Decomposed Amortized Inference

Gourab Kundu, Vivek Srikumar, Dan Roth

This research is sponsored by the Army Research Laboratory (ARL) under agreement W911NF-09-2-0053, Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0181, DARPA under agreement number FA8750-13-2-0008, Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20155.

1. Margin based Amortization 2. Decomposed Amortized Inference using Lagrangian Relaxation

Experimental Setup

Amortized InferenceResearch Question: Can we solve the k-th inference instance much fast than the 1st? Amortized inference (Srikumar et al 2012) shows how computation from earlier inference instances can be used to speed up inference for new, previously unseen instances.

Key Observations: (1) In NLP, we solve a lot of inference problems, at least one per sentence. (2) Redundancy of structures: The number of observed structures (blue solid line) is much smaller than the number of inputs (red dotted line). Moreover, the distribution of observed structures is highly skewed (inset). (Eg. for POS, a small number of tag sequences are much more frequent than the rest.) Pigeon Principle Applies.

General Recipe

If CONDITION(problem cache, new problem)then (no need to call the solver)

SOLUTION(new problem) = old solutionElse

Call base solver and update cacheEnd

+ A theorem guaranteeing correctness

We simulate a long-running NLP process by caching problems and solutions from Gigaword corpus. We used a database engine to cache ILPs and their solutions along their and structured margin. We compare our approaches to a state-of-the-art ILP solver (Gurobi) and also to Theorem 1 from (Srikumar et al.2012).

Semantic Role Labeling Entities and Relations

Let B be the solution yp for inference problem p with objective cp, denoted by the red hyperplane and let A be the second best assignment. For a new objective function cq (blue), if

the margin δ is greater than the sum of the decrease in the objective value of yp

and the maximum increase in the objective of another solution (Δ), then the solution to the new inference problem will still be yp. Formally , the condition is given by: - ( cq - cp)T yp + Δ <= δ

At the caching stage, δ is computed for each ILP and stored in cache. At test time, computing Δ exactly requires solving an ILP. Instead, we compute an approximate Δ by solving a relaxed problem which can be done efficiently.

More redundancy among smaller structures

Redundancy in components of structures: We extend amortization to cases where the full structured output is not repeated by storing partial computation for future inference problems.

Given a problem: max cqT y s.t. MT y <= b:

1. Partition constraints into two sets, say C1 and C2 (with constraints M2

T y – b2)2. Define L(λ): max y ϵ C1 cq

T y – λT (M2T y –

b2), solved using any amortized algorithm3. Dual problem: min λ >= 0 L(λ), solved using

gradient descent over dual variables λ.The dual can be decomposed into multiple smaller problems if no constraint in C1 has active variables from two different smaller problems. Even otherwise, it has fewer constraints than the original problem. Considering the dual can have two advantages from the perspective of amortization: (1) Smaller problems (2) Higher chance of cache hits

[Punyakanok, et al 2008][Roth & Yih 2004]

Solve only one in six problems

Solve only one in four problems

See alsoThis work continues the original work on Amortization: Srikumar, Kundu and Roth. On Amortizing Inference Cost for Structured Prediction. EMNLP, 2012

Wall clock improvements too

ILP formulations

ILP (Integer Linear Programming) is a general formulation for inference in structured prediction tasks [Roth & Yih, 04, 07]

Inference using ILP has been successful in NLP tasks e.g. SRL, Dep. Parsing, Information extraction and more.

ILP can be expressed as: max cx max2x1 + 3x2

s.t. Ax ≤ b x1 + x2 <= 1 x integer Amortized inference

depends on having an ILP formulation; but multiple solvers might be used.