indirect supervision protocols for learning in natural language processing ii. learning by inventing...

Indirect Supervision Protocols for Learning in Natural Language Processing

II. Learning by Inventing Binary Labels

This work is supported by DARPA funding under the Bootstrap Learning and the Machine Reading Programs.

Abstract Many natural language processing (NLP) tasks require making decisions with respect to interdependent output variables. Current approaches to this problem, known as structured prediction, rely on using annotated data to learn a model mapping inputs to a set of output variables. Unfortunately, providing this from of supervision is a difficult task often requiring highly specialized knowledge that is not commonly available. Thus alleviating the supervision effort is a major challenge in scaling machine learning techniques to real world natural language processing problems.We investigate methods that allow learning models for structured prediction using an indirect supervision signal which is considerably easier to obtain. We suggest three indirect learning protocols for common NLP learning scenarios.We further demonstrate how to obtain the indirect supervision signal for several common NLP tasks such as semantic parsing, textual entailment, paraphrase identification, transliteration, POS tagging and information extraction and show empirically how this signal can be effectively used for these learning tasks.

III. Driving Semantic Parsing from the World’s Response

In this work we aim to alleviate the supervision burden by using a new form of supervision derived from the response to the learned model’s actions. Consider, for example, the task of converting Natural Language Questions into Logical statements used to query a database. Current approaches to semantic parsing, rely on annotated training data mapping sentences to logical forms. We exploit the ability to query the database to get supervision, thus reducing the supervision effort to providing the correct responses to the text queries, rather than working at the logical form level.

In this setting the learner has access to a feedback function providing weak binary supervision signal, indicating if the predicted query response is identical to the expected one. However, our learning task requires a stronger supervision signal indicating the correctness of the hypothesized query components rather than its result. We define the prediction task of mapping the input text (denoted x) to a logical representation (denoted z) by aligning their constituents (denoted y):

Our experimental results show that our algorithms can use this supervision to recover the correct queries (R250).

“What is the largest state that borders Texas?"NLQuery

largest( state( next_to( const(texas))))LogicalQuery

Interactive Computer Environment

New MexicoQuery Result:

Previous systems:learn using logical forms

Our approach: use only the responses

We propose two algorithms for this task: (1) Direct approach: uses the binary supervision signal directly by formulating the decision as a binary task. (2) Aggressive Approach: iteratively learns a structured model using positively labeled predictions.

Our key challenge is therefore: how to use this weak binary signal to learn this complex structured predication task?

Moreover, given new data (Q250) our learned model can generate the correct queries with high accuracy compared to fully supervised models.

We present a novel approach for structured prediction that exploits the observation that structured output problems often have a companion learning problem – determining whether a given input possess a good structure.

Ming-Wei Chang, James Clarke, Dan Goldwasser, Dan Roth and Vivek Srikumar

I. Learning over Constrained Latent Representations (LCLR)

intuition that negative binary examples cannot have a good structure, and uses it

Our experimental results exhibit the significant contribution of the easy-to-get indirect binary supervision on three NLP structure learning problems.

Phonetic alignment Information Extraction POS Tagging

phonetic alignments. While obtaining direct supervision for structures is difficult, it is often very easy to obtain indirect supervision for the companion binary decision problem.

Defined over the structured loss function Ls

We are interested in adding the binary information into the optimization process. Following the intuition that negative examples cannot have good structures, we formalize the connection between the binary decision and the structured one as follows:

We proceed to define the loss function for the binary classification problem:

Finally, we formulate the joint optimization problem:

We develop a large margin framework that jointly learns from both direct and indirect forms of supervision. Our framework exploits the

Learning Protocol: Our setting extends the standard structured learning setting (e.g., SSVM), where learning is defined as the following minimization problem:

Many NLP tasks can be phrased as decision problems over complex linguistic structures which are not expressed directly in the input. Successful learning depends on correctly encoding these, often

Empirical Evaluation To demonstrate the effectiveness of our learning framework we evaluate it on three NLP tasks – TE, transliteration and paraphrase identification. Results show that our framework outperforms existing systems taking both two-stage and joint approaches!

Learning Framework We aim to learn a binary classification function (f(x)) defined over a weight vector (u) and a latent representation (h ) constrained by application specific semantics (C)

latent structures, as features for the learning system. For example, deciding if one sentence paraphrases another depends on aligning the sentences constituents and extracting features from that alignment. Learning an Intermediate Representation: Most existing approaches take a two-stage approach separating the representation decisions from the classification decision. We follow the intuition that good representations are those that facilitate the learning task and suggest a joint learning framework that learns both tasks together. We present a generalized learning framework applicable for a wide range of applications, and show how to encode domain specific knowledge into the learning process using Integer Linear Programming formulation.

In our formulation h is a binary vector indicating the features used to represent a given example. Learning in this framework corresponds to solving the following optimization problem minimizing the loss over the binary training data: However, since the prediction is defined over a maximization problem, this results in a non-convex optimization problem: We present a novel iterative optimization algorithm for solving it.

Textual Entailment Transliteration Identification Paraphrase Identification

I T A L Y

א י ט ל י ה f (איטליה,Italy)

Yes No

Phonetic alignment

Transliteration iden.

For example, transliteration identification and phonetic alignment are companion problems — only positive binary examples have good

to push the structured decisions towards better structures.

Semantic Parsing