natural language generation - sailab

Natural Language Generation

Andrea Zugarini

SAILab

December 5th, 2019

LabMeeting, December 5th - 2019 Natural Language Generation


Natural Language Generation is the problem of generating textautomatically.

Machine Translation, Text Summarization and Paraphrasingare all instances of NLG.

Language generation is a very challenging problem, that does notonly require text understanding, but it also involves typicalhuman skills, such as creativity.

Word representations and Recurrent Neural Networks (RNNs) arethe basic tools for NLG models,usually called as end-to-end sincethey learn directly from data.


From Language Modelling to NLG

Recap: Given a sequence of words y1, . . . , yn, a language modelis characterized by a probability distribution:

P (y1, . . . , ym) = P (ym|y1, . . . , ym−1) . . . P (y2|y1)P (y1)

that can be equivalently expressed as:

P (y1, . . . , ym) =

m∏i=1

P (yi|y<i)

Language Modelling is strictly related to NLG.


From Language Modelling to NLG

Many NLG problems are conditioned to some given context.

In Machine Translation, the generated text strictly depends onthe input text to translate.

Hence, we can add to the equation another sequence x of size n tocondition the probability distribution,

P (y1, . . . , ym) =

m∏i=1

P (yi|y<i, x1, . . . , xm)

obtaining a general formulation for any Language Generationproblem.



A Machine-Learning model can then be used to learn P (·).

P (y|x,θ) =m∏i=1

P (yi|y<i, x1, . . . , xn,θ)

maxθ

P (y|x,θ)

where P (·) is the model parametrized by θ that is trained tomaximize the likelihood of y on a dataset of (x,y) sequence pairs.

Note: when x ∈ ∅ we fall in Language Modelling.


Natural Language GenerationOpen-ended vs non open-ended generations

Depending on how much x conditions P , we distinguish amongtwo kinds of text generation:

Open-ended

I Story Generation

I Text Continuation

I Poem Generation

I Lyrics Generation

Non open-ended

I Machine Translation

I Text Summarization

I Text Paraphrasing

I Data-to-text generation

There is no neat separation between those kind of problems.


DecodingLikelihood maximization

Once these models are trained, how do we exploit in inference togenerate new tokens?

Straightforward approach: pick the sequence with maximumprobability.

y = argmaxy1,...,yn

m∏i=1

P (yi|y<i, x1, . . . , xn,θ)

Finding the optimal y is not tractable.

Two popular approximate methods are greedy and beam search,both successful in non open-ended domains.


DecodingLikelihood maximization

Beam search is a search algorithm that explores k2 nodes at eachtime step and keeps the best k paths.

Greedy search is a special case of beam search, where the beamwidth k is set to 1.


DecodingLikelihood maximization issues

Unfortunately, likelihood maximization is only effective in nonopen-ended problems, where there is a strong correlation betweeninput x and output y.

In open-ended domains, instead, it ends up in repetitive,meaningless generations.

To overcome such issue, sampling approaches better explore theentire learnt distribution P .


DecodingSampling strategies

The most common sampling strategy is: multinomial sampling.

At each step i a token yi is sampled from P .

yi ∼ P (yi|y<i, x1, . . . , xn)

The higher is P (yi|y<i, x1, . . . , xn) the more yi is likely to besampled.


Poem Generation

Project referencesailab.diism.unisi.it/poem-gen/


sailab.diism.unisi.it/poem-gen/

Poem Generation

Poem Generation is an instance of Natural Language Generation(NLG).

Goal: Design an end-to-end poet-based poem generator.

Issue: Poet’s production is rarely enough to train a neural model.

We will describe a general model to learn poet-based poem generators.

We experimented it in the case of Italian poetry.


Poem Generation

The sequence of text is processed by a recurrent neural network(LSTM), that has to predict the next word at each time step.

Note: <EOV>, <EOT> are special tokens to indicate the end ofa verse or a tercet.


Corpora

We considered poetries from Dante and Petrarca.

Divine Comedy

I 4811 tercets

I 108k words

I ABA rhyme scheme (enforced through rule-basedpost-processing)

Canzoniere

I 7780 tercets

I 63k words

Note: 100k words is 4 order of magnitude less data thattraditional corpora!!!


Let’s look at the Demo:

www.dantepetrarca.it


www.dantepetrarca.it

natural language generation - sailab

Documents