natural language generation - sailab

16
Natural Language Generation Andrea Zugarini SAILab December 5th, 2019 LabMeeting, December 5th - 2019 Natural Language Generation

Upload: others

Post on 01-Mar-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Natural Language Generation - SAILab

Natural Language Generation

Andrea Zugarini

SAILab

December 5th, 2019

LabMeeting, December 5th - 2019 Natural Language Generation

Page 2: Natural Language Generation - SAILab

Natural Language Generation

Natural Language Generation is the problem of generating textautomatically.

Machine Translation, Text Summarization and Paraphrasingare all instances of NLG.

Language generation is a very challenging problem, that does notonly require text understanding, but it also involves typicalhuman skills, such as creativity.

Word representations and Recurrent Neural Networks (RNNs) arethe basic tools for NLG models,usually called as end-to-end sincethey learn directly from data.

LabMeeting, December 5th - 2019 Natural Language Generation

Page 3: Natural Language Generation - SAILab

From Language Modelling to NLG

Recap: Given a sequence of words y1, . . . , yn, a language modelis characterized by a probability distribution:

P (y1, . . . , ym) = P (ym|y1, . . . , ym−1) . . . P (y2|y1)P (y1)

that can be equivalently expressed as:

P (y1, . . . , ym) =

m∏i=1

P (yi|y<i)

Language Modelling is strictly related to NLG.

LabMeeting, December 5th - 2019 Natural Language Generation

Page 4: Natural Language Generation - SAILab

From Language Modelling to NLG

Many NLG problems are conditioned to some given context.

In Machine Translation, the generated text strictly depends onthe input text to translate.

Hence, we can add to the equation another sequence x of size n tocondition the probability distribution,

P (y1, . . . , ym) =

m∏i=1

P (yi|y<i, x1, . . . , xm)

obtaining a general formulation for any Language Generationproblem.

LabMeeting, December 5th - 2019 Natural Language Generation

Page 5: Natural Language Generation - SAILab

Natural Language Generation

A Machine-Learning model can then be used to learn P (·).

P (y|x,θ) =m∏i=1

P (yi|y<i, x1, . . . , xn,θ)

maxθ

P (y|x,θ)

where P (·) is the model parametrized by θ that is trained tomaximize the likelihood of y on a dataset of (x,y) sequence pairs.

Note: when x ∈ ∅ we fall in Language Modelling.

LabMeeting, December 5th - 2019 Natural Language Generation

Page 6: Natural Language Generation - SAILab

Natural Language GenerationOpen-ended vs non open-ended generations

Depending on how much x conditions P , we distinguish amongtwo kinds of text generation:

Open-ended

I Story Generation

I Text Continuation

I Poem Generation

I Lyrics Generation

Non open-ended

I Machine Translation

I Text Summarization

I Text Paraphrasing

I Data-to-text generation

There is no neat separation between those kind of problems.

LabMeeting, December 5th - 2019 Natural Language Generation

Page 7: Natural Language Generation - SAILab

DecodingLikelihood maximization

Once these models are trained, how do we exploit in inference togenerate new tokens?

Straightforward approach: pick the sequence with maximumprobability.

y = argmaxy1,...,yn

m∏i=1

P (yi|y<i, x1, . . . , xn,θ)

Finding the optimal y is not tractable.

Two popular approximate methods are greedy and beam search,both successful in non open-ended domains.

LabMeeting, December 5th - 2019 Natural Language Generation

Page 8: Natural Language Generation - SAILab

DecodingLikelihood maximization

Once these models are trained, how do we exploit in inference togenerate new tokens?

Straightforward approach: pick the sequence with maximumprobability.

y = argmaxy1,...,yn

m∏i=1

P (yi|y<i, x1, . . . , xn,θ)

Finding the optimal y is not tractable.

Two popular approximate methods are greedy and beam search,both successful in non open-ended domains.

LabMeeting, December 5th - 2019 Natural Language Generation

Page 9: Natural Language Generation - SAILab

DecodingLikelihood maximization

Beam search is a search algorithm that explores k2 nodes at eachtime step and keeps the best k paths.

Greedy search is a special case of beam search, where the beamwidth k is set to 1.

LabMeeting, December 5th - 2019 Natural Language Generation

Page 10: Natural Language Generation - SAILab

DecodingLikelihood maximization issues

Unfortunately, likelihood maximization is only effective in nonopen-ended problems, where there is a strong correlation betweeninput x and output y.

In open-ended domains, instead, it ends up in repetitive,meaningless generations.

To overcome such issue, sampling approaches better explore theentire learnt distribution P .

LabMeeting, December 5th - 2019 Natural Language Generation

Page 11: Natural Language Generation - SAILab

DecodingSampling strategies

The most common sampling strategy is: multinomial sampling.

At each step i a token yi is sampled from P .

yi ∼ P (yi|y<i, x1, . . . , xn)

The higher is P (yi|y<i, x1, . . . , xn) the more yi is likely to besampled.

LabMeeting, December 5th - 2019 Natural Language Generation

Page 12: Natural Language Generation - SAILab

Poem Generation

Project referencesailab.diism.unisi.it/poem-gen/

LabMeeting, December 5th - 2019 Natural Language Generation

Page 13: Natural Language Generation - SAILab

Poem Generation

Poem Generation is an instance of Natural Language Generation(NLG).

Goal: Design an end-to-end poet-based poem generator.

Issue: Poet’s production is rarely enough to train a neural model.

We will describe a general model to learn poet-based poem generators.

We experimented it in the case of Italian poetry.

LabMeeting, December 5th - 2019 Natural Language Generation

Page 14: Natural Language Generation - SAILab

Poem Generation

The sequence of text is processed by a recurrent neural network(LSTM), that has to predict the next word at each time step.

Note: <EOV>, <EOT> are special tokens to indicate the end ofa verse or a tercet.

LabMeeting, December 5th - 2019 Natural Language Generation

Page 15: Natural Language Generation - SAILab

Corpora

We considered poetries from Dante and Petrarca.

Divine Comedy

I 4811 tercets

I 108k words

I ABA rhyme scheme (enforced through rule-basedpost-processing)

Canzoniere

I 7780 tercets

I 63k words

Note: 100k words is 4 order of magnitude less data thattraditional corpora!!!

LabMeeting, December 5th - 2019 Natural Language Generation

Page 16: Natural Language Generation - SAILab

Let’s look at the Demo:

www.dantepetrarca.it

LabMeeting, December 5th - 2019 Natural Language Generation