microplanning (sentence planning) part 1 kees van deemter

Microplanning(Sentence planning)

Part 1

Kees van Deemter

Natural Language Generation

• Taking some computer-readable gibberish

• “Translating” it into proper English

• Applications include– dialogue/chat systems– on-line help– summarisation, – document authoring

NLG Tasks (as explained by Anja):

1. Content determination: decide what to say; construct set of messages

2. Discourse planning: ordering, structuring concepts; rhetorical relationships

3. Sentence aggregation: divide content into sentences; construct sentence plans

4. Lexicalisation: map concepts and relations to lexemes (= words)

5. Referring expression generation: decide how to refer to objects

6. Linguistic realisation: put it all together in acceptable words and sentences

Modular structure of NLG systems (in theory!):

Content determination

Discourse planning

Sentence aggregation

Realisation

Lexicalisation

Referring expressions

TEXT PLANNER

REALISER

SENTENCE PLANNER/MICROPLANNER

Last week: Input to realisation

message-id: msg02

relation: C_DEPARTURE

departing-entity: C_CALEDON-EXPRESS

args: departure-location: C_ABERDEEN

departure-time: C_1000

departure-platform: C_7

Microplanning 1:Aggregation

• Distributing information over different sentences. Example:

a. The Caledonian express departs Aberdeen at 10:00, from platform 7

b. The Caledonian express departs Aberdeen at 10:00. The Caledonia express departs from platform 7

Microplanning 2: GRE

GRE = Generation of Referring Expressions

Explaining which objects you’re talking about

a. The Caledonian express departs Aberdeen at 10:00, from platform 7

b. The Caledonian express departs -- at 10:00. The train departs from this platform

Microplanning 3: lexical choice

Using different words for the same concept

a. The Caledonian express departs Aberdeen at ten o’clock, from platform 7

b. The Caledonian express departs Aberdeen at ten. The Caledonia express leaves from platform 7

In practice: tasks can be performed in different order

• Example: aggregation can be performedon messages:

message-id: msg02

relation: C_DEPARTURE_1




message-id: msg03

relation: C_DEPARTURE_2

args: departure-entity: C_CALEDON-EXPRESS

departure-platform: C_7

• Aggregation can also be performed later:

[The Caledonian express] departs Aberdeen [at 10:00] [from platform 7]

===> [The Caledonian express] departs Aberdeen

[at 10:00]. [The Caledonia express] departs [from platform 7]

Let’s focus on GRE, but ...

• A little detour: NLG systems do not always work as you’ve been told

• Some practically deployed systems combine “canned text” with NLG

• One possibility: system has a library of language “templates”, with gaps that need to be filled. E.g.,

[TRAIN] departs [TOWN] at [TIME]

[TRAIN] departs [TOWN] from [PLATFORM]

We apologise for the fact that [TRAIN] is delayed by [AMOUNT]

Gap filling: using canned text or GRE.

Question: which of the other tasks are still relevant?

Let’s move on to GRE

• Why/when is GRE useful?

1. The referent has a familiar name, but it’s not unique, e.g., ‘John Smith’

2. The referent has no familiar name: trains, furniture, trees, atomic particles, …

( Databases use keys, e.g.,

‘Smith$73527$’, ‘TRAIN-3821’ )

3. Similar: sets of objects

4. NL is too economical to have namesfor everything

Last week: Input to realisation

message-id: msg02





This week: more realistic input

message-id: msg02


departing-entity: C_34435

args: departure-location: .....

departure-time: .....

“the caledonian (express)”,

“the Aberdeen-Glasgow express’’

“the blue train on your left” , “the train”

• Communication is about saying the truth ...

• but that’s not all there is to it

• Paul Grice (around 1970): principles of rational, cooperative communication

• GRE, it a good case study. (R.Dale and E.Reiter, Cognitive Science, 1995)

Grice: maxims of conversation

• Quality: only say what you know to be true

• Quantity: give enough but not too much information

• Relevance: be relevant

• Manner: be clear and brief

(There is overlap between these four)

Maxims are two-edged sword:

1. They say how one should normally speak/write. Example:

“Yes, there’s a gasoline station around the corner” (when it’s no longer operational)

quality: yes, it’s truequantity: probably yesrelevance: no, not relevant to hearer’s intentionsmanner: it’s brief, clear, etc.

Maxims are two-edged sword:

2. They can also be exploited. Example:

Asked to write academic reference: “Kees always came to my lectures and he’s a nice guy”

quality: yes, it’s true (let’s assume)

quantity: No -- How about academic achievements?

relevance: yes

manner: yes

Application to GRE

Dale & Reiter: best description of an object fulfils the Gricean maxims. E.g.,

• (Quality:) list properties truthfully• (Quantity:) use properties that allow identification –

without containing more info• (Relevance:) use properties that are of interest in the

situation• (Manner:) be brief

D&R’s expectation:

• Violation of a maxim leads to implicatures.

• For example,– [Quantity] ‘the pitbull’ (when there is

only one dog).– [Manner] ‘Get the cordless drill that’s

in the toolbox’ (Appelt).

• There’s just one problem: …

people don’t always speak this way

For example,– [Manner] ‘the red chair’ (when there is

only one red object in the domain).

– [Manner/Quantity] ‘I broke my arm’ (when I have two).

General: empirical work shows much redundancy

Similar for other maxims, e.g.,– [Quality] ‘the man with the martini’ (Donellan)

Example Situation

a, £100 b, £150

c, £100 d, £150 e, £?Swedish Italian

Formalized in a KB

• Type: furniture (abcde), desk (ab), chair (cde)

• Origin: Sweden (ac), Italy (bde)

• Colours: dark (ade), light (bc), grey (a)

• Price: 100 (ac), 150 (bd) , 250 ({})

• Contains: wood ({}), metal (abcde), cotton(d)

Assumption: all this is shared knowledge.

Game

1. Describe object a.

2. Describe object e.

3. Describe object d.

Game

1. Describe object a: {desk,sweden}, {grey}

2. Describe object e: no solution

3. Describe object d: {Italy, 150}

Violations of …• Manner:

* ‘The £100 grey Swedish desk which is made of metal’

(Description of a)

• Relevance: ‘The cotton chair is a fire hazard?

?Then why not buy the Swedish chair?’ (Descriptions of d and c respectively)

• In fact, there is a second problem with Quantity/Manner. Consider the following formalization:

Full Brevity: Never use more than the minimal number of properties required for identification (Dale 1989)

An algorithm:

Dale 1989:

1. Check whether 1 property is enough

2. Check whether 2 properties is enough

….

Etc., until

success {minimal description is generated} or

failure {no description is possible}

Problem: exponential complexity

• Worst-case, this algorithm would have to inspect all combinations of properties. n properties combinations.

• Recall: one grain of rice on square one; twice as many on any subsequent square.

• Some algorithms may be faster, but …

• Theoretical result: algorithm must be exponential in the number of properties.

n2

• D&R conclude that Full Brevity cannot be achieved in practice.

• They designed an algorithm that only approximates Full Brevity:

the Incremental Algorithm.

Incremental Algorithm (informal):

• Properties are considered in a fixed order:

P =

• A property is included if it is ‘useful’:

true of target; false of some distractors

• Stop when done; so earlier properties have a greater chance of being included. (E.g., a perceptually salient property)

• Therefore called preference order.

nPPPP ,...,,, 321

• r = individual to be described

• P = list of properties, in preference order

• P is a property

• L= properties in generated description

(Recall: we’re not worried about realization today)

FailureReturn

LReturn then {r}C If

]][[C:C

}{L:L

do then ]][[ C &]][[r If

:do P allFor

Domain:C

Φ:L

P

P

PP

P

Back to the KB

• Type: furniture (abcde), desk (ab), chair (cde)

• Origin: Sweden (ac), Italy (bde)

• Colours: dark (ade), light (bc), grey (a)

• Price: 100 (ac), 150 (bd) , 250 ({})

• Contains: wood ({}), metal (abcde), cotton(d)

Assumption: all this is shared knowledge.

Back to our game

1. Describe object a.

2. Describe object e.

3. Describe object d.

Can you see room for improvement?

microplanning (sentence planning) part 1 kees van deemter

Documents

platform slide

train slide

caledonian express

aberdeen departuretime

departure departingentity

caledonexpress args

document authoring slide

gre gre