74.406 natural language processing christel kemke department of computer science university of...

52
74.406 Natural Language Processing Christel Kemke Department of Computer Science University of Manitoba 74.406 Natural Language Processing, 1st term 2004/5

Post on 18-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

74.406 Natural Language Processing

Christel Kemke

Department of Computer Science

University of Manitoba

74.406 Natural Language Processing, 1st term 2004/5

Evolution of Human Language

• communication for "work"

• social interaction

• basis of cognition and thinking

(Whorff & Saphir)

Communication

"Communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs."

[Russell & Norvig, p.651]

Natural Language - General

Natural Language is characterized by a common or shared set of signs

alphabeth; lexicon a systematic procedure to produce

combinations of signs syntax

a shared meaning of signs and combinations of signs (constructive) semantics

Natural Language Processing Overview

• Speech Recognition

• Natural Language Processing

• Syntax

• Semantics

• Pragmatics

• Spoken Language

Natural Language and Speech

Speech Recognition acoustic signal as input conversion into phonemes and written words

Natural Language Processing written text as input; sentences (or 'utterances') syntactic analysis: parsing; grammar semantic analysis: "meaning", semantic representation pragmatics: dialogue; discourse; metaphors

Spoken Language Processing transcribed utterances Phenomena of spontaneous speech

Words

MorphologyA morphological analyzer determines (at least) the stem + ending of a word, and usually delivers related information, like the word class, the number and the person of the word. The morphology can be part of the lexicon or implemented as a single component, for example as a rule-based system.

eats eat + s verb, singular, 3rd pers

dog dog noun, singular

LexiconThe Lexicon contains information on words, as inflected forms (e.g. goes, eats) or word-stems (e.g. go, eat).

The Lexicon usually assigns a syntactic category, the word class or Part-of-Speech category

Sometimes also further syntactic information (see Morphology); semantic information (e.g. semantic classifications like ‘agent’); syntactic-semantic information, e.g. on verb complements like ‘give’ requires a direct object.

Lexicon

Example contents:

eats verb; singular, 3rd person;

can have direct object

dog dog, noun, singular; animal

semantic annotation

POS (Part-of-Speech) Tagging

POS Tagging determines word class or ‘part-of-speech’ category (basic syntactic categories) of single words or word-stems.

The det (determiner)

dog noun

eat, eats verb (3rd singular)

the det

bone noun

MorphologicalAnalyzer

Lexicon

Part-of-Speech(POS)

Tagging

GrammarRules

Parser

NLP - Syntactic Analysis

eat + s eat – verb Verb VP → Verb Noun VP recognized

3rd sing VP

Verb Noun

parse tree

Syntax

Language and Grammar

Natural Language described as Formal Language L using a Formal Grammar G:

• start-symbol S ≡ sentence• non-terminals NT ≡ syntactic constituents• terminals T ≡ lexical entries/ words• production rules P ≡ grammar rules

Generate sentences or recognize sentences (Parsing) of the language L through the application of grammar rules from G.

Overgeneration / undergeneration: accept/generate sentences not in L / not all sentences from L.

Grammar

• Terminals can be words, part-of-speech categories, or more complex lexical items (including additional syntactic/semantic information related to the word).– dog– noun– dog: noun, singular; animal

• Non-Terminals represent (higher level) ‘syntactic categories’.– noun– NP (noun phrase)– S (sentence)

Grammar

Most often we deal with Context-free Grammars, with a distinguished Start-symbol S (sentence).

det the

noun dog | bone

verb eat | eats

NP det noun (NP noun phrase)

VP verb (VP verb phrase)

VP verb NP

S NP VP (S sentence)

Here, POS Tagging is included in the grammar.

Parsing (here: LR, bottom-up)

Determine the syntactic structure of the sentence:

“the dog eats the bone”

the det POS Tagging

dog noun

det noun NP Rule application

eats verb

the det

bone noun

det noun NP

verb NP VP

NP VP S

Syntax Analysis / Parsing

Syntactic Structure often represented as Parse Tree.

Connect symbols according to applied grammar rules (like Rewrite Systems).

Parse Tree

det noun

NP

verb NP

VP

NP VP

S

Lexical Ambiguity

Several word senses or word categories

e.g. chase – noun or verb

e.g. plant - ????

Syntactic Ambiguity

Several parse trees:

1) “The dog eats the bone in the park.”

2) “The dog eats the bone in the package.”

Who/what is in the park and who/what is in the package?

Syntactically speaking:

How do I bind the Prepositional Phrase "in the ..." ?

Semantics

Semantic Representation

Represent the meaning of a sentence.Generate, e.g.• a logic-based representation or • a frame-based representation

Fillmore’s case frames

based on the syntactic structure, lexical entries, and particularly the head-verb, which determines how to arrange parts of the sentence and relate them to each other in the semantic representation.

Semantic Representation

Verb-centered representation:

Verb (action, head) is regarded as center of verbal expression and determines the case frame with possible case roles; other parts of the sentence are described in relation to the action as fillers of case slots. (cf. also Schank’s CD Theory)

Typing of case roles is possible (e.g. 'agent' refers to a specific sort or concept, like “humans”)

General Frame for eat

Agent: animate

Action: eat

Patiens: food

Manner: {e.g. fast}

Location: {e.g. in the yard}

Time: {e.g. at noon}

Frame with fillers for sample sentence

Agent: the dog

Action: eat

Patiens: the bone / the bone in the package

Location: in the park

General Frame for drive Frame with fillers

Agent: animate Agent: she

Action: drive Action: drives

Patiens: vehicle Patiens: the convertible

Manner:{the way it is done} Manner: fast

Location: Location-spec Location: [in the] Rocky Mountains

Source: Location-spec Source: [from] home

Destination: Location-spec Destination: [to the] ASIC conference

Time: Time-spec Time: [in the] summer holiday

Pragmatics

Pragmatics

Pragmatics includes context-related aspects of NL expressions (utterances).

These are in particular anaphoric references, elliptic expressions, deictic expressions, …

anaphoric references – refer to items mentioned before

deictic expressions – simulate pointing gestures

elliptic expressions – incomplete expression;

relate to item mentioned before

Pragmatics

“I put the box on the top shelve.”

“I know that. But I can’t find it there.”

deictic expressionanaphoric reference

Pragmatics

“I put the box on the top shelve.”

“I know that. But I can’t find it there.”

anaphoric reference

Pragmatics

“I put the box on the top shelve.”

“I know that. But I can’t find it there.”

deictic expression

Pragmatics

“I put the box on the top shelve.”

“I know that. But I can’t find it there.”

“The candy-box?”

elliptic expression

deictic expressionanaphoric reference

Pragmatics

“I know that. But I can’t find it there.”

“The candy-box?”

elliptic expression

Intentions

One philosophical assumption is that natural language is used to achieve something:

“Do things with words.”

The meaning of an utterance is essentially determined by the intention of the speaker.

Intentionality - Examples

What was said: What was meant:

“There is a terrible "Can you please draft here.” close the window."

“How does it look "I am really mad; here?” clean up your room."

"Will this ever end?" "I would prefer to bewith my friends than to sit in class now."

Metaphors

The meaning of a sentence or expression is not directly inferable from the sentence structure and the word meanings. Metaphors transfer concepts and relations from one area of discourse into another area, for example, seeing time as a line (in space) or seeing friendship / life as a journey.

Metaphors - Examples

“This car eats a lot of gas.”

“She devoured the book.”

“He was tied up with his clients.”

“Marriage is like a journey.”

“Their marriage was a one-way road into hell.”

(see also George Lakoff, e.g. Women, Fire and Dangerous Things)

Dialogue and Discourse

Discourse / Dialogue Structure

Grammar for various sentence types (speech acts): dialogue, discourse, story grammar

Distinguish questions, commands, and statements: Where is the remote-control? Bring the remote-control! The remote-control is on the brown table.

Dialogue Grammars describe possible sequences of Speech Acts in communication, e.g. that a question is followed by an answer/statement.

Similar for Discourse (like continuous texts).

Speech

Speech Processing SystemsTypes and Characteristics

Speech Recognition vs. Speaker Recognition (Voice Recognition; Speaker Identification )

speaker-dependent vs. speaker-independent training? unlimited vs. large vs. small vocabulary single word vs. continuous speech

Speech Recognition Phases

• acoustic signal as input

• signal analysis - spectrogram

• feature extraction

• phoneme recognition

• word recognition

• conversion into written words

Speech Recognizer Architecture

Video of glottis and speech signal in lingWAVES (from http://www.lingcom.de)

Spoken Language

Spoken Language

Output of Speech Recognition System as input "text".

Can be associated with probabilities for different word sequences.

Contains ungrammatical structures, so-called "disfluencies", e.g. repetitions and corrections.

Spoken Language - Examples

1. no [s-] straight southwest

2. right to [my] my left

3. [that is] that is correct

Robin J. Lickley. HCRC Disfluency Coding Manual. http://www.ling.ed.ac.uk/~robin/maptask/HCRCdsm-01.html

Spoken Language - Disfluency

Reparandum and Repair

Reparandum Repair

[come to] ... walk right to [the] ... the right-hand side of the page

Spoken Language - Example

1. we're going to [g-- ]... turn straight back around

for testing.

2. [come to] ... walk right to the ... right-hand side of the page.

3. right [up ... past] ... up on the left of the ... white mountain walk ... right up past.

4. [i'm still] ... i've still gone halfway back round the lake again.

Spoken Language - Example

1. [I’d] [d if] I need to go

2. [it’s basi--] see if you go over the old mill

3. [you are going] make a gradual slope … to your right

4. [I’ve got one] I don’t realize why it is there