oct 2009hlt1 human language technology overview. oct 2009hlt2 acknowledgement material for some of...

34
Oct 2009 HLT 1 Human Language Technology Overview

Upload: aron-lucas

Post on 17-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Oct 2009 HLT 1

Human Language Technology

Overview

Oct 2009 HLT 2

Acknowledgement

Material for some of these slides taken from J Nivre, University of Gotheborg, Sweden D. Jurafsky & J. Martin

Oct 2009 HLT 3

Human Language Technology

HLT sometimes referred to as Natural Language Processing

focus on linguistic processing Computational Linguistics

focus on understanding language Language Engineering

focus on practical tasks and results

Oct 2009 HLT 4

HLT – Engineering v. Science

Engineering NLP is concerned with the design and

implementation of effective NL input and output components for computational systems (Robert Dale 2000)

Science The use of computers for linguistic research and

applications

Oct 2009 HLT 5

HLT is Interdisciplinary

Linguistics Theoretical Applied

Computer Science Algorithms Compiling Techniques

Artificial Intelligence Understanding, reasoning Intelligent Action

Oct 2009 HLT 6

HLT is Commercial

Lot’s of exciting stuff going on…Powerset

Oct 2009 HLT 7

Google Translate

Oct 2009 HLT 8

Google Translate

Oct 2009 HLT 9

Web Q/A

Oct 2009 HLT 10

Web Analytics Data-mining of social media

weblogs, discussion forums, message boards, user groups, and other forms of user generated media

Sentiment analysis, social network analysis Product marketing information Opinion tracking over space and time Social network analysis Buzz analysis (what’s hot, what topics are people

talking about right now).

Oct 2009 HLT 11

HLT can help with

Understanding how language works by implementing complex theories directly

More Natural Communication development of multimodal M/M communication:

language, speech, gesture Development of multilingual applications

Knowledge Management Language is the fabric of the web

Oct 2009 HLT 12

Language Enabled Applications

What makes an application a language processing application (as opposed to any other piece of software)? An application that requires the use of

knowledge about human languages

Example: Is Unix wc (word count) an example of a language processing application?

Oct 2009 HLT 13

Language Enabled Applications

Word count? When it counts words: Yes

To count words you need to know what a word is. That’s knowledge of language.

When it counts lines and bytes: No Lines and bytes are computer artifacts, not linguistic

entities

Oct 2009 HLT 14

Topics: Applications

Small Spelling correction Hyphenation

Medium Word-sense disambiguation Named entity recognition Information retrieval

Big Question answering Conversational agents Automatic Summarisation Machine translation

Stand-alone

Enabling applications

Funding/Business plans

Oct 2009 HLT 15

Big Applications

These kinds of applications require a tremendous amount of knowledge of language.

Consider the following interaction with HAL the computer from 2001: A Space Odyssey

Oct 2009 HLT 16

HAL from 2001

Dave: Open the pod bay doors, Hal. HAL: I’m sorry Dave, I’m afraid I can’t do

that. http://www.youtube.com/watch?v

=kkyUMmNl4hk

Oct 2009 HLT 17

What’s needed?

Speech recognition and synthesis Knowledge of the English words involved

What they mean How groups of words fit together into

groups What the groups mean

How the groups relate to each other.

Oct 2009 HLT 18

What’s needed?

Dialog It is polite to respond, even if you’re planning to

kill someone. It is polite to pretend to want to be cooperative

(I’m afraid, I can’t…)

Oct 2009 HLT 19

Summary of Application Areas Document Processing

Classification Summarisation Information Extraction

Question Answering Information Retrieval Dialogue

Multilinguality Machine Translation Translation tools

Multimodality speech intonation image

Oct 2009 HLT 20

Basic Problems

Analysis Conversion of NL input to internal representations

Generation Conversion of internal representations to NL output

Issues What kind of input/output/representations? Role of learning

Supervised v unsupervised What training data is available?

System Evaluation

Oct 2009 HLT 21

Levels of Linguistic Knowledge

Phonetics/Phonology: sound structure Morphology: word structure Syntax: sentence structure Semantics: meanings Pragmatics: use of language in context Discourse: paragraphs, texts, dialogues

Oct 2009 HLT 22

Processing Pipelines

Each level of knowledge is associated with an encapsulated set of processes.

Interfaces are defined that allow the various levels to communicate.

This often leads to a pipeline architecture.

Oct 2009 HLT 23

Ambiguity

Computational linguists are obsessed with ambiguity

Ambiguity is a fundamental problem of computational linguistics

Resolving ambiguity is a crucial goal Ambiguity arises at different levels of analysis

Oct 2009 HLT 24

Ambiguity – different flavours

LexicalI made her duck

SyntacticYoung men and women

ReferentialShe did it

PragmaticCan you pass the salt?

Oct 2009 HLT 25

Ambiguity

Find at least 5 meanings of this sentence: I made her duck

I cooked waterfowl for her benefit (to eat) I cooked waterfowl belonging to her I created the (plaster?) duck she owns I caused her to quickly lower her head or body I waved my magic wand and turned her into

undifferentiated waterfowl

Oct 2009 HLT 26

Ambiguity is Pervasive

I made her duck

I caused her to quickly lower her head or body Lexical category: “duck” can be a N or V

I cooked waterfowl belonging to her. Lexical category: “her” can be a possessive (“of her”)

or dative (“for her”) pronoun I made the (plaster) duck statue she owns

Lexical semantics: “make” can mean “create” or “cook”

Oct 2009 HLT 27

Ambiguity is Pervasive

Grammar: Make can be: Transitive: (verb has a noun direct object)

I cooked [waterfowl belonging to her] Ditransitive: (verb has 2 noun objects)

I made [her] (into) [undifferentiated waterfowl]

Action-transitive (verb has a direct object and another verb)

I caused [her] [to move her body]

Oct 2009 HLT 28

Ambiguity is Pervasive

Phonetics! I mate or duck I’m eight or duck Eye maid; her duck Aye mate, her duck I maid her duck I’m aid her duck I mate her duck I’m ate her duck I’m ate or duck I mate or duck

Oct 2009 HLT 29

Dealing with Ambiguity

Four possible approaches:1. Tightly coupled interaction among

processing levels; knowledge from other levels can help decide among choices at ambiguous levels.

2. Pipeline processing that ignores ambiguity as it occurs and hopes that other levels can eliminate incorrect structures.

Oct 2009 HLT 30

Dealing with Ambiguity

3. Probabilistic approaches based on making the most likely choices

4. Don’t do anything, maybe it won’t matter1. We’ll leave when the duck is ready to eat.

2. The duck is ready to eat now. Does the “duck” ambiguity matter with respect to whether

we can leave?

Oct 2009 HLT 31

Ways of Studying NLP

By ApplicationMT, IE, IR etc.

By Approachrational vs. empirical

By Linguistic Levelmorphology, syntax etc.

By Algorithm

Oct 2009 HLT 32

Algorithms

State Machines automata and transducers

Rule Systems regular and context free grammars

Search top-down/bottom-up parsing

Probabilistic algorithms

Oct 2009 HLT 33

Organisation of Course

Module 1: Words Linguistics: Morphological Structure Morphological Processing LAB + Assignment I

Module 2: Sentences Linguistics: Syntactic Structure NL Parsing Algorithms LAB + Assignment II

Module 3: Texts Statistics Text Classification LAB + Assignment III

Oct 2009 HLT 34

Course Information

Course Website http://staff.um.edu.mt/mros1/hlt

Reference Texts D. Jurafsky and J. Martin, Speech and Language

Processing, 2nd Edition, Prentice-Hall S. Bird, E. Klein and E. Loper, Natural Language

Processing with Pythonhttp://www.nltk.org

Thank you