introduction to language technology potsdam, 12 april 2012 · introduction to language technology...

Post on 15-May-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Natural Language ProcessingIntroduction to Language Technology

Potsdam, 12 April 2012

Saeedeh MomtaziInformation Systems Group

Outline

1 Administrative Information

2 Introduction

3 NLP Applications

4 NLP Techniques

5 Linguistic Knowledge

6 Challenges

7 Course Materials

Saeedeh Momtazi | NLP | 12.04.2012

2

Outline

1 Administrative Information

2 Introduction

3 NLP Applications

4 NLP Techniques

5 Linguistic Knowledge

6 Challenges

7 Course Materials

Saeedeh Momtazi | NLP | 12.04.2012

3

NLP Course

LectureThursdays 9:15-10:45Location: HS 3Credit Point: 3 LP

AssessmentRegular attendance in the classDoing exercisesFinal exam

ContactSaeedeh Momtazi

Email: saeedeh.momtazi@hpi.uni-potsdam.deOffice: A-1.7

Saeedeh Momtazi | NLP | 12.04.2012

4

NLP Course

Course Home Pagehttp://www.hpi.uni-potsdam.de/naumann/teaching/ss_12/natural_language_processing.html

Administrative informationSlidesExercises

Mailing ListWill be set later

Saeedeh Momtazi | NLP | 12.04.2012

5

Outline

1 Administrative Information

2 Introduction

3 NLP Applications

4 NLP Techniques

5 Linguistic Knowledge

6 Challenges

7 Course Materials

Saeedeh Momtazi | NLP | 12.04.2012

6

Different Types of Languages

Natural languagesEnglishGermanFrenchSpanish...

Formal languagesJavaPythonLaTeX...

Descriptive languagesBiology: DNAChemistry: chemical formulas...

Saeedeh Momtazi | NLP | 12.04.2012

7

Natural Language

A vocabulary consists of a set of words (wi )

A text is composed of a sequence of words from the vocabulary

A language is constructed of a set of all possible texts

Saeedeh Momtazi | NLP | 12.04.2012

8

Natural Language

Examples of Vocabularies

Englishtheandeatyoubook...

Germandasundessendubuch...

Saeedeh Momtazi | NLP | 12.04.2012

9

Outline1 Administrative Information

2 Introduction

3 NLP ApplicationsText TechnologiesSpeech Technologies

4 NLP Techniques

5 Linguistic Knowledge

6 Challenges

7 Course Materials

Saeedeh Momtazi | NLP | 12.04.2012

10

Spell and Grammar Checking

Checking the spelling and the grammar of a text,and suggesting correct alternatives for the errors

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

–––

Saeedeh Momtazi | NLP | 12.04.2012

11

Spell and Grammar Checking

Saeedeh Momtazi | NLP | 12.04.2012

12

Text Categorization

Assigning each text to a category

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

Saeedeh Momtazi | NLP | 12.04.2012

13

Text Categorization

Saeedeh Momtazi | NLP | 12.04.2012

14

Information Retrieval

Finding relevant information to the user’s query

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

Saeedeh Momtazi | NLP | 12.04.2012

15

Information Retrieval

Saeedeh Momtazi | NLP | 12.04.2012

16

Summarization

Finding the most relevant part of a document based on theuser’s information need

---------------------------------------------------------------------------------------------------------

--------------------------------

Saeedeh Momtazi | NLP | 12.04.2012

17

Summarization

Saeedeh Momtazi | NLP | 12.04.2012

18

Information Extraction

Extracting the important items of a text and structuring them

---------------------------------------------------------------------------------------------------------

...

-------------

-------------

-------------

Saeedeh Momtazi | NLP | 12.04.2012

19

Information Extraction

Saeedeh Momtazi | NLP | 12.04.2012

20

Information Extraction

Saeedeh Momtazi | NLP | 12.04.2012

21

Question Answering

Answering natural language questions asked by the user

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

----------

Saeedeh Momtazi | NLP | 12.04.2012

22

Question Answering

Saeedeh Momtazi | NLP | 12.04.2012

23

Question Answering

Crete

Saeedeh Momtazi | NLP | 12.04.2012

24

Machine Translation

Translating a text from one language to another language

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

Saeedeh Momtazi | NLP | 12.04.2012

25

Machine Translation

Saeedeh Momtazi | NLP | 12.04.2012

26

Data Fusion

Combining extracted information from several text files into adatabase or an ontology

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------

Saeedeh Momtazi | NLP | 12.04.2012

27

Data Fusion

Saeedeh Momtazi | NLP | 12.04.2012

28

Sentiment Analysis

Identifying positive and negative opinions stated in a text

---------------------------------------------------------------------------------------------------------

Saeedeh Momtazi | NLP | 12.04.2012

29

Sentiment Analysis

Saeedeh Momtazi | NLP | 12.04.2012

30

Optical Character Recognition

Recognizing printed or handwritten texts and converting themto computer-readable texts

---------------------------------------------------------------------------------------------------------

Saeedeh Momtazi | NLP | 12.04.2012

31

Optical Character Recognition

Saeedeh Momtazi | NLP | 12.04.2012

32

Word Prediction

Predicting the next word that is highly probable to be typed bythe user

---------------------------------------------------------------------------------------------------------

----------------------

---------------

Saeedeh Momtazi | NLP | 12.04.2012

33

Word Prediction

Saeedeh Momtazi | NLP | 12.04.2012

34

Outline1 Administrative Information

2 Introduction

3 NLP ApplicationsText TechnologiesSpeech Technologies

4 NLP Techniques

5 Linguistic Knowledge

6 Challenges

7 Course Materials

Saeedeh Momtazi | NLP | 12.04.2012

35

Speech Recognition

Recognizing a spoken language and transforming it into a text

---------------------------------------------------------------------------------------------------------

Saeedeh Momtazi | NLP | 12.04.2012

36

Speech Recognition

Saeedeh Momtazi | NLP | 12.04.2012

37

Speech Synthesis

Producing a spoken language from a text

---------------------------------------------------------------------------------------------------------

Saeedeh Momtazi | NLP | 12.04.2012

38

Speech Synthesis

Saeedeh Momtazi | NLP | 12.04.2012

39

Spoken Dialog Systems

Running a dialog between the user and the system

Saeedeh Momtazi | NLP | 12.04.2012

40

Spoken Dialog Systems

Saeedeh Momtazi | NLP | 12.04.2012

41

Applications’ Levels

Easy (mostly solved)Spell and grammar checkingSpam detection

Intermediate (good progress)Information retrievalSentiment analysisMachine translationInformation extraction

Difficult (still hard)Question answeringSummarizationDialog system

Saeedeh Momtazi | NLP | 12.04.2012

42

Outline

1 Administrative Information

2 Introduction

3 NLP Applications

4 NLP Techniques

5 Linguistic Knowledge

6 Challenges

7 Course Materials

Saeedeh Momtazi | NLP | 12.04.2012

43

Part Of Speech Tagging

“I saw the man on the roof.”

“ I[PRON] saw[V ] the[DET ] man[N] on[PREP] the[DET ] roof[N]. ”

[PRON] Pronoun[PREP] Preposition[DET] Determiner[V] Verb[N] Noun

Saeedeh Momtazi | NLP | 12.04.2012

44

Parsing

“I saw the man on the roof.”

Saeedeh Momtazi | NLP | 12.04.2012

45

Named Entity Recognition

“Steven Paul Jobs, co-founder of Apple Inc, was born in California.”

“ Steven Paul JobsPerson

, co-founder of Apple IncOrganization

, was born in CaliforniaLocation

.”

Saeedeh Momtazi | NLP | 12.04.2012

46

Word Sense Disambiguation

“Jim flew his plane to Texas.”

“Alice destroys the item with a plane.”

Saeedeh Momtazi | NLP | 12.04.2012

47

Semantic Role Labeling

“John grills a fish on an open fire.”

CookFood

Heating-Instrument

“ JohnCook

grills a fishFood

on an open fireHeating−Instrument

.”

Saeedeh Momtazi | NLP | 12.04.2012

48

Outline

1 Administrative Information

2 Introduction

3 NLP Applications

4 NLP Techniques

5 Linguistic Knowledge

6 Challenges

7 Course Materials

Saeedeh Momtazi | NLP | 12.04.2012

49

Linguistic Knowledge

Phonetics and PhonologyMorphologySyntaxSemanticsPragmaticsDiscourse

Saeedeh Momtazi | NLP | 12.04.2012

50

Phonetics and Phonology

The study of linguistic sounds and their relations to words

Saeedeh Momtazi | NLP | 12.04.2012

51

Morphology

The study of internal structures of words and how they can bemodified

Parsing complex words into their components

unbelievableun+believ+able

prefix: unroot: believesufix: able

Saeedeh Momtazi | NLP | 12.04.2012

52

Syntax

The study of the structural relationships between words in asentence

I saw the man on the roof.

Saeedeh Momtazi | NLP | 12.04.2012

53

Semantics

The study of the meaning of words, and how these combine toform the meanings of sentences

Realizing lexical relations among words

Hyponymy (is a) dog & animal

Meronymy (part of) arm & body

Synonymy fall & autumn

Antonymy tall & short

Saeedeh Momtazi | NLP | 12.04.2012

54

Pragmatics

The study of how language is used to accomplish goals, andthe influence of context on meaning

Understanding the aspects of a language which depends onsituation and world knowledge

Do you know what time it is?

Saeedeh Momtazi | NLP | 12.04.2012

55

Discourse

The study of linguistic units larger than a single utterance

John reads a book. He borrowed it from his friend.

Saeedeh Momtazi | NLP | 12.04.2012

56

Outline

1 Administrative Information

2 Introduction

3 NLP Applications

4 NLP Techniques

5 Linguistic Knowledge

6 Challenges

7 Course Materials

Saeedeh Momtazi | NLP | 12.04.2012

57

Challenges

Different words/sentences express the same meaningThe third season of the year

FallAutumn

Book delivery timeWhen will my book arrive?When will I receive my book?

One word/sentence can have different meaningsFall

The third season of the yearMoving down towards the ground or towards a lower position

The door is open.Expressing a factA request to close the door

Paraphrasing

Ambiguity

Saeedeh Momtazi | NLP | 12.04.2012

58

Challenges

Moving down?The third season of the year?

What about autumn?

Saeedeh Momtazi | NLP | 12.04.2012

59

Phonetics and Phonology

Computers can recognize speech

Computers can wreck a nice peach

Saeedeh Momtazi | NLP | 12.04.2012

60

Morphology

Unionize

Union + ize

Un + ion + ize

Saeedeh Momtazi | NLP | 12.04.2012

61

Syntax

I saw the man with a telescope.

Saeedeh Momtazi | NLP | 12.04.2012

62

Syntax

I saw the man with a telescope.

Saeedeh Momtazi | NLP | 12.04.2012

63

Semantics

The astronomer loves the star.

Movie Star (Celebrity)

Sky Star

Saeedeh Momtazi | NLP | 12.04.2012

64

Semantics

Every man loves a woman.

∀x : (man(x) → ∃y : (woman(y) & love(x , y)))

∃y : (woman(y) & ∀x : (man(x) → love(x , y)))

Saeedeh Momtazi | NLP | 12.04.2012

65

Pragmatics

Can you give me the salt?

Would you please give me the salt?

Do you have the ability to give me the salt?

Saeedeh Momtazi | NLP | 12.04.2012

66

Discourse

Alice understands that you like your mother, but she ...

Alice

Your Mother

Saeedeh Momtazi | NLP | 12.04.2012

67

Ambiguity

I made her duck.

Saeedeh Momtazi | NLP | 12.04.2012

68

Outline

1 Administrative Information

2 Introduction

3 NLP Applications

4 NLP Techniques

5 Linguistic Knowledge

6 Challenges

7 Course Materials

Saeedeh Momtazi | NLP | 12.04.2012

69

Topics

IntroductionIntroduction to Language TechnologyLanguage Modeling

Machine Learning for NLP

NLP Techniques

NLP Applications

Saeedeh Momtazi | NLP | 12.04.2012

70

Topics

Introduction

Machine Learning for NLPLearning TechniquesClassification AlgorithmsClustering Algorithms

NLP Techniques

NLP Applications

Saeedeh Momtazi | NLP | 12.04.2012

71

Topics

Introduction

Machine Learning for NLP

NLP TechniquesPart Of Speech TaggingParsingNamed Entity RecognitionWord RelationsWord Sense DisambiguationSemantic Role Labeling

NLP Applications

Saeedeh Momtazi | NLP | 12.04.2012

72

Topics

Introduction

Machine Learning for NLP

NLP Techniques

NLP ApplicationsText CategorizationInformation RetrievalInformation ExtractionQuestion AnsweringSentiment AnalysisSummarizationMachine Translation

Saeedeh Momtazi | NLP | 12.04.2012

73

Course Book

SPEECH AND LANGUAGE PROCESSING

An Introduction to Natural Language Processing,Computational Linguistics, and Speech Recognition

by Daniel Jurafsky and James H. Martin

Second Edition

Saeedeh Momtazi | NLP | 12.04.2012

74

Relevance Journal & Conferences

Journal

Computational Linguistics

Conferences

ACL: Association for Computational LinguisticsNAACL: North American ChapterEACL: European Chapter

HLT: Human Language Technology

EMNLP: Empirical Methods on Natural Language Processing

CoLing: Computational Linguistics

LREC: Language Resources and Evaluation

Saeedeh Momtazi | NLP | 12.04.2012

75

top related