natural language processing (nlp)
DESCRIPTION
Natural Language Processing (NLP). Overview and history of the field Knowledge of language The role of ambiguity Models and Algorithms Eliza, Turing, and conversational agents History of speech and language processing Regular expressions. Computer Speech and Language Processing. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/1.jpg)
Natural Language Processing (NLP)
Overview and history of the field Knowledge of language The role of ambiguity Models and Algorithms Eliza, Turing, and conversational agents History of speech and language processing
Regular expressions
![Page 2: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/2.jpg)
Computer Speech and Language Processing
What is it?Getting computers to perform useful tasks involving human languages whether for:– Enabling human-machine communication– Improving human-human communication– Doing stuff with language objects
Examples:– Question Answering– Machine Translation– Spoken Conversational Agents
![Page 3: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/3.jpg)
Knowledge needed to build speaking computer
Speech recognition and synthesisDictionaries (how words are pronounced)Phonetics (how to recognize/produce each sound of English)
Natural language understandingKnowledge of the English words involved– What they mean– How they combine (what is a `pod bay door’?)
Knowledge of syntactic structure– I’m I do, Sorry that afraid Dave I’m can’t
![Page 4: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/4.jpg)
Eliza: Weizenbaum (1966)User: You are like my father in some ways.ELIZA: What resemblance do you seeUser: You are not very aggressive but I think you don’t want me to notice that.ELIZA: What makes you think I am not aggressiveUser: You don’t argue with meELIZA: Why do you think I don’t argue with youUser: You are afraid of meELIZA: Does it please you to believe I am afraid of you
![Page 5: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/5.jpg)
Ambiguity
Computational linguists are obsessed with ambiguityAmbiguity is a fundamental problem of computational linguisticsResolving ambiguity is a crucial goal
![Page 6: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/6.jpg)
Ambiguity
Find at least 5 meanings of this sentence:
I made her duck
![Page 7: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/7.jpg)
Ambiguity
Find at least 5 meanings of this sentence: I made her duck
I cooked waterfowl for her benefit (to eat)I cooked waterfowl belonging to herI created the (plaster?) duck she ownsI caused her to quickly lower her head or bodyI waved my magic wand and turned her into undifferentiated waterfowl
![Page 8: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/8.jpg)
Ambiguity is Pervasive
I caused her to quickly lower her head or bodyLexical category: “duck” can be a N or V
I cooked waterfowl belonging to her.Lexical category: “her” can be a possessive (“of her”) or dative (“for her”) pronoun
I made the (plaster) duck statue she ownsLexical Semantics: “make” can mean “create” or “cook”
![Page 9: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/9.jpg)
Ambiguity is Pervasive
Grammar: Make can be:Transitive: (verb has a noun direct object)– I cooked [waterfowl belonging to her]
Ditransitive: (verb has 2 noun objects)– I made [her] (into) [undifferentiated
waterfowl]Action-transitive (verb has a direct object and another verb)
- I caused [her] [to move her body]
![Page 10: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/10.jpg)
Ambiguity is Pervasive
Phonetics!I mate or duckI’m eight or duckEye maid; her duckAye mate, her duckI maid her duckI’m aid her duckI mate her duckI’m ate her duckI’m ate or duckI mate or duck
![Page 11: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/11.jpg)
Models and AlgorithmsModels: formalisms used to capture the various kinds of linguistic structure.
State machines (fsa, transducers, markov models)Formal rule systems (context-free grammars, feature systems)Logic (predicate calculus, inference)Probabilistic versions of all of these + others (gaussian mixture models, probabilistic relational models, etc etc)
Algorithms used to manipulate representations to create structure.
Search (A*, dynamic programming)Supervised learning, etc etc
![Page 12: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/12.jpg)
Language, Thought, Understanding
A Gedanken Experiment: Turing TestQuestion “can a machine think” is not operational.Operational version:
2 people and a computerInterrogator talks to contestant and computer via teletypeTask of machine is to convince interrogator it is humanTask of contestant is to convince interrogator she and not machine is human.
![Page 13: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/13.jpg)
History: foundational insights 1940s-1950s
Automaton:Turing 1936McCulloch-Pitts neuron (1943)– http://diwww.epfl.ch/mantra/tutorial/english/m
cpits/html/Kleene (1951/1956)Shannon (1948) link between automata and Markov modelsChomsky (1956)/Backus (1959)/Naur(1960): CFG
Probabilistic/Information-theoretic modelsShannon (1948)Bell Labs speech recognition (1952)
![Page 14: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/14.jpg)
History: the two camps: 1957-1970Symbolic
Zellig Harris 1958 TDAP first parser– Cascade of finite-state transducers
ChomskyAI workshop at Dartmouth (McCarthy, Minsky, Shannon, Rochester)Newell and Simon: Logic Theorist, General Problem Solver
StatisticalBledsoe and Browning (1959): Bayesian OCRMosteller and Wallace (1964): Bayesian authorship attributionDenes (1959): ASR combining grammar and acoustic probability
![Page 15: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/15.jpg)
Four paradigms: 1970-1983Stochastic
Hidden Markov Model 1972– Independent application of Baker (CMU) and Jelinek/Bahl/Mercer
lab (IBM) following work of Baum and colleagues at IDALogic-based
Colmerauer (1970,1975) Q-systemsDefinite Clause Grammars (Pereira and Warren 1980)Kay (1979) functional grammar, Bresnan and Kaplan (1982) unification
Natural language understandingWinograd (1972) ShrdluSchank and Abelson (1977) scripts, story understandingInfluence of case-role work of Fillmore (1968) via Simmons (1973), Schank.
Discourse ModelingGrosz and colleagues: discourse structure and focusPerrault and Allen (1980) BDI model
![Page 16: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/16.jpg)
Finite State Approach 83 - 93
Finite State ModelsKaplan and Kay (1981): Phonology/MorphologyChurch (1980): Syntax
Return of Probabilistic Models:Corpora created for language tasksEarly statistical versions of NLP applications (parsing, tagging, machine translation)Increased focus on methodological rigor:– Can’t test your hypothesis on the data you used
to build it!– Training sets and test sets
![Page 17: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/17.jpg)
The field comes together: 1994-2007
NLP has borrowed statistical modeling from speech recognition, is now standard:
ACL conference:– 1990: 39 articles 1 statistical– 2003 62 articles 48 statistical
Machine learning techniques key
NLP has borrowed focus on web and search and “bag of words models” from information retrievalUnified field:
NLP, MT, ASR, TTS, Dialog, IR
![Page 18: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/18.jpg)
Regular expressions
A formal language for specifying text stringsHow can we search for any of these?
woodchuckwoodchucksWoodchuckWoodchucks
![Page 19: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/19.jpg)
Regular ExpressionsBasic regular expression patternsPerl-based syntax (slightly different from other notations for regular expressions)Disjunctions /[wW]oodchuck/
![Page 20: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/20.jpg)
Regular ExpressionsRanges [A-Z]
• Negations [^Ss]
![Page 21: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/21.jpg)
Regular Expressions
Optional characters ? ,* and +? (0 or 1) – /colou?r/ color or colour
* (0 or more)– /oo*h!/ oh! or Ooh! or Ooooh!
– + (1 or more) • /o+h!/ oh! or Ooh! or Ooooh!
Wild cards .- /beg.n/ begin or began or begun
![Page 22: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/22.jpg)
Regular ExpressionsAnchors ^ and $
/^[A-Z]/ “Ramallah, Palestine”/^[^A-Z]/ “¿verdad?” “really?”/\.$/ “It is over.”/.$/ ?
Boundaries \b and \B/\bon\b/ “on my way” “Monday”/\Bon\b/ “automaton”
Disjunction |/yours|mine/ “it is either yours or mine”
![Page 23: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/23.jpg)
Disjunction, Grouping, Precedence
Column 1 Column 2 Column 3 …How do we express this?/Column [0-9]+ *//(Column [0-9]+ +)*/
PrecedenceParenthesis ()Counters * + ? {}Sequences and anchors the ^my end$Disjunction |
![Page 24: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/24.jpg)
ExampleFind me all instances of the word “the” in a text.
/the/Misses capitalized examples
/[tT]he/–Returns other or theology
/\b[tT]he\b//[^a-zA-Z][tT]he[^a-zA-Z]//(^|[^a-zA-Z])[tT]he[^a-zA-Z]/
![Page 25: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/25.jpg)
Errors
The process we just went through was based on fixing two kinds of errors
Matching strings that we should not have matched (there, then, other)– False positives
Not matching things that we should have matched (The)– False negatives
![Page 26: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/26.jpg)
More complex RE example
Regular expressions for prices/$[0-9]+/
Doesn’t deal with fractions of dollars/$[0-9]+\.[0-9][0-9]/
Doesn’t allow $199, not word-aligned\b$[0-9]+(\.[0-9]0-9])?\b)
![Page 27: Natural Language Processing (NLP)](https://reader033.vdocuments.us/reader033/viewer/2022061504/5681605b550346895dcf8601/html5/thumbnails/27.jpg)
Advanced operators