![Page 1: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/1.jpg)
CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI
Programming (Lecture 2– Introduction+ML and NLP)
Pushpak BhattacharyyaCSE Dept., IIT Bombay
![Page 2: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/2.jpg)
Persons involved
Faculty instructor: Dr. Pushpak Bhattacharyya (www.cse.iitb.ac.in/~pb) Areas of Expertise: Natural Language
Processing, Machine Learning TAs: Prithviraj (prithviraj@cse) and Debraj
(debraj@cse) Course home page (to be created)
www.cse.iitb.ac.in/~cs626-449-2009 Mirrored at www.cse.iitb.ac.in/~pb/cs626-
449-2009
![Page 3: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/3.jpg)
Time and Venue
Slot-3 Old CSE: S9 (top floor) Mo- 10.30, Tu- 11.30, Th- 8.30
![Page 4: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/4.jpg)
Perpectivising NLP: Areas of AI and their inter-dependencies
Search
Vision
PlanningMachine Learning
Knowledge RepresentationLogic
Expert SystemsRoboticsNLP
AI is the forcing function for Computer Science
![Page 5: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/5.jpg)
Stages of language processing
Phonetics and phonology Morphology Lexical Analysis Syntactic Analysis Semantic Analysis Pragmatics Discourse
![Page 6: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/6.jpg)
Two Views of NLP
1. Classical View: Layered Procssing;Various Ambiguities (already discussed)
2. Statistical/Machine Learning View
![Page 7: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/7.jpg)
Uncertainty in classification: Ambiguity
Visiting aunts can be a nuisance Visiting:
adjective or gerund (POS tag ambiguity) Role of aunt:
agent of visit (aunts are visitors) object of visit (aunts are being visited)
Minimize uncertainty of classification with cues from the sentence
![Page 8: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/8.jpg)
What cues? Position with respect to the verb:
France to the left of beat and Brazil to the right: agent-object role marking (English)
Case marking: France ne (Hindi); ne (Marathi): agent role Brazil ko (Hindi); laa (Marathi): object role
Morphology: haraayaa (hindi); haravlaa (Marathi): verb POS tag as indicated by the distinctive
suffixes
![Page 9: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/9.jpg)
Cues are like attribute-value pairs prompting machine learning from NL data
Constituent ML tasks Goal: classification or clustering Features/attributes (word position, morphology,
word label etc.) Values of features Training data (corpus: annotated or un-
annotated) Test data (test corpus) Accuracy of decision (precision, recall, F-value,
MAP etc.) Test of significance (sample space to
generality)
![Page 10: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/10.jpg)
What is the output of an ML-NLP System (1/2)
Option 1: A set of rules, e.g., If the word to the left of the verb is a noun
and has animacy feature, then it is the likely agent of the action denoted by the verb.
The child broke the toy (child is the agent)
The window broke (window is not the agent; inanimate)
![Page 11: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/11.jpg)
What is the output of an ML-NLP System (2/2)
Option 2: a set of probability values P(agent|word is to the left of verb and has
animacy) > P(object|word is to the left of verb and has animacy)> P(instrument|word is to the left of verb and has animacy) etc.
![Page 12: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/12.jpg)
How is this different from classical NLP
corpus
Text data
Linguist
Computer
rules
rules/probabilities
Classical NLP
Statistical NLP
![Page 13: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/13.jpg)
Classification appears as sequence labeling
![Page 14: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/14.jpg)
A set of Sequence Labeling Tasks: smaller to larger units
Words: Part of Speech tagging Named Entity tagging Sense marking
Phrases: Chunking Sentences: Parsing Paragraphs: Co-reference annotating
![Page 15: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/15.jpg)
Example of word labeling: POS Tagging
<s> Come September, and the UJF campus is
abuzz with new and returning students.</s>
<s> Come_VB September_NNP ,_, and_CC
the_DT UJF_NNP campus_NN is_VBZ abuzz_JJ with_IN new_JJ and_CC returning_VBG students_NNS ._.
</s>
![Page 16: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/16.jpg)
Example of word labeling: Named Entity Tagging
<month_name>September
</month_name>
<org_name>UJF
</org_name>
![Page 17: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/17.jpg)
Example of word labeling: Sense Marking
Word Synset WN-synset-nocome {arrive, get, come} 01947900
...
abuzz {abuzz, buzzing, droning} 01859419
![Page 18: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/18.jpg)
Example of phrase labeling: Chunking
Come July, and is
abuzz with .
the UJF campus
new and returning students
![Page 19: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/19.jpg)
Example of Sentence labeling: Parsing
[S1[S[S[VP[VBCome][NP[NNPJuly]]]]
[,,]
[CC and]
[S [NP [DT the] [JJ UJF] [NN campus]]
[VP [AUX is]
[ADJP [JJ abuzz]
[PP[IN with]
[NP[ADJP [JJ new] [CC and] [ VBG returning]]
[NNS students]]]]]]
[..]]]
![Page 20: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/20.jpg)
Handling labeling through the Noisy Channel Model
w t
(wn, wn-1, … , w1) (tm, tm-1, … , t1)
Noisy Channel
Sequence w is transformed into sequence t.
![Page 21: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/21.jpg)
Bayesian Decision Theory and Noisy Channel Model are close to each other
Bayes Theorem : Given the random variables A and B, ( ) ( | )
( | )( )
P A P B AP A B
P B
( | )P A B
( )P A
( | )P B A
Posterior probability
Prior probability
Likelihood
![Page 22: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/22.jpg)
Corpus
A collection of text called corpus, is used for collecting various language data
With annotation: more information, but manual labor intensive
Practice: label automatically; correct manually The famous Brown Corpus contains 1 million tagged
words. Switchboard: very famous corpora 2400
conversations, 543 speakers, many US dialects, annotated with orthography and phonetics
![Page 23: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/23.jpg)
Example-1 of Application of Noisy Channel Model: Probabilistic Speech Recognition (Isolated Word)[8]
Problem Definition : Given a sequence of speech signals, identify the words.
2 steps : Segmentation (Word Boundary Detection) Identify the word
Isolated Word Recognition : Identify W given SS (speech signal)
^
arg max ( | )W
W P W SS
![Page 24: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/24.jpg)
Identifying the word^
arg max ( | )
arg max ( ) ( | )W
W
W P W SS
P W P SS W
P(SS|W) = likelihood called “phonological model “ intuitively more tractable!
P(W) = prior probability called “language model” # W appears in the corpus
( )# words in the corpus
P W
![Page 25: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/25.jpg)
Pronunciation Dictionary
P(SS|W) is maintained in this way. P(t o m ae t o |Word is “tomato”) = Product of arc
probabilities
t
s4
o m o
ae
t
aa
end
s1 s2 s3
s5
s6 s7
1.0 1.0 1.0 1.01.0
1.0
0.73
0.27
Word
Pronunciation Automaton
Tomato
![Page 26: CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d135503460f949e67b9/html5/thumbnails/26.jpg)
Discriminative vs. Generative Model
W* = argmax (P(W|SS)) W
Compute directly fromP(W|SS)
Compute fromP(W).P(SS|W)
DiscriminativeModel
GenerativeModel