final assignment demo 11 th nov, 2012
DESCRIPTION
Final Assignment Demo 11 th Nov, 2012. Deepak Suyel Geetanjali Rakshit Sachin Pawar. CS 626 – Sppech , NLP and the Web. Assignments. POS Tagger Bigram Viterbi Trigram Viterbi A-Star Bigram Discriminative Viterbi Language Model (Word Prediction) Bigram Trigram Yago Explorer - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/1.jpg)
Final Assignment Demo11th Nov, 2012
Deepak SuyelGeetanjali Rakshit
Sachin Pawar
CS 626 – Sppech, NLP and the Web
![Page 2: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/2.jpg)
2
Assignments
• POS Tagger– Bigram Viterbi – Trigram Viterbi– A-Star– Bigram Discriminative Viterbi
• Language Model (Word Prediction)– Bigram– Trigram
• Yago Explorer• Parser Projection and NLTK
![Page 3: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/3.jpg)
3
POS Tagger
![Page 4: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/4.jpg)
4
Viterbi: Generative Model
• Most probable tag sequence given word sequence:
• Bigram Model:
• Trigram Model:
![Page 5: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/5.jpg)
5
Discriminative Bigram Model
• Most probable tag sequence given word sequence:
![Page 6: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/6.jpg)
6
A-star Heuristic
• A : Highest transition probability– Static score which can be found directly from the
learned model• B : Highest lexical probability in the given
sentence– Dynamic score
• Min_cost = -log(A)-log(B)• h(n) = Min_cost * (no. of hops till goal state)
![Page 7: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/7.jpg)
7
Comparison of different flavours of POS Taggers
POS Tagger Correct Total Accuracy (%)
Bigram Generative Viterbi
812188.0 862785.0 94.14
Trigram Generative Viterbi
814505.0 862785.0 94.4
A-Star 793441.0 862785.0 91.96
Bigram Discriminative
Viterbi
796890.0 862785.0 92.36
![Page 8: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/8.jpg)
8
Language Model
![Page 9: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/9.jpg)
9
Next word prediction : Bigram Model
• Using language model on raw text
• Using language model on POS tagged text
![Page 10: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/10.jpg)
10
Next word prediction : Trigram Model
• Using language model on raw text
• Using language model on POS tagged text
![Page 11: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/11.jpg)
11
Metrics: Comparing Language Models
• We have used “Perplexity” for comparing two language models.– Language model using only previous word– Language model using previous word as well as
POS tag of previous word• Perplexity is weighted average branching
factor which is calculated as,
![Page 12: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/12.jpg)
12
Results
• Raw text LM :– Word Prediction Accuracy: 12.97%– Perplexity : 5451
• POS tagged text LM :– Word Prediction Accuracy : 13.24%– Perplexity : 5002
![Page 13: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/13.jpg)
13
ExamplesRaw Text - Incorrect POS tagged Text - Correct
• porridgy liquid is : fertiliser• AJ0_porridgy NN1_liquid is : is
• malt dissolve into : terms• NN1_malt VVB_dissolve into : into
• also act as : of• AV0_also VVB_act as : as
![Page 14: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/14.jpg)
14
Examples(Contd.)
• about english literature : and• PRP_about AJ0_english literature : literature
• spoken english was : literature• AJ0_spoken NN1_english was : was
![Page 15: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/15.jpg)
15
Yago Explorer
![Page 16: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/16.jpg)
16
Yago Explorer
• Made use of:– WikipediaCategories– WordnetCategores, and – YagoFacts.
• Modified Breadth First Search (BFS).
![Page 17: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/17.jpg)
17
Algorithm
• Input: Entities E1, E2• Output: Paths between E1 and E2• Procedure:
1. Find WikipediaCategories for E1 and E2. If any category matches, return
2. Find WordNetCategories for E1 and E2. If any match found, return.
3. Find YagoFacts for E1 and E2. If any match found, return4. Expand YagoFacts for E1 and E2. For each pair of
entities from E1 and E2, repeat steps 1-4.
![Page 18: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/18.jpg)
18
Ex:1 Narendra Modi and Indian National Congress
• Path from E1 : Narendra_Modi--livesIn--> Gandhinagar; Gandhinagar--category--> Indian_capital_cities; • Path from E2 : Indian_National_Congress--isLocatedIn--> New_Delhi; New_Delhi--category--> Indian_capital_cities; • Path from E1: Narendra_Modi--isAffiliatedTo--> Bharatiya_Janata_Party; Bharatiya_Janata_Party--category--> Political_parties_in_India; • Path from E2 : Indian_National_Congress--category--> Political_parties_in_India;
![Page 19: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/19.jpg)
19
Ex:2 Mahesh Bhupathi and Mother Teresa
• Path from E1 : Mahesh_Bhupathi--livesIn--> Bangalore; Bangalore--category--> Metropolitan_cities_in_India; • Path from E2: Mother_Teresa--diedIn--> Kolkata; Kolkata--category--> Metropolitan_cities_in_India; • Path from E1 : Mahesh_Bhupathi--hasWonPrize--> Padma_Shri; Padma_Shri--category--> Civil_awards_and_decorations_of_India; • Path from E2 : Mother_Teresa--hasWonPrize--> Bharat_Ratna; Bharat_Ratna--category--> Civil_awards_and_decorations_of_India;
![Page 20: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/20.jpg)
20
Ex:3 Michelle Obama and Frederick Jelinek
• Path from E1 : Michelle_Obama--graduatedFrom--> Princeton_University; Princeton_University--category--> university_108286569; • Path from E2 : Frederick_Jelinek--graduatedFrom--> Massachusetts_Institute_of_Technology; Massachusetts_Institute_of_Technology--category--> university_108286569;
![Page 21: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/21.jpg)
21
Ex:4 Sonia Gandhi and Benito Mussolini
• Path from E1 : Sonia_Gandhi--isCitizenOf--> Italy ; Italy--dealsWith--> Germany ; Germany--isLocatedIn--> Europe ; • Path from E2 : Benito_Mussolini--isAffiliatedTo--> National_Fascist_Party; National_Fascist_Party--isLocatedIn--> Rome; Rome--isLocatedIn--> Europe;
![Page 22: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/22.jpg)
22
Ex5 : Narendra Modi and Mohan Bhagwat
• Path from E1 :– Narendra_Modi--isAffiliatedTo--
>Bharatiya_Janata_Party ; Bharatiya_Janata_Party<--isAffiliatedTo--Hansraj_Gangaram_Ahir ;
• Path from E2 : – Mohan_Bhagwat--wasBornIn-->Chandrapur ;
Chandrapur<--livesIn--Hansraj_Gangaram_Ahir ;
![Page 23: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/23.jpg)
23
Parser Projection
![Page 24: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/24.jpg)
24
ExampleE: Delhi is the capital of IndiaH: dillii bhaarat kii raajdhaani haiE-parse: [ [ [Delhi]NN]NP
[ [is]VBZ [[the]ART [capital]NN]NP [[of]P [[India]NNP]NP]PP]VP
]S
H-parse: [ [ [dillii]NN]NP
[ [[[bhaarat]NNP]NP [kii]P ]PP [raaajdhaanii]NN]NP [hai]VBZ ]VP
]S
![Page 25: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/25.jpg)
25
Resource and Tools
• Parallel corpora in two languages L1 and L2
• Parser for langauge L1
• Word translation model• A statistical model of the relationship between
the syntactic structures of two different languages (can be effectively learned from a bilingual corpus by an unsupervised learning technique)
![Page 26: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/26.jpg)
26
Challenges
• Conflation across languages– “goes” “जा�ता� है�”
• Phrase to phrase translation required; some phrases are opaque to translation– E.g. Phrases like “piece of cake”
• Noise introduced by misalignments
![Page 27: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/27.jpg)
27
Natural LanguageTool Kit
• It is a platform for building Python programs to work with human language data.
• It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet
• It has a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.
![Page 28: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/28.jpg)
28
NLTK ModulesLanguage processing task
NLTK modules Functionality
Collocation discovery
nltk.collocationst-test, chi-squared, point-wise mutual information
Part-of-speech tagging
nltk.tagn-gram, backoff, Brill, HMM, TnT
Classification nltk.classify, nltk.clusterdecision tree, maximum entropy, naive Bayes, EM, k-means
Chunking nltk.chunkregular expression, n-gram, named-entity
Parsing nltk.parsechart, feature-based, unification, probabilistic, dependency
![Page 29: Final Assignment Demo 11 th Nov, 2012](https://reader036.vdocuments.us/reader036/viewer/2022081603/568144fc550346895db1c88a/html5/thumbnails/29.jpg)
29
NLTK Modules (Contd)
Language processing task NLTK modules Functionality
Semantic interpretation nltk.sem, nltk.inferencelambda calculus, first-order logic, model checking
Evaluation metrics nltk.metricsprecision, recall, agreement coefficients
Probability and estimation nltk.probabilityfrequency distributions, smoothed probability distributions
Applications nltk.app, nltk.chatgraphical concordancer, parsers, WordNet browser, chatbots
Linguistic fieldwork nltk.toolboxmanipulate data in SIL Toolbox format