cs626/449 : speech, nlp and the web/topics in ai programming (lecture 5: wordnet; application in...
TRANSCRIPT
CS626/449 : Speech, NLP and the Web/Topics in AI Programming
(Lecture 5: Wordnet; Application in Query Expansion)
Pushpak BhattacharyyaCSE Dept., IIT Bombay
Lexical Matrix
Wordnet - Lexical Matrix (with examples)
Word MeaningsWord Forms
F1 F2 F3 … Fn
M1
(depend)E1,1
(bank)E1,2
(rely)E1,3
M2
(bank)E2,2
(embankment)
E2,…
M3
(bank)E3,2 E3,3
… …
Mm Em,n
Psycholinguistic Theory • Human lexical memory for nouns as a hierarchy.• Can canary sing? - Pretty fast response.• Can canary fly? - Slower response.• Does canary have skin? – Slowest response.
(can move, has skin)
(can fly)
(can sing)
Wordnet - a lexical reference system based on psycholinguistic theories of human lexical memory.
Animal
Bird
canary
Hindi Wordnet
Dravidian Language Wordnets
North East Language Wordnet
Marathi Wordnet
Sanskrit Wordnet
EnglishWordnet
Bengali Wordnet
Punjabi Wordnet
KonkaniWordnet
Linked Wordnets in India
Semantic relations in wordnet1. Synonymy2. Hypernymy / Hyponymy3. Antonymy4. Meronymy / Holonymy5. Gradation6. Entailment 7. Troponymy1, 3 and 5 are lexical (word to word), rest are semantic
(synset to synset).
Synset: the foundation(house)
1. house -- (a dwelling that serves as living quarters for one or more families; "he has a house on Cape Cod"; "she felt she had to get out of the house")2. house -- (an official assembly having legislative powers; "the legislature has two houses")3. house -- (a building in which something is sheltered or located; "they had a large carriage house")4. family, household, house, home, menage -- (a social unit living together; "he moved his family to Virginia"; "It was a good Christian household"; "I waited until the whole house was asleep"; "the teacher asked how many people made up his home")5. theater, theatre, house -- (a building where theatrical performances or motion-picture shows can be presented; "the house was full")6. firm, house, business firm -- (members of a business organization that owns or operates one or more establishments; "he worked for a brokerage house")7. house -- (aristocratic family line; "the House of York")8. house -- (the members of a religious community living together)9. house -- (the audience gathered together in a theatre or cinema; "the house applauded"; "he counted the house")10. house -- (play in which children take the roles of father or mother or children and pretend to interact like adults; "the children were playing house")11. sign of the zodiac, star sign, sign, mansion, house, planetary house -- ((astrology) one of 12 equal areas into which the zodiac is divided)12. house -- (the management of a gambling house or casino; "the house gets a percentage of every bet")
Synset - DSF format (2/2)
ID :: 121CATEGORY :: NOUNCONCEPT :: अपने� से� छो�टों के� प्रति� हृदय में� उठने�वा�ला�
प्र�मेंEXAMPLE :: “चा�चा� ने�हरू के� बच्चों से� बहु� ह� स्ने�ह
था�”SYNSET :: स्ने�ह,ने�ह,लागा�वा,मेंमें��
Creation of Synsets
Three principles:• Minimality• Coverage• Replacability
Gloss and ExampleCrucially needed for concept explication, wordnet building using another
wordnet and wordnet linking.
{earthquake, quake, temblor, seism} -- (shaking and vibration at the surface of the earth resulting from underground movement along a fault plane of from volcanic activity)
Semantic Relations
• Hypernymy and Hyponymy– Relation between word senses (synsets)– X is a hyponym of Y if X is a kind of Y– Hyponymy is transitive and asymmetrical– Hypernymy is inverse of Hyponymy
(lion->animal->animate entity->entity)(lion->animal->animate entity->entity)
Semantic Relations (continued)
• Meronymy and Holonymy– Part-whole relation, branch is a part of tree– X is a meronymy of Y if X is a part of Y– Holonymy is the inverse relation of Meronymy{kitchen} ………………………. {house}
Lexical Relation
• Antonymy– Oppositeness in meaning – Relation between word forms– Often determined by phonetics, word length etc.
({rise, ascend} vs. {fall, descend})
Gradation
StateState Childhood, Youth, Old Childhood, Youth, Old ageage
TemperatureTemperature Hot, Warm, ColdHot, Warm, Cold
ActionAction Sleep, Doze, WakeSleep, Doze, Wake
Gloss
study
Hyponymy
Hyponymy
Dwelling,abode
bedroom
kitchen
house,home
A place that serves as the living quarters of one or mor efamilies
guestroom
veranda
bckyard
hermitage cottage
Meronymy
Hyponymy
Meronymy
Hypernymy
WordNet Sub-Graph (English)
गा�य, गाऊ (gaaya ,gauu) Cow
चा$प�य�,पशु'(chaupaayaa, pashu)Four-legged animal
से(गावा�ला� एके शु�के�ह�री� में�द� चा$प�य�(siingwaalaa eka sakaahaarii maadaa choupaayaa)A horny, herbivorous, four-legged female animal)
पगा'री�ने� ( paguraanaa) ruminate
ब+ला (baila) Ox
के�मेंधे�ने'kaamadhenuA kind of cow
में+ने� गा�यmainii gaayaA kind of cow
थाने (thana) udder
प-.छो(puunchh ) Tail
शु�के�ह�री� (shaakaahaarii) herbivorous
Hypernym
Attribute
Hyponym
Gloss
Ability Verb
meronym
Antonym
WordNet Sub-Graph: Hindi
Wordnet Subgraph (Marathi)
खो�ड
री�ने
ब�गा
आं2ब�लिं45ब-
में -ूळ
में'ळ� ,खो�ड,फां�2द्या�,प�ने� इत्य�द<ने� य'क्त असे� वानेस्पति�तिवाशु�ष:"झा�ड� पय�Aवारीण शु'द्ध केरीण्य�चा� के�में केरी���"
झा�ड, वाEक्ष, �रू
वानेस्प��
MERONYMY
HOLONYMY
H Y P E R N Y M Y
H Y P O N Y M YGLOSS
Pan-India Dictionary StandardSenses Hindi Marathi Bangali Oriya Tamil
(W1, W2, W3, W4, W5, W6 )
(W1, W2, W3, W4, W5, W6 )
(W1, W2, W3) (W1, W2 , W3) (W1, W2, W3, W4)
(W1, W2, W3)
(sun) (सू�र्य�, सू�रज, भा�नु, भा�स्कर, प्रभा�कर,
दि�नुकर, अं�शुमा�नु, अं�शुमा�ली�) (सू�र्य�, भा�नु, दि�वा�कर, भा�स्कर, रविवा, दि�नु�शु, दि�नुमाणी�) ... ... ...
(cub, lad, laddie, sonny, sonny boy)
(लीड़क�, बा�लीक, बाच्चा�, छो�कड़�, छो�र�, छो�कर�, ली�डा� )
(मालीगा�, पो�रगा�, पो�र, पो�रगा� ) … … …
(son, boy) (पोत्र, बा�टा�, लीड़क�, ली�ली, सूत, बाच्चा�, नु��नु, पो�त, चि'र�ज�वा, चि'र�ज� )
(मालीगा�, पोत्र, ली�क, चि'र�ज�वा, तनुर्य ) … … …
Sanskrit Wordnet: a new effort- A column in the Concept based Multilingual dictionary
Concepts L1 (English) L2 (Hindi) L3 (Sanskrit)
Concept ID: Concept description
(W1, W2, W3, ..) (W4, W5, W6, ..) (W7, W8, W9, ..)
4066: any of various long-tailed primates (excluding the prosimians)
(monkey)(ब2दरी, बन्दरी, ब�नेरी,
वा�नेरी, केHशु, केतिप, मेंकेA टों, ..)
(वा�नेरीI, केतिपI, प्लावाङ्गःI, प्लावागाI, शु�खो�मेंEगाI, वाला�में'खोI, मेंकेA टोंI, ..)
2186: a typical star that is the source of light and heat for the planets in the solar system
(sun)
(से-यA,से-रीज, भा�ने', दिदवा�केरी, भा�स्केरी, प्रभा�केरी, दिदनेकेरी, रीतिवा, ..)
(से-यAI, सेतिवा��, आंदिदत्यI, मिमेंत्रःI, अरुणI, भा�ने'I, प-ष�, अकेA I, ..)
Query Expansion
Acknowledgement: part of the slides borrowed from my Dual Degree Project Student Nishikant Dhanuka
Problem with Keywords
• May not retrieve relevant documents that include synonymous terms– “restaurant” vs. “cafe”– “India” vs. “Bharat”
• May retrieve irrelevant documents that include ambiguous terms– “bat” (baseball vs. mammal)– “Apple” (company vs. fruit)– “bit” (unit of data vs. act of eating)
Why Search Engines Fail to Search Relevant Documents
Query Expansion
Definition• adding more terms (keyword spices) to a
user’s basic query Goal
• to improve Precision and/or RecallExample
• User Query: car• Expanded Query: car, cars, automobile,
automobiles, auto, .. etc
Naïve Methods
• Finding synonyms of query terms and searching for synonyms as well
• Finding various morphological forms of words by stemming each word in the query
• Fixing spelling errors and automatically searching for the corrected form
• Re-weighting the terms in original query
Query Expansion Issues
• Two major issues –– Which terms to include?– Which terms to weight more?
• Concept based versus term based QE– Is it better to expand based upon the individual
terms in the query, or the overall concept of the query
Objective
To get proper set of words, which will improve Precision, when added to basic search query, without loosing the recall in considerable amount
Existing QE techniques
• Global methods (static; of all documents in collection)– Query expansion
• Thesauri (or WordNet)• Automatic thesaurus generation
• Local methods (dynamic; analysis of documents in result set)– Relevance feedback– Pseudo relevance feedback
Global Analysis
Thesaurus based QE• For each term, t, in a query, expand the query with synonyms and
related words of t from the thesaurus– feline → feline cat
• May weight added terms less than original query terms.• Generally increases recall.• May significantly decrease precision, particularly with ambiguous
terms.– “interest rate” “interest rate fascinate evaluate”
• There is a high cost of manually producing a thesaurus– And for updating it for scientific changes
Automatically Generated Thesauri
• Attempt to generate a thesaurus automatically by analyzing the collection of documents
• Two main approaches– Co-occurrence based (co-occurring words are more likely
to be similar)– Shallow analysis of grammatical relations
• Entities that are grown, cooked, eaten, and digested are more likely to be food items.
• Co-occurrence based is more robust, grammatical relations are more accurate.
Example
Semantic Network/ Wordnet
• To expand a query, find the word in the semantic network and follow the various arcs to other related words.
Global Methods: Summary
• Pros – Thesauri and Semantic Networks (WordNet) can be
used to find good words for users “more like this”
• Cons– Little improvement has been found with automatic
techniques to expand query without user intervention
– Overall, not as useful as Relevance Feedback, may be as good as Pseudo Relevance Feedback
Local Analysis
Relevance Feedback
• Relevance feedback: user feedback on relevance of docs in initial set of results– User issues a (short, simple) query– The user marks returned documents as relevant
or non-relevant.– The system computes a better representation of
the information need based on feedback.– Relevance feedback can go through one or more
iterations.
Relevance Feedback Example: Initial Query and Top 8 Results
• Query: New space satellite applications
• + 1. 0.539, 08/13/91, NASA Hasn't Scrapped Imaging Spectrometer• + 2. 0.533, 07/09/91, NASA Scratches Environment Gear From Satellite
Plan• 3. 0.528, 04/04/90, Science Panel Backs NASA Satellite Plan, But Urges
Launches of Smaller Probes• 4. 0.526, 09/09/91, A NASA Satellite Project Accomplishes Incredible
Feat: Staying Within Budget• 5. 0.525, 07/24/90, Scientist Who Exposed Global Warming Proposes
Satellites for Climate Research• 6. 0.524, 08/22/90, Report Provides Support for the Critics Of Using Big
Satellites to Study Climate• 7. 0.516, 04/13/87, Arianespace Receives Satellite Launch Pact From
Telesat Canada• + 8. 0.509, 12/02/87, Telecommunications Tale of Two Companies
Relevance Feedback Example: Expanded Query
• 2.074 new 15.106 space• 30.816 satellite 5.660 application• 5.991 nasa 5.196 eos• 4.196 launch 3.972 aster• 3.516 instrument 3.446 arianespace• 3.004 bundespost 2.806 ss• 2.790 rocket 2.053 scientist• 2.003 broadcast 1.172 earth• 0.836 oil 0.646 measure
Top 8 Results After Relevance Feedback
• + 1. 0.513, 07/09/91, NASA Scratches Environment Gear From Satellite Plan
• + 2. 0.500, 08/13/91, NASA Hasn't Scrapped Imaging Spectrometer• 3. 0.493, 08/07/89, When the Pentagon Launches a Secret Satellite,
Space Sleuths Do Some Spy Work of Their Own• 4. 0.493, 07/31/89, NASA Uses 'Warm‘ Superconductors For Fast Circuit• + 5. 0.492, 12/02/87, Telecommunications Tale of Two Companies• 6. 0.491, 07/09/91, Soviets May Adapt Parts of SS-20 Missile For
Commercial Use• 7. 0.490, 07/12/88, Gaping Gap: Pentagon Lags in Race To Match the
Soviets In Rocket Launchers• 8. 0.490, 06/14/90, Rescue of Satellite By Space Agency To Cost $90
Million
Relevance Feedback: Problems
Why do most search engines not use relevance feedback?
• Users are often reluctant to provide explicit feedback
• It’s often harder to understand why a particular document was retrieved after applying relevance feedback
Pseudo Relevance Feedback
• Automatic local analysis• Pseudo relevance feedback attempts to
automate the manual part of relevance feedback.
• Retrieve an initial set of relevant documents.• Assume that top m ranked documents are
relevant.• Do relevance feedback