microsoft research india’s participation in fire2008
DESCRIPTION
Microsoft Research India’s Participation in FIRE2008. Raghavendra Udupa [email protected]. CLIR System. CLEF’07 Query # 10.2452/447-AH ऐसे दस् तावेज खोजिए जिनमें पिम फोरत् यून के राजनैतिक विचारों पर चर्चा की गई हो।. पिम फोरत् यून की राजनीति. Dictionary. - PowerPoint PPT PresentationTRANSCRIPT
![Page 2: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/2.jpg)
Inverted Index
Inverted Index
DictionaryDictionary
LA Times 2002
articles
LA Times 2002
articles
Document Ranker
Document Ranker
Query Translator
Query Translator
पि�म फो�रत् यू�न की� र�जन�पित्
CLEF’07 Query #10.2452/447-AHऐसे� दसे त्�वे�ज खो�जिजए जिजनम� पि�म फो�रत् यू�न की� र�जन�पित्की पिवेचा�र� �र चाचा�� की� गई हो�।
Pim Fortuyn politics
CLIR System
![Page 3: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/3.jpg)
Inverted Index
Inverted Index
DictionaryDictionary
Document CollectionDocument Collection
Document Ranker
Document Ranker
Query Translator
Query Translator
Domain Adaptation
Mining Translation Lexicon from Comparable
Corpora
Mining transliterations of
OOV words
Cross-Language
Ranking Model
Mining NETE Transliterations
from Comparable Corpora
![Page 4: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/4.jpg)
Inverted Index
Inverted Index
DictionaryDictionary
Document CollectionDocument Collection
Document Ranker
Document Ranker
Query Translator
Query Translator
Domain Adaptation
Mining transliterations of OOV terms
(ECIR 2009)
Cross-Language Ranking Models
Mining NETE Transliterations
from Comparable Corpora (CIKM’08)
Mining Translation Lexicon from Comparable Corpora (MT
Summit 2007)
![Page 5: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/5.jpg)
Baseline Retrieval System
Language Model-Based Retrieval
).|(log)|()|(
)|(log)|()|(
TTTSSw w
S
TTSw
Tts
dwPwwPqwP
dwPqwPdqScore
T S
T
Probabilistic Translation Lexicon ~100K parallel sentences
IBM Model 3 AlignmentGIZA++
J. Jagarlamudi and A. Kumaran, Cross-LingualInformation Retrieval System for Indian Languages. Working
Notes for the CLEF 2007 Workshop.
![Page 6: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/6.jpg)
FIRE Fighting
Mining Transliterations of Out-Of-Vocabulary Query Terms.
Date-Based Document Restriction.
![Page 7: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/7.jpg)
Mining Transliterations of Out-Of-Vocabulary Query
Terms
Raghavendra Udupa
![Page 8: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/8.jpg)
OOV Query Terms
Many OOV query terms are NEs NEs are often the focus of a query NEs form an open class of terms in all languages. Getting their transliterations right is extremely
important Many OOV query terms are not NEs but
transliterations of English words. E.g. से�मिमन�र (seminar), की��$र�शन (corporation), चा�म्पि'यून
(champion), पिफोल्म (film)
![Page 9: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/9.jpg)
A Hypothesis
The transliterations of most of the transliteratable OOV terms of a query can be found in documents relevant to the query.
![Page 10: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/10.jpg)
Empirical Validation
Collection Transliteratable OOV terms
Terms with transliterations in at least one relevant
document
Terms with transliteration in at
least 50% of relevant documents
CLEF 2006 (Hindi) 62 58 (94%) 49 (79%)
CLEF 2007 (Hindi) 47 42 (89%) 34 (72%)
CLEF 2007 (Tamil) 43 42 (98%) 39 (89%)
![Page 11: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/11.jpg)
A Practical Hypothesis
The transliterations of many of the transliteratable OOV terms of a query can be found in the top results of the CLIR system for the query.
![Page 12: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/12.jpg)
Mining OOV Transliteration Equivalents Basic Idea:
Pair the query with each of the top N results. Treat each pair as a comparable document pair. Mine transliteration equivalents from the comparable
document pairs.
“They are out there, if you know where to look”: MiningTransliterations of OOV Query Terms for Cross-Language Information
RetrievalECIR 2009, Toulouse
![Page 13: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/13.jpg)
Long Queries: MAP
Collection Baseline Transliterations Mining % change over baseline
CLEF 2006 (Hindi) 0.1463 0.2476 +69.24*
CLEF 2007 (Hindi) 0.2521 0.3389 +34.43*
CLEF 2007 (Tamil) 0.1848 0.2270 +22.84*
![Page 14: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/14.jpg)
Short Queries: MAP
Collection Baseline Transliterations Mining
% change over baseline
CLEF 2006 (Hindi) 0.0877 0.1467 67.3
CLEF 2007 (Hindi) 0.1829 0.2323 27.0
CLEF 2007 (Tamil) 0.1024 0.1265 23.5
![Page 15: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/15.jpg)
FIRE 2008: MAP
Baseline Transliterations Mining
% change over baseline
Short (unofficial) 0.2616 0.3191 22
Long (unofficial) 0.4351 0.4871 12
Long (official) 0.4140 0.4526 9
![Page 16: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/16.jpg)
FIRE2008: MAP Difference (Long, official)
HE0121 - HE0120 MAP
-0.10
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
26
29
32
35
38
41
44
47
50
53
56
59
62
65
68
71
74
Query Number
MA
P D
iffe
ren
ce
HE0121 - HE0120
![Page 17: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/17.jpg)
FIRE 2008: Num_Rel_Ret
Baseline Transliterations Mining
Short (unofficial) 70.60 80.0
Long (unofficial) 84.55% 88.54%
Long (official) 79.68% 82.11%
![Page 18: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/18.jpg)
FIRE 2008: P@10
Baseline Transliterations Mining
Short (unofficial) 0.1000 0.4320
Long (unofficial) 0.6260 0.6540
Long (official) 0.6120 0.6480
![Page 19: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/19.jpg)
Mining Transliterations @ FIRE2008 Worked.
![Page 20: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/20.jpg)
Date-Based Document Restriction
Raghavendra Udupa
![Page 21: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/21.jpg)
Dates
Some queries contain dates CLEF 2007, Topic 407: Who was the Australian Prime
Minister in 2002? CLEF 2007, Topic 411: …terrorist car bomb in Bali,
Indonesia, in 2002. CLEF 2006, Topic 326: …winners in any category of the
1995 Emmy Awards. CLEF 2006, Topic 327: …earthquakes in Mexico City in
1995.
![Page 22: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/22.jpg)
Hypothesis
If a query contains a date then the relevant documents for the query are likely to be from the same time period.
![Page 23: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/23.jpg)
Empirical Validation
CLEF’07 LATimes 2002
CLEF’06 GH 95, LATimes 1994
![Page 24: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/24.jpg)
CLEF’06: C327
Title: Earthquakes in Mexico City
Description: Find documents that provide details on the impact of or the
damage caused by earthquakes in Mexico City in 1995. Narrative:
Relevant document should contain some information on earthquakes in Mexico City in 1995, such as their magnitude, damages caused, panic of the inhabitants, etc. Documents on earthquakes in other places in Mexico are not relevant unless the seismic impact was also felt in Mexico City.
![Page 25: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/25.jpg)
Relevant Document
<DOCNO> LA121194-0313 </DOCNO> <DOCID> 107228 </DOCID> December 11, 1994, Sunday, Home Edition A magnitude 6.3 earthquake rocked Mexico City,
causing people to flee their homes in fear. There were no immediate reports of injuries or severe damage. The U.S. Geological Survey's National Earthquake Information Center in Golden, Colo., said the quake's epicenter was in Petatlan in the southwestern state of Guerrero.
![Page 26: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/26.jpg)
Date-Based Document Restriction Identify dates (if any) in the query. Restrict candidate documents to the set of
documents coming from the same time period.
![Page 27: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/27.jpg)
FIRE 2008: Relevant Docs
Topic Relevant Docs from different time period
44 (11/56)
47 (23/32)
48 (70/76)
50 (18/61)
52 (2/38)
73 (10/53)
![Page 28: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/28.jpg)
FIRE 2008: HindiEnglish MAP
Without DR With DR
Short 0.2616 (unofficial)
0.2601 (unofficial)
Long 0.4351(unofficial)
0.4140 (official)
![Page 29: Microsoft Research India’s Participation in FIRE2008](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148f3550346895db61204/html5/thumbnails/29.jpg)
Date-Based Document Restriction @ FIRE2008 Hurt us. Deeper investigation needed.