1 the domain-specific track at clef 2007 vivien petras, stefan baerisch max stempfhuber gesis...
DESCRIPTION
3 The Domain-Specific Task CLIR on structured scientific document collections: social science domain bibliographic metadata controlled vocabularies for subject description Leverage bibliographic metadata & controlled vocabularies for: search translationTRANSCRIPT
![Page 1: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/1.jpg)
1
The Domain-Specific Track at CLEF 2007
Vivien Petras, Stefan Baerisch & Max StempfhuberGESIS Social Science Information Centre,Bonn, Germany
Budapest, September 19, 2007
![Page 2: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/2.jpg)
2
Outline
• The Domain-Specific Task• Collections & Controlled Vocabularies• Topics • Participants, Runs & Relevance Assessments• Themes • Summary & Outlook
![Page 3: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/3.jpg)
3
The Domain-Specific Task
CLIR on structured scientific document collections:• social science domain• bibliographic metadata• controlled vocabularies for subject description
Leverage bibliographic metadata & controlled vocabularies for:
• search• translation
![Page 4: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/4.jpg)
4
The Domain-Specific Task
Tasks:• Monolingual against German, English or Russian• Bilingual against German, English or Russian• Multilingual against combined collection
![Page 5: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/5.jpg)
5
Collections
German English RussianName GIRT-DE GIRT-EN CSA-SA ISISSDescription German social
science literature & projects
GIRT-DEtranslated
Sociolog. Abstracts
Inst. of Scientific Inf. for Soc. Sc. of the Ru. Acad. of Science
Coverage 1990-2000 1990-2000 1994-1996Docs 151,319 151,319 20,000 145,802Abstracts 96% 17% 94% 27%
![Page 6: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/6.jpg)
6
Controlled Vocabularies
GIRT CSA-SA ISISSDescriptors / doc 10 6.4 3.9Class. codes / doc 2 1.3 n/a
5 different subject-describing terminologies:• Thesaurus for the Social Sciences (GIRT-DE, -EN)• Thesaurus of Sociological Indexing Terms (CSA-SA)• INION Thesaurus (ISISS)• Social Sciences Classification (GIRT-DE, -EN)• Sociological Abstracts Classification (CSA-SA)
![Page 7: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/7.jpg)
7
Controlled Vocabularies – Mapping Tools
Translation:• GIRT German GIRT English
Intellectual term mappings (cross-walks):• equivalent terms in vocabularies• GIRT German CSA-SA English • GIRT English CSA-SA English
original-term: agricultural area mapped-term: Rural areas
![Page 8: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/8.jpg)
8
Topics
25 topics in standard TREC format (title, desc, narr):
• 15 volunteers (social scientists)• 2-5 suggestions from 28 subject specialties• checked for:
• coverage in collections• variance from previous years
• translated into English, Russian
![Page 9: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/9.jpg)
9
Participants
5 groups Group Institution Country
ChemnitzMedia InformaticsChemnitz University of Technology
Germany
Cheshire School of Information UC Berkeley USA
Moscow Moscow State University Russia
Unine Computer Science DepartmentUniversity of Neuchatel
Switzerland
Xerox Data Mining GroupXerox Research Centre Europe France
![Page 10: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/10.jpg)
10
Runs
Task Runs2007
Runs 2006
Runs 2005
Monolingual - against German
13 13 17
- against English
15 8 15
- against Russian
11 1 8
Bilingual - against German
14 6 15
- against English
15 3 13
- against Russian
9 3 5
Multilingual 9 2 3Total 86 36 76
![Page 11: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/11.jpg)
11
Relevance Assessments
German English Russian
Pool size 16,288 17,867 14,473
Rel. Docs 2007 22% 25% 10%*
Rel. docs 2006 39% 26% n/a
Rel. docs 2005 20% 21% 9% (RSSC)
* In Russian collection: 3 topics without relevant topics
All assessments done with Univ. of Padova‘s DIRECT System.
![Page 12: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/12.jpg)
12
Relevance Assessments – Best MAP
Task MAP2007
MAP 2006
MAP2005
Monolingual - against German
0.5051 0.5454 0.4936
- against English
0.3534 0.4576 0.5065
- against Russian
0.1971 0.2542 0.3038
Bilingual - against German
0.4568 (90%)
0.2448 (45%) 0.4201 (85%)
- against English
0.3341 (95%)
0.3301 (72%) 0.4743 (94%)
- against Russian
0.1348 (68%)
0.1648 (62%) 0.2331 (77%)
Multilingual 0.0884 0.0753 0.0532
![Page 13: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/13.jpg)
13
Themes - Retrieval models
• Lucene • Language Modelling • Logistic Regression• Comparison: Vector Space, LM, Probabilistic - Okapi, DFR
• Data fusion
• Russian• word-based vs. N-gram retrieval• new light-weight stemmer
![Page 14: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/14.jpg)
14
Themes – Query Expansion
Entry Vocabulary Modules • query terms associated with thesaurus terms from documents
Thesaurus Lookup• combined thesaurus from all CVs • GIRT Thesaurus Index
Lexical Entailment • find document terms in relation to query terms
Blind Feedback
![Page 15: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/15.jpg)
15
Themes – Translation
Lucene plug-in • Babelfish, Google, PROMT, Reverso
Bilingual thesaurus mapping
Dictionary adaption • disambiguate term translation given language context of feedback documents
Statistical machine translation• MATRAX
Commercial Software
![Page 16: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/16.jpg)
16
Summary & Outlook
Extension of Russian materials• Translation table DE-EN-RU for GIRT Thesaurus• Translation table RU-EN for INION Thesaurus• Mapping between GIRT – INION Thesaurus
More tools for Terminology mapping• different relationships (0T, SYN, BT, NT, RT)• GESIS-IZ project: > 40 mappings
• 25 controlled vocabularies / 11 disciplines • ~ 125,000 terms & phrases • ~ 400,000 relations
![Page 17: 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1b9c7f8b9ab0599c5edc/html5/thumbnails/17.jpg)
17
Domain-Specific Track:http://www.gesis.org/en/research/information_technology/clef_ds_2007.htm
Vocabulary Mappings:http://www.gesis.org/en/research/information_technology/komohe.htm
Email:[email protected]