![Page 1: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/1.jpg)
XML Retrieval: A content-oriented perspective
Mounia Lalmas
Department of Computer Science
Queen Mary, University of London
![Page 2: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/2.jpg)
Outline
Part I - Content-oriented XML retrieval
Part II - Evaluating content-oriented XML retrieval
![Page 3: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/3.jpg)
XML Retrieval: Motivation
XML is able to represent a mixture of “structured” and text (“unstructured”) information.
XML applications: digital libraries, content management.
XML repositories: IEEE INEX collection, LexisNexis, the Library of Congress collection.
![Page 4: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/4.jpg)
XML Retrieval: DB and IR views
Data-centric view (DB)—XML as exchange format for structured data
Document-centric view (IR)—XML as format for representing the logical structure of documents
Now increasingly both views (DB+IR)
![Page 5: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/5.jpg)
Data Centric XML Documents: Example
<CLASS name=“DCS317” num_of_std=“100”> <LECTURER lecid=“111”>Thomas</LECTURER> <STUDENT marks=“70” origin=“Oversea”> <NAME>Tassos</NAME> </STUDENT> <STUDENT marks=“30” origin=“EU”> <NAME>Christof</NAME> </STUDENT></CLASS>
![Page 6: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/6.jpg)
Document Centric XML Documents: Example
<CLASS name=“DCS317” num_of_std=“100”><LECTURER lecid=“111”>Thomas</LECTURER><STUDENT studid=“007” >
<NAME>James Bond</NAME> is the best student in theclass. He scored <INTERM>95</INTERM> points out of<MAX>100</MAX>. His presentation of <ARTICLE>UsingMaterialized Views in Data Warehouse</ARTICLE> wasbrilliant.
</STUDENT><STUDENT stuid=“131”>
<NAME>Donald Duck</NAME> is not a very goodstudent. He scored <INTERM>20</INTERM> points…</STUDENT>
</CLASS>
![Page 7: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/7.jpg)
Content-oriented XML retrieval
Traditional IR is about finding relevant documents to a user’s information need, e.g. entire book.
XML retrieval allows users to retrieve document components (elements) that are more focussed to their information needs, e.g a chapter, a page, several paragraphs of a book instead of an entire book.
The structure of documents is exploited to identify which document components to retrieve.
• Structure improves precision• Exploit visual memory
![Page 8: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/8.jpg)
Book
Chapters
Sections
Subsections
World Wide Web
This is only only another to look one le to show the need an la a out structure of and more a document and so ass to it doe not necessary text a structured document have retrieval on the web is an it important topic of today’s research it issues to make se last sentence..
XML retrieval allows users to retrieve document components that are more focussed, e.g. a subsection of a book instead of an entire book.
SEARCHING = QUERYING + BROWSINGSEARCHING = QUERYING + BROWSING
Content-oriented XML retrieval
![Page 9: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/9.jpg)
Focussed retrieval: Scientific Collection
Querymodel checking aviation systems
Answerone section in a workshop report
![Page 10: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/10.jpg)
Focussed Retrieval: Encyclopedia
Information needvolcanic eruption prediction
Answerrelatively small portion of the volcano topic
![Page 11: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/11.jpg)
Focussed retrieval: Technical Manual
Querysegmentation fault windows services for unix
Answeronly a single paragraph in a long manual
![Page 12: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/12.jpg)
XML: eXtensible Mark-up Language
Meta-language (user-defined tags) currently being adopted as the document format language by W3C
Used to describe content and structure (and not layout)
Grammar described in DTD ( used for validation)
<lecture> <title> Structured Document Retrieval </title> <author> <fnm> Smith </fnm> <snm> John </snm> </author> <chapter> <title> Introduction into SDR </title> <paragraph> …. </paragraph> … </chapter> …</lecture>
<!ELEMENT lecture (title, author+,chapter+)><!ELEMENT author (fnm*,snm)><!ELEMENT fnm #PCDATA>…
![Page 13: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/13.jpg)
XML: eXtensible Mark-up Language
Use of XPath notation to refer to the XML structure
chapter/title: title is a direct sub-component of chapter//title: any titlechapter//title: title is a direct or indirect sub-component of chapterchapter/paragraph[2]: any direct second paragraph of any chapterchapter/*: all direct sub-components of a chapter
<lecture> <title> Structured Document Retrieval </title> <author> <fnm> Smith </fnm> <snm> John </snm> </author> <chapter> <title> Introduction into SDR </title> <paragraph> …. </paragraph> … </chapter> …</lecture>
![Page 14: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/14.jpg)
XML Queries
Content-only (CO) queries Standard IR queries but here we are retrieving document components “Wine tasting in Granada”
Structure-only queries Usually not that useful from an IR perspective “Paragraph containing a diagram next to a table”
Content-and-structure (CAS) queries Put constraints on which types of components are to be retrieved
• E.g. “Articles that contain sections about hotels in Granada, and that contain a picture of Alhambra, and return titles of these articles”
Where to look (support elements),what to return (target elements)
![Page 15: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/15.jpg)
Content-oriented XML retrieval
Return document components at the right level of granularity (e.g. a book, a chapter, a section, a paragraph, a table, a
figure, etc), relevant to the user’s information need with regards to content
and structure.
SEARCHING = QUERYING + BROWSINGSEARCHING = QUERYING + BROWSING
![Page 16: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/16.jpg)
Right level of granularity: The challenge
Query: wordnet information retrieval
![Page 17: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/17.jpg)
(Simplified) Conceptual model
Structured documents Content + structure
Inverted file + structure index
tf, idf, …
Matching content + structure
Presentation of related components
Documents Query
Document representation
Retrieval results
Query representation
Indexing Formulation
Retrieval function
![Page 18: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/18.jpg)
Article ?XML,?retrieval
?authoring
0.9 XML 0.5 XML 0.2 XML
0.4 retrieval 0.7 authoring
Challenge 1: term weights
Title Section 1 Section 2
how to obtain document and collection statistics (e.g. tf, idf)
![Page 19: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/19.jpg)
Article ?XML,?retrieval
?authoring
0.9 XML 0.5 XML 0.2 XML
0.4 retrieval 0.7 authoring
Challenge 2: augmentation weights
Title Section 1 Section 2
0.5 0.8 0.2
which components contribute best to content of “article”?
![Page 20: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/20.jpg)
Article ?XML,?retrieval
?authoring
0.9 XML 0.5 XML 0.2 XML
0.4 retrieval 0.7 authoring
Challenge 3: component weights
Title Section 1 Section 20.6
0.4 0.4
0.5
which component type (tag) is a good retrieval unit?
![Page 21: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/21.jpg)
Article XML,retrieval
authoring
XML XML XML
retrieval authoring
Challenge 4: overlapping elements
Title Section 1 Section 2
“section 1” and “article” are both relevant to “XML retrieval”, so which one to return?
![Page 22: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/22.jpg)
Approaches …
vector space model
probabilistic model
Bayesian network
language model
extending DB model
Boolean model
natural language processing
cognitive model
ontology
parameter estimation
tuning
smoothing
fusion
phrase
term statistics
collection statistics
component statistics
proximity search
logistic regression
belief modelrelevance feedback
divergence from randomness
machine learning
![Page 23: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/23.jpg)
Content-oriented XML retrieval: Conclusion
Efficiency—Not just documents, but all its elements
Models—Statistics to be adapted or redefined—Combination of evidence
Users—What is focussed retrieval?—Do users really want elements?
Interface and presentation issues
![Page 24: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/24.jpg)
Outline
Part I - Content-oriented XML retrieval
Part II - Evaluating content-oriented XML retrieval
![Page 25: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/25.jpg)
Evaluation of XML retrieval: INEX
Promote research and stimulate development of XML information access and retrieval, through
Creation of evaluation infrastructure and organisation of regular evaluation campaigns for system testing
Building of an XML information access and retrieval research community Construction of test-suites
Collaborative effort participants contribute to the development of the collection
End with a yearly workshop, in December, in Dagstuhl, Germany
INEX has allowed a new community in XML information access to emerge, as shown by the number of publications (64 - not final- in 2005, 37 in 2004 and 13 in 2003).
![Page 26: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/26.jpg)
INEX: Background
University of Amsterdam, NLUniversity of Otago, NZUniversity of Chile, CLCWI, NLCarnegie Mellon University, USAIBM Research Lab, ILUniversity of Minnesota Duluth, USAUniversity of Paris 6, FR
Queensland University of Technology, AUSUniversity of California, Berkeley, USARoyal School of LIS, DKQueen Mary, University of London, UKUniversity of Duisburg-Essen, DEINRIA-Rocquencourt, FRUtrecht University, NL
Sponsored by DELOS Network of Excellence for Digital Libraries under FP6 – IST programme
Mainly dependent on voluntary efforts Coordination is distributed for tasks and tracks
Main Institutions involved in Coordination for 2005
![Page 27: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/27.jpg)
INEX 2005 Participants
64 participants: 32 Europe; 12 N.America; 10 Asia; 5 Oceania; 5 other+3000 e-mails in 2005!
Max-Planck-Institut fuer Informatik, GermanyInformation Studies, Royal School of LIS, Denmark University of California, Berkeley, USA Peking University, China University of Granada, Spain University of Amsterdam, The Netherlands University of Otago, New Zealand Queen Mary University of London, UKUniversity of Toronto, Canada Utrecht University, The Netherlands City University London, UK University of Kaiserslautern, Germany INRIA-Rocquencourt, France University of Wollongong in Dubai IRIT - Toulouse, France RMIT University, Australia Ecoles des Mines de Saint-Etienne, France Queensland University of Technology, Australia Universtity of Klagenfurt, Austria Fondazione Ugo Bordoni, Italy University of Tampere, FinlandCarnegie Mellon University, USA Cornell University, USA University of Illinois at Urbana-Champaign, USA IBM Haifa Research Lab, IsraelOchanomizu University, JapanThe Hebrew University of Jerusalem, Israel Laboratoire d’Informatique de Paris 6, FranceUniversity of Minnesota Duluth, USAUniversity of Rostock, GermanyUniversity of California, Los Angeles, USAUniversity of Udine, ItalyUniversity of Rostock, GermanyUniversity of California, Los Angeles, USAUniversity of Udine, Italy
University of South-Brittany, FranceNagoya University, JapanUniversity of Waterloo, CanadaRutgers University, USAKyungpook National University, KoreaUniversity of Chile, ChileHiroshima City University, JapanUniversity of Helsinki, FinlandAT&T Labs-Research, USAMicrosoft Research Lab Cambridge, UKUniversity of Twente, The NetherlandsCentre for Mathematics & Computer Science (CWI), NLUniversity of Utah, USAUniversity Duisburg-Essen, GermanyUniversity of Ostrava, Czech RepublicHong Kong Baptist University, Hong KongUniversity of Sheffield, UKOslo University College, NorwayL3S Research Center, GermanyUniversity of Michigan, USACLIPS-IMAG Grenoble, FranceWuhan University, ChinaNara Institute of Science and Technology, JapanRitsumeikan University, JapanUniversity of Tsukuba, JapanState University of Montes Claros, Montes Claros(MG), BrazilINRIA Sophia AntipolisCharles de Gaulle University - Lille 3University of Siena, ItalyAustralian Research Council, Canberra, AustraliaUniversity of Wollongong, Wollongong, AustraliaUniversity of Padova, Italy
![Page 28: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/28.jpg)
Test suite for evaluating retrieval performance
Is your XML engine retrieving the relevant information, while at the same time avoiding returning irrelevant information?
Document collection
Topics reflecting realistic information needs
Retrieval tasks, stating what the XML search engine should return as answers
Relevance assessments, stating which elements are relevant to which topics
Metrics to measure effectiveness performance
![Page 29: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/29.jpg)
INEX test suites
Documents ~500MB (+ 241 MB): 12,107 (16, 819) articles in XML format from IEEE Computer Society journals and magazines; 8 millions elements!
INEX 2002 60 topics, inex_eval metric
INEX 200366 topics, use subset of XPath, inex_eval and inex_eval_ng metrics
INEX 200475 topics, subset of 2003 XPath subset (NEXI)Official metric: inex_evalOthers: inex_eval_ng, XCG, t2i, ERR, PRUM, …
INEX 200587 topics, NEXIOfficial metric: XCG
![Page 30: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/30.jpg)
INEX Topics
CO topic: open standards for digital video in distance learning
CAS topic: //article[about(.,'formal methods verify correctness aviation systems')]//sec//[about(.,'case study application model checking theorem proving')]
— Candidate topics submitted by participants, must have some relevant elements, not to few and not too many
— Selection process performed by INEX organisers
![Page 31: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/31.jpg)
Retrieval tasks I
CO retrieval task, same as standard IR, but return elementsopen standards for digital video in distance learning
+S retrieval task, where user add structural hints to query to narrow down number of returned elements
//article//sec[about(.,open standards for digital video in distance learning)]
Three strategies:— Focussed strategy: assume that user prefers a single element that is the most
relevant.— Thorough strategy: assume that user prefers all highly relevant elements.— Fetch and browse strategy: assume that user interested in highly relevant
elements that are contained only within highly relevant articles.
![Page 32: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/32.jpg)
Retrieval tasks II
CAS retrieval task where to look for the relevant elements (i.e. support elements) what type of elements to return (i.e. target elements). strict and vague interpretations applied to both support and target
elements
//article[about(.,'formal methods verify correctness aviation systems')]//sec//[about(.,'case study application model checking theorem proving')]
![Page 33: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/33.jpg)
Relevance in XML retrieval
smallest component (specificity) that is highly relevant (exhaustivity)
specificityspecificity: extent to which a document component is focused on the information need, while being an informative unit.
exhaustivityexhaustivity: extent to which the information contained in a document component satisfies the information need.
XML retrieval evaluation
XML retrieval
article
ss1 ss2
s1 s2 s3
XML evaluation
![Page 34: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/34.jpg)
Relevance assessment task
Topics are assessed by the INEX participants Use of an on-line interface
CompletenessCompleteness— Rules that force assessors to assess related elements — E.g. element assessed relevant its parent element and children elements must also
be assessed— …
ConsistencyConsistency— Rules to enforce consistent assessments— E.g. Parent of a relevant element must also be relevant, although to a different extent— E.g. Exhaustivity increases going up; specificity increases going down— …
Assessing a topics takes a week!
Average 2 topics per participants
Duplicate assessments (12 topics) in INEX 2004
![Page 35: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/35.jpg)
% Agreement
Topic % Type1 12,59 CAS2 2,95 CAS3 22,85 CAS4 8,60 CAS5 60,87 CAS6 0,00 CAS7 27,53 CAS8 7,63 CO9 25,22 CO10 9,89 CO11 5,65 CO12 9,08 CO
12,19
Tag %Abs 7,53App 13,64Art 2,44Article 21,70Atl 1,95B 16,45Bb 15,37Bdy 20,33Bib 14,84Bm 15,79Fig 20,25Fm 6,06Index-entry 0,00Ip1 10,11Item 10,16Lists (sum) 5,14P 9,51P2 10,84Ref 5,00Sec 15,90Ss1 14,01Ss2 10,45St 5,94
![Page 36: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/36.jpg)
Measuring effectiveness: Metrics
A research problem in itself!
Metrics inex_eval - official INEX metric until 2004 inex_eval_ng ERR (expected ratio of relevant units) XCG (XML cumulative gain) - official INEX metric 2005 t2i (tolerance to irrelevance) PRUM (Precision Recall with User Modelling) HiXEval …..
![Page 37: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/37.jpg)
What is the problem? Relevance propagates up!
~26,000 relevant elements on ~14,000 relevant paths
Propagated assessments: ~45% Increase in size of relevant elements: ~182%
![Page 38: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/38.jpg)
Precision-Recall-based metric and Overlap
Simulated runs
![Page 39: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/39.jpg)
Overlap in results
Rank Systems (runs) Avg Prec % Overlap1. IBM Haifa Research Lab(CO-0.5-LAREFIENMENT) 0.1437 80.892. IBM Haifa Research Lab(CO-0.5) 0.1340 81.463. University of Waterloo(Waterloo-Baseline) 0.1267 76.324. University of Amsterdam(UAms-CO-T-FBack) 0.1174 81.855. University of Waterloo(Waterloo-Expanded) 0.1173 75.626. Queensland University of Technology(CO_PS_Stop50K) 0.1073 75.897. Queensland University of Technology(CO_PS_099_049) 0.1072 76.818. IBM Haifa Research Lab(CO-0.5-Clustering) 0.1043 81.109. University of Amsterdam(UAms-CO-T) 0.1030 71.9610. LIP6(simple) 0.0921 64.29
Official INEX 2004 Results for CO topics (1500 retrieved elements)
![Page 40: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/40.jpg)
Final words
Challenging research issues in XML retrieval are not ‘just’ about the effective retrieval of XML documents, but also about what and how to evaluate!
INEX 2006 document collection— Wikipedia (XML English) document collection full-texts, marked up in XML, of about
1,900,000 articles — 228,546 categories, totaling +100 Gigabytes (10 Gigabytes without pictures)— 3000 different tags, article has in average 500 XML nodes, average depth 5.
Additional tracks in 2006— interactive, heterogeneous collection, document mining, relevance feedback,
natural language query processing, multimedia, XML entity search
![Page 41: XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London](https://reader035.vdocuments.us/reader035/viewer/2022062500/56649e4a5503460f94b3dfb9/html5/thumbnails/41.jpg)
GraciasGracias