tijah @ inex 2003
DESCRIPTION
TIJAH @ INEX 2003. The Cirquid Team CWI and University of Twente. Overview. Introduction Content-Only (CO) (Pattern-Based) Structured Querying Conclusions and Future Work Questions/Discussion. Content-Only (CO). Same model as for INEX 2002 Exhaustivity (content-based relevance) - PowerPoint PPT PresentationTRANSCRIPT
TIJAH @ INEX 2003
The Cirquid Team
CWI and University of Twente
Overview
Introduction Content-Only (CO) (Pattern-Based) Structured Querying Conclusions and Future Work Questions/Discussion
Content-Only (CO)
Same model as for INEX 2002
Exhaustivity (content-based relevance)– Statistical Language Model [Hiemstra
2000]
Specificity– Log-normal distribution– Component size, mean at ~2500 words
Structured Querying (SCAS/VCAS)
Pattern-Based Structured Querying
Collection of 3 patterns– Base pattern for determining a single
subtree (pattern 1)– More complex combinations of pattern 1
instances (patterns 2 and 3)
Use of ALL topic’s keywords to process EACH OF the about clauses
//article[about(./,’IR’) AND about(.//sec,’XML’)]
About Function
Use of ALL topic’s keywords to process EACH OF the about clauses
//article[about(./,’IR’) AND about(.//sec,’XML’)]
//article[about(./,’IR XML’) AND
about(.//sec,’IR XML’)]
About Function
Pattern 1
Simplest pattern instance Topic 69
VCAS and SCAS
/article/bdy/sec[about(.//st, ‘…’)]
Pattern 1 – VCAS and SCAS
article
bdy…
sec… …
Nodeset selections
Containment
Relevance
Containmentst
Average ranked containing
Previous operation is ranked containing
Multiple subtrees within the target element are averaged
sec
st st0.2 0.1
Average ranked containing
Previous operation is ranked containing
Multiple subtrees within the target element are averaged
sec
st st0.2 0.1
sec 0.15
Pattern 2 Topic 73
VCAS– Absence of subtree does not render target
irrelevant completely SCAS
– All subtrees specified need to be present for relevance
//article[about(.//st, ‘…’) AND about(.//bib, ‘…’)]
Pattern 2 – VCAS
st
article
……
bib
Split up into set of pattern 1 instances
Combine resultsets– OR -> max– AND -> min (non zero)
Pattern 2 – SCAS
article
……
st
bib
Split up into set of pattern 1 instances
Combine resultsets– AND -> min– OR -> max
Pattern 2 - Example
//article[about(.//st, ‘+comparison’) AND about(.//bib, ‘machine learning’)]
//article[about(.//st, ‘comparison machine learning’)]//article[about(.//bib, ‘comparison machine learning’)]
1.- Execution of 2 pattern 1
2.- Combining resultsArt 1 0.2
Art 2 0
Art 1 0.1
Art 2 0.3
AND
Art 1 0.1
Art 2 0.3
Art 1 0.1
Art 2 0
VCAS
SCAS
Pattern 3
Topic 64 CAS//article[about(., ‘…’)]//sec[about(., ‘…’)]
VCAS– What does the first about mean?– Drop all about-calls, except those specified for target element
SCAS– Split up into set of pattern 1 instances– Topdown structural correlation to correct nodeset
Pattern 3 – VCAS
article
……
sec (about 2)
(about 1)
//article//sec[about(., ‘…’)]
Pattern 3 – SCAS
article
……
sec
1.- about 1
2.- about 2
3.- containment
Ranked by the scores of the target element
Pattern 3 - Example
//article[about(./, ‘hollerith’)]//sec[about(., ‘DEHOMAG’)]
//article [about(./, ‘hollerith DEHOMAG’)]//article//sec [about(./, ‘hollerith DEHOMAG’)]
1.- Execution of 1 or 2 pattern 1
2.- Ranked containingArt 1 0.2
Art 2 0
sec 1 0.1
sec 2 0.3
sec 1 0.1
sec 2 0.3VCAS
SCAS sec1 0.1
In case sec 1 belongs to art 1 and sec 2 do not
Only second about
Physical Query Plan - Pattern 1
/article/bdy/sec[about(.//st, ‘…’)]
/article/bdy/sec//st
W Q
about
/article/bdy/sec
avg-groupby
Conclusions
CO model works pretty well– Article run still
‘Keep it simple’ approach
Any questions?