Download - TIJAH @ INEX 2003
![Page 1: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/1.jpg)
TIJAH @ INEX 2003
The Cirquid Team
CWI and University of Twente
![Page 2: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/2.jpg)
Overview
Introduction Content-Only (CO) (Pattern-Based) Structured Querying Conclusions and Future Work Questions/Discussion
![Page 3: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/3.jpg)
Content-Only (CO)
Same model as for INEX 2002
Exhaustivity (content-based relevance)– Statistical Language Model [Hiemstra
2000]
Specificity– Log-normal distribution– Component size, mean at ~2500 words
![Page 4: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/4.jpg)
Structured Querying (SCAS/VCAS)
Pattern-Based Structured Querying
Collection of 3 patterns– Base pattern for determining a single
subtree (pattern 1)– More complex combinations of pattern 1
instances (patterns 2 and 3)
![Page 5: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/5.jpg)
Use of ALL topic’s keywords to process EACH OF the about clauses
//article[about(./,’IR’) AND about(.//sec,’XML’)]
About Function
![Page 6: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/6.jpg)
Use of ALL topic’s keywords to process EACH OF the about clauses
//article[about(./,’IR’) AND about(.//sec,’XML’)]
//article[about(./,’IR XML’) AND
about(.//sec,’IR XML’)]
About Function
![Page 7: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/7.jpg)
Pattern 1
Simplest pattern instance Topic 69
VCAS and SCAS
/article/bdy/sec[about(.//st, ‘…’)]
![Page 8: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/8.jpg)
Pattern 1 – VCAS and SCAS
article
bdy…
sec… …
Nodeset selections
Containment
Relevance
Containmentst
![Page 9: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/9.jpg)
Average ranked containing
Previous operation is ranked containing
Multiple subtrees within the target element are averaged
sec
st st0.2 0.1
![Page 10: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/10.jpg)
Average ranked containing
Previous operation is ranked containing
Multiple subtrees within the target element are averaged
sec
st st0.2 0.1
sec 0.15
![Page 11: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/11.jpg)
Pattern 2 Topic 73
VCAS– Absence of subtree does not render target
irrelevant completely SCAS
– All subtrees specified need to be present for relevance
//article[about(.//st, ‘…’) AND about(.//bib, ‘…’)]
![Page 12: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/12.jpg)
Pattern 2 – VCAS
st
article
……
bib
Split up into set of pattern 1 instances
Combine resultsets– OR -> max– AND -> min (non zero)
![Page 13: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/13.jpg)
Pattern 2 – SCAS
article
……
st
bib
Split up into set of pattern 1 instances
Combine resultsets– AND -> min– OR -> max
![Page 14: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/14.jpg)
Pattern 2 - Example
//article[about(.//st, ‘+comparison’) AND about(.//bib, ‘machine learning’)]
//article[about(.//st, ‘comparison machine learning’)]//article[about(.//bib, ‘comparison machine learning’)]
1.- Execution of 2 pattern 1
2.- Combining resultsArt 1 0.2
Art 2 0
Art 1 0.1
Art 2 0.3
AND
Art 1 0.1
Art 2 0.3
Art 1 0.1
Art 2 0
VCAS
SCAS
![Page 15: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/15.jpg)
Pattern 3
Topic 64 CAS//article[about(., ‘…’)]//sec[about(., ‘…’)]
VCAS– What does the first about mean?– Drop all about-calls, except those specified for target element
SCAS– Split up into set of pattern 1 instances– Topdown structural correlation to correct nodeset
![Page 16: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/16.jpg)
Pattern 3 – VCAS
article
……
sec (about 2)
(about 1)
//article//sec[about(., ‘…’)]
![Page 17: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/17.jpg)
Pattern 3 – SCAS
article
……
sec
1.- about 1
2.- about 2
3.- containment
Ranked by the scores of the target element
![Page 18: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/18.jpg)
Pattern 3 - Example
//article[about(./, ‘hollerith’)]//sec[about(., ‘DEHOMAG’)]
//article [about(./, ‘hollerith DEHOMAG’)]//article//sec [about(./, ‘hollerith DEHOMAG’)]
1.- Execution of 1 or 2 pattern 1
2.- Ranked containingArt 1 0.2
Art 2 0
sec 1 0.1
sec 2 0.3
sec 1 0.1
sec 2 0.3VCAS
SCAS sec1 0.1
In case sec 1 belongs to art 1 and sec 2 do not
Only second about
![Page 19: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/19.jpg)
Physical Query Plan - Pattern 1
/article/bdy/sec[about(.//st, ‘…’)]
/article/bdy/sec//st
W Q
about
/article/bdy/sec
avg-groupby
![Page 20: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/20.jpg)
Conclusions
CO model works pretty well– Article run still
‘Keep it simple’ approach
![Page 21: TIJAH @ INEX 2003](https://reader038.vdocuments.us/reader038/viewer/2022102911/56815735550346895dc4d437/html5/thumbnails/21.jpg)
Any questions?