tijah @ inex 2003

21
TIJAH @ INEX 2003 The Cirquid Team CWI and University of Twente

Upload: rian

Post on 29-Jan-2016

43 views

Category:

Documents


1 download

DESCRIPTION

TIJAH @ INEX 2003. The Cirquid Team CWI and University of Twente. Overview. Introduction Content-Only (CO) (Pattern-Based) Structured Querying Conclusions and Future Work Questions/Discussion. Content-Only (CO). Same model as for INEX 2002 Exhaustivity (content-based relevance) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: TIJAH @ INEX 2003

TIJAH @ INEX 2003

The Cirquid Team

CWI and University of Twente

Page 2: TIJAH @ INEX 2003

Overview

Introduction Content-Only (CO) (Pattern-Based) Structured Querying Conclusions and Future Work Questions/Discussion

Page 3: TIJAH @ INEX 2003

Content-Only (CO)

Same model as for INEX 2002

Exhaustivity (content-based relevance)– Statistical Language Model [Hiemstra

2000]

Specificity– Log-normal distribution– Component size, mean at ~2500 words

Page 4: TIJAH @ INEX 2003

Structured Querying (SCAS/VCAS)

Pattern-Based Structured Querying

Collection of 3 patterns– Base pattern for determining a single

subtree (pattern 1)– More complex combinations of pattern 1

instances (patterns 2 and 3)

Page 5: TIJAH @ INEX 2003

Use of ALL topic’s keywords to process EACH OF the about clauses

//article[about(./,’IR’) AND about(.//sec,’XML’)]

About Function

Page 6: TIJAH @ INEX 2003

Use of ALL topic’s keywords to process EACH OF the about clauses

//article[about(./,’IR’) AND about(.//sec,’XML’)]

//article[about(./,’IR XML’) AND

about(.//sec,’IR XML’)]

About Function

Page 7: TIJAH @ INEX 2003

Pattern 1

Simplest pattern instance Topic 69

VCAS and SCAS

/article/bdy/sec[about(.//st, ‘…’)]

Page 8: TIJAH @ INEX 2003

Pattern 1 – VCAS and SCAS

article

bdy…

sec… …

Nodeset selections

Containment

Relevance

Containmentst

Page 9: TIJAH @ INEX 2003

Average ranked containing

Previous operation is ranked containing

Multiple subtrees within the target element are averaged

sec

st st0.2 0.1

Page 10: TIJAH @ INEX 2003

Average ranked containing

Previous operation is ranked containing

Multiple subtrees within the target element are averaged

sec

st st0.2 0.1

sec 0.15

Page 11: TIJAH @ INEX 2003

Pattern 2 Topic 73

VCAS– Absence of subtree does not render target

irrelevant completely SCAS

– All subtrees specified need to be present for relevance

//article[about(.//st, ‘…’) AND about(.//bib, ‘…’)]

Page 12: TIJAH @ INEX 2003

Pattern 2 – VCAS

st

article

……

bib

Split up into set of pattern 1 instances

Combine resultsets– OR -> max– AND -> min (non zero)

Page 13: TIJAH @ INEX 2003

Pattern 2 – SCAS

article

……

st

bib

Split up into set of pattern 1 instances

Combine resultsets– AND -> min– OR -> max

Page 14: TIJAH @ INEX 2003

Pattern 2 - Example

//article[about(.//st, ‘+comparison’) AND about(.//bib, ‘machine learning’)]

//article[about(.//st, ‘comparison machine learning’)]//article[about(.//bib, ‘comparison machine learning’)]

1.- Execution of 2 pattern 1

2.- Combining resultsArt 1 0.2

Art 2 0

Art 1 0.1

Art 2 0.3

AND

Art 1 0.1

Art 2 0.3

Art 1 0.1

Art 2 0

VCAS

SCAS

Page 15: TIJAH @ INEX 2003

Pattern 3

Topic 64 CAS//article[about(., ‘…’)]//sec[about(., ‘…’)]

VCAS– What does the first about mean?– Drop all about-calls, except those specified for target element

SCAS– Split up into set of pattern 1 instances– Topdown structural correlation to correct nodeset

Page 16: TIJAH @ INEX 2003

Pattern 3 – VCAS

article

……

sec (about 2)

(about 1)

//article//sec[about(., ‘…’)]

Page 17: TIJAH @ INEX 2003

Pattern 3 – SCAS

article

……

sec

1.- about 1

2.- about 2

3.- containment

Ranked by the scores of the target element

Page 18: TIJAH @ INEX 2003

Pattern 3 - Example

//article[about(./, ‘hollerith’)]//sec[about(., ‘DEHOMAG’)]

//article [about(./, ‘hollerith DEHOMAG’)]//article//sec [about(./, ‘hollerith DEHOMAG’)]

1.- Execution of 1 or 2 pattern 1

2.- Ranked containingArt 1 0.2

Art 2 0

sec 1 0.1

sec 2 0.3

sec 1 0.1

sec 2 0.3VCAS

SCAS sec1 0.1

In case sec 1 belongs to art 1 and sec 2 do not

Only second about

Page 19: TIJAH @ INEX 2003

Physical Query Plan - Pattern 1

/article/bdy/sec[about(.//st, ‘…’)]

/article/bdy/sec//st

W Q

about

/article/bdy/sec

avg-groupby

Page 20: TIJAH @ INEX 2003

Conclusions

CO model works pretty well– Article run still

‘Keep it simple’ approach

Page 21: TIJAH @ INEX 2003

Any questions?