![Page 1: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/1.jpg)
purl.org/net/retrieval 1
A Theory of Retrieval Using Structured Vocabularies
(SKOS: Preparation for Standardization)
Alistair MilesCCLRC Rutherford Appleton Laboratory
NKOS Workshop, September 2006, Alicante
![Page 2: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/2.jpg)
purl.org/net/retrieval 2
What Am I Presenting?● A formal theory of retrieval using structured
vocabularies.● The main body of my masters dissertation,
which is entitled “Retrieval and the Semantic Web”.
● N.B. This presentation is intended to give an overview, for the full text go to ...
purl.org/net/retrieval
![Page 3: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/3.jpg)
purl.org/net/retrieval 3
Why?● How do you maximize the utility and
minimize the cost of vocabulary control ... ? ● Support standardization initiatives ...
– SKOS to W3C Recommendation,– BS 8723 parts 3, 4 and 5.
● Check our working assumptions!
● See also “SKOS: Requirements for Standardization” to be presented at DC 2006.
![Page 4: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/4.jpg)
purl.org/net/retrieval 4
How?● Use a formal notation (“Z”) to express
underlying ideas with mathematical precision.● Support formal specification with explanatory
prose.
● N.B. This presentation is strictly informal!
![Page 5: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/5.jpg)
purl.org/net/retrieval 5
Overview of the Theory● Foundations (Chapter 3)● Composite Queries (Chapter 4)● Limited Cost Expansion (Chapter 5)● Coordination (Chapter 6)● Translation (Chapter 7)
![Page 6: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/6.jpg)
purl.org/net/retrieval 6
General Scenario (1)
Controlled Structured Vocabulary
Collection
Index
Query
Results
evaluation
![Page 7: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/7.jpg)
purl.org/net/retrieval 7
Overview of the Theory● Foundations (Chapter 3)● Composite Queries (Chapter 4)● Limited Cost Expansion (Chapter 5)● Coordination (Chapter 6)● Translation (Chapter 7)
![Page 8: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/8.jpg)
purl.org/net/retrieval 8
General Scenario (2)
Vocabulary A
Index
Query
???
Vocabulary B
![Page 9: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/9.jpg)
purl.org/net/retrieval 9
Lightning Tour (1) – Foundations● Structured vocabulary.● Index.● Atomic query.● Direct evaluation (of atomic queries).● Naïve expansion (of an index).
![Page 10: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/10.jpg)
purl.org/net/retrieval 10
Lightning Tour (2) – Composite Queries
● Query expressions ...– “and”, “or”, “not”, “required-optional-prohibited”.
● Composition and decomposition of expressions.● Direct evaluation (composite queries).● Naïve expansion (of composite queries).● Scoring and ranking of results.
![Page 11: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/11.jpg)
purl.org/net/retrieval 11
Lightning Tour (3) – Limited Cost Expansion
● Beyond naïve expansion.● Approximating numerical “relevance cost” of
expansion.● Limited cost expansion (of an index or query).● Expansion weight and result scoring.
![Page 12: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/12.jpg)
purl.org/net/retrieval 12
Lightning Tour (4) - Coordination● Using vocabulary units in combination.● Ordered and unordered coordination.● Coordinated indexes and queries.● Naïve expansion (of a coordinated index or
query).● Limited cost expansion (of a coordinated index
or query).
![Page 13: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/13.jpg)
purl.org/net/retrieval 13
Lightning Tour (5) – Translation● Structural mapping.● Query expression mapping.● Naïve translation using a structural mapping.● Naïve translation using a query expression
mapping.● Limited cost translation using a structural
mapping.
![Page 14: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/14.jpg)
purl.org/net/retrieval 14
Caveats● Much of the prose was written in haste!● I'm no mathematician or logician!● My review of the literature is woefully
incomplete!● The chapter on RDF representations (chapter
8) is rather incomplete and at best only suggestive!
● Use cases need further development.
![Page 15: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/15.jpg)
purl.org/net/retrieval 15
A Theory of Retrieval Using Structured Vocabularies
Foundations
![Page 16: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/16.jpg)
purl.org/net/retrieval 16
Foundations – The Conceptual Basis of Controlled Vocabularies (1)● The fundamental purpose of a controlled
vocabulary is to establish a set of distinct meanings or “concepts” and to provide a means of referring unambiguously to those concepts.
![Page 17: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/17.jpg)
purl.org/net/retrieval 17
Foundations – The Conceptual Basis of Controlled Vocabularies (2)● I have modelled this means of reference as a
set of “names”, which I have called “concept names”.
● A controlled vocabulary provides a set of “concept names” which constitutes an artificial language for use in constructing an “index”. (I.e. a controlled indexing language.)
![Page 18: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/18.jpg)
purl.org/net/retrieval 18
Foundations – Structure Relations (1)
● A controlled vocabulary may provide one or more binary relations on the set of concept names, which I refer to as “structure relations”.
● The structure relations of a controlled vocabulary together constitute the “structure graph”.
![Page 19: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/19.jpg)
purl.org/net/retrieval 19
Foundations – Structure Relations (2)
● The theory considers only vocabularies that provide three structure relations, which I have called “broader”, “narrower” and “associated”.
● N.B. No attempt is made to define “broader”, “narrower” or “associated”!
● Their meaning is defined entirely in terms of operational assumptions that may be used to derive retrieval operations.
![Page 20: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/20.jpg)
purl.org/net/retrieval 20
Foundations – A Structure Graph
B
N
A
A
CNAME
![Page 21: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/21.jpg)
purl.org/net/retrieval 21
Foundations – The Structure of an Index
● An “index” consists of one or more “fields”.● A “field” is a binary relation between “document
names” and “concept names”.● (N.B. I use “document” to refer to any object we
are interested in retrieving.)● An index also provides a name for each field,
so we can target particular fields in a query.
![Page 22: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/22.jpg)
purl.org/net/retrieval 22
Foundations – A Field
DNAMECNAME
![Page 23: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/23.jpg)
purl.org/net/retrieval 23
Foundations – Types of Index● An index can have single or multiple fields.● A field can be functional or relational.
![Page 24: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/24.jpg)
purl.org/net/retrieval 24
Foundations – Atomic Queries● An “atomic query expression” comprises a
single field name and a single concept name.
![Page 25: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/25.jpg)
purl.org/net/retrieval 25
Foundations – Direct Evaluation of Atomic Queries
DNAMECNAME
QUERY
![Page 26: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/26.jpg)
purl.org/net/retrieval 26
Foundations – Naïve Assumption of Ideal Indexing● All documents indexed with a given concept
name in a given field are relevant to an atomic query for that concept name in that field.
![Page 27: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/27.jpg)
purl.org/net/retrieval 27
Foundations – Naïve Assumption of Ideal Indexing
DNAMECNAME
QUERY
All Relevant
![Page 28: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/28.jpg)
purl.org/net/retrieval 28
Foundations – Naïve Assumption of Broadening Relevance
DNAMECNAME
QUERY
All Relevant
B
All Relevant
![Page 29: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/29.jpg)
purl.org/net/retrieval 29
Foundations – Naïve Expansion of a Field
DNAMECNAME
B
![Page 30: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/30.jpg)
purl.org/net/retrieval 30
Foundations – Naïve Expansion● By including documents in a result set that are
also relevant to the query, recall is increased at no cost to precision.
![Page 31: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/31.jpg)
purl.org/net/retrieval 31
Foundations – Key Ideas● Assumptions ...
– Naïve assumption of ideal indexing.– Naïve assumption of broadening relevance.
● Operational definition for “broader/narrower”.● Naïve expansion of an index to improve recall.
● N.B. This framework probably sufficient to cover the majority of applications!
![Page 32: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/32.jpg)
purl.org/net/retrieval 32
A Theory of Retrieval Using Structured Vocabularies
Composite Queries
![Page 33: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/33.jpg)
purl.org/net/retrieval 33
Composite Queries – Query Expressions (1)
● Composite query expression – has one or more “component” (or “child”) query expressions.
● Four types of composite query expression ...– and– or– not– rop (“required-optional-prohibited”)
![Page 34: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/34.jpg)
purl.org/net/retrieval 34
Composite Queries – Composition● Child of a composite query expression can be
an atomic expression or another composite expression.
● I.e. Expressions can be arbitrarily nested.
![Page 35: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/35.jpg)
purl.org/net/retrieval 35
Composite Queries – Direct Evaluation
● Results of “and” expression ... set intersection of results of child expressions.
● Results of “or” expression ... set union of results of child expressions.
● Results of “not” expression ... set complement of results of child expression.
● Results of “rop” expression ... set intersection of results of “required” children minus set union of results of “prohibited” children ... N.B. “optional” children are truly optional.
![Page 36: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/36.jpg)
purl.org/net/retrieval 36
Composite Queries – Direct Evaluation
QUERY
and or not
![Page 37: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/37.jpg)
purl.org/net/retrieval 37
Composite Queries - Decomposition● Decompose arbitrarily nested composite query
into “positive” and “negative” atoms.
![Page 38: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/38.jpg)
purl.org/net/retrieval 38
Composite Queries – Scoring Results
● Two metrics for scoring results of composite queries ...– Unweighted scoring (number of positive atoms
matching the document).– IDF weighted scoring (take into account inverse
document frequency of concept names in the index – greater weight to more “discriminating” atoms).
● Use scores to rank results (we assume in order of greatest relevance).
![Page 39: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/39.jpg)
purl.org/net/retrieval 39
Composite Queries – Naïve Query/Index Expansion
Controlled Structured Vocabulary
Index
Results
evaluation
Query
Index
evaluation
Queryexpand
expand
![Page 40: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/40.jpg)
purl.org/net/retrieval 40
Composite Queries – Naïve Query Expansion
● Expand arbitrarily nested query expressions.● Mathematically equivalent to naïve index
expansion (but not computationally equivalent).
![Page 41: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/41.jpg)
purl.org/net/retrieval 41
A Theory of Retrieval Using Structured Vocabularies
Limited Cost Expansion
![Page 42: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/42.jpg)
purl.org/net/retrieval 42
Limited Cost Expansion – Naïve Assumptions
● Likely to break down, especially for “deep” hierarchies (does not account for specificity).
● Does not take advantage of associative links.● Expansion cannot be “tuned”, no possibility for
dynamic functionality (“all or nothing”).● Structure is not utilised for ranking of expanded
result set.
![Page 43: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/43.jpg)
purl.org/net/retrieval 43
Limited Cost Expansion – Quantitative Assumptions
? ? ? ?
P(relevance)? = N
? = A
? = B
DNAMECNAME
QUERY
![Page 44: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/44.jpg)
purl.org/net/retrieval 44
Limited Cost Expansion – Relevance Cost
● Use a numerical function to model the accumulated “relevance cost” of expansion.
● Use a “cost limit” to provide a cut-off.● Invert the minimum cost value to obtain an
“expansion weight” between 0 and 1 (high weight suggests high probability of relevance).
● Factor expansion weight into result scoring and therefore ranking.
![Page 45: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/45.jpg)
purl.org/net/retrieval 45
Limited Cost Expansion – Query/Index Expansion
● Limited cost expansion of either query or index.
![Page 46: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/46.jpg)
purl.org/net/retrieval 46
A Theory of Retrieval Using Structured Vocabularies
Coordination
![Page 47: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/47.jpg)
purl.org/net/retrieval 47
Coordination – Ordered and Unordered (1)
● Coordination is the act of combining concept names.
● Ordered – order of coordination is significant to meaning.
● Unordered – order of coordination is not significant to meaning.
![Page 48: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/48.jpg)
purl.org/net/retrieval 48
Coordination – Ordered and Unordered (2)
![Page 49: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/49.jpg)
purl.org/net/retrieval 49
Coordination – A Coordinated Field
DNAMECNAME
![Page 50: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/50.jpg)
purl.org/net/retrieval 50
Coordination – A Coordinated Query
and
![Page 51: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/51.jpg)
purl.org/net/retrieval 51
Coordination - Decomposition
{a, b, c}
{a, b} {b, c} {a, c}
a b c
![Page 52: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/52.jpg)
purl.org/net/retrieval 52
Coordination – Structure Relations{a1, b1}
{a1, b2} {a2, b1}
{a1, b3} {a2, b2} {a3, b1}
{a2, b3} {a3, b2}
{a3, b3}
a1 b1
a2 b2
a3 b3
![Page 53: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/53.jpg)
purl.org/net/retrieval 53
Coordination - Expansion● Naïve expansion of coordinated queries or
indexes.● Limited cost expansion of coordinated queries
or indexes.
![Page 54: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/54.jpg)
purl.org/net/retrieval 54
A Theory of Retrieval Using Structured Vocabularies
Translation
![Page 55: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/55.jpg)
purl.org/net/retrieval 55
General Scenario (2)
Vocabulary A
Index
Query
???
Vocabulary B
![Page 56: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/56.jpg)
purl.org/net/retrieval 56
Translation
Index
Results
evaluation
Query
Index
evaluation
Querytranslate
translate
Vocabulary AVocabulary B
Mapping
![Page 57: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/57.jpg)
purl.org/net/retrieval 57
Translation - Goals● Automated translation.● Understand consequences for precision and
recall.● Minimise loss of precision and recall.
![Page 58: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/58.jpg)
purl.org/net/retrieval 58
Translation – Mapping● Structural mapping ...
– Use “broader”, “narrower”, “associated” and “equivalent” mapping relations.
● Query expression mapping ...– Use composite query expression as the target of
the mapping.
![Page 59: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/59.jpg)
purl.org/net/retrieval 59
Translation – Methods● Naïve translation.● Limited cost translation ...
– Translation weight.
● N.B. Limited cost translation is much less demanding on the completeness of the mapping!
![Page 60: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/60.jpg)
purl.org/net/retrieval 60
A Theory of Retrieval Using Structured Vocabularies
Next Steps ...
![Page 61: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/61.jpg)
purl.org/net/retrieval 61
Adaptation and Change● Use mappings to express change in
vocabularies.● Use translations to adapt indexes and/or
queries.
● N.B. Requires vocabulary management tools that capture change information at the point of change!
![Page 62: A Theory of Retrieval Using Structured Vocabularies](https://reader034.vdocuments.us/reader034/viewer/2022052501/628b578722aa8549305287d1/html5/thumbnails/62.jpg)
purl.org/net/retrieval 62
Summary● Pragmatic, operational approach to describing
the use of structured vocabularies for retrieval.● Formalise the underlying assumptions.● Support standardization, especially of
representations for index, vocabulary and mapping data.