relevance in information science

38
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License Tefko Saracevic 1 RELEVANCE in information science Tefko Saracevic, PhD [email protected] http://comminfo.rutgers.edu/~tefko/articles. htm

Upload: sigourney-wyatt

Post on 01-Jan-2016

35 views

Category:

Documents


1 download

DESCRIPTION

RELEVANCE in information science. Tefko Saracevic, PhD [email protected] http://comminfo.rutgers.edu/~tefko/articles.htm. Fundamental concepts. Relevance is a fundamental concept or notion in information science. - PowerPoint PPT Presentation

TRANSCRIPT

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States LicenseTefko Saracevic 1

RELEVANCE in information science

Tefko Saracevic, [email protected]://comminfo.rutgers.edu/~tefko/articles.htm

Tefko Saracevic

Fundamental concepts

Relevance is a fundamental concept or notion in information science

Every scholarly field has a fundamental, basic notion, concept, idea ... or a few

2

Tefko Saracevic

Two large questions*

Why? (Part I)

Why did relevance become a central notion of information science?

What? (Part II)

What did we learn about relevance through research in information science?

3

* URLs and references are in Notes – accessible after download

Tefko Saracevic

Relevance definitions

“1:a: relation to the matter at hand (emphasis added)

b: practical and especially social applicability : pertinence <giving relevance to college courses> 2:the ability (as of an information retrieval system) to retrieve material that satisfies the needs of the user.”

4

Tefko Saracevic

What is “matter at hand”? Context in relation to

which a question is asked an information need is

expressed as a query a problem is addressed

interaction is conducted

No such thing as relevance without a context

Axiom: One cannot not have a context in information interaction.

5

Relevance is ALWAYS contextual

Tefko Saracevic

Relevance – by any other name...

Many names connote relevance e.g.:

pertinent; useful; applicable; significant; germane; material; bearing; proper; related; important; fitting; suited; apropos; ... & nowadays even truthful

Connotations may differ but the concept is still relevance

"A rose by any other name would smell as sweet“ Shakespeare, Romeo and Juliet

6

Tefko Saracevic

Two worlds in information science IR systems offer as

answers their version of what may be relevant by ever improving algorithms

People go their way & asses relevance by their problem at hand,

context & criteria

The two worlds interact

7

Covered here: human world of relevanceNOT covered: how IR deals with relevance

Tefko Saracevic

WHY RELEVANCE?Part I

8

Tefko Saracevic

Bit of history

Vannevar Bush: Article “As we may think” 1945 Defined the problem as “... the massive task of

making more accessible of a bewildering store of knowledge.” problem still with us & growing

Suggested a solution, a machine: “Memex ... association of ideas ... duplicate mental processes artificially.”

Technological fix to problem

9

1890-1974

Tefko Saracevic

Information Retrieval (IR) – definition

10

Term “information retrieval” coined & defined by Calvin Mooers, 1951

“ IR: ... intellectual aspects of description of information, ... and its specification for search ... and systems, technique, or machines...[to provide information] useful to user”

1919-1994

Tefko Saracevic

Technological determinant

In IR emphasis was not only on organization but even more on searching technology was suitable for searching

in the beginning information organization was done by people & searching by machines

nowadays information organization mostly by machines (sometimes by humans as well) & searching almost exclusively by machines

11

Tefko Saracevic

Two important pioneers

at IBM pioneered many IR computer applications first to describe searching

using Venn diagrams

at Documentation Inc. pioneered coordinate indexing first to describe searching

as Boolean algebra

12

Mortimer Taube1910-1965Hans Peter Luhn 1896-1964

Tefko Saracevic

Searching & relevance

Searching became a key component of information retrieval extensive theoretical &

practical concern with searching

technology uniquely suitable for searching

And searching is about retrieval of relevant answers

13

Thus RELEVANCE emerged as a key notion

Tefko Saracevic

Aboutness in librarianship Key notion for bibliographic classifications,

subject headings, indexing languages used in organizing inf. records – goes back centuries

choice of a given classification code, subject heading, index term ... denotes what a document (or part) is all about

Searching is assumed but not addressed a given, taken for granted

14

Tefko Saracevic

A bit of history – assumptions related to

searching

IFLA 1998, 2009, defined FRBR (Functional Requirements for Bibliographic Records) “four generic user tasks ... in

relation to the elementary uses that are made of the data by the user: ...Find, Identify, Select, Obtain” essentially the same as Cutter’s

In “Rules for Dictionary Catalog” (1876, 1904) defined “Objects” – objectives of a catalog – “to enable a person to find...to show what a library has ... to assist in choice ...”

15

Charles Ammi Cutter1837-1903

Tefko Saracevic

Why relevance?

Aboutness A fundamental notion

related to organization of information

Relates to subject & in a broader sense to epistemology

Relevance A fundamental notion

related to searching for information

Relates to problem-at-hand and context & in a broader sense to pragmatism

16

Relevance emerged as a central notion in information science because of practical & theoretical concerns with searching

Tefko Saracevic

WHAT HAVE WE LEARNED ABOUT RELEVANCE?

Part II

17

Tefko Saracevic

Claims & counterclaims in IR

Historically from the outset: “My system is better than your system!”

Well, which one is it? A: Lets test it. But: what criterion to use? what measure(s) based on the criterion?

Things got settled by the end of 1950’s and remain mostly the same to this day

18

Tefko Saracevic

Relevance & IR testing

In 1955 Allen Kent & James W. Perry were first to propose two measures for test of IR systems: “relevance” later renamed

“precision” & “recall” A scientific & engineering

approach to testing19

Allen Kent1921 -

James W. Perry1907-1971

Tefko Saracevic

Relevance as criterion for measures

Precision Probability that what is

retrieved is relevant conversely: how much junk is

retrieved?

Recall Probability that what is

relevant in a file is retrieved conversely: how much relevant

stuff is missed?

20

Probability of agreement between what the system retrieved/not retrieved as relevant (systems relevance) & what the user assessed as relevant (user relevance)where user relevance is the gold standard for comparison

Tefko Saracevic

First test – law of unintended consequences Mid 1950’s test of two

competing systems: subject headings by Armed

Services Tech Inf Agency uniterms (keywords) by

Documentation Inc. 15,000 documents

indexed by each group, 98 questions searched

but relevance judged by each group separately

First group: 2,200 relevant Second: 1,998 relevant

but low agreement Then peace talks

but even after these talks agreement came to 30.9%

Test collapsed on relevance disagreements

21

Results:

Learned: Never, ever use more than a single judge per query.

Since then to this day IR tests don’t

Tefko Saracevic

Cranfield tests 1957-1967

Funded by NSF Controlled testing:

different indexing languages, same documents, same relevance judgment

Used traditional IR model – non-interactive

Many results, some surprising e.g. simple keywords “high

ranks on many counts”

Developed Cranfield methodology for testing

Still in use today incl. in

22

Cyril Cleverdon 1914-1997

TREC started in 1992, still strong in 2013

Tefko Saracevic

Tradeoff in recall vs. precision

Example from TREC:

23

Generally, there is a tradeoff: recall can be increased by

retrieving more but precision decreases

precision can be increased by being more specific but recall decreases

Some users want high precision others high recall

Cleverdon’s law

Tefko Saracevic

Relevance experiments

First experiments reported in 1960 & 61 by an IBM group compared effects on

relevance judgements of various representations

Over the years about 300 or so experiments

Little funding only two funded by a US

agency (1967) A variety of factors in human

judgments of relevance addressed

24

Tefko Saracevic

Assumptions in Cranfield methodology IR and thus relevance is

static (traditional IR model) Further: Relevance is:

topical binary independent stable consistent if pooling: complete

Inspired relevance experimentation on every one of these assumptions

Main finding:none of them holds

25

but these simplified assumptions enabled rich IR tests and many improvements

Tefko Saracevic

IR & relevance: static vs. dynamic

Q: Do relevance inferences & criteria change over time for the same user & task? A: They do For a given task, user’s inferences are dependent on

the stage of the task:Different stages = differing selections but different stages = similar criteria = different weightsIncreased focus = increased discrimination = more stringent relevance inferences

26

IR & relevance inferences are highly dynamic processes

Tefko Saracevic

Experimental results

TopicalTopicality: very important but not exclusive role.Cognitive, situational, affective variables: play a role e.g. user background (cognitive); task complexity (situational); intent, motivation (affective)

BinaryContinuum: Users judge not only binary (relevant – not relevant), but on a continuum & comparatively.Bi-modality: Seems that assessments have high peaks at end points of the range (not relevant, relevant) with smaller peaks in the middle range

IndependentOrder: in which documents are presented to users seems to have an effect. Near beginning: Seems that documents presented early have a higher probability of being inferred as relevant.

27

Tefko Saracevic

Experimental results (cont.)

28

StableTime: relevance judgments = not completely stable; change over time as tasks progress & learning advancesCriteria: for judging relevance are fairly stable

ConsistentExpertise: higher = higher agreement, less differences; lower = lower agreement, more leniency. Individual differences: the most prominent feature & factor in relevance inferences. Experts agree up to 80%; others around 30%Number of judges: More judges = less agreement

If pooling:Complete

(if only a sample of collection or a pool from several searches is evaluated)Additions: with more pools or increased sampling more relevant objects are found

Tefko Saracevic

Other experiments: Clues - on what basis & criteria users make relevance judgments?

29

Contenttopic, quality, depth, scope, currency, treatment, clarity

Objectcharacteristics of information objects, e.g., type, organization, representation, format, availability, accessibility, costs

Validityaccuracy of information provided, authority, trustworthiness of sources, verifiability

Tefko Saracevic

Matching - on what basis & criteria users make relevance judgments to match their context?

30

Use or situational match

appropriateness to situation, ortasks, usability, urgency; value in use

Cognitive match

understanding, novelty, mental effort

Affective match

emotional responses to information, fun, frustration, uncertainty

Belief match

personal credence given to information,confidence

Tefko Saracevic

Major general finding & conclusion from relevance experiments

31

Relevance is measurable

became part of general experimentation related to human information behavior

Tefko Saracevic

In conclusion

Information technology & systems will change dramatically even in the short run and in unforeseeable directions

But relevance is here to stay!

32

and relevance has many faces – some unusual

Tefko Saracevic

...... different technology...

33

Tefko Saracevic

and relevance in its use

34

Tefko Saracevic

Unusual [relevant] services: Library therapy dogs

35

U Michigan, Ann Arbor, Shapiro Library

Tefko Saracevic

Seed lending at public libraries

36

Tefko Saracevic 37

Thank you for inviting

me!

Tefko Saracevic

Presentation in Wordle

38