wired week 3 syllabus update (next week) readings overview - quick review of last week’s ir models...

WIRED Week 3WIRED Week 3• Syllabus Update (next week)

• Readings Overview- Quick Review of Last Week’s IR Models (if time)- Evaluating IR Systems- Understanding Queries

• Assignment Overview & Scheduling- Leading WIRED Topic Discussions- Web Information Retrieval System Evaluation &

Presentation

• Projects and/or Papers Discussion- Initial Ideas- Evaluation- Revise & Present

Evaluating IR SystemsEvaluating IR Systems

• Recall and Precision• Alternative Measures• Reference Collections- What- Why

• Trends

Why Evaluate IR Systems?Why Evaluate IR Systems?

• Leave it to the developers?- No bugs- Fully functional

• Let the market (users) decide?- Speed- (Perceived) accuracy

• Relevance is relevant- Different types of searches, data and users

• “How precise is the answer set?” p 73

Retrieval Performance EvaluationRetrieval Performance Evaluation

• Task- Batch or Interactive- Each needs a specific interface

• Setting• Context- New search- Monitoring

• Usability- Lab tests- Real world (search log) analysis

Recall and PrecisionRecall and Precision

• Basic evaluation measurement for IR system performance

• Recall: the fraction of relevant documents retrieved- 100% is perfect recall- Every document that is relevant is found

• Precision: the fraction of retrieved documents which are relevant- 100% relevancy is perfect precision- How good the recall is

Recall and Precisions goalsRecall and Precisions goals

• Everything is found (recall)• The right set of documents is pulled from the

found set (precision)

• What about ranking?- Ranking is an absolute measure of relevance for

the query.- Ranking is Ordinal in almost all cases

Recall and Precision ConsideredRecall and Precision Considered

• 100 documents have been analyzed• 10 documents relevant to the query in the set- 4 documents are found and all are relevant• ??% recall, ??% precision

- 8 documents are found, but 4 are relevant• ??% recall, ??% precision

• Which is more important?

Recall and Precision Appropriate?Recall and Precision Appropriate?

• Disagreements over perfect sets• User errors in using results• Redundancy of results- Result diversity- Metadata

• Dynamic data- Indexable- Recency of information may be key

• A single measure is better- Combinatory- User evaluation

Back to the UserBack to the User

• User evaluation• Is one answer good enough? Rankings• Satisficing

• Studies of Relevance are key

Other Evaluation MeasuresOther Evaluation Measures

• Harmonic Mean- Single, combined measure- Between 0 (none) & 1 (all)- Only high when both P & R are high- Still a percentage

• E measure- User determines (parameter) value of R & P- Different tasks (legal, academic)- An interactive search?

Coverage and NoveltyCoverage and Novelty

• System effects- Relative recall- Relative effort

• sMore natural, user understandable measure• User knows some % documents are relevant• Coverage = % documents user expects• Novelty = % of documents user didn’t know of- Content of document- Document itself- Author of document- Purpose of document

Reference CollectionsReference Collections

• Testbeds for IR evaluation• TREC (Text Retrieval Conference) set- Industry focus- Topic-based or General- Summary tables for tasks (queries)- R & P averages- Document analysis- Measures for each topic

• CACM (general CS)• ISI (academic, indexed, industrial)

Trends in IR EvaluationTrends in IR Evaluation

• Personalization• Dynamic Data• Multimedia• User Modeling• Machine Learning (CPU/$)

Understanding QueriesUnderstanding Queries

• Types of Queries:- Keyword- Context- Boolean- Natural Language

• Pattern Matching- More like this…- Metadata

• Structural Environments

BooleanBoolean

• AND, OR, NOT• Combination or individually• Decision tree parsing for the system• Not so easy for the user when advanced

queries• Hard to backtrack and see differences in

results

KeywordKeyword

• Single word (most common)- Sets- “Phrases”

• Context- “Phrases”- Near (# value in characters, words, documents

links)

Natural LanguageNatural Language

• Asking• Quoting• Fuzzy matches• Different evaluation methods might be needed• Dynamic data “indexing” problematic• Multimedia challenges

Pattern MatchingPattern Matching

• Words• Prefixes “comput*”• Suffixes “*ology”• Substrings “*exas*”• Ranges “four ?? years ago”• Regular Expressions (GREP)• Error threshold• User errors

Query ProtocolsQuery Protocols

• HTTP• Z39.50- Client – Server API

• WAIS- Information/ database connection

• ODBC• JDBC• P2P

Assignment Overview & SchedulingAssignment Overview & Scheduling

• Leading WIRED Topic Discussions- # in class = # of weeks left?

• Web Information Retrieval System Evaluation & Presentation- 5 page written evaluation of a Web IR System- technology overview (how it works)- a brief history of the development of this type of

system (why it works better)- intended uses for the system (who, when, why)- (your) examples or case studies of the system in

use and its overall effectiveness

• How can (Web) IR be better?- Better IR models- Better User Interfaces

• More to find vs. easier to find

• Scriptable applications• New interfaces for applications• New datasets for applications

Projects and/or Papers OverviewProjects and/or Papers Overview

Project Idea #1 – simple HTMLProject Idea #1 – simple HTML

• Graphical Google

• What kind of document?

• When was the document created?

Project IdeasProject Ideas

• Google History: keeps track of what I’ve seen and not seen

• Searching when it counts: Financial and Health information requires guided, quality search

wired week 3 syllabus update (next week) readings overview - quick review of last week’s ir models...

Documents