wired week 3 syllabus update (next week) readings overview - quick review of last week’s ir models...
TRANSCRIPT
WIRED Week 3WIRED Week 3• Syllabus Update (next week)
• Readings Overview- Quick Review of Last Week’s IR Models (if time)- Evaluating IR Systems- Understanding Queries
• Assignment Overview & Scheduling- Leading WIRED Topic Discussions- Web Information Retrieval System Evaluation &
Presentation
• Projects and/or Papers Discussion- Initial Ideas- Evaluation- Revise & Present
Evaluating IR SystemsEvaluating IR Systems
• Recall and Precision• Alternative Measures• Reference Collections- What- Why
• Trends
Why Evaluate IR Systems?Why Evaluate IR Systems?
• Leave it to the developers?- No bugs- Fully functional
• Let the market (users) decide?- Speed- (Perceived) accuracy
• Relevance is relevant- Different types of searches, data and users
• “How precise is the answer set?” p 73
Retrieval Performance EvaluationRetrieval Performance Evaluation
• Task- Batch or Interactive- Each needs a specific interface
• Setting• Context- New search- Monitoring
• Usability- Lab tests- Real world (search log) analysis
Recall and PrecisionRecall and Precision
• Basic evaluation measurement for IR system performance
• Recall: the fraction of relevant documents retrieved- 100% is perfect recall- Every document that is relevant is found
• Precision: the fraction of retrieved documents which are relevant- 100% relevancy is perfect precision- How good the recall is
Recall and Precisions goalsRecall and Precisions goals
• Everything is found (recall)• The right set of documents is pulled from the
found set (precision)
• What about ranking?- Ranking is an absolute measure of relevance for
the query.- Ranking is Ordinal in almost all cases
Recall and Precision ConsideredRecall and Precision Considered
• 100 documents have been analyzed• 10 documents relevant to the query in the set- 4 documents are found and all are relevant• ??% recall, ??% precision
- 8 documents are found, but 4 are relevant• ??% recall, ??% precision
• Which is more important?
Recall and Precision Appropriate?Recall and Precision Appropriate?
• Disagreements over perfect sets• User errors in using results• Redundancy of results- Result diversity- Metadata
• Dynamic data- Indexable- Recency of information may be key
• A single measure is better- Combinatory- User evaluation
Back to the UserBack to the User
• User evaluation• Is one answer good enough? Rankings• Satisficing
• Studies of Relevance are key
Other Evaluation MeasuresOther Evaluation Measures
• Harmonic Mean- Single, combined measure- Between 0 (none) & 1 (all)- Only high when both P & R are high- Still a percentage
• E measure- User determines (parameter) value of R & P- Different tasks (legal, academic)- An interactive search?
Coverage and NoveltyCoverage and Novelty
• System effects- Relative recall- Relative effort
• sMore natural, user understandable measure• User knows some % documents are relevant• Coverage = % documents user expects• Novelty = % of documents user didn’t know of- Content of document- Document itself- Author of document- Purpose of document
Reference CollectionsReference Collections
• Testbeds for IR evaluation• TREC (Text Retrieval Conference) set- Industry focus- Topic-based or General- Summary tables for tasks (queries)- R & P averages- Document analysis- Measures for each topic
• CACM (general CS)• ISI (academic, indexed, industrial)
Trends in IR EvaluationTrends in IR Evaluation
• Personalization• Dynamic Data• Multimedia• User Modeling• Machine Learning (CPU/$)
Understanding QueriesUnderstanding Queries
• Types of Queries:- Keyword- Context- Boolean- Natural Language
• Pattern Matching- More like this…- Metadata
• Structural Environments
BooleanBoolean
• AND, OR, NOT• Combination or individually• Decision tree parsing for the system• Not so easy for the user when advanced
queries• Hard to backtrack and see differences in
results
KeywordKeyword
• Single word (most common)- Sets- “Phrases”
• Context- “Phrases”- Near (# value in characters, words, documents
links)
Natural LanguageNatural Language
• Asking• Quoting• Fuzzy matches• Different evaluation methods might be needed• Dynamic data “indexing” problematic• Multimedia challenges
Pattern MatchingPattern Matching
• Words• Prefixes “comput*”• Suffixes “*ology”• Substrings “*exas*”• Ranges “four ?? years ago”• Regular Expressions (GREP)• Error threshold• User errors
Query ProtocolsQuery Protocols
• HTTP• Z39.50- Client – Server API
• WAIS- Information/ database connection
• ODBC• JDBC• P2P
Assignment Overview & SchedulingAssignment Overview & Scheduling
• Leading WIRED Topic Discussions- # in class = # of weeks left?
• Web Information Retrieval System Evaluation & Presentation- 5 page written evaluation of a Web IR System- technology overview (how it works)- a brief history of the development of this type of
system (why it works better)- intended uses for the system (who, when, why)- (your) examples or case studies of the system in
use and its overall effectiveness
• How can (Web) IR be better?- Better IR models- Better User Interfaces
• More to find vs. easier to find
• Scriptable applications• New interfaces for applications• New datasets for applications
Projects and/or Papers OverviewProjects and/or Papers Overview
Project Idea #1 – simple HTMLProject Idea #1 – simple HTML
• Graphical Google
• What kind of document?
• When was the document created?
Project IdeasProject Ideas
• Google History: keeps track of what I’ve seen and not seen
• Searching when it counts: Financial and Health information requires guided, quality search