2004.12.09 - slide 1is 202 – fall 2004 lecture 29: final review prof. ray larson & prof. marc...
Post on 19-Dec-2015
218 views
TRANSCRIPT
2004.12.09 - SLIDE 1IS 202 – FALL 2004
Lecture 29: Final Review
Prof. Ray Larson & Prof. Marc Davis
UC Berkeley SIMS
Tuesday and Thursday 10:30 am - 12:00 pm
Fall 2004http://www.sims.berkeley.edu/academics/courses/is202/f04/
SIMS 202:
Information Organization
and Retrieval
2004.12.09 - SLIDE 2IS 202 – FALL 2004
Lecture Overview
• Final Exam
• Final Review
• Course Evaluations
• Phone Details
• Next Steps
2004.12.09 - SLIDE 3IS 202 – FALL 2004
Lecture Overview
• Final Exam
• Final Review
• Course Evaluations
• Phone Details
• Next Steps
2004.12.09 - SLIDE 4IS 202 – FALL 2004
Final Exam Details
• Date: December 14 Time: 9:30-12:30• The exam is open-book, open note AND open
computer• You may use your own laptop, or one of the
computers in the lab• The results will need to be printed• It can be handwritten if you wish, if so be sure to
bring pens, pencils, and erasers• It is essential that you have access to and/or
bring your final facetted classification so that you can analyze it and use it
2004.12.09 - SLIDE 5IS 202 – FALL 2004
Final Exam Details
• There will be 8 questions on the exam– Some questions have multiple parts
• One of the questions will be taken from the Discussion Questions you submitted in class
• Questions will be worth a specific number of points and these will be stated on the exam itself
• Partial credit will be awarded for partial answers, so we advise that you do not skip any questions
• In your answers, please balance conciseness with illustration of all of the requested information– In other words, don't write a lot of things that aren't
asked for, but try to address all of what is asked for
2004.12.09 - SLIDE 6IS 202 – FALL 2004
Final Exam Details
• The exam will be comprehensive, covering both the Information Organization and Retrieval parts of the course– The emphasis will be on the last half (Organization)
(about 70/30 bias towards the last half)
• Each person will work individually• The exam period is three hours
– You will likely need the entire time
• If you use network-accessed material for any part of the exam be sure to cite your sources
2004.12.09 - SLIDE 7IS 202 – FALL 2004
Study Guide
• Be sure you understand the material that was covered in lectures and have read and understood the assigned readings
• Be sure you can do activities similar to what was done in the homework assignments
2004.12.09 - SLIDE 8IS 202 – FALL 2004
Study Guide
• We will have questions that require you to generalize from what you've learned and synthesize ideas– So be sure you have thought about the ideas
covered in lectures, readings, and homework assignments
• These ideas and abilities should be at your fingertips– There won't be time during the exam to do a lot
of catch-up reading on topics you haven't studied
2004.12.09 - SLIDE 9IS 202 – FALL 2004
Example Questions
• These are available on the Class Web site• Note that these examples are NOT the exact
questions that will be on the exam but are similar to questions that have been used in the past
• If you have actively participated in the phone project assignments from the last part of the course and are familiar with the facetted classification you designed and built, this will greatly help you on at least 30% of the final exam
2004.12.09 - SLIDE 10IS 202 – FALL 2004
Review of Course Content
• We can draw on:– All sets of slides (including this one)– The Course Readers– Textbooks– Handout papers– Assignments– Discussion questions and issues
2004.12.09 - SLIDE 11IS 202 – FALL 2004
Lecture Overview
• Final Exam
• Final Review
• Course Evaluations
• Phone Details
• Next Steps
2004.12.09 - SLIDE 12IS 202 – FALL 2004
Course Schedule
• Organization– Categorization– Knowledge Representation– Lexical Relations and WordNet– Controlled Vocabularies
Introduction– Phone Project Introduction– Semantic Web and RDF– Facetted Classification– Thesaurus Design and
Construction– Metadata Standards– Multimedia Information
Organization and Retrieval– Metadata for Media– Mobile and Context-Aware
Multimedia Systems– Phone Project Presentations– Future of Information Systems
• Retrieval– Overview– What is Information?– History of Information Systems– Introduction to the Search
Process– Boolean Queries and Text
Processing– Web Search Issues and
Architecture– Implementing Web Site Search
Engines– Statistical Properties of Text
and Vector Representation– Probabilistic Ranking &
Relevance Feedback– Evaluation– Database Design
2004.12.09 - SLIDE 13IS 202 – FALL 2004
Your Questions
• What topics and/or questions would you like to discuss today?
2004.12.09 - SLIDE 14IS 202 – FALL 2004
Information Retrieval Topics
• Information• Document Representation and Statistical
Properties of Text• Queries, Ranking, and the Vector Space Model• IR systems and Implementation• Evaluation of IR Systems• The Search Process and User Interfaces• Relevance Feedback• Database Design
2004.12.09 - SLIDE 15IS 202 – FALL 2004
Information Retrieval Topics
• Information– What is the information life cycle? – What are different ways of measuring
information? What are different ways of defining information?
• Document Representation and Statistical Properties of Text– What is the significance of Zipf's law for
weighting of terms in information retrieval? – What kinds of errors can a stemming
algorithm produce?
2004.12.09 - SLIDE 16IS 202 – FALL 2004
Information Retrieval Topics
• Queries, Ranking, and the Vector Space Model – What is the difference between a search engine that uses the
vector space ranking algorithm on natural language queries and a system that uses Boolean queries?
– What is the role of coordination level ranking in a facetted Boolean system?
– Describe the following information need in terms of a faceted Boolean query. What kinds of weighting algorithms can be applied to a faceted query like this? “I would like to find articles about the effects of the passage of the independent investigator statute by Congress on how the U.S. president chooses an attorney general.''
– Why do different web search engines return different sets of documents for the same query?
– Redo the computations of Assignment 3 part 3 using different values for TF.
2004.12.09 - SLIDE 17IS 202 – FALL 2004
Information Retrieval Topics
• IR systems and Implementation– Draw and label a diagram that shows the major components of
an IR system. – What are the special features of the Cheshire II information
access system? – What is the purpose of an inverted index? How is it used to
generate answers to Boolean queries? – Convert the contents of a set of documents (short texts) into an
inverted index representation. • Evaluation of IR Systems
– Define precision. Define recall. Define relevance. How are the three interrelated?
– Under what circumstances is high recall desirable? Under what circumstances is high precision?
– What is the main purpose of TREC? How does it differ from earlier evaluation efforts?
2004.12.09 - SLIDE 18IS 202 – FALL 2004
Information Retrieval Topics
• The Search Process and User Interfaces– Search and retrieval is part of a larger
process. Name some other components of that process.
– How/why doesn't the Bates berry-picking model fit with the standard information retrieval model?
– How (fundamentally) does search on a directory system like Yahoo differ from search on Altavista or Google?
2004.12.09 - SLIDE 19IS 202 – FALL 2004
Information Retrieval Topics
• Relevance Feedback – What is main the difference between relevance
feedback as defined in the literature and the more current web-based notion of "more like this"?
– Given a query, three documents marked as relevant, and the Rocchio formula for relevance feedback given in class, compute the vector for the new query that results.
– The Koenemann & Belkin study found results in three conditions for relevance feedback opaque, transparent, and penetrable. Consider the different ways people have recently implemented systems for predicting which web page to show the user next. How do the differences in these systems correspond to the different relevance feedback
2004.12.09 - SLIDE 20IS 202 – FALL 2004
Information Retrieval Topics
• Database Design– How is a database different than a file system? – What are the benefits of a database system? – What do we mean by data independence? – What are the benefits/drawbacks of the
primary database models? – Entity-Relationship Diagrams -- what are they
for, how do you create them? – How do you normalize a relational model
database? – What is a join?
2004.12.09 - SLIDE 21IS 202 – FALL 2004
Information Organization Topics• Categorization• Knowledge Representation• Lexical Relations and WordNet• Controlled Vocabularies• Semantic Web and RDF• Facetted Classification and Thesaurus Design and
Construction• Metadata Standards• Multimedia Information Organization and Retrieval• Metadata for Motion Pictures Media Streams and MPEG-7• Mobile and Context-Aware Multimedia Information Systems• Looking Backward Looking Forward Future of Information
Systems• Project Presentations
2004.12.09 - SLIDE 22IS 202 – FALL 2004
Information Organization Topics• Categorization
– What is the definition of class membership in traditional categorization? How does traditional categorization have difficulty describing certain phenomena, like games (give 1 other example besides games)?
– What is the “basic level” in categorization and how is it psychologically primary? How might the use of basic level categorization affect the design and use of information systems?
• Knowledge Representation– What limitations in standard information retrieval do knowledge
representation technologies try to overcome? What challenges do they face in the attempt?
– What are the similarities and differences between commonsense knowledge representation systems like CYC and facetted metadata classifications like the Art and Architecture Thesaurus or the facetted classification you built (give three examples)?
2004.12.09 - SLIDE 23IS 202 – FALL 2004
Information Organization Topics• Lexical Relations and WordNet
– What are three lexical relations in WordNet that would be useful in an information retrieval task (explain how and give examples)?
– Where are the meanings of the words in WordNet? How would assuming the conduit metaphor vs. the toolmakers’ paradigm of communication lead you to different answers to this question?
• Controlled Vocabularies– What does Svenonius consider to be the primary
difficulties with using controlled vocabularies? – What is the purpose of authority control? Is this a type
of controlled vocabulary? Why or why not?
2004.12.09 - SLIDE 24IS 202 – FALL 2004
Information Organization Topics• Semantic Web and RDF
– What are the different basic topological structures of XML and RDF? What benefits and problems do these respective structures offer for information organization and retrieval?
– What is the Semantic Web effort trying to accomplish? What challenges does that effort face and how might they be overcome?
• Facetted Classification and Thesaurus Design and Construction– What are the differences between classical and
faceted classification and how do these differences affect the design and use of information systems?
– How is a classification scheme or a thesaurus designed?
2004.12.09 - SLIDE 25IS 202 – FALL 2004
Information Organization Topics• Metadata Standards
– What are the motivations behind creating and using metadata systems like Dublin Core, MARC, AACR II, etc.?
– How do metadata standards come about and how might their provenance affect their adoption?
• Multimedia Information Organization and Retrieval– What is the “Kuleshov Effect” and how might it affect
the design of metadata for multimedia data?– What are the “semantic gap” and the “sensory gap”
and what challenges do they present for the design of information systems for multimedia data?
2004.12.09 - SLIDE 26IS 202 – FALL 2004
Information Organization Topics• Metadata for Motion Pictures Media Streams and
MPEG-7– What limitations do keywords pose for multimedia
information retrieval and how might those limitations be addressed?
– What aspects of multimedia content description is MPEG-7 attempting to standardize?
• Mobile and Context-Aware Multimedia Information Systems– How are cameraphones distinguished from traditional digital
cameras in their technological capabilities and use (give 5 examples)?
– What and how could contextual metadata be useful in describing and retrieving information (give 4 examples)?
2004.12.09 - SLIDE 27IS 202 – FALL 2004
Information Organization Topics• Looking Backward Looking Forward Future of
Information Systems– How are Bush’s vision of the Memex and the current
World Wide Web similar and different (explain two similarities and two differences)?
• Project Presentations – In revising your facetted metadata ontology how did
you increase its expressiveness and reusability (give 3 examples)?
– How well would the ontology you and your partner group designed support one of the other mobile media metadata applications presented by your classmates?
2004.12.09 - SLIDE 28IS 202 – FALL 2004
Lecture Overview
• Final Exam
• Final Review
• Course Evaluations
• Phone Details
• Next Steps
2004.12.09 - SLIDE 29IS 202 – FALL 2004
Course Evaluations
• Please take these seriously
• We and your colleagues really benefit from these in many ways– Affect our promotion and tenure
– Give us helpful feedback on what worked and what didn't to help us for next year and beyond
– They in no way affect your grade
2004.12.09 - SLIDE 30IS 202 – FALL 2004
Lecture Overview
• Final Exam
• Final Review
• Course Evaluations
• Phone Details
• Next Steps
2004.12.09 - SLIDE 31IS 202 – FALL 2004
Phone Details
• Use over break?
• Need roaming?
• Want GPS unit?
• Want to still get photos off the phone?
• Want to switch to primary cell phone number?
• Can bring in on Friday?
• Can bring in on Monday?
2004.12.09 - SLIDE 32IS 202 – FALL 2004
Lecture Overview
• Final Exam
• Final Review
• Course Evaluations
• Phone Details
• Next Steps
2004.12.09 - SLIDE 33IS 202 – FALL 2004
Study hard, and good luck!
Thank you for all the great work!