cs523 information retrieval course introduction yÜcel saygin sabanci university
TRANSCRIPT
![Page 1: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/1.jpg)
CS523 INFORMATION RETRIEVAL
COURSE INTRODUCTION•
• YÜCEL SAYGIN
• SABANCI UNIVERSITY
![Page 2: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/2.jpg)
Contact Info
http://people.sabanciuniv.edu/~ysaygin
Tel : 9576No Specific office hours. You can drop by anytime you like. Email or call me to make sure I am at the office.
![Page 3: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/3.jpg)
Course Info
Reference Book: Introduction to Information Retrieval,
Authors: Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze
Publisher:Cambridge University Press. 2008.
![Page 4: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/4.jpg)
Course Info
Grading: Homework : 10% Project : 40% Paper presentation : 20% Term Paper : 20% Attendance during paper
presentations: 10%
![Page 5: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/5.jpg)
Topics that will be covered
Document Retrieval TechniquesInformation Retrieval on the WebData Mining for Information Retrieval
![Page 6: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/6.jpg)
Aim of the course
Knowledge: To introduce information retrieval
techniques
Skills: paper reading and presentation research and/or project work
![Page 7: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/7.jpg)
A Rough Schedule
October, November: Lectures on various information
retrieval techniques
Remaining weeks: Paper and research project presentations
![Page 8: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/8.jpg)
What I will do
Give the basics on information retrievalProject supervisionGive directions and advise on the projects Coordination of the presentations
![Page 9: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/9.jpg)
What I expect you to do
Understand the basic concepts of Information Retrieval
Choose a specific area and two related papers on the same topic for presentation in class
Attendance is required for paper presentations and you will loose 2% of your overall grade for each presentation you missed.
Write a term paper on the two papers presented.
Do a project and a final report describing what you learned or achieved in the scope of the project.
![Page 10: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/10.jpg)
Sources
TREC Conference http://trec.nist.gov/SIGIR Conference http://www.sigir.org/WWW Conference http://www2004.org/ACM TOIS JournalSIGMOD, VLDB, ICDE Conferences (database perspective)SIGKDD, ICDM Conferences (data mining perspective)
![Page 11: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/11.jpg)
Tools
SMART IR (Cornell Univ.) http://www.cs.cornell.edu/Info/Projects/NLP/
Glimpse from Univ. Arizona http://webglimpse.net/
GoogleAltavistaYahoo
![Page 12: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/12.jpg)
Information Retrieval
Refers to the retrieval of any type of information such as
Structured data (e.g. relational database) Text (We will focus on this) Video Image, sound DNA
![Page 13: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/13.jpg)
Document Retrieval
User Query
Static Document Collection
Ranked Result
•Document Collection is previously indexed•User query is ad hoc•Results are ranked wrt their similarity to the user query
![Page 14: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/14.jpg)
Document Routing
User profiles are set in advance
Incoming documents are directed to relevant usersUseful for redirecting corporate emails to relevant departments (sales, marketing, support etc)
![Page 15: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/15.jpg)
Performance Metrics for IRPrecisionRecallNot practical to have good precision and recall
Whole Document Space
Relevant Documents
Retrieved Documents
Relevant and Retrieved Documents
![Page 16: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/16.jpg)
First Reading for Tomorrow
The Anatomy of a Large-Scale Hypertextual Web Search Engine (WWW Conference 1998)
paper by Sergey Brin and Lawrence Page www-db.stanford.edu/~backrub/google.html
![Page 17: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/17.jpg)
Web Information Retrieval
Two possible ways: Use the web structure starting from a
location like yahoo where things are categorized
Use search engines
![Page 18: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/18.jpg)
Web Information Retrieval
Challenges Scale:
Hundreds of millions of queries per day Web grows, continuous crawling is needed Obstacles due to OS, and disk seek time
Google handles large data sets by indexing and compressionSearch quality is important
Completeness of the index is important But ranking is also of utmost importance due to the
size of the Web
![Page 19: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/19.jpg)
Web Information Retrieval
Ranking (of google) The idea is to give importance to pages that
have a lot of back links Similar to the notion of citations in
academia A link graph of the web was formed and
maintained (518 million links in 1998 for the prototype)
![Page 20: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY](https://reader036.vdocuments.us/reader036/viewer/2022083004/56649e635503460f94b5fa18/html5/thumbnails/20.jpg)
Web Mining
(focused) Crawling and IndexingTopic DirectoriesClustering and ClassificationHyperlink AnalysisPersonalization (profiles, preferences)