cs523 information retrieval course introduction yÜcel saygin sabanci university

20
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION • YÜCEL SAYGIN SABANCI UNIVERSITY

Upload: howard-manning

Post on 28-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

CS523 INFORMATION RETRIEVAL

COURSE INTRODUCTION•

• YÜCEL SAYGIN

• SABANCI UNIVERSITY

Page 2: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

Contact Info

[email protected]

http://people.sabanciuniv.edu/~ysaygin

Tel : 9576No Specific office hours. You can drop by anytime you like. Email or call me to make sure I am at the office.

Page 3: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

Course Info

Reference Book: Introduction to Information Retrieval,

Authors: Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze

Publisher:Cambridge University Press. 2008.

Page 4: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

Course Info

Grading: Homework : 10% Project : 40% Paper presentation : 20% Term Paper : 20% Attendance during paper

presentations: 10%

Page 5: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

Topics that will be covered

Document Retrieval TechniquesInformation Retrieval on the WebData Mining for Information Retrieval

Page 6: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

Aim of the course

Knowledge: To introduce information retrieval

techniques

Skills: paper reading and presentation research and/or project work

Page 7: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

A Rough Schedule

October, November: Lectures on various information

retrieval techniques

Remaining weeks: Paper and research project presentations

Page 8: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

What I will do

Give the basics on information retrievalProject supervisionGive directions and advise on the projects Coordination of the presentations

Page 9: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

What I expect you to do

Understand the basic concepts of Information Retrieval

Choose a specific area and two related papers on the same topic for presentation in class

Attendance is required for paper presentations and you will loose 2% of your overall grade for each presentation you missed.

Write a term paper on the two papers presented.

Do a project and a final report describing what you learned or achieved in the scope of the project.

Page 10: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

Sources

TREC Conference http://trec.nist.gov/SIGIR Conference http://www.sigir.org/WWW Conference http://www2004.org/ACM TOIS JournalSIGMOD, VLDB, ICDE Conferences (database perspective)SIGKDD, ICDM Conferences (data mining perspective)

Page 11: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

Tools

SMART IR (Cornell Univ.) http://www.cs.cornell.edu/Info/Projects/NLP/

Glimpse from Univ. Arizona http://webglimpse.net/

GoogleAltavistaYahoo

Page 12: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

Information Retrieval

Refers to the retrieval of any type of information such as

Structured data (e.g. relational database) Text (We will focus on this) Video Image, sound DNA

Page 13: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

Document Retrieval

User Query

Static Document Collection

Ranked Result

•Document Collection is previously indexed•User query is ad hoc•Results are ranked wrt their similarity to the user query

Page 14: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

Document Routing

User profiles are set in advance

Incoming documents are directed to relevant usersUseful for redirecting corporate emails to relevant departments (sales, marketing, support etc)

Page 15: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

Performance Metrics for IRPrecisionRecallNot practical to have good precision and recall

Whole Document Space

Relevant Documents

Retrieved Documents

Relevant and Retrieved Documents

Page 16: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

First Reading for Tomorrow

The Anatomy of a Large-Scale Hypertextual Web Search Engine (WWW Conference 1998)

paper by Sergey Brin and Lawrence Page www-db.stanford.edu/~backrub/google.html

Page 17: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

Web Information Retrieval

Two possible ways: Use the web structure starting from a

location like yahoo where things are categorized

Use search engines

Page 18: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

Web Information Retrieval

Challenges Scale:

Hundreds of millions of queries per day Web grows, continuous crawling is needed Obstacles due to OS, and disk seek time

Google handles large data sets by indexing and compressionSearch quality is important

Completeness of the index is important But ranking is also of utmost importance due to the

size of the Web

Page 19: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

Web Information Retrieval

Ranking (of google) The idea is to give importance to pages that

have a lot of back links Similar to the notion of citations in

academia A link graph of the web was formed and

maintained (518 million links in 1998 for the prototype)

Page 20: CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

Web Mining

(focused) Crawling and IndexingTopic DirectoriesClustering and ClassificationHyperlink AnalysisPersonalization (profiles, preferences)