ir tutorial

28
Information Retrieval Systems By: Hussein Hazimeh Lebanese University.

Upload: hussein-hazimeh

Post on 11-May-2015

160 views

Category:

Art & Photos


0 download

DESCRIPTION

IR tutorial

TRANSCRIPT

Page 1: IR tutorial

Information Retrieval Systems

By: Hussein Hazimeh

Lebanese University.

Page 2: IR tutorial

Main points

Introduction Text operations and Indexing Performance evaluation Search engines as IR tools Metasearch engines IR Applications Some current researches in IRS Current conferences in information retrieval

Page 3: IR tutorial

Introduction Information Retrieval (IR) is the discipline that deals with

retrieval of unstructured data, especially textual documents, in response to a query .

User Interface

Text Operations

Indexing

Similarity Computation (Searching)

Ranking

Index

User need

Inverted file

Documents

Retrieved docs

Ranked docs

Page 4: IR tutorial

Text operation and Indexing Text operations: reduce the complexity of the

document representation

Indexing: A simple alternative is to search the whole text sequentially

Q=List of the European countries List , Europe , country

beautifulflowersgardenhouse

7045, 5818, 296

Vocabulary

Occurrences

Page 5: IR tutorial

Retrieval Performance Evaluation

collection

Relevant DocsIn Answer Set

|Ra|

Relevant Docs|R|

Answer Set|A|

Recall=|Ra|/|R|

Precision=|Ra|/|A|

Page 6: IR tutorial

Popular search engines Google Yahoo Bing …

Google search engine Google search is based on priority Priority rank used “PageRank” algorithm Search Google can be using Boolean operators

such as : exclusion ( -aa ) , alternatives ( aa OR bb)

Page 7: IR tutorial

PageRank algorithm PageRank is an algorithm used by Google

search engine to rank websites in their search engine results.

PR(B) = PR(E) + PR(F) + PR(D) + P(C)

Page 8: IR tutorial

Googlebot : Google’s Web Crawler Googlebot is Google’s web crawling robot, which

finds and retrieves pages on the web and hands them off to the Google indexer.

Googlebot finds pages in two ways: Through an add URL form,

www.google.com/addurl.html Finding links by crawling the web.

Page 9: IR tutorial

How Google process a query

Page 10: IR tutorial

Facebook as intelligent IR tool (Graph search)

Google vs. Facebook

Page 11: IR tutorial

Facebook as intelligent IR tool (continued..)

Google vs. Facebook

Page 12: IR tutorial

Metasearch engines

A meta search engine is a search tool that send user requests to several other search engines and/or databases and aggregate results into a single list or displays them according to their source.

Metasearch engines enable users to enter search criteria once and access several search engines simultaneously.

Page 13: IR tutorial

Metasearch engine

Page 14: IR tutorial

IR Applications

Desktop

Search(Puggl

e)

Digital Librari

es

Mobile IR

IR Applicatio

ns

Enterprise

Search

Page 15: IR tutorial

Some current research topics in IRS

Visual Indexing

Indexing of (video, images, audio). Visual content extraction

Machine learning in information retrieval

Web information retrieval (including blogs)

Mobile computing related information retrieval issues

Performance measures

Query languages and optimization

Page 16: IR tutorial

What is MapReduce ? MapReduce is a programming model for

processing large data sets

The first is the map job, which takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs)

The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples.

Page 17: IR tutorial

Motivations of MapReduce

Data processing > 1 TB

Massively parallel

Easy to use

Page 18: IR tutorial

Programming Model Map(k1,v1) → list(k2,v2) Reduce(k2, list (v2)) → list(v3)

Ex: 5 files Toronto, 20

Whitby, 25 New York, 22 Rome, 32 Toronto, 4 Rome, 33 New York, 18

File 1

Page 19: IR tutorial

Programming Model (continued..) we want to find the maximum tem perature for

each city across all of the data files

Break this into 5 Map tasks

Each mapper work on 1 file and return the Max tem in each city

All five of these output streams would be fed into the reduce tasks, which combine the input results and output a single value for each city, producing a final result.

Page 20: IR tutorial

Programming Model(continued..) Map(output) : (Toronto, 18) (Whitby, 27) (New

York, 32) (Rome, 37)(Toronto, 32) (Whitby, 20) (New York, 33) (Rome, 38)(Toronto, 22) (Whitby, 19) (New York, 20) (Rome, 31)(Toronto, 31) (Whitby, 22) (New York, 19) (Rome, 30)

Reduce(output):(Toronto, 32) (Whitby, 27) (New York, 33) (Rome, 38)

Page 21: IR tutorial

MapReduce uses MapReduce is useful in a wide range of applications,

including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, and machine learning

Moreover, the MapReduce model has been adapted to several computing environments like multi-core systems, desktop grids, dynamic cloud environments, and mobile environments.

At Google, MapReduce was used to completely regenerate Google's index of the World Wide Web. It replaced the old ad hoc programs that updated the index and ran the various analyses.

Page 22: IR tutorial

Current conferences in information retrieval 3rd Spanish Conference on Information Retrieval

2014 , June 20 Spain

The European Conference on Information Retrieval 2014, April 17 Netherland

7th International Workshop on Information Filtering and Retrieval 2013, Dec 6 Italy

Page 23: IR tutorial
Page 24: IR tutorial
Page 25: IR tutorial
Page 26: IR tutorial
Page 27: IR tutorial
Page 28: IR tutorial

Search…

Que

ry

Ope

ratio

ns

groph theories