ir tutorial

Information Retrieval Systems

By: Hussein Hazimeh

Lebanese University.

Main points

Introduction Text operations and Indexing Performance evaluation Search engines as IR tools Metasearch engines IR Applications Some current researches in IRS Current conferences in information retrieval

Introduction Information Retrieval (IR) is the discipline that deals with

retrieval of unstructured data, especially textual documents, in response to a query .

User Interface

Text Operations

Indexing

Similarity Computation (Searching)

Ranking

Index

User need

Inverted file

Documents

Retrieved docs

Ranked docs

Text operation and Indexing Text operations: reduce the complexity of the

document representation

Indexing: A simple alternative is to search the whole text sequentially

Q=List of the European countries List , Europe , country

beautifulflowersgardenhouse

7045, 5818, 296

Vocabulary

Occurrences

Retrieval Performance Evaluation

collection

Relevant DocsIn Answer Set

|Ra|

Relevant Docs|R|

Answer Set|A|

Recall=|Ra|/|R|

Precision=|Ra|/|A|

Popular search engines Google Yahoo Bing …

Google search engine Google search is based on priority Priority rank used “PageRank” algorithm Search Google can be using Boolean operators

such as : exclusion ( -aa ) , alternatives ( aa OR bb)

PageRank algorithm PageRank is an algorithm used by Google

search engine to rank websites in their search engine results.

PR(B) = PR(E) + PR(F) + PR(D) + P(C)

Googlebot : Google’s Web Crawler Googlebot is Google’s web crawling robot, which

finds and retrieves pages on the web and hands them off to the Google indexer.

Googlebot finds pages in two ways: Through an add URL form,

www.google.com/addurl.html Finding links by crawling the web.

http://www.google.com/addurl.html

How Google process a query

Facebook as intelligent IR tool (Graph search)

Google vs. Facebook

Facebook as intelligent IR tool (continued..)

Google vs. Facebook

Metasearch engines

A meta search engine is a search tool that send user requests to several other search engines and/or databases and aggregate results into a single list or displays them according to their source.

Metasearch engines enable users to enter search criteria once and access several search engines simultaneously.

Metasearch engine

IR Applications

Desktop

Search(Puggl

e)

Digital Librari

es

Mobile IR

IR Applicatio

ns

Enterprise

Search

Some current research topics in IRS

Visual Indexing

Indexing of (video, images, audio). Visual content extraction

Machine learning in information retrieval

Web information retrieval (including blogs)

Mobile computing related information retrieval issues

Performance measures

Query languages and optimization

What is MapReduce ? MapReduce is a programming model for

processing large data sets

The first is the map job, which takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs)

The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples.

Motivations of MapReduce

Data processing > 1 TB

Massively parallel

Easy to use

Programming Model Map(k1,v1) → list(k2,v2) Reduce(k2, list (v2)) → list(v3)

Ex: 5 files Toronto, 20

Whitby, 25 New York, 22 Rome, 32 Toronto, 4 Rome, 33 New York, 18

File 1

Programming Model (continued..) we want to find the maximum tem perature for

each city across all of the data files

Break this into 5 Map tasks

Each mapper work on 1 file and return the Max tem in each city

All five of these output streams would be fed into the reduce tasks, which combine the input results and output a single value for each city, producing a final result.

Programming Model(continued..) Map(output) : (Toronto, 18) (Whitby, 27) (New

York, 32) (Rome, 37)(Toronto, 32) (Whitby, 20) (New York, 33) (Rome, 38)(Toronto, 22) (Whitby, 19) (New York, 20) (Rome, 31)(Toronto, 31) (Whitby, 22) (New York, 19) (Rome, 30)

Reduce(output):(Toronto, 32) (Whitby, 27) (New York, 33) (Rome, 38)

MapReduce uses MapReduce is useful in a wide range of applications,

including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, and machine learning

Moreover, the MapReduce model has been adapted to several computing environments like multi-core systems, desktop grids, dynamic cloud environments, and mobile environments.

At Google, MapReduce was used to completely regenerate Google's index of the World Wide Web. It replaced the old ad hoc programs that updated the index and ran the various analyses.

Current conferences in information retrieval 3rd Spanish Conference on Information Retrieval

2014 , June 20 Spain

The European Conference on Information Retrieval 2014, April 17 Netherland

7th International Workshop on Information Filtering and Retrieval 2013, Dec 6 Italy

Search…

Que

ry

Ope

ratio

ns

groph theories

ir tutorial

Art & Photos

search tool

search criteria

information retrieval

meta search engine

new york

search engines andor

search groph theories

retrieval of unstructured