ir tutorial
DESCRIPTION
IR tutorialTRANSCRIPT
Information Retrieval Systems
By: Hussein Hazimeh
Lebanese University.
Main points
Introduction Text operations and Indexing Performance evaluation Search engines as IR tools Metasearch engines IR Applications Some current researches in IRS Current conferences in information retrieval
Introduction Information Retrieval (IR) is the discipline that deals with
retrieval of unstructured data, especially textual documents, in response to a query .
User Interface
Text Operations
Indexing
Similarity Computation (Searching)
Ranking
Index
User need
Inverted file
Documents
Retrieved docs
Ranked docs
Text operation and Indexing Text operations: reduce the complexity of the
document representation
Indexing: A simple alternative is to search the whole text sequentially
Q=List of the European countries List , Europe , country
beautifulflowersgardenhouse
7045, 5818, 296
Vocabulary
Occurrences
Retrieval Performance Evaluation
collection
Relevant DocsIn Answer Set
|Ra|
Relevant Docs|R|
Answer Set|A|
Recall=|Ra|/|R|
Precision=|Ra|/|A|
Popular search engines Google Yahoo Bing …
Google search engine Google search is based on priority Priority rank used “PageRank” algorithm Search Google can be using Boolean operators
such as : exclusion ( -aa ) , alternatives ( aa OR bb)
PageRank algorithm PageRank is an algorithm used by Google
search engine to rank websites in their search engine results.
PR(B) = PR(E) + PR(F) + PR(D) + P(C)
Googlebot : Google’s Web Crawler Googlebot is Google’s web crawling robot, which
finds and retrieves pages on the web and hands them off to the Google indexer.
Googlebot finds pages in two ways: Through an add URL form,
www.google.com/addurl.html Finding links by crawling the web.
How Google process a query
Facebook as intelligent IR tool (Graph search)
Google vs. Facebook
Facebook as intelligent IR tool (continued..)
Google vs. Facebook
Metasearch engines
A meta search engine is a search tool that send user requests to several other search engines and/or databases and aggregate results into a single list or displays them according to their source.
Metasearch engines enable users to enter search criteria once and access several search engines simultaneously.
Metasearch engine
IR Applications
Desktop
Search(Puggl
e)
Digital Librari
es
Mobile IR
IR Applicatio
ns
Enterprise
Search
Some current research topics in IRS
Visual Indexing
Indexing of (video, images, audio). Visual content extraction
Machine learning in information retrieval
Web information retrieval (including blogs)
Mobile computing related information retrieval issues
Performance measures
Query languages and optimization
What is MapReduce ? MapReduce is a programming model for
processing large data sets
The first is the map job, which takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs)
The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples.
Motivations of MapReduce
Data processing > 1 TB
Massively parallel
Easy to use
Programming Model Map(k1,v1) → list(k2,v2) Reduce(k2, list (v2)) → list(v3)
Ex: 5 files Toronto, 20
Whitby, 25 New York, 22 Rome, 32 Toronto, 4 Rome, 33 New York, 18
File 1
Programming Model (continued..) we want to find the maximum tem perature for
each city across all of the data files
Break this into 5 Map tasks
Each mapper work on 1 file and return the Max tem in each city
All five of these output streams would be fed into the reduce tasks, which combine the input results and output a single value for each city, producing a final result.
Programming Model(continued..) Map(output) : (Toronto, 18) (Whitby, 27) (New
York, 32) (Rome, 37)(Toronto, 32) (Whitby, 20) (New York, 33) (Rome, 38)(Toronto, 22) (Whitby, 19) (New York, 20) (Rome, 31)(Toronto, 31) (Whitby, 22) (New York, 19) (Rome, 30)
Reduce(output):(Toronto, 32) (Whitby, 27) (New York, 33) (Rome, 38)
MapReduce uses MapReduce is useful in a wide range of applications,
including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, and machine learning
Moreover, the MapReduce model has been adapted to several computing environments like multi-core systems, desktop grids, dynamic cloud environments, and mobile environments.
At Google, MapReduce was used to completely regenerate Google's index of the World Wide Web. It replaced the old ad hoc programs that updated the index and ran the various analyses.
Current conferences in information retrieval 3rd Spanish Conference on Information Retrieval
2014 , June 20 Spain
The European Conference on Information Retrieval 2014, April 17 Netherland
7th International Workshop on Information Filtering and Retrieval 2013, Dec 6 Italy
Search…
Que
ry
Ope
ratio
ns
groph theories