seminar topic
DESCRIPTION
TRANSCRIPT
![Page 1: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/1.jpg)
Personalization in Information Retrieval, Extraction and AccessWorkshop On Ontology, NLP, Personalization And IE/IR - IIT Bombay, Mumbai 15-17 July 2008
Vasudeva Varma
www.iiit.ac.in/~vasu
![Page 2: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/2.jpg)
2
Search Engine Heat is On!
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
2
� Applications of Search Technologies
�Web search
�Product search
�Service search
�Domain Search
� Already a BIG Market
� HUGE Opportunity
5/30/2008
![Page 3: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/3.jpg)
3
Agenda
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
3
� Evolution of Search Engines
� Information Retrieval Vs. Extraction Vs. Access
� Personalization in IR, IE and IA
� Applications in Personalized IA
� Conclusions
![Page 4: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/4.jpg)
4
Evolution of Search Engines
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
4
� Crawling and Indexing
� Topic directories
� Clustering and Classification
� Hyperlink analysis
� Resource discovery and vertical portals
� Semantic Web
� ???
5/30/2008
![Page 5: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/5.jpg)
5
Current IR engines fail – why?
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
5
� Wide variation in retrieval results � User topic
� Retrieval system
� Different approaches work for different systems.
� No way to determine which approach will work for a particular query.
Solution:
� Deeper analysis of the content and Query
5/30/2008
![Page 6: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/6.jpg)
6
Motivation for Deeper Analysis
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
6
� Texts are one of the major sources of
information and knowledge.
However, they are not transparent.
They have to be systematically integrated with
the other sources like data bases, numerical data,
etc.
NLP/IR/IE for better analysis
IA for better presentation5/30/2008
![Page 7: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/7.jpg)
7
Agenda
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
7
� Evolution of Search Engines
� Information Retrieval Vs. Extraction Vs. Access
� Personalization in IR, IE and IA
� Applications in Personalized IA
� Conclusions
![Page 8: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/8.jpg)
8
IR vs. IE vs. IA
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
8
� To search and retrieve documents in response to queries for information
Vs.
� To extract information that fits pre-defined database schemas or templates, specifying the output formats
Vs.
� To make the required information accessible to the user in theirchoice of language, mode, level of detail and format
5/30/2008
![Page 9: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/9.jpg)
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H9
Collection of Texts
IR System
Characterization of Texts
Queries
5/30/2008
![Page 10: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/10.jpg)
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H10
Collection of Texts
IR System
Characterization of Texts
Queries
Interpretation
Knowledge
5/30/2008
![Page 11: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/11.jpg)
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H11
Collection of Texts
Passage
IR System
Characterization of Texts
Queries
Interpretation
Knowledge
5/30/2008
![Page 12: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/12.jpg)
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H12
Collection of Texts
Passage
IR System
Characterization of Texts
Queries
Interpretation
Knowledge
IE System
Texts Templates
Structures
of
Sentences
NLP
5/30/2008
![Page 13: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/13.jpg)
I
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H13
Passage
IR System
Interpretation
Knowledge
IE System
5/30/2008
Machine
Translation
Summarization
Visualization
Tools
Information Access
Technologies
Snippet
Generation
NL Generation
![Page 14: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/14.jpg)
14
Agenda
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
14
� Evolution of Search Engines
� Information Retrieval Vs. Extraction Vs. Access
� Personalization in IR, IE and IA
� Applications in Personalized IA
� Conclusions
![Page 15: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/15.jpg)
15
Limitations of Current IR Systems15
� All users get same results for a given query –independent of:
� Previous search history
� Current Search Context
� Treat all users the same
� Does one size fits all?
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
![Page 16: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/16.jpg)
16
Personalized Web Search16
� Automatic adjustment of information content, structure, and presentation tailored to an individual user.
� Characteristics: Age, Gender, Special Interest Groups, Topic
� Personalize Search Results using � Personal content
� Past Activities (long term and short term)
� Variations:� Explicit or Implicit profile setup
� Explicit or Implicit relevance feedback
� Client side or server side storage of information (privacy implications)
� User control over amount of personalization
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
![Page 17: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/17.jpg)
17
Overview of Personalized Search
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
17
Typically a 3 step process:
1. Obtain results (n>>10)
2. Computer Similarity (results, User)
3. Re-rank the results
![Page 18: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/18.jpg)
18 5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
18
![Page 19: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/19.jpg)
19 5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
19
![Page 20: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/20.jpg)
20
Techniques
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
20
� Co-active Techniques
� Pro-active Techniques
� Collaborative Filtering
� User Profile based Result Pruning
� User Profile based Query Expansion
5/30/2008
![Page 21: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/21.jpg)
21
Problem Description
� Personalized Search - Issues
� What to use to Personalize?
� How to Personalize?
� When not to Personalize?
� How to know Personalization helped?
![Page 22: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/22.jpg)
22
Problem Description
� We focus on the issue How to Personalize?
� Problem Statement
� How to learn to personalize for future searches using past search history
� How to model and represent past search contexts
� How to use it to improve search results
![Page 23: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/23.jpg)
23
Solution - Outline
� Model and Represent past user feedback – Learning user profile� Use implicit feedback
� Long term learning
� User contexts – triples � {user,query,{relevant documents}}
� Improve Search Results – Reranking� Get Initial Search results
� Take top few and rescore using user profile and rearrange
![Page 24: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/24.jpg)
24
Contributions
� I Search : A suite of approaches for Personalized Web Search
� Proposed Personalized search approaches
� Baseline
� Basic Retrieval methods
� Automatic Evaluation
� Analysis of Query Log
![Page 25: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/25.jpg)
25
Review of Personalized Search
Personalized Search
Query logs Machine learning Language modeling Community based Others
![Page 26: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/26.jpg)
26
I Search : A suite of Techniques for
Personalized IR
� Suite of Approaches???
� Statistical Language modeling based approaches
� Simple N-gram based methods
� Noisy Channel Model based method
� Machine learning based approach
� Ranking SVM based method
� Personalization without relevance feedback
� Simple N-gram based method
![Page 27: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/27.jpg)
27
Statistical Language Modeling based Approaches:Overview
� From user contexts, capture statistical properties of texts
� Use the same to improve search results
� Different Contexts� Unigram and Bigrams
� Simple N-gram based approaches
� Relationship between query and document words
� Noisy Channel based approach
![Page 28: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/28.jpg)
28
Simple N-gram based approaches
� N-gram : general term for words
� 1-gram : unigram, 2-gram : bigram
� Capture statistical properties of text
� Single words (Unigrams)
� Two adjacent words (Bigrams)
![Page 29: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/29.jpg)
29
Learning user profile
Given Past search history
Hu = {(q1, rf1), (q2, rf2), …, (qn, rfn)}
� rfall = contentation of all rf
� For each unigram wi
� User profile
![Page 30: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/30.jpg)
30
Sample user profile
![Page 31: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/31.jpg)
31
Reranking
� In general LM for IR
� Our Approach
![Page 32: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/32.jpg)
32
Noisy Channel based Approach
� Documents and Queries different information spaces
� Queries – short, concise
� Documents – more descriptive
� Most methods to retrieval or personalized web search do not model this
� Capture relationship between query and document words
![Page 33: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/33.jpg)
33
Machine Learning based Approaches:Introduction
� Most machine learning for IR - Binary classification problem – “relevant” and “non-relevant”
� Click through data � Click is not an absolute relevance but relative relevance
� i.e., assuming clicked – relevant, un clicked - irrelevant is wrong.
� Clicks – biased
� Partial relative relevance - Clicked documents are more relevant than the un clicked documents.
![Page 34: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/34.jpg)
34
Personalized Search without Relevance Feedback:Introduction
� Can personalized be done without relevance feedback about which documents are relevant
� How much informative are the queries posed by users
� Is information contained in the queries enough to personalize?
![Page 35: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/35.jpg)
35
Approach
� Past queries of the user available
� Make effective use of past queries
� Simple N-gram based approach
![Page 36: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/36.jpg)
36
Experiment Results
� Language Modeling – Best Results! � Interesting framework Personalized Search
� Simple N-gram based approaches also worked well
� Noisy Channel model worked best� Extracting Synthetic Queries helped
� Different Training schemes� IBM Model1 Vs GIZA++� Snippet Vs Document
� Machine Learning – competitive results� Different Features and weights
� Without Relevance Feedback – Very encouraging results� Simple Approach worked well
� Sparsity – Query log was useful
![Page 37: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/37.jpg)
37
Agenda
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
37
� Evolution of Search Engines
� Information Retrieval Vs. Extraction Vs. Access
� Personalization in IR, IE and IA
� Applications in Personalized IA
� Conclusions
Personalized Search Engine for Mobile Phones
Personalized Summarization (for Mobile Devices)
![Page 38: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/38.jpg)
38 (C) Vasudeva Varma, IIIT Hyderabad, India38
“Personalized” Search Engine for mobile devices
� To develop a “personalized” Search Engine for mobile devices that will produce more relevant results based on the queryand the “context”
� What we mean by “Personalized” search?
� user will be able to configure the search interfaces (Explicit feedback)
� System will observe user behavior and customize itself to suit user’s needs (Implicit feedback)
� What we mean by “Context”?
� User, time, location, …
Goal is to make Search accessible on Nokia mobile devices and make use of
the mobile aspects for personalization.
![Page 39: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/39.jpg)
39 (C) Vasudeva Varma, IIIT Hyderabad, India39
Scope of the Application
Client Side Server Side
![Page 40: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/40.jpg)
40 (C) Vasudeva Varma, IIIT Hyderabad, India40
Problem Re-Definition
� Dynamic user behavior tracking� An observer that keeps track of all “relevant” user actions
� Client module
� Analysis of user actions� Interpret the user actions to derive user interests (categories of interests)
so that more relevant results are displayed
� Construction of user profile implicitly� Implicit Supervised learning
� Personalization� Based on Query
� Based on User Profile
� Based on other parameters such as time, location
![Page 41: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/41.jpg)
41 (C) Vasudeva Varma, IIIT Hyderabad, India41
Solution Overview
![Page 42: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/42.jpg)
42 (C) Vasudeva Varma, IIIT Hyderabad, India42
Personalized Summarization: Motivation
� The success that search engine providers have found on the PC have failed to translate to the mobile phone. why?
� Because trying to force a PC-based search experience inside a mobile device falls short on a key area of usability
� Search queries typically return hundreds of potential hits.
� Making sense of such output is difficult.
� The results may or may not be of user interest.
� We are looking for a faster and easier way to access precise information on our mobile devices.
![Page 43: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/43.jpg)
43 (C) Vasudeva Varma, IIIT Hyderabad, India43
Challenges
� Can we offer users a more simple, friendly and intuitive experience?
� We are looking forward to provide more information with less payload in form of a summary which will take care of� context
� history
� preferences
� device capabilities
� social network
![Page 44: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/44.jpg)
44 (C) Vasudeva Varma, IIIT Hyderabad, India
44
System Model
Search Engine
![Page 45: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/45.jpg)
45
Summary
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
45
� Current Search Engines are inadequate and current know-how is only the tip of an ice-berg
� IR, IE and IA areas have enjoyed huge commercial success and have a huge growth potential
� Personalization is perhaps the next big wave
� Various personalization techniques are available -yet this is a very fertile research field
� The two personalization application shown are just examples of many possibilities.
![Page 46: seminar topic](https://reader033.vdocuments.us/reader033/viewer/2022051312/546257d2af795988228b6125/html5/thumbnails/46.jpg)
Vasudeva Varma, IIIT Hyderabad
[email protected] or www.iiit.ac.in/~vasu
Thank You – Questions?
5/30/2008
46
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H