web personalization
DESCRIPTION
NLP Course Seminar. WEB PERSONALIZATION. Group 14 Vishaal Jatav (04d05013) Varun Garg (04d05015) . Roadmap. Motivation Introduction The Personalization Process Personalization Approaches Personalization Techniques Issues Conclusion. Motivation. Some Facts - PowerPoint PPT PresentationTRANSCRIPT
WEB PERSONALIZATION
NLP Course Seminar
Group 14Vishaal Jatav (04d05013)
Varun Garg (04d05015)
Roadmap
Motivation Introduction The Personalization Process Personalization Approaches Personalization Techniques Issues Conclusion
Motivation
Some Facts Overwhelming amount of information on web Not all the documents are relevant to the user Users cannot convey their information needs Users never find any document 100% relevant
Users expect more personal behavior I don't want results of Delhi when I am in Bombay. I was looking for crane (the bird) not crane (the machine).
Google Customization
Google (without personalization)
Google (with personalization)
Google Search History
Google Search History
Introduction
Personalization React differently to different users System reacts in a way the users want it to Ultimately bring back the user to the system
Web Personalization Apply machine learning and data mining Build models of user behavior (called profiles) Predict user's needs and expectations Adaptively estimate better models
The Personalization Process
Consider the following pieces of information Geographical Location Age, gender, ethnicity, religion, etc. Interests Previous reviews on products ......
How could these pieces of information help?
How to collect these information?
The Personalization Process(Contd...)
Collect lots of information on the user behavior Information must be attributable to a single user
Decide on a user model Featuring user needs, lifestyle, situations, etc.
Create user profile for each user of the system Profile captures the individuality of the user
Habits, browsing behavior, lifestyle, etc.
With every interaction, modify the user profile
The Personalization Process More Formally
Web is a collection of n items I = {i1,i
2,....i
n}
User comes from a set U = {u1,u
2,...u
m}
User has rated each item by ruk
: I → [0,1] U ! where, i
j = ! means i
j is not rated by the user
Ik(u) is set of items not yet rated by user u
k
Ik(r) is set of items rated by user u
k
GOAL: recommend items ij to user u
a that are
present in Ia
(u), which might be of his interest
Classification of Personalization Approaches
Individual Vs Collaborative
Reactive Vs Proactive
User Vs Item Information
Classification of Personalization Approaches
Individual Vs Collaborative Individual approach (Google Personalized Search)
Use only individual user's data Generate user profile by analyzing
User's browsing behavior User's active feedback on the system
Advantage Can be implemented on the client-side - no privacy
violation Disadvantage
Based only on past interactions – lack of serendipity
Classification of Personalization Approaches
Individual Vs CollaborativeContd...
Collaborative approach (Amazon recommendations)
Find the neighborhood of the active user React according to an assumption
If A is like B, then B likes the same things as A likes Disadvantages
New item rating problem New user problem
Advantage Better than individual approach - Once the two problems are
solved.
Classification of Personalization Approaches
Reactive Vs Proactive
Reactive approach Explicitly ask user for preferences
Either in the form of query or feedback
Proactive approach Learn user preferences by user behavior
No explicit preference demand from the user Behavior is extracted
Click-through rates Navigational pattern
Classification of Personalization Approaches
User Vs Item Information
User Information Geographic location (from IP address)
age, gender, marital status, etc (explicit query)
Lifestyle, etc. (inference from past behavior)
Item Information Content of Topics – movie genre, etc. Product/ domain ontology
Personalization Techniques
Content-Based Filtering
Collaborative Filtering
Model Based Personalization
Rule based
Graph theoretic
Language Model
Content-Based Filtering
Syskill and Webert use explicit feedback Individual, Reactive, Item-information Uses naïve Bayes to distinguish likes from dislikes Initial probabilities updated with new interactions Uses 128 most informative words from each item
Letizia uses implicit feedback Individual, Proactive, Item-information Find likes/dislikes based on tf-idf similarity
Others use nearest-neighborhood for similarity
Collaborative Filtering
Found successful in recommendation systems
General Technique For every user, a user neighborhood is computed
Neighborhood contains users who have rated several items almost equally
Get candidate items for recommendations Items seen by the neighborhood but not by active user ua
Data is stored in the form of a rating matrix Items as rows and users as columns
Collaborative FilteringContd....
System must provide the following algorithms Measure similarity between users
For creation of the neighborhood Pearson and Spearman Correlation, cosine similarity, etc.
Predicting rank of the item not rated by the user To decide order with which these items will be presented Weighted sum of ranks – most common
Select neighborhood subset for prediction To reduce large amount of computation Threshold in similarity value – most common
Model Based Personalization Approaches
Executed in two stages Offline process – to create the actual model Online process – using the model and interaction
Common data used for model generation Web usage data (web history, click-through rates, etc.) Item's structure and content data
Examples Rule-Based Models Graph-Theoretic Models Language Models
Model Based Personalization
Rule Based Models
Association rule-based Item ia is in unordered association with ib If user considers ib, then ia is a good recommendation
Sequence rule-based Item ia is in sequential association with ib If user considers ia, then ib is a good recommendation
Association between items can be stored as a dependency graph
Model Based Personalization
Graph Theoretic Model
Ratings data is transformed into a directed graph Nodes are users A edge between ui and uj means that ui predicts uj
Weights on edges represents the predictability
To predict if an item ik will be of interest to ui
Calculate shortest path from ui to any user ur Where u
r has rated i
k
Predicted rating is calculated as a function of path between ui and ur
Model Based Personalization
Language Modeling Approaches
Without using user's relevance feedback Simple language modeling
Using user's relevance feedback N gram based methods Noisy channel model based method
Language Model Approach
Simple Language Modeling
Without using user's feedback History consists of all the words in the past
queries Learn User Profile as {(w
1,P(w
1)),... (w
n,P(w
n))}
where
Language Model Approach
Simple Language Modeling Sample User profile
Language Model Approach
Simple Language Modeling
Re-ranking of unpersonalized results Re-ranking is done according to P(Q|D,u)
α Is a weighter parameter between 0 and 1 UP is user profile
Language Model Approach
N gram based approach
Using user's relevance feedback Learn User Profile
Let Hu represent the search history of user u
H = {(q1, rf
1), (q
2, rf
2), (q
3, rf
3), ...., (q
n, rf
n)}
Unigram
Now the user profile consists of
{(w1, P(w
1)), (w
2, P(w
2)), (w
3, P(w
3)), ...., (w
n, P(w
n))}
Language Model Approach
N gram based approach Sample Unigram User Profile
Language Model Approach
N gram based approach
Bigram
the user profile consists of
{(w1w
2, P(w
2|w
1)), (w
2w
3, P(w
3|w
2)), ... , (w
n-1w
n, P(w
n|w
n-1))}
Language Model Approach
N gram based approach
Sample Bigram User Profile
Language Model Approach
N gram based approach
Re-ranking unpersonalized results Based on unigram (α = weighting parameter)
Q = q1 q2 q3 .... qn
P(q1 q2 q3 .... qn)= P(q1) P(q2) P(q3) ....... P(qn)
Language Model Approach
N gram based approach Based on bigrams
Q = q1 q2 q3 .... qn
P(q1 q2 q3 .... qn)= P(q1|q2) P(q2|q3) ....... P(qn-1|qn)
Language Model Approach
Noisy Channel based approach With using User's Feedback (Implicit) User history is represented as
Hi = (Q1,D1) , (Q2,D2) , .... (QN,DN) Di is the document visited for Qi
D consists of words w1, w2, .... wm
Basic Idea – Statistical Machine Translation Given Parallel Text of languages S and T We get P(ti|si) ∀ si ϵ S and ti ϵ T Using EM we get the optimized model P(T|S)
Language Model Approach
Noisy Channel based approach Similarly
T = past queries Q1, Q2, .... QK
S = text of relevant documents for queries T We learn the model P(Q|D) or more precisely P(qi|wj)
Assumption Translate the ideal [information containing] document into a query Document – a verbose language Query – a compact language
User profile is stored as Tuples < qi , wj , P(qi|wj) >
Language Model Approach
Noisy Channel based approach
Sample Noisy Channel User Profile
Language Model Approach
Noisy Channel based approach
Re-ranking Re-rank the documents using P(Q|D,u)
α = weighting parameter P(q
i|GE) is the lexical probability of q
i
Issues in Personalization
Cold Start Problem (new user problem)
Latency Problem (new item problem)
Data sparseness Scalability Privacy Recommendation List Diversity Robustness
Conclusion
Web personalization is the need of the hour for e-businesses
A relatively new research topic Several issues are yet to be solved effectively
Data should be collected without evading user privacy
Creating user models effectively and scaling it to the size of a large number of users/ items is at the core of Personalization
Bibliography Rohini U, Vamshi Ambati and Vasudeva Varma. Statistical
Machine Translation Models for Personalized Search. In the Proceedings of 3rd International Joint Conference on Natural Language Processing (IJCNLP 2008), January 7-12, 2008, Hyderabad, India.
Sarabjot S. Anand and Bamshad Mobasher. Intelligent techniques for web personalization. In Intelligent Techniques for Web Personalization, pages 1-36. Springer, 2005.
Vasudeva Verma. Personalization in Information Retrieval, Extraction and Access. In Workshop On Ontology, NLP, Personalization And IE/IR - IIT Bombay, Mumbai 15-17 July 2008
http://en.wikipedia.org/wiki/Personalisation Snapshots from Google Inc.
Questions