web personalization and recommender systems bamshad mobasher school of computing, depaul university...

Web Personalization andRecommender Systems

Bamshad MobasherSchool of Computing, DePaul University

Bamshad MobasherSchool of Computing, DePaul University

2

What is Web Personalization Web Personalization: “personalizing the browsing experience of

a user by dynamically tailoring the look, feel, and content of a Web site to the user’s needs and interests.”

Related Phrases mass customization, one-to-one marketing, site customization, target marketing

Why Personalize? broaden and deepen customer relationships provide continuous relationship marketing to build customer loyalty help automate the process of proactively market products to customers

lights-out marketing cross-sell/up-sell products

provide the ability to measure customer behavior and track how well customers are responding to marketing efforts

3

Personalization v. Customization

It’s a question of who controls the user’s browsing experience

Customization user controls and customizes the site or the product based on his/her

preferences usually manual, but sometimes semi-automatic based on a given user profile

Personalization done automatically based on the user’s actions, the user’s profile, and

(possibly) the profiles of others with “similar” profiles

6

Challenges and Pitfalls

Technical Challenges data collection and data preprocessing discovering actionable knowledge from the data which personalization algorithms

Implementation/Deployment Challenges what to personalize when to personalize degree of personalization or customization how to target information without being intrusive

7

Web Personalization & Recommender Systems

Dynamically serve customized content (pages, products, recommendations, etc.) to users based on their profiles, preferences, or expected interests

Most common type of personalization: Recommender systems Recommendation

algorithm

Userprofile

8

Common Recommendation Techniques

Collaborative Filtering Give recommendations to a user based on preferences of “similar”

users Preferences on items may be explicit or implicit

Content-Based Filtering Give recommendations to a user based on items with “similar” content

in the user’s profile

Rule-Based (Knowledge-Based) Filtering Provide recommendations to users based on predefined (or learned)

rules age(x, 25-35) and income(x, 70-100K) and childred(x, >=3)

recommend(x, Minivan)

9

The Recommendation Task

Basic formulation as a prediction problem

Typically, the profile Pu contains preference scores by u on some other items, {i1, …, ik} different from it

preference scores on i1, …, ik may have been obtained explicitly (e.g., movie ratings) or implicitly (e.g., time spent on a product page or a news article)

Given a profile Pu for a user u, and a target item it, predict the preference score of user u on item it

Given a profile Pu for a user u, and a target item it, predict the preference score of user u on item it

10

Notes on User Profiling

Utilizing user profiles for personalization assumes 1) past behavior is a useful predictor of the future behavior 2) wide variety of behaviors amongst users

Basic task in user profiling: Preference elicitation May be based on explicit judgments from users (e.g. ratings) May be based on implicit measures of user interest

Automatic user profiling Use machine learning or data mining techniques to learn models user behavior,

preferences May build a model for each specific user or build group profiles

Usually based on passive observation of user behavior Advantages:

less work for user and application writer adaptive behavior user and system build trust relationship gradually

11

Consequences of passiveness Weak heuristics

example: click through multiple uninteresting pages en route to interestingness

example: user browses to uninteresting page, then goes for a coffee example: hierarchies tend to get more hits near root

Cold start No ability to fine tune profile or express interest without

visiting “appropriate” pages

Some possible alternative/extensions to internally maintained profiles: expose to the user (e.g. fine tune profile) ? expose to other users/agents (e.g. collaborative filtering)? expose to web server (e.g. cnn.com custom news)?

12

Content-Based Filtering Systems Track which pages/items the user visits and give as

recommendations other pages with similar content Often involves the use of client-side learning interface agents May require the user to enter a profile or to rate pages/objects as “interesting”

or “uninteresting”

Advantages: useful for large information-based sites (e.g., portals) or for domains where

items have content-rich features can be easily integrated with “content servers”

Disadvantages may miss important pragmatic relationships among items (based on usage) not effective in small-specific sites or sites which are not content-oriented

13

Content-Based Recommenders

Predictions for unseen (target) items are computed based on their similarity (in terms of content) to items in the user profile.

E.g., user profile Pu contains

recommend highly: and recommend “mildly”:

http://www.imdb.com/title/tt0167404/photogallery






14

Content-Based Recommender Systems

15

Content-Based Recommenders: Personalized Search

How can the search engine determine the “user’s context”?

Query: “Madonna and Child”

?

?

Need to “learn” the user profile: User is an art historian? User is a pop music fan?

16

Content-Based Recommenders

Music recommendations Play list generation

Example: Pandora

http://www.pandora.com/

http://www.pandora.com/

17

Example: Recommender Systems

Collaborative filtering recommenders Predictions for unseen (target) items are computed based the other users’ with

similar interest scores on items in user u’s profile i.e. users with similar tastes (aka “nearest neighbors) requires computing correlations between user u and other users according to interest

scores or ratings

18

Collaborative Recommender

Systems

19

Collaborative Recommender Systems

20

Collaborative Recommender Systems

21

Basic Collaborative Filtering Process

Neighborhood Formation Phase

Recommendations

NeighborhoodFormation

NeighborhoodFormation

RecommendationEngine

RecommendationEngine

Current User Record

HistoricalUser Records

user item rating

<user, item1, item2, …>

NearestNeighbors

CombinationFunction

Recommendation Phase

Both of the Neighborhood formation and the recommendation phases are real-time components

22

Collaborative Filtering: Measuring Similarities

Pearson Correlation weight by degree of correlation between user U and user J

1 means very similar, 0 means no correlation, -1 means dissimilar

Works well in case of user ratings (where there is at least a range of 1-5) Not always possible (in some situations we may only have implicit binary

values, e.g., whether a user did or did not select a document) Alternatively, a variety of distance or similarity measures can be used

Average rating of user Jon all items.2 2

( )( )

( ) ( )UJ

U U J Jr

U U J J

23

Collaborative Filtering: Making Predictions

When generating predictions from the nearest neighbors, neighbors can be weighted based on their distance to the target user

To generate predictions for a target user a on an item i:

ra = mean rating for user a

u1, …, uk are the k-nearest-neighbors to a

ru,i = rating of user u on item I

sim(a,u) = Pearson correlation between a and u

This is a weighted average of deviations from the neighbors’ mean ratings (and closer neighbors count more)

k

u

k

u uiuaia

uasim

uasimrrrp

1

1 ,,

),(

),()(

24

Example Collaborative System

Item1 Item 2 Item 3 Item 4 Item 5 Item 6 Correlation with Alice

Alice 5 2 3 3 ?

User 1 2 4 4 1 -1.00

User 2 2 1 3 1 2 0.33

User 3 4 2 3 2 1 .90

User 4 3 3 2 3 1 0.19

User 5 3 2 2 2 -1.00

User 6 5 3 1 3 2 0.65

User 7 5 1 5 1 -1.00

Bestmatch

Prediction

Using k-nearest neighbor with k = 1

25

Item-based Collaborative Filtering Find similarities among the items based on ratings across users

Often measured based on a variation of Cosine measure

Prediction of item I for user a is based on the past ratings of user a on items similar to i.

Suppose:

Predicted rating for Karen on Indep. Day will be 7, because she rated Star Wars 7 That is if we only use the most similar item Otherwise, we can use the k-most similar items and again use a weighted average

sim(Star Wars, Indep. Day) > sim(Jur. Park, Indep. Day) > sim(Termin., Indep. Day)

26

Item-Based Collaborative Filtering

Item1 Item 2 Item 3 Item 4 Item 5 Item 6

Alice 5 2 3 3 ?

User 1 2 4 4 1

User 2 2 1 3 1 2

User 3 4 2 3 2 1

User 4 3 3 2 3 1

User 5 3 2 2 2

User 6 5 3 1 3 2

User 7 5 1 5 1

Item similarity 0.76 0.79 0.60 0.71 0.75Bestmatch

Prediction

Cosine Similarity to the target item

27

Collaborative Filtering: Evaluation

split users into train/test sets for each user a in the test set:

split a’s votes into observed (I) and to-predict (P) measure average absolute deviation between predicted and actual votes in P MAE = mean absolute error

average over all test users

28

Other Forms of Collaborative and Social Filtering

Social Tagging (Folksonomy) people add free-text tags to their content where people happen to use the same terms then their

content is linked frequently used terms floating to the top to create a kind of

positive feedback loop for popular tags. Examples:

Del.icio.us Flickr Last.fm

http://del.icio.us/

http://flickr.com/

http://www.lastfm.com/

29

Social Tagging

By allowing loose coordination, tagging systems allow social exchange of conceptual information. Facilitates a similar but richer information exchange than collaborative filtering. I comment that a movie is "romantic", or "a good holiday movie". Everyone who

overhears me has access to this metadata about the movie. The social exchange goes beyond collaborative filtering - facilitating transfer of

more abstract, conceptual information about the movie. Note: the preference information is transferred implicitly - we are more likely to

tag items we like than don't like No algorithm mediating our connection between individuals: when we navigate

by tags, we are directly connecting with others.

30

Social Tagging Deviating from standard mental models

No browsing of topical, categorized navigation or searching for an explicit term or phrase

Instead is use language I use to define my world (tagging)

Sharing my language and contexts will create community Tagging creates community through the overlap of perspectives This leads to the creation of social networks which may further develop and

evolve

But, does this lead to dynamic evolution of complex concepts or knowledge? Collective intelligence?

http://en.wikipedia.org/wiki/Social_networking

31

Folksonomies

32

Hybrid Recommender Systems

33

Semantically Enhanced Collaborative Filtering

Basic Idea: Extend item-based collaborative filtering to incorporate both similarity based

on ratings (or usage) as well as semantic similarity based on domain knowledge

Semantic knowledge about items Can be extracted automatically from the Web based on domain-specific

reference ontologies Used in conjunction with user-item mappings to create a combined similarity

measure for item comparisons Singular value decomposition used to reduce noise in the semantic data

Semantic combination threshold Used to determine the proportion of semantic and rating (or usage) similarities

in the combined measure

34

Semantically Enhanced Hybrid Recommendation

An extension of the item-based algorithm Use a combined similarity measure to compute item similarities:

where, SemSim is the similarity of items ip and iq based on semantic features (e.g.,

keywords, attributes, etc.); and RateSim is the similarity of items ip and iq based on user ratings (as in the

standard item-based CF)

is the semantic combination parameter: = 1 only user ratings; no semantic similarity = 0 only semantic features; no collaborative similarity

35

Semantically Enhanced CF

Movie data set Movie ratings from the movielens data set Semantic info. extracted from IMDB based on the following ontology

Movie

Actor DirectorYearName Genre

Genre-All

Romance Comedy

Romantic Comedy

Black Comedy

Kids & Family

Action

Actor

Name Movie Nationality

Director


Movie


Movie


Genre-All

Romance Comedy

Romantic Comedy

Black Comedy

Kids & Family

Action

Genre-All

Romance Comedy

Romantic Comedy

Black Comedy

Kids & Family

Action

Actor


Actor


Director


36


Used 10-fold x-validation on randomly selected test and training data sets Each user in training set has at least 20 ratings (scale 1-5)

Movie Data SetRating Prediction Accuracy

0.71

0.72

0.73

0.74

0.75

0.76

0.77

0.78

0.79

0.8

No. of Neighbors

MA

E

enhanced standard

Movie Data Set Impact of SVD and Semantic Threshold

0.725

0.73

0.735

0.74

0.745

0.75

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Alpha

MA

E

SVD-100 No-SVD

37


Dealing with new items and sparse data sets For new items, select all movies with only one rating as the test data Degrees of sparsity simulated using different ratios for training data

Movie Data Set Prediction Accuracy for New Items

0.72

0.74

0.76

0.78

0.8

0.82

0.84

0.86

0.88

5 10 20 30 40 50 60 70 80 90 100 110 120

No. of Neighbors

MA

E

Avg. Rating as Prediction Semantic Prediction

Movie Data Set% Improvement in MAE

0

5

10

15

20

25

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Train/Test Ratio

% Im

pro

vem

ent

38

Collaborative Filtering: Problems Problems with standard CF

major problem with CF is scalability neighborhood formation is done in real-time

small number of users relative to items may result in poor performance data become too sparse to provide accurate predictions

“new item” problem Vulnerability to attacks (will come back to this later)

Problems in context of clickstream / e-commerce data explicit user ratings are not available

features are binary (visit or a non-visit for a particular item) or a function of the time spent on a particular item

a visit to a page is not necessarily an indication of interest in that item number of user records (and items) is far larger than the standard domains for CF

where users are limited to purchasers or people who rated items need to rely on very short user histories

39

Web Mining Approach to Personalization

Basic Idea generate aggregate user models (usage profiles) by discovering user access

patterns through Web usage mining (offline process) Clustering user transactions Clustering items / pageviews Association rule mining Sequential pattern discovery

match a user’s active session against the discovered models to provide dynamic content (online process)

Advantages no explicit user ratings or interaction with users helps preserve user privacy, by making effective use of anonymous data enhance the effectiveness and scalability of collaborative filtering more accurate and broader recommendations than content-only approaches

40

Automatic Web Personalization:

Offline Process

Web &ApplicationServer Logs

Data CleaningPageview Identification

SessionizationData Integration

Data Transformation

Data Preprocessing

UserTransactionDatabase

Transaction ClusteringPageview ClusteringCorrelation Analysis

Association Rule MiningSequential Pattern Mining

Usage Mining

Patterns

Pattern FilteringAggregation

Characterization

Pattern Analysis

Site Content& Structure

Domain Knowledge

AggregateUsage Profiles

Data Preparation Phase Pattern Discovery Phase

41

Automatic Web Personalization:

Online Process

Recommendation EngineRecommendation Engine

Web Server Client BrowserActive Session

RecommendationsIntegrated User Profile

AggregateUsage Profiles

<user,item1,item2,…>

Stored User Profile

Domain Knowledge

42

Conceptual Representation of User Transactions or Sessions

A B C D E Fuser0 15 5 0 0 0 185user1 0 0 32 4 0 0user2 12 0 0 56 236 0user3 9 47 0 0 0 134user4 0 0 23 15 0 0user5 17 0 0 157 69 0user6 24 89 0 0 0 354user7 0 0 78 27 0 0user8 7 0 45 20 127 0user9 0 38 57 0 0 15

Session/user data

Pageview/objects

Raw weights are usually based on time spent on a page, but in practice, need to normalize and transform.

43

Real-Time Recommendation Engine

Keep track of users’ navigational history through the site a fixed-size sliding window over the active session to capture the current user’s

“short-term” history depth

Match current user’s activity against the discovered profiles profiles either can be based on aggregate usage profiles, or are obtained

directly from association rules or sequential patterns

Dynamically generated recommendations are added to the returned page each pageview can be assigned a recommendation score based on

matching score to user profiles (e.g., aggregate usage profiles) “information value” of the pageview based on domain knowledge (e.g.,

link distance of the candidate recommendation to the active session)

44

Matching score computed using cosine similarity User’s active session (pageviews in the current window) is compared to each

aggregate profile (both are viewed as pageview vectors) Weight of items in the profile vector is the significance weight of the item for that

profile Weight of items in the session vector can be all 1’s, or based on some method for

determining their significance in the current session

Generating recommendations based on matching profiles from each matching profile recommend the items not already in the user session

window, and not directly linked from the pages in the current session window the recommendation score for an item is based on a combination of profile

matching score (similarity to session window) and the weight of the item in that profile

additionally, we can weight items farther away from the current location of user higher (i.e., consider them better recommendations)

Recommendations Based on Aggregate Profiles

45

Discovery of Aggregate Profiles

Transaction clusters as Aggregate Profiles

Each transaction is viewed as a pageview vector

Each cluster contains a set of transaction vectors with a centroid

Each centroid acts as an aggregate profile with representing the

weight for pageview pi in the profile

Personalization involves computing similarity between a current user’s profile (or the active user session) and the cluster centroids

1 2, , ,c c cnc u u u

1 2( , ), ( , ),..., ( , )nt w p t w p t w p t

ciu

46

Web Usage Mining: clustering example Transaction Clusters:

Clustering similar user transactions and using centroid of each cluster as a usage profile (representative for a user segment)

Support URL Pageview Description

1.00 /courses/syllabus.asp?course=450-96-303&q=3&y=2002&id=290

SE 450 Object-Oriented Development class syllabus

0.97 /people/facultyinfo.asp?id=290 Web page of a lecturer who thought the above course

0.88 /programs/ Current Degree Descriptions 2002

0.85 /programs/courses.asp?depcode=96&deptmne=se&courseid=450

SE 450 course description in SE program

0.82 /programs/2002/gradds2002.asp M.S. in Distributed Systems program description

Sample cluster centroid from dept. Web site (cluster size =330)

47

Using Clusters for Personalization

A.html B.html C.html D.html E.html F.htmluser0 1 1 0 0 0 1user1 0 0 1 1 0 0user2 1 0 0 1 1 0user3 1 1 0 0 0 1user4 0 0 1 1 0 0user5 1 0 0 1 1 0user6 1 1 0 0 0 1user7 0 0 1 1 0 0user8 1 0 1 1 1 0user9 0 1 1 0 0 1

A.html B.html C.html D.html E.html F.htmlCluster 0 user 1 0 0 1 1 0 0

user 4 0 0 1 1 0 0user 7 0 0 1 1 0 0

Cluster 1 user 0 1 1 0 0 0 1user 3 1 1 0 0 0 1user 6 1 1 0 0 0 1user 9 0 1 1 0 0 1

Cluster 2 user 2 1 0 0 1 1 0user 5 1 0 0 1 1 0user 8 1 0 1 1 1 0

PROFILE 0 (Cluster Size = 3)--------------------------------------1.00 C.html1.00 D.html

PROFILE 1 (Cluster Size = 4)--------------------------------------1.00 B.html1.00 F.html0.75 A.html0.25 C.html

PROFILE 2 (Cluster Size = 3)--------------------------------------1.00 A.html1.00 D.html1.00 E.html0.33 C.html

Original Session/user

data

Result ofClustering

Given an active session A B, the best matching profile is Profile 1. This may result in a recommendation for page F.html, since it appears with high weight in that profile.

Given an active session A B, the best matching profile is Profile 1. This may result in a recommendation for page F.html, since it appears with high weight in that profile.

48

Association Rules & Personalization

Approach of Fu, Budzik, Hammond, 2000 Proposed solution to the problem of reduced coverage due to sparse data

rank all discovered rules by the degree of intersection between the left-hand-side of rule and a user's active session and then generate the top k recommendations

Problem: requires the generation of all association rules, requiring a search in the full space of rules during the recommendation process

Approach of Lin, Alvarez, Ruiz, 2000 Basic Approach

find an appropriate number of rules for each target user by automatically selecting the minimum support

the recommendation engine generates association rules among both users and articles

Problem: requires online generation of relevant rules for each user

49

Association Rules & Personalization

Approach of Mobasher, et al., 2001 discovered frequent itemsets of are stored into an “itemset graph” (an

extension of lexicographic tree structure of Agrawal, et al. 1999) each node at depth d in the graph corresponds to an itemset, I, of size d

and is linked to itemsets of size d+1 that contain I at level d+1. The single root node at level 0 corresponds to the empty itemset.

frequent itemsets are matched against a user's active session S by performing a search of the graph to depth |S|

recommendation generation can be done in constant time does not require apriori generate association rules from frequent itemsets

a recommendation r is an item at level |S+1| whose recommendation score is the confidence of rule S ==> r

50

Sequential Patterns & Personalization

Sequential / Navigational Patterns as Aggregate Profiles similar to association rules, but the ordering of accessed

items is taken into account Two basic approaches

use contiguous sequential patterns (CSP) (e.g., Web navigational patterns) use general sequential patterns (SP)

Contiguous sequential patterns are often modeled as Markov chains and used for prefetching (i.e., predicting the immediate next user access based on previously accessed pages)

In context of recommendations, they can achieve high accuracy, but may be difficult to obtain reasonable coverage

51

Sequential Patterns & Personalization

Sequential / Navigational Patterns (continued) representation as Markov chains often leads to high space complexity

due to model sizes some approaches have focused on reducing model size selective Markov Models (Deshpande, Karypis, 2000)

use various pruning strategies to reduce the number of states (e.g., support or confidence pruning, error pruning)

longest repeating subsequences (Pitkow, Pirolli, 1999) similar to support pruning, used to focus only on significant navigational

paths increased coverage can be achieved by using all-Kth-order models (i.e.,

using all possible sizes for user histories)

52

Sequential Patterns & Personalization(Mobasher, et al. 2002)

A Frequent Sequence Trie (FST), is used to store both the sequential and contiguous sequential patterns organized into levels from 0 to k, where k is the maximal size among all

sequential (respectively, contiguous sequential) patterns each non-root node N at depth d contains an item sd and representing a frequent

sequence <s1,s2,...,sd>

along with each node the support (or frequency) value of the corresponding pattern is stored

for each active session window w = <w1,w2,...,wn> perform a depth-first search of the FST to level n If a match is found, then the children of the matching node N are used to

generate candidate recommendations given a sequence S = <w1,w2,...,wn,p>, the item p is added to the

recommendation set if the confidence of S is greater than or equal to the confidence threshold

53

Example: Frequent Itemsets

Sample Transactions

Frequent itemsets (using min. support frequency = 4)

54

Example: Sequential Patterns

Sample Transactions

SP (min. support frequency = 4) CSP (min. support frequency = 4)

55

Example: An Itemset Graph

Frequent Itemset Graph for the Example

Given an active session window <B,E>, the algorithm finds items A and C with recommendation scores of 1 and 4/5 (corresponding to confidences of the rules {B,E } => {A } and {B,E } => {C} ).

56

Example: Frequent Sequence Trie

Frequent Sequence Trie for the Example

Given an active session window <A,B>, the algorithm finds item E with recommendation score of 1 (corresponding to confidences of the rules {A,B } => {E }.

57

Quantitative Evaluation of Recommendation Effectiveness

Two important factors in evaluating recommendations Precision: measures the ratio of “correct” recommendations to all

recommendations produced by the system low precision would result in angry or frustrated users

Coverage: measures the ratio of “correct” recommendations to all pages/items that will be accessed by user

low coverage would inhibit the ability of the system to give relevant recommendations at critical points in user navigation

Transactions Divided into Training & Evaluation Sets training set is used to build models (generation of aggregate profiles, neighborhood

formation) evaluation set is used to measure precision & coverage 10-Fold Cross Validation generally used in the experiments

58

Evaluation Methodology Each transaction t in the evaluation set is divided into two parts

ast: portion of the first n items in t, used as the user session to generate recommendations (n is the maximum allowable window size)

Evalt: the remaining portion of t is used to evaluate the recommendations (|Evalt| = |t| - n)

R(ast, ): the recommendation set which contains all items whose recommendation score is greater than or equal to the threshold .

Example: t = A,B,C,D,E,F,G,H- Use A,B,C,D to generate

recommendations, say: E,G,K- Match E,G,K with E,F,G,H- No. of matches = 2- Size of Evalt = 4- Size of recommendation set = 3- Coverage = 2/4 = 50%- Precision = 2/3 = 67%

Example: t = A,B,C,D,E,F,G,H- Use A,B,C,D to generate

recommendations, say: E,G,K- Match E,G,K with E,F,G,H- No. of matches = 2- Size of Evalt = 4- Size of recommendation set = 3- Coverage = 2/4 = 50%- Precision = 2/3 = 67%

59

Increasing window sizes (using larger portion of user’s history) generally leads to improvement in precision

Impact of Window Size - Support = 0.04

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recommendation Threshold

Pre

cis

ion W = 1

W = 2

W = 3

W = 4

Impact of Window Size - Support = 0.04

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


Co

vera

ge W = 1

W = 2

W = 3

W = 4

Impact of Window Size

This example is based on the association rule approachThis example is based on the association rule approach

60

Associations vs. Sequences

Precision - Window=3

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


Pre

cis

ion

kNN Association SP CSP

Coverage - Window=3

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


Co

vera

ge

kNN Association SP CSP

Comparison of recommendations based on association rules, sequential patterns, contiguous sequential patterns, and standard k-nearest neighbor

Support threshold for Association, SP, CSP = 0.04

61

Problems with Web Usage Mining New item problem

Patterns will not capture new items recently added

Bad for dynamic Web sites

Poor machine interpretability Hard to generalize and reason about patterns

No domain knowledge used to enhance results

E.g., Knowing a user is interested in a program, we could recommend the prerequisites, core or popular courses in this program to the user

Poor insight into the patterns themselves The nature of the relationships among items or users in a pattern is not directly

available

62

Solution: Integrate Semantic Knowledge with Web Usage Mining

Information Retrieval/Extraction Approach Represent semantic knowledge in pageviews as keyword vectors Keywords extracted from text or meta-data Text mining can be used to capture higher-level concepts or associations

among concepts Cannot capture deeper relationships among objects based on their inherent

properties or attributes

Ontology-based approach Represent domain knowledge using relational model or ontology

representation languages Process Web usage data with the structured domain knowledge Requires the extraction of ontology instances from Web pages Challenge: performing underlying mining operations on structured objects

(e.g., computing similarities or performing aggregations)

63

Pre-Mining Initial transaction vector: t = <weight(p1,t), …, weight(Pn,t)>

Transform into content-enhanced transaction: t = <weight(w1,t), …, weight(wk,t)>

Now transaction clustering can be performed based on content similarity among user transactions

Post-Mining First perform mining operations on usage and content data independently Integrate usage and content patterns in the recommendation phase Example: Content Profiles

Perform clustering on the term-pageview matrix Each cluster centroid represents pages with some similar content Use both content and usage profiles to generate recommendations

Integration of Content Features

64

A.html B.html C.html D.html E.html

user1 1 0 1 0 1

user2 1 1 0 0 1

user3 0 1 1 1 0

user4 1 0 1 1 1

user5 1 1 0 0 1

user6 1 0 1 1 1

A.html B.html C.html D.html E.html

web 0 0 1 1 1

data 0 1 1 1 0

mining 0 1 1 1 0

business 1 1 0 0 0

intelligence 1 1 0 0 1

marketing 1 1 0 0 1

ecommerce 0 1 1 0 0

search 1 0 1 0 0

information 1 0 1 1 1

retrieval 1 0 1 1 1

User transaction matrix UT

Feature-PageviewMatrix FP

65

Content Enhanced Transactions

web data mining business intelligence marketing ecommerce search information retrieval

user1 2 1 1 1 2 2 1 2 3 3

user2 1 1 1 2 3 3 1 1 2 2

user3 2 3 3 1 1 1 2 1 2 2

user4 3 2 2 1 2 2 1 2 4 4

user5 1 1 1 2 3 3 1 1 2 2

user6 3 2 2 1 2 2 1 2 4 4

User-FeatureMatrix UF Note that: UF = UT x FPT

Example: users 4 and 6 are more interested in concepts related to Web information retrieval, while user 3 is more interested in data mining.

66

Integrating Content and UsageFor Personalization

67

Example: Content ProfilesExamples of feature (word) clusters (Association for Consumer Research Web site)

CLUSTER 0----------anthropologiassocibehavior...

CLUSTER 4----------consumjournalmarketPsychologi…

CLUSTER 10----------ballotresultvote...

CLUSTER 11----------advisoriappointcommittecouncil...

Cluster Centroids

68

Example: Usage Profiles Example Usage Profiles from the ACR Site:

1.00 Call for Papers0.67 ACR News Special Topics0.67 CFP: Journal of Psychology and Marketing I0.67 CFP: Journal of Psychology and Marketing II0.67 CFP: Journal of Consumer Psychology II0.67 CFP: Journal of Consumer Psychology I

1.00 Call for Papers0.67 ACR News Special Topics0.67 CFP: Journal of Psychology and Marketing I0.67 CFP: Journal of Psychology and Marketing II0.67 CFP: Journal of Consumer Psychology II0.67 CFP: Journal of Consumer Psychology I

1.00 CFP: Winter 2000 SCP Conference1.00 Call for Papers0.36 CFP: ACR 1999 Asia-Pacific Conference0.30 ACR 1999 Annual Conference0.25 ACR News Updates0.24 Conference Update

1.00 CFP: Winter 2000 SCP Conference1.00 Call for Papers0.36 CFP: ACR 1999 Asia-Pacific Conference0.30 ACR 1999 Annual Conference0.25 ACR News Updates0.24 Conference Update

Generated by clustering user transactions directly Usage profiles represent groups of users commonly accessing certain

pages together Content profiles represent groups of pages with similar content

69

Comparison of Recommendations

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0


Pre

cis

ion

Content Only Combined Usage & Content

0

0.2

0.4

0.6

0.8

1

1.2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0


Co

vera

ge

Content Only Combined Usage & Content

70

Ontology-Based Usage Mining

Approach 1: Ontology-Enhanced Transactions Initial transaction vector: t = <weight(p1,t), …, weight(Pn,t)>

Transform into content-enhanced transaction:t = <weight(o1,t), …, weight(or,t)>

The structured objects o1, …, or are instances on ontology entities extracted from pages p1, …, pn in the transaction

Now mining tasks can be performed based on ontological similarity among user transactions

Approach 2: Ontology-Enhanced Patterns Discover usage patterns in the standard way Transform patterns by creating an aggregate representation of the patterns based

on the ontology Requires the categorization of similar objects into ontology classes Also requires the specification of different aggregation/combination function for each

attribute of each class in the ontology

71

Example: Ontology for a Movie Site

An example of a Movie object instance

An example of a Movie object instance

72

Ontology-Based Pattern Aggregation

{2000}

{1999}

{2002}

Year

{S: 0.6;

W: 0.4}

{C}

{S: 0.5;

T:0.5}

{B}Movie 2:

{S: 0.7;

T: 0.2;

U: 0.1}

{A}Movie 1:

GenreName

Genre-All

Romance Comedy

Romantic Comedy

Kid&Family

Romance

Genre-All

Movie 3:

Comedy

Genre-All

Actor

[1999, 2002]

Year

{S: 0.58; T: 0.27;

W:0.09; U: 0.05}

{A: 0.5;

B: 0.35;

C: 0.15}

ActorGenreName

Genre-All

Romance Comedy

0.50 Movie1.html 0.35 Movie2.html0.15 Movie3.html

Usage profile

Object Extraction

Ontology-Based

Aggregation

Comedy

Semantic Usage pattern

73

Personalization with Semantic Usage Patterns

Aggregate Semantic Usage Patterns

Match Profiles

Extended User Profile

Current User Profile

Recommendations of Items

Instantiate to Real Web Objects

Note that the matching between the semantic representations of user’s profile and patterns requires computation of similarities at the ontological level (may be defined based on domain-specific characteristics)

Note that the matching between the semantic representations of user’s profile and patterns requires computation of similarities at the ontological level (may be defined based on domain-specific characteristics)

74

Profile Injection Attacks

Consist of a number of "attack profiles" added to the system by providing ratings for various items engineered to bias the system's recommendations Two basic types:

“Push attack” (“Shilling”): designed to promote an item “Nuke attack”: designed to demote a item

Prior work has shown that CF recommender systems are highly vulnerable to such attacks

Attack Models strategies for assigning ratings to items based on knowledge of the

system, products, or users examples of attack models: “random”, “average”, “bandwagon”,

“segment”, “love-hate”

75

A Successful Push Attack

Item1 Item 2 Item 3 Item 4 Item 5 Item 6 Correlation with Alice

Alice 5 2 3 3 ?

User 1 2 4 4 1 -1.00

User 2 2 1 3 1 2 0.33

User 3 4 2 3 2 1 .90

User 4 3 3 2 3 1 0.19

User 5 3 2 2 2 -1.00

User 6 5 3 1 3 2 0.65

User 7 5 1 5 1 -1.00

Attack 1 2 3 2 5 -1.00

Attack 2 3 2 3 2 5 0.76

Attack 3 3 2 2 2 5 0.93

Prediction

Best

Match

“user-based” algorithm using k-nearest neighbor with k = 1

76

A Generic Attack Profile

Attack models differ based on ratings assigned to filler and selected items

… … … it

… … null null null

Ratings for k selected items

Rating for the target item

1Si S

ki

IS

1Fi F

li

IF

1i

vi

I

Ratings for l filler items

Unrated items in the attack profile

1( )Fi ( )Fli

1( )Si ( )Ski ( )ti

77

Random ratings for l filler items

Average and Random Attack Models

Random Attack: filler items are assigned random ratings drawn from the overall distribution of ratings on all items across the whole DB

Average Attack: ratings each filler item drawn from distribution defined by average rating for that item in the DB

The percentage of filler items determines the amount knowledge (and effort) required by the attacker

… … it

… null null null rmax


1Fi F

li

IF

1i

vi

I


1( )Fi ( )Fli

78

Bandwagon Attack Model

What if the system's rating distribution is unknown? Identify products that are frequently rated (e.g., “blockbuster” movies) Associate the pushed product with them Ratings for the filler items centered on overall system average rating (Similar to Random

attack) frequently rated items can be guessed or obtained externally

… … … it

rmax … rmax … null null null rmax

Ratings for k frequently rated items


1Si S

ki

IS

1Fi F

li

IF

1i

vi

I

Random ratings for l filler items


1( )Fi ( )Fli

79

Segment Attack Model

Assume attacker wants to push product to a target segment of users those with preference for similar products

fans of Harrison Ford fans of horror movies

like bandwagon but for semantically-similar items originally designed for attacking item-based CF algorithms

maximize sim(target item, segment items) minimize sim(target item, non-segment items)

… … … it

rmax … rmax rmin … rmin null null null rmax

Ratings for k favorite items in user segment


1Si S

ki

IS

1Fi F

li

IF

1i

vi

I

Ratings for l filler items


80

Nuke Attacks: Love/Hate Attack Model

… … it

rmax … rmax null null null rmin

Min rating for the target item

1Fi F

li

IF

1i

vi

I

Max rating for l filler items


A limited-knowledge attack in its simplest form Target item given the minimum rating value All other ratings in the filler item set are given the maximum rating value

Note: Variations of this (an the other models) can also be used as a push or nuke

attacks, essentially by switching the roles of rmin and rmax.

81

How Effective Can Attacks Be?

First A Methodological Note Using MovieLens 100K data set 50 different "pushed" movies

selected randomly but mirroring overall distribution 50 users randomly pre-selected

Results were averages over all runs for each movie-user pair K = 20 in all experiments Evaluating results

prediction shift– how much the rating of the pushed movie differs before and after the

attack hit ratio

– how often the pushed movie appears in a recommendation list before and after the attack

82

Example Results: Average Attack

Average attack is very effective against user based algorithm (Random not as effective)

Item-based CF more robust (but vulnerable to other attack types such as “segment attack” [Burke & Mobasher, 2005]

Average attack

0

0.2

0.40.6

0.8

1

1.21.4

1.6

1.8

0% 3% 6% 9% 12% 15%

Attack Size

Pre

dic

tio

n S

hif

t

User Based Item Based

83

Example Results: Bandwagon Attack

Only a small profile needed (3%-7%) Only a few (< 10) popular movies needed As effective as the more data-intensive average attack (but still not effective

against item-based algorithms)

Bandwagon and Average Attacks

00.2

0.40.6

0.81

1.21.4

1.6

0% 3% 6% 9% 12% 15%

Attack Size

Pre

dic

tio

n S

hif

t

Average(10%) Bandwagon(6%)

Bandwagon and Average Attacks(10% attack size)

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60

# of recommendations

Hit

Rat

io

Average Attack Bandwagon Attack Baseline

84

Results: Impact of Profile SizeOnly a small number of filler items need to be assigned ratings. An attacker, therefore, only needs to use part of the product space to make the attack effective.

In the item-based algorithm we don’t see the same drop-off, but prediction shift shows a logarithmic behavior – near maximum at about 7% filler size.

85

Example Results: Segmented Attack Against Item-Based CF

Item-Based Algorithm: 1% Attack against the Horror Movie Segment

0%

10%

20%

30%

40%

50%

60%

0 10 20 30 40 50

# of Recommendations

Hit

Rat

io

in-segment all-user pre-attack

Item-Based Algorithm: Horror Movie Segment

0

0.2

0.4

0.6

0.8

1

1.2

0% 5% 10% 15%

Attack Size

Pre

dic

tio

n S

hif

t

in-segment all-user

•Very effective against targeted group•Best against item-based•Also effective against user-based

•Low knowledge

86

Possible Solutions

Explicit trust calculation? select peers through network of trust relationships law of large numbers

hard to achieve numbers needed for CF to work well

Hybrid recommendation Some indications that some hybrids may be more robust

Model-based recommenders Certain recommenders using clustering are more robust, but

generally at the cost of less accuracy But a probabilistic approach has been shown to be relatively

accurate

Detection and Response

87

Results: Semantically Enhanced Hybrid

Alpha 0.0 = 100% semantic item-based similarityAlpha 1.0 = 100% collaborative item-based similarity

Hybrid Algorithm 10% Horror Segment Attack at Alpha = 0.4

0

0.1

0.2

0.3

0.4

0.5

0.6

0 10 20 30 40 50

# of RecommendationsH

it R

atio

Hybrid Item based

Hybrid Algorithm - Impact of Semantic / Collaborative Combination (Alpha) on

Prediction Accuracy

0.56

0.57

0.58

0.59

0.6

0.61

0.62

0 0.2 0.4 0.6 0.8 1

Alpha

MA

E

Semantic features extracted for movies: top actors, director, genre, synopsis (top keywords), etc.

88

Approaches to Detection & Response

Profile Classification Classification model to identify attack profiles and exclude these

profiles in computing predictions Uses the characteristic features of most successful attack models Designed to increase cost of attacks by detecting most effective attacks

Anomaly Detection Classify Items (as being possibly under attack)

Not dependent on known attack models Can shed some light on which type of items are most vulnerable to which types of attacks

But, what if the attack does not closely correspond to known attack signature

In Practice: need a comprehensive framework combining both approaches

89

Conclusions Why recommender systems?

Many algorithmic advances more accurate and reliable systems more confidence by users

Assist users in Finding more relevant information, items, products Give users alternatives broaden user knowledge Building communities

Help companies to Better engage users and customers building loyalty Increase sales (on average 5-10%)

Problems and challenges More complex Web-based applications more complex user

interactions need more sophisticated models Need to further explore the impact of recommendations on (a) user

behavior and (b) on the evolution of Web communities Privacy, security, trust

web personalization and recommender systems bamshad mobasher school of computing, depaul university...

Documents

specific user

models user behavior

preference score of

users actions

personalization assumes1

customer behavior

similar content

web site