hybrid recommendation technologies francesco ricci ecommerce and tourism research laboratory...

Hybrid Recommendation Technologies

Francesco RiccieCommerce and Tourism Research Laboratory ITC-irstTrento – [email protected]://ectrl.itc.it

2

Content

Recommender systems

Collaborative-based filtering (CF)

Limitations of CF

Motivations for the proposed research

Case Based Reasoning and Interactive Query Management:

– Case/Session Model

– Similarity for tree-based models

– Query relaxation

Empirical evaluation

Discussion

3

Recommender Systems

A recommender system helps to make choices without sufficient personal experience of the alternatives

– To suggest products to their customers - PUSH

– To provide consumers with information to help them decide which products to purchase - PULL

Some examples found in the Web:

1. Amazon.com – looks in the user past buying history, and recommends product bought by a user with similar buying behavior

2. Tripadvisor.com - Quoting product reviews of a community of users

3. Activebuyersguide.com – make questions about searched benefits to reduce the number of candidate products

They are based on a number of technologies: information filtering, machine learning, adaptive and personalized system, user modeling, …

4

Recommendation “Core” Techniques

[Burke, 2002]

U is a set of usersI is a set of items/products

5

Nearest Neighbor

Nearest Neighbor Collaborative-Based Filtering

Hamming distance

5 6 6 5 4 8

0 Dislike

1 Like

? Unknown

1?011011011110

Current User Users

Item

s

User Model = interaction history

1

1st item rate

14th item rate

6

Collaborative-Based Filtering

A collection of user ui, i=1, …n and a collection of products pj,

j=1, …, m

A n m matrix of rates vij , with vij = ? if user i did not rate

product j

Prediction is computed as

)(*? kv kjikiij vvuKvv

kj

Where, vi is the average rate of user i, K is a normalization factor

such that the sum of uik is 1, and

j j kkjiij

jkkjiij

ikvvvv

vvvv

u22 )()(

))((

Where the sum is over j s.t. vij and vkj are not “?”.

Similarity of users i and k

7

Collaborative-Based Filtering

Pros: require minimal knowledge engineering efforts (knowledge poor)

– Users and products are symbols without any internal structure or characteristics

Cons:

– Requires a large number of explicit “rates” to bootstrap

– Requires products to be standardized (users should have bought exactly the same product)

– Assumes that prior behavior determines current behavior without taking into account “contextual” knowledge (session-level)

– Does not provide information about products or explanations for the recommendations

– Does not support sequential decision making or recommendation of “good bundling”, e.g., a travel package.

8

Requirements and Issues Recommendation Process

– Recommendation requires information search – not only filtering

– Human/Computer dialogues should be supported – e.g. user criticizes a suggested product or refine a query definition

Input/Output

– Products and services may have complex structures

– The final recommendation is a bundling of elementary components

– Allow system bootstrapping without an initial memory of rates interactions

– Generalize the definition of rates (implicit rates)

Users

– Both short term (goal oriented) preferences and long term (stable) preferences must influence the recommendation

– Unregistered users should be allowed to get recommendations

– Account for user variability in preferred decision style

– Users needs and wants structure/language may not match those of the products.

9

Hybrid Case-Based/Collaborative Ranking

Case Base

2. Search Similar Cases

tb

u

r

twcCase

3. Output Reference Set

QInput

4. Sort locations loci bysimilarity to locations in

reference cases

1. Search thecatalogue

loc1

loc2

loc3

Locations from Catalogue

Travelcomponents

loc1

loc2loc3

Ranked Items

Output

Suggest Q changes

tb

u

r

twcCurrent Case

10

Case/Session Model

11

A case is a rooted tree and each node has a:

– node-type: similarity between two nodes in two cases is defined only for nodes with the same node-type

– metric-type: node content structure - how to measure the node similarity with another node in a second case

Tree-based Case Representation

c1

nt: case

mt: vector

clf1

cnq1

cart1

nt: cart

mt: vector

dests1

accs1

acts1

nt: destinations

mt: set

dest1

dest2

X1

X2

X3

X4

nt: location

mt: hierarchical

mt: vector

nt: destination

ITEM

12

Item Representation

Node Type Metric Type Example: Canazei

X1LOCATION Set of hierarchical

related symbolsCountry=ITALY, Region=TRENTINO, TouristArea=FASSA, Village=CANAZEI

X2INTERESTS Array of booleans Hiking=1, Trekking=1, Biking=1

X3ALTITUDE Numeric 1400

X4LOCTYPE Array of booleans Urban=0, Mountain=1,

Rivereside=0

TRAVELDESTINATION=(X1,X2,X3,X4)

dest1

X1 = (Italy, Trentino, Fassa, Canazei)

X2 = (1,1,1)

X3 = 1400

X4 = (0, 1, 0)

13

Item Query Language

For querying purposes items x a represented as simple vector features x=(x1, …, xn)

A query is a conjunction of constraints over features: q=c1 c2 … cm where m n and

numerical isif

nominal isif

boolean isif

kk

kk

kk

ii

ii

ii

k

xuxl

xvx

xtruex

c

dest1

X1 = (Italy, Trentino, Fassa, Canazei)

X2 = (1,1,1)

X3 = 1400

X4 = (0, 1, 0)

(Italy, Trentino, Fassa, Canazei, 1, 1, 1, 1400, 0, 1, 0)

14

Query Relaxation

The goal of relaxation process is to solve the empty result set problem finding “maximal” succeeding sub-queries

For boolean queries: 1 succeeding sub-query can be found in quadratic time; finding all of them requires exponential time

Our approach:

– Look for all relaxed sub-queries that change one single constraint and produce some results

– Present all these relaxed queries to the users without sorting them

– If two or more constraints should be relaxed this is done only if they belong to the same Abstraction Hierarchy (they refer to the same concept from the user point of view)

15

Relaxation Applicability: Accommodation Search

All queries

Failing queries

Failing queriesrepaired by thealgorithm

16

Example: Scoring Two Destinations

D2D1

Destinations matchingthe user’s query

Sim(CC,C1) 0.2

Sim(CC,C2) 0.6

Sim(D1, CD1) 0.4

Sim(D1, CD2) 0.7

Sim(D2, CD1) 0.5

Sim(D2, CD2) 0.3

Score(Di) = Maxj {Sim(CC,Cj)*Sim(Di,CDj)}

Score(D1)=Max{0.2*0.4,0.6*0.7}=0.42Score(D2)=Max{0.2*0.5,0.6*0.3}=0.18

currentcase CC

D1

currentcase CC

D2

?

?

similar casesin the case base

CD1

CD2

C1

C2

17

Scoring

A collection of case sessions si i=1,…n and a collection of items/products pj, j=1, …, m

A n n sessions similarity matrix S ={sij} and a m m items/products similarity matrix P={pij}

A n m session vs. product incidence matrix A={aij}, where aij=1 (aij=0) if session i does (not) include product j

Score(si,pj) = MAXk,l{sik akl plj}

A product pj gets a high score if it is very similar (plj close to 1) to a product pl that is contained in a session sk (akl = 1) that is very similar to the target session (sik close to 1)

A particular product can be scored (high) even if it is not already present in other sessions/cases, provided that it is similar to other products contained in other similar sessions.

18

Item Similarity

If X and Y are two items with same node-type

d(X,Y) = (1/i = 1n wi)1/2 [i = 1

n wi di(Xi,Yi)2 ]1/2

where 0 wi 1.

1 if Xi or Yi are unknown

overlap(Xi,Yi) if Xi is symbolic

|Xi - Yi|/rangei if Xi is finite integer or real

di(Xi,Yi) = Jaccard(Xi,Yi) if Xi is an array of Boolean

Hierarchical(Xi,Yi) if Xi is a hierarchy

Modulo(Xi,Yi ) if Xi is a circular feature (month)

Date (Xi,Yi ) if Xi is a date

Sim(X,Y) = 1 - d(X,Y)

dest1

X1=(Italy, Trentino, Fassa, Canazei)

X2 = (1,1,1)

X3 = 1400

X4 = (0, 1, 0)

19

Item Similarity Example

dest1

X1 = (I, TN, Fassa, Canazei)

X2 = (1,1,1)

X3 = 1400

X4 = (0, 1, 0)

dest2

Y1 = (I, TN, Fassa,?)

Y2 = (1,0,1)

Y3 = 1200

Y4 = (1, 1, 0)

8349,01651,014361,04/11

)2/1()2000/)12001400(()3/1()4/1(4/11

),(),(4/11),(

2222

2444

211121

YXdYXddestdestSim

20

Case Distance

c1

nt: case

mt: vector

clf1

cnq1

cart1

nt: cart

mt: vector

dests1

accs1

acts1

nt: destinations

mt: set

dest1

dest2

X1

X2

X3

X4

nt: location

mt: hierarchical

mt: vector

nt: destination

c2clf12

cnq2

cart2

dests2

accs2

acts2

dest3

dest4

Y1

Y2

Y3

Y4

dest5

21

Case Distance

2213

2212

22113

1

21 ),(),(),(1

),( cnqcnqdWclfclfdWcartcartdW

W

ccd

ii

c1

nt: case

mt: vector

clf1

cnq1

cart1

c2clf12

cnq2

cart2

22

221

221

22121 ),(),(),(),( cnqcnqdclfclfddestsdestsdcartcartd

nt: cart

mt: vector

c1

nt: case

mt: vector

clf1

cnq1

cart1

c2clf12

cnq2

cart2

dests1

accs1

acts1

dests2

accs2

acts2

23

c1

nt: case

mt: vector

clf1

cnq1

cart1

nt: cart

mt: vector

dests1

accs1

acts1

nt: destinations

mt: set

dest1

dest2

c2clf12

cnq2

cart2

dests2

accs2

acts2

dest3

dest4

dest5

)),(),(),(

),(),(),((3*2

1),(

524232

51413121

destdestddestdestddestdestd

destdestddestdestddestdestddestsdestsd

24

Empirical Evaluation

NutKing- NutKing+

Queries issued by the user 20.1±19.1 13.4 ±9.3

General travel wishes provided 12.3±1.6 11.5±2.0

Constraints per query 4.7±1.2 4.4±1.0

Results per query 42.0±61.2 9.8±14.3

Pages displayed 93.4±44.3 71.3±35.4

Items in the final travel plan 5.8±3.9 4.1±3.4

Session duration 28.5±9.5 27.3±13.0

Calls to query relaxation n.a. 6.3±3.6

User accepted relax suggestion n.a. 2.8±2.1

Calls to query tightening n.a. 2.1±2.5

User accepted tightening suggestion n.a 0.6±0.9

Position of the selected item in the result list 3.2±4.8 2.2±2.9

Bold face means significantly different (t-test, p<0.05)

25

Research Areas Intersecting with RS

User Modeling: product rates; user dependent product classifier; product preferences; etc.

Information Retrieval: RS may be evaluated in term of precision and recall; a RS retrieves content that is relevant for user information needs

Personalization and Adaptive Hypermedia: recommendations are one-to-one and presentation is adapted to the user (e.g. explanations)

Mixed Initiative and Conversational Systems: system suggestions or questions interleave with user input and information browsing

Decision Making: the ultimate goal is to support a purchase decision; utility-based ranking has been exploited.

26

Contribution

User Model: a collection of cases (recommendation sessions of the user); attribute weights

Information Retrieval: interactive query management; query relaxation; query tightening

Personalization: query refinement suggestions; product ranking; explanations

Conversational: user initiates interaction; system suggests way to escape from interaction dead-ends

Decision Making: model derived from literature on consumer behavior.

27

Thank you !

hybrid recommendation technologies francesco ricci ecommerce and tourism research laboratory...

Documents

collection of user u

average rate of user

information filtering

user modeling

similarity of users

set of users

user past

community of users