Hybrid Recommendation Technologies
Francesco RiccieCommerce and Tourism Research Laboratory ITC-irstTrento – [email protected]://ectrl.itc.it
2
Content
Recommender systems
Collaborative-based filtering (CF)
Limitations of CF
Motivations for the proposed research
Case Based Reasoning and Interactive Query Management:
– Case/Session Model
– Similarity for tree-based models
– Query relaxation
Empirical evaluation
Discussion
3
Recommender Systems
A recommender system helps to make choices without sufficient personal experience of the alternatives
– To suggest products to their customers - PUSH
– To provide consumers with information to help them decide which products to purchase - PULL
Some examples found in the Web:
1. Amazon.com – looks in the user past buying history, and recommends product bought by a user with similar buying behavior
2. Tripadvisor.com - Quoting product reviews of a community of users
3. Activebuyersguide.com – make questions about searched benefits to reduce the number of candidate products
They are based on a number of technologies: information filtering, machine learning, adaptive and personalized system, user modeling, …
5
Nearest Neighbor
Nearest Neighbor Collaborative-Based Filtering
Hamming distance
5 6 6 5 4 8
0 Dislike
1 Like
? Unknown
1?011011011110
Current User Users
Item
s
User Model = interaction history
1
1st item rate
14th item rate
6
Collaborative-Based Filtering
A collection of user ui, i=1, …n and a collection of products pj,
j=1, …, m
A n m matrix of rates vij , with vij = ? if user i did not rate
product j
Prediction is computed as
)(*? kv kjikiij vvuKvv
kj
Where, vi is the average rate of user i, K is a normalization factor
such that the sum of uik is 1, and
j j kkjiij
jkkjiij
ikvvvv
vvvv
u22 )()(
))((
Where the sum is over j s.t. vij and vkj are not “?”.
Similarity of users i and k
7
Collaborative-Based Filtering
Pros: require minimal knowledge engineering efforts (knowledge poor)
– Users and products are symbols without any internal structure or characteristics
Cons:
– Requires a large number of explicit “rates” to bootstrap
– Requires products to be standardized (users should have bought exactly the same product)
– Assumes that prior behavior determines current behavior without taking into account “contextual” knowledge (session-level)
– Does not provide information about products or explanations for the recommendations
– Does not support sequential decision making or recommendation of “good bundling”, e.g., a travel package.
8
Requirements and Issues Recommendation Process
– Recommendation requires information search – not only filtering
– Human/Computer dialogues should be supported – e.g. user criticizes a suggested product or refine a query definition
Input/Output
– Products and services may have complex structures
– The final recommendation is a bundling of elementary components
– Allow system bootstrapping without an initial memory of rates interactions
– Generalize the definition of rates (implicit rates)
Users
– Both short term (goal oriented) preferences and long term (stable) preferences must influence the recommendation
– Unregistered users should be allowed to get recommendations
– Account for user variability in preferred decision style
– Users needs and wants structure/language may not match those of the products.
9
Hybrid Case-Based/Collaborative Ranking
Case Base
2. Search Similar Cases
tb
u
r
twcCase
3. Output Reference Set
QInput
4. Sort locations loci bysimilarity to locations in
reference cases
1. Search thecatalogue
loc1
loc2
loc3
Locations from Catalogue
Travelcomponents
loc1
loc2loc3
Ranked Items
Output
Suggest Q changes
tb
u
r
twcCurrent Case
11
A case is a rooted tree and each node has a:
– node-type: similarity between two nodes in two cases is defined only for nodes with the same node-type
– metric-type: node content structure - how to measure the node similarity with another node in a second case
Tree-based Case Representation
c1
nt: case
mt: vector
clf1
cnq1
cart1
nt: cart
mt: vector
dests1
accs1
acts1
nt: destinations
mt: set
dest1
dest2
X1
X2
X3
X4
nt: location
mt: hierarchical
mt: vector
nt: destination
ITEM
12
Item Representation
Node Type Metric Type Example: Canazei
X1LOCATION Set of hierarchical
related symbolsCountry=ITALY, Region=TRENTINO, TouristArea=FASSA, Village=CANAZEI
X2INTERESTS Array of booleans Hiking=1, Trekking=1, Biking=1
X3ALTITUDE Numeric 1400
X4LOCTYPE Array of booleans Urban=0, Mountain=1,
Rivereside=0
TRAVELDESTINATION=(X1,X2,X3,X4)
dest1
X1 = (Italy, Trentino, Fassa, Canazei)
X2 = (1,1,1)
X3 = 1400
X4 = (0, 1, 0)
13
Item Query Language
For querying purposes items x a represented as simple vector features x=(x1, …, xn)
A query is a conjunction of constraints over features: q=c1 c2 … cm where m n and
numerical isif
nominal isif
boolean isif
kk
kk
kk
ii
ii
ii
k
xuxl
xvx
xtruex
c
dest1
X1 = (Italy, Trentino, Fassa, Canazei)
X2 = (1,1,1)
X3 = 1400
X4 = (0, 1, 0)
(Italy, Trentino, Fassa, Canazei, 1, 1, 1, 1400, 0, 1, 0)
14
Query Relaxation
The goal of relaxation process is to solve the empty result set problem finding “maximal” succeeding sub-queries
For boolean queries: 1 succeeding sub-query can be found in quadratic time; finding all of them requires exponential time
Our approach:
– Look for all relaxed sub-queries that change one single constraint and produce some results
– Present all these relaxed queries to the users without sorting them
– If two or more constraints should be relaxed this is done only if they belong to the same Abstraction Hierarchy (they refer to the same concept from the user point of view)
15
Relaxation Applicability: Accommodation Search
All queries
Failing queries
Failing queriesrepaired by thealgorithm
16
Example: Scoring Two Destinations
D2D1
Destinations matchingthe user’s query
Sim(CC,C1) 0.2
Sim(CC,C2) 0.6
Sim(D1, CD1) 0.4
Sim(D1, CD2) 0.7
Sim(D2, CD1) 0.5
Sim(D2, CD2) 0.3
Score(Di) = Maxj {Sim(CC,Cj)*Sim(Di,CDj)}
Score(D1)=Max{0.2*0.4,0.6*0.7}=0.42Score(D2)=Max{0.2*0.5,0.6*0.3}=0.18
currentcase CC
D1
currentcase CC
D2
?
?
similar casesin the case base
CD1
CD2
C1
C2
17
Scoring
A collection of case sessions si i=1,…n and a collection of items/products pj, j=1, …, m
A n n sessions similarity matrix S ={sij} and a m m items/products similarity matrix P={pij}
A n m session vs. product incidence matrix A={aij}, where aij=1 (aij=0) if session i does (not) include product j
Score(si,pj) = MAXk,l{sik akl plj}
A product pj gets a high score if it is very similar (plj close to 1) to a product pl that is contained in a session sk (akl = 1) that is very similar to the target session (sik close to 1)
A particular product can be scored (high) even if it is not already present in other sessions/cases, provided that it is similar to other products contained in other similar sessions.
18
Item Similarity
If X and Y are two items with same node-type
d(X,Y) = (1/i = 1n wi)1/2 [i = 1
n wi di(Xi,Yi)2 ]1/2
where 0 wi 1.
1 if Xi or Yi are unknown
overlap(Xi,Yi) if Xi is symbolic
|Xi - Yi|/rangei if Xi is finite integer or real
di(Xi,Yi) = Jaccard(Xi,Yi) if Xi is an array of Boolean
Hierarchical(Xi,Yi) if Xi is a hierarchy
Modulo(Xi,Yi ) if Xi is a circular feature (month)
Date (Xi,Yi ) if Xi is a date
Sim(X,Y) = 1 - d(X,Y)
dest1
X1=(Italy, Trentino, Fassa, Canazei)
X2 = (1,1,1)
X3 = 1400
X4 = (0, 1, 0)
19
Item Similarity Example
dest1
X1 = (I, TN, Fassa, Canazei)
X2 = (1,1,1)
X3 = 1400
X4 = (0, 1, 0)
dest2
Y1 = (I, TN, Fassa,?)
Y2 = (1,0,1)
Y3 = 1200
Y4 = (1, 1, 0)
8349,01651,014361,04/11
)2/1()2000/)12001400(()3/1()4/1(4/11
),(),(4/11),(
2222
2444
211121
YXdYXddestdestSim
20
Case Distance
c1
nt: case
mt: vector
clf1
cnq1
cart1
nt: cart
mt: vector
dests1
accs1
acts1
nt: destinations
mt: set
dest1
dest2
X1
X2
X3
X4
nt: location
mt: hierarchical
mt: vector
nt: destination
c2clf12
cnq2
cart2
dests2
accs2
acts2
dest3
dest4
Y1
Y2
Y3
Y4
dest5
21
Case Distance
2213
2212
22113
1
21 ),(),(),(1
),( cnqcnqdWclfclfdWcartcartdW
W
ccd
ii
c1
nt: case
mt: vector
clf1
cnq1
cart1
c2clf12
cnq2
cart2
22
221
221
22121 ),(),(),(),( cnqcnqdclfclfddestsdestsdcartcartd
nt: cart
mt: vector
c1
nt: case
mt: vector
clf1
cnq1
cart1
c2clf12
cnq2
cart2
dests1
accs1
acts1
dests2
accs2
acts2
23
c1
nt: case
mt: vector
clf1
cnq1
cart1
nt: cart
mt: vector
dests1
accs1
acts1
nt: destinations
mt: set
dest1
dest2
c2clf12
cnq2
cart2
dests2
accs2
acts2
dest3
dest4
dest5
)),(),(),(
),(),(),((3*2
1),(
524232
51413121
destdestddestdestddestdestd
destdestddestdestddestdestddestsdestsd
24
Empirical Evaluation
NutKing- NutKing+
Queries issued by the user 20.1±19.1 13.4 ±9.3
General travel wishes provided 12.3±1.6 11.5±2.0
Constraints per query 4.7±1.2 4.4±1.0
Results per query 42.0±61.2 9.8±14.3
Pages displayed 93.4±44.3 71.3±35.4
Items in the final travel plan 5.8±3.9 4.1±3.4
Session duration 28.5±9.5 27.3±13.0
Calls to query relaxation n.a. 6.3±3.6
User accepted relax suggestion n.a. 2.8±2.1
Calls to query tightening n.a. 2.1±2.5
User accepted tightening suggestion n.a 0.6±0.9
Position of the selected item in the result list 3.2±4.8 2.2±2.9
Bold face means significantly different (t-test, p<0.05)
25
Research Areas Intersecting with RS
User Modeling: product rates; user dependent product classifier; product preferences; etc.
Information Retrieval: RS may be evaluated in term of precision and recall; a RS retrieves content that is relevant for user information needs
Personalization and Adaptive Hypermedia: recommendations are one-to-one and presentation is adapted to the user (e.g. explanations)
Mixed Initiative and Conversational Systems: system suggestions or questions interleave with user input and information browsing
Decision Making: the ultimate goal is to support a purchase decision; utility-based ranking has been exploited.
26
Contribution
User Model: a collection of cases (recommendation sessions of the user); attribute weights
Information Retrieval: interactive query management; query relaxation; query tightening
Personalization: query refinement suggestions; product ranking; explanations
Conversational: user initiates interaction; system suggests way to escape from interaction dead-ends
Decision Making: model derived from literature on consumer behavior.