elixir: learning from user feedback on

ELIXIR: Learning from User Feedback on Explanations to Improve Recommender Models

Azin Ghazimatin, Soumajit Pramanik, Rishiraj Saha Roy, Gerhard Weikum

Max Planck Institute for Informatics, Saarland Informatics CampusSaarbrücken, Germany

The Web Conference, 2021

Motivation

User u Fight Club

7 years in Tibet

The Prestige

Pulp Fiction

Explanation (exp): Recommendation (rec)

Motivation

I want to stop seeing this item and the like.

I want to see more items like this.

User u Fight Club

7 years in Tibet

The Prestige

Pulp Fiction

Motivation

So far: item-level rating

User u Fight Club

7 years in Tibet

The Prestige

Pulp Fiction

Motivation

So far: item-level rating

Brad Pitt

Surprise Ending

Violence

What is it that you (dis)like about the Fight Club?

User u Fight Club

7 years in Tibet

The Prestige

Pulp Fiction

Motivation

The idea: Leverage feedback on pairs of explanations and recommendations

Similar aspect of (rec, exp):

Feedback on (rec, exp):

Cast: Brad Pitt

Storyline: Surprise Ending

Content: Violence

User u Fight Club

7 years in Tibet

The Prestige

Pulp Fiction

ELIXIR

Efficient Learning from Item-level eXplanations In Recommenders(Improving future recommendations through feedback on pairs of recommendation and explanation items)

User u

Fight Club

7 years in Tibet

The Prestige

Pulp Fiction

Explanation (exp): Recommendation (rec)at time T

Similar aspect of (rec, exp):

Feedback on (rec, exp):

Cast: Brad Pitt

Storyline: Surprise Ending

Content: Violence

Recommendation (rec)at time T+1

Ocean’s Eleven

ELIXIR

Collect and densify Feedback

Incorporate Feedback

ELIXIR: Feedback Matrix

User uFight Club

7 years in Tibet

The Prestige

Pulp Fiction

Do you like the similarity between (rec, exp)?

Matrix

ELIXIR: Feedback Densification

9Zhu and Ghahramani, Learning from labeled and unlabeled data with label propagation, CMU Technical Report (2002)

★ Label propagation [Zhu and Ghahramani, 2002] on pairs of items to mitigate sparsity

★ Present pairs ( , ) with pseudo-items (geometric mean of two vectors)

★ The affinity matrix encodes pair-pair similarity. This makes huge! ( , is the set of items)

★ Propagate labels only on of the labeled pairs○ naive approach: ○ our approach:

■ find of among the items in using locality sensitive hashing (LSH) , we call this set . ■ Search for of in . ■ This reduces the time complexity to for small .

ELIXIR: Feedback Incorporation

The Idea: Learn user-specific parameters of a mapping function to produceuser-specific item representations.

similarity functionNo. feedback pairs

Similarity of items in negative (positive) pairs decreases (increases).

Distance of items to user’s profile

User u

After feedback incorporation

User u

Instantiation of ELIXIR

★ We instantiated our framework with RecWalkPR [Nikolakopoulos and Karypis, WSDM’19]

★ Explanations generated by PRINCE [Ghazimatin et al., WSDM’20]

users items

Nikolakopoulos and Karypis, RecWalk: Nearly uncoupled random walks for top-n recommendation, WSDM’19Ghazimatin et al., PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems, WSDM’20

★ Learning ( ):

users items

★ Learning ( ):

★ Updating the model:

users items

User Study for Data Collection

★ User studies in two domains: movies and books

★ Phase 1: Building users’ profiles○ 50 liked movies selected from Movielens website○ 50 liked books selected from Goodreads website

★ Phase 2: Collecting feedback on items and pairs○ 30 recommendations (generated at time T)

■ generated by RecWalk■ S: cosine similarity of item representations learned by applying NMF on item-feature matrix

○ 150 pairs corresponding to recommendations and their top 5 explanations○ 150 pairs corresponding to recommendations and their bottom 5 explanations

★ Phase 3: Collecting feedback on final recommendations (evaluation at time T+1)

Scenario #Item feedback

#Pair feedback Sessions Hours

Phase1 50 - 1 2

Phase2 30 300 5 10

Phase3 72 - 1 2

Total 152 300 7 14

Total (All users) ≃4000 7500 175 350

Phase1 50 - 1 2

Phase2 30 300 5 10

Phase3 72 - 1 2

Total 152 300 7 14

Total (All users) ≃4000 7500 175 350

Phase1 50 - 1 2

Phase2 30 300 5 10

Phase3 72 - 1 2

Total 152 300 7 14

Total (All users) ≃4000 7500 175 350

Phase1 50 - 1 2

Phase2 30 300 5 10

Phase3 72 - 1 2

Total 152 300 7 14

Total (All users) ≃4000 7500 175 350

Evaluation Configurations

★ Item-level feedback (baseline)

★ Pair-level feedback with explanations

★ Pair-level feedback with random items○ explanation items are replaced by the least relevant items from user’s history

★ Item+pair-level feedback with explanations

★ Item+pair-level feedback with random items○ explanation items are replaced by the least relevant items from user’s history

Results

★ Metrics: P@k, MAP@k, nDCG@k at time T+1

★ Key findings:

○ Pair-level feedback improves recommendation.

Setup [Movies] P@3 P@5 P@10

Item-level 0.253 0.368 0.484

Pair-level(Exp-1) 0.506* 0.592* 0.58*

Pair-level(Exp-3) 0.506* 0.568* 0.596*

Pair-level(Exp-5) 0.48* 0.512* 0.568*

Pair-level(Rand-1) 0.453* 0.504* 0.56*

Item+Pair-level (Exp-5) 0.533* 0.544* 0.596*

Item+Pair-level (Rand-5) 0.453* 0.488* 0.572*

Setup [Books] P@3 P@5 P@10

Item-level 0.253 0.336 0.436

Pair-level(Exp-1) 0.506* 0.528* 0.54*

Pair-level(Exp-3) 0.506* 0.536* 0.58*

Pair-level(Exp-5) 0.586* 0.6* 0.656*

Pair-level(Rand-1) 0.48* 0.504 0.504

Results

★ Key findings:

○ Pair-level feedback is more discriminative than item-level.

Item-level 0.253 0.368 0.484

Pair-level(Exp-1) 0.506* 0.592* 0.58*

Pair-level(Exp-3) 0.506* 0.568* 0.596*

Pair-level(Exp-5) 0.48* 0.512* 0.568*

Pair-level(Rand-1) 0.453* 0.504* 0.56*

Item-level 0.253 0.336 0.436

Pair-level(Exp-1) 0.506* 0.528* 0.54*

Pair-level(Exp-3) 0.506* 0.536* 0.58*

Pair-level(Exp-5) 0.586* 0.6* 0.656*

Pair-level(Rand-1) 0.48* 0.504 0.504

Results

★ Key findings:

○ Using explanations for pair-level feedback is essential.

Item-level 0.253 0.368 0.484

Pair-level(Exp-1) 0.506* 0.592* 0.58*

Pair-level(Exp-3) 0.506* 0.568* 0.596*

Pair-level(Exp-5) 0.48* 0.512* 0.568*

Pair-level(Rand-1) 0.453* 0.504* 0.56*

Item-level 0.253 0.336 0.436

Pair-level(Exp-1) 0.506* 0.528* 0.54*

Pair-level(Exp-3) 0.506* 0.536* 0.58*

Pair-level(Exp-5) 0.586* 0.6* 0.656*

Pair-level(Rand-1) 0.48* 0.504 0.504

Some insights

★ Users give much more positive feedback than negative.

-Movies- -Books-

Some insights

★ User behavior carries over from items to pairs (pearson correlation ≥ 0.5).

-Movies- -Books-

Some insights

★ User behavior carries over from items to pairs (pearson correlation ≥ 0.5)

★ Users mostly give feedback on shared genres/content between rec and exp

-Movies- -Books-

Summary

ELIXIR

★ A human-in-the-loop framework that elicits lightweight user feedback on pairs of recommendation and explanation items.

★ Learns user-specific item representations based on user feedback.

★ Instantiated with recommenders based on random walks with restart.

★ Major gains in recommendation quality over item-level feedback, shown with real user studies.

Summary

ELIXIR

★ A human-in-the-loop framework that elicits lightweight user feedback on pairs of recommendation and explanation items.

★ Learns user-specific item representations based on user feedback.

★ Instantiated with recommenders based on random walks with restart.

★ Major gains in recommendation quality over item-level feedback, shown with real user studies.

Thanks!

Explanations in Action

★ The role of explanations is mostly limited to improving user trust and satisfaction.

★ Critiqued-enabled recommenders [Jin et al., CIKM’19], [Luo et al., SIGIR’20]

○ Feedback on explicit and coarse-grained item properties ○ Negative feedback

★ ELIXIR○ elicits lightweight user feedback on similarity of recommendation and explanation items. ○ supports both positive and negative feedback.

Jin et al., MusicBot: Evaluating Critiquing-Based Music Recommenders with Conversational Interaction, CIKM’19Luo et al., Deep Critiquing for VAE-based Recommender Systems, SIGIR’20

Explanations in Action

★ The role of explanations is mostly limited to improving user trust and satisfaction.

★ Critiqued-enabled recommenders [Jin et al., CIKM’19], [Luo et al., SIGIR’20]

○ Feedback on explicit and coarse-grained item properties ○ Negative feedback

★ ELIXIR○ elicits lightweight user feedback on similarity of recommendation and explanation items. ○ supports both positive and negative feedback.

Jin et al., MusicBot: Evaluating Critiquing-Based Music Recommenders with Conversational Interaction, CIKM’19Luo et al., Deep Critiquing for VAE-based Recommender Systems, SIGIR’20

I want to stop seeing this item and the like.I want to see more items like this.

Anecdotal Examples

Item-level 0.253 0.368 0.484

Pair-level(Exp-1) 0.506* 0.592* 0.58*

Pair-level(Exp-3) 0.506* 0.568* 0.596*

Pair-level(Exp-5) 0.48* 0.512* 0.568*

Pair-level(Rand-1) 0.453* 0.504* 0.56*

Item-level 0.253 0.336 0.436

Pair-level(Exp-1) 0.506* 0.528* 0.54*

Pair-level(Exp-3) 0.506* 0.536* 0.58*

Pair-level(Exp-5) 0.586* 0.6* 0.656*

Pair-level(Rand-1) 0.48* 0.504 0.504

Results

★ Key findings:

Results

★ Key findings:

Results

★ Key findings:

○ Using explanations for pair-level feedback is essential.

User uFight Club

7 years in Tibet

The Prestige

Pulp Fiction

Do you like the similarity between (rec, exp):

Matrix

★ Label propagation [Zhu and Ghahramani, 2002] on pairs of items to mitigate sparsity ○ defining pseudo-items, selecting unlabeled pairs, using locality-sensitive hashing (LSH) for efficiency, …

Zhu and Ghahramani, Learning from labeled and unlabeled data with label propagation, CMU Technical Report (2002)

The Idea: Learn user-specific parameters of a mapping function to produceuser-specific item representations

User u

The Idea: Learn user-specific parameters of a mapping function to produceuser-specific item representations

User u

After feedback incorporation

Translation vs. Scaling

Translation

Scaling

Diversity

elixir: learning from user feedback on

Documents

user information feedback

360 feedback questionnaire user manual - myskillsprofile.com...

chicago elixir - elixir intro

user interface adaptation based on user feedback and

elixir: learning from user feedback on explanationsto

elixir ad-hoc report user manual

grosum : 360 feedback user guide

elixir™ 3 - elixir 1 service manual - sram

elixir report designer user manual - intelligence on …...

rainmakervt user feedback summary.091311

elixir report designer user...

elixir-2013-14 elixir 5 year programme...

elixir-uk and the elixir interoperability platform

2013 interim feedback dashboard user guide...page 1 of 25...

elixir report designer user manual

feedback pro extension - user guide

elixir data designer user...

360 feedback user guide

how elixir helped us scale our video user profile service...

elixir and elixir-uk training activities