elixir: learning from user feedback on

Post on 07-Feb-2022

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ELIXIR: Learning from User Feedback on Explanations to Improve Recommender Models

Azin Ghazimatin, Soumajit Pramanik, Rishiraj Saha Roy, Gerhard Weikum

Max Planck Institute for Informatics, Saarland Informatics CampusSaarbrücken, Germany

The Web Conference, 2021

Motivation

2

User u Fight Club

7 years in Tibet

The Prestige

Pulp Fiction

Explanation (exp): Recommendation (rec)

Motivation

3

I want to stop seeing this item and the like.

I want to see more items like this.

User u Fight Club

7 years in Tibet

The Prestige

Pulp Fiction

Explanation (exp): Recommendation (rec)

Motivation

So far: item-level rating

4

I want to stop seeing this item and the like.

I want to see more items like this.

User u Fight Club

7 years in Tibet

The Prestige

Pulp Fiction

Explanation (exp): Recommendation (rec)

Motivation

So far: item-level rating

5

I want to stop seeing this item and the like.

I want to see more items like this.

Brad Pitt

Surprise Ending

Violence

...

What is it that you (dis)like about the Fight Club?

User u Fight Club

7 years in Tibet

The Prestige

Pulp Fiction

Explanation (exp): Recommendation (rec)

Motivation

The idea: Leverage feedback on pairs of explanations and recommendations

6

I want to stop seeing this item and the like.

I want to see more items like this.

Similar aspect of (rec, exp):

Feedback on (rec, exp):

Cast: Brad Pitt

Storyline: Surprise Ending

Content: Violence

User u Fight Club

7 years in Tibet

The Prestige

Pulp Fiction

Explanation (exp): Recommendation (rec)

ELIXIR

Efficient Learning from Item-level eXplanations In Recommenders(Improving future recommendations through feedback on pairs of recommendation and explanation items)

7

User u

Fight Club

7 years in Tibet

The Prestige

Pulp Fiction

Explanation (exp): Recommendation (rec)at time T

Similar aspect of (rec, exp):

Feedback on (rec, exp):

Cast: Brad Pitt

Storyline: Surprise Ending

Content: Violence

Recommendation (rec)at time T+1

Ocean’s Eleven

ELIXIR

Collect and densify Feedback

Incorporate Feedback

ELIXIR: Feedback Matrix

8

...

...

User uFight Club

7 years in Tibet

The Prestige

Pulp Fiction

Explanation (exp): Recommendation (rec)at time T

Do you like the similarity between (rec, exp)?

0

1

1 1

1

-1

-1

. . .

Matrix

ELIXIR: Feedback Densification

9Zhu and Ghahramani, Learning from labeled and unlabeled data with label propagation, CMU Technical Report (2002)

★ Label propagation [Zhu and Ghahramani, 2002] on pairs of items to mitigate sparsity

ELIXIR: Feedback Densification

10Zhu and Ghahramani, Learning from labeled and unlabeled data with label propagation, CMU Technical Report (2002)

★ Label propagation [Zhu and Ghahramani, 2002] on pairs of items to mitigate sparsity

★ Present pairs ( , ) with pseudo-items (geometric mean of two vectors)

ELIXIR: Feedback Densification

11Zhu and Ghahramani, Learning from labeled and unlabeled data with label propagation, CMU Technical Report (2002)

★ Label propagation [Zhu and Ghahramani, 2002] on pairs of items to mitigate sparsity

★ Present pairs ( , ) with pseudo-items (geometric mean of two vectors)

★ The affinity matrix encodes pair-pair similarity. This makes huge! ( , is the set of items)

ELIXIR: Feedback Densification

12Zhu and Ghahramani, Learning from labeled and unlabeled data with label propagation, CMU Technical Report (2002)

★ Label propagation [Zhu and Ghahramani, 2002] on pairs of items to mitigate sparsity

★ Present pairs ( , ) with pseudo-items (geometric mean of two vectors)

★ The affinity matrix encodes pair-pair similarity. This makes huge! ( , is the set of items)

★ Propagate labels only on of the labeled pairs○ naive approach: ○ our approach:

■ find of among the items in using locality sensitive hashing (LSH) , we call this set . ■ Search for of in . ■ This reduces the time complexity to for small .

ELIXIR: Feedback Incorporation

The Idea: Learn user-specific parameters of a mapping function to produceuser-specific item representations.

13

similarity functionNo. feedback pairs

Similarity of items in negative (positive) pairs decreases (increases).

The Idea: Learn user-specific parameters of a mapping function to produceuser-specific item representations.

ELIXIR: Feedback Incorporation

14

Distance of items to user’s profile

similarity functionNo. feedback pairs

User u

Similarity of items in negative (positive) pairs decreases (increases).

ELIXIR: Feedback Incorporation

The Idea: Learn user-specific parameters of a mapping function to produceuser-specific item representations.

15

After feedback incorporation

similarity functionNo. feedback pairs

Distance of items to user’s profile

User u

Similarity of items in negative (positive) pairs decreases (increases).

Instantiation of ELIXIR

★ We instantiated our framework with RecWalkPR [Nikolakopoulos and Karypis, WSDM’19]

★ Explanations generated by PRINCE [Ghazimatin et al., WSDM’20]

16

users items

A S

Nikolakopoulos and Karypis, RecWalk: Nearly uncoupled random walks for top-n recommendation, WSDM’19Ghazimatin et al., PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems, WSDM’20

Instantiation of ELIXIR

★ We instantiated our framework with RecWalkPR [Nikolakopoulos and Karypis, WSDM’19]

★ Explanations generated by PRINCE [Ghazimatin et al., WSDM’20]

★ Learning ( ):

17

users items

A S

Nikolakopoulos and Karypis, RecWalk: Nearly uncoupled random walks for top-n recommendation, WSDM’19Ghazimatin et al., PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems, WSDM’20

Instantiation of ELIXIR

★ We instantiated our framework with RecWalkPR [Nikolakopoulos and Karypis, WSDM’19]

★ Explanations generated by PRINCE [Ghazimatin et al., WSDM’20]

★ Learning ( ):

★ Updating the model:

18

users items

A S

Nikolakopoulos and Karypis, RecWalk: Nearly uncoupled random walks for top-n recommendation, WSDM’19Ghazimatin et al., PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems, WSDM’20

User Study for Data Collection

★ User studies in two domains: movies and books

★ Phase 1: Building users’ profiles○ 50 liked movies selected from Movielens website○ 50 liked books selected from Goodreads website

★ Phase 2: Collecting feedback on items and pairs○ 30 recommendations (generated at time T)

■ generated by RecWalk■ S: cosine similarity of item representations learned by applying NMF on item-feature matrix

○ 150 pairs corresponding to recommendations and their top 5 explanations○ 150 pairs corresponding to recommendations and their bottom 5 explanations

★ Phase 3: Collecting feedback on final recommendations (evaluation at time T+1)

19

Scenario #Item feedback

#Pair feedback Sessions Hours

Phase1 50 - 1 2

Phase2 30 300 5 10

Phase3 72 - 1 2

Total 152 300 7 14

Total (All users) ≃4000 7500 175 350

User Study for Data Collection

★ User studies in two domains: movies and books

★ Phase 1: Building users’ profiles○ 50 liked movies selected from Movielens website○ 50 liked books selected from Goodreads website

★ Phase 2: Collecting feedback on items and pairs○ 30 recommendations (generated at time T)

■ generated by RecWalk■ S: cosine similarity of item representations learned by applying NMF on item-feature matrix

○ 150 pairs corresponding to recommendations and their top 5 explanations○ 150 pairs corresponding to recommendations and their bottom 5 explanations

★ Phase 3: Collecting feedback on final recommendations (evaluation at time T+1)

20

Scenario #Item feedback

#Pair feedback Sessions Hours

Phase1 50 - 1 2

Phase2 30 300 5 10

Phase3 72 - 1 2

Total 152 300 7 14

Total (All users) ≃4000 7500 175 350

User Study for Data Collection

★ User studies in two domains: movies and books

★ Phase 1: Building users’ profiles○ 50 liked movies selected from Movielens website○ 50 liked books selected from Goodreads website

★ Phase 2: Collecting feedback on items and pairs○ 30 recommendations (generated at time T)

■ generated by RecWalk■ S: cosine similarity of item representations learned by applying NMF on item-feature matrix

○ 150 pairs corresponding to recommendations and their top 5 explanations○ 150 pairs corresponding to recommendations and their bottom 5 explanations

★ Phase 3: Collecting feedback on final recommendations (evaluation at time T+1)

21

Scenario #Item feedback

#Pair feedback Sessions Hours

Phase1 50 - 1 2

Phase2 30 300 5 10

Phase3 72 - 1 2

Total 152 300 7 14

Total (All users) ≃4000 7500 175 350

User Study for Data Collection

★ User studies in two domains: movies and books

★ Phase 1: Building users’ profiles○ 50 liked movies selected from Movielens website○ 50 liked books selected from Goodreads website

★ Phase 2: Collecting feedback on items and pairs○ 30 recommendations (generated at time T)

■ generated by RecWalk■ S: cosine similarity of item representations learned by applying NMF on item-feature matrix

○ 150 pairs corresponding to recommendations and their top 5 explanations○ 150 pairs corresponding to recommendations and their bottom 5 explanations

★ Phase 3: Collecting feedback on final recommendations (evaluation at time T+1)

22

Scenario #Item feedback

#Pair feedback Sessions Hours

Phase1 50 - 1 2

Phase2 30 300 5 10

Phase3 72 - 1 2

Total 152 300 7 14

Total (All users) ≃4000 7500 175 350

Evaluation Configurations

★ Item-level feedback (baseline)

★ Pair-level feedback with explanations

★ Pair-level feedback with random items○ explanation items are replaced by the least relevant items from user’s history

★ Item+pair-level feedback with explanations

★ Item+pair-level feedback with random items○ explanation items are replaced by the least relevant items from user’s history

23

Results

★ Metrics: P@k, MAP@k, nDCG@k at time T+1

★ Key findings:

○ Pair-level feedback improves recommendation.

24

Setup [Movies] P@3 P@5 P@10

Item-level 0.253 0.368 0.484

Pair-level(Exp-1) 0.506* 0.592* 0.58*

Pair-level(Exp-3) 0.506* 0.568* 0.596*

Pair-level(Exp-5) 0.48* 0.512* 0.568*

Pair-level(Rand-1) 0.453* 0.504* 0.56*

Item+Pair-level (Exp-5) 0.533* 0.544* 0.596*

Item+Pair-level (Rand-5) 0.453* 0.488* 0.572*

Setup [Books] P@3 P@5 P@10

Item-level 0.253 0.336 0.436

Pair-level(Exp-1) 0.506* 0.528* 0.54*

Pair-level(Exp-3) 0.506* 0.536* 0.58*

Pair-level(Exp-5) 0.586* 0.6* 0.656*

Pair-level(Rand-1) 0.48* 0.504 0.504

Item+Pair-level (Exp-5) 0.493* 0.456* 0.56*

Item+Pair-level (Rand-5) 0.386* 0.416* 0.516*

Results

★ Metrics: P@k, MAP@k, nDCG@k at time T+1

★ Key findings:

○ Pair-level feedback improves recommendation.

○ Pair-level feedback is more discriminative than item-level.

25

Setup [Movies] P@3 P@5 P@10

Item-level 0.253 0.368 0.484

Pair-level(Exp-1) 0.506* 0.592* 0.58*

Pair-level(Exp-3) 0.506* 0.568* 0.596*

Pair-level(Exp-5) 0.48* 0.512* 0.568*

Pair-level(Rand-1) 0.453* 0.504* 0.56*

Item+Pair-level (Exp-5) 0.533* 0.544* 0.596*

Item+Pair-level (Rand-5) 0.453* 0.488* 0.572*

Setup [Books] P@3 P@5 P@10

Item-level 0.253 0.336 0.436

Pair-level(Exp-1) 0.506* 0.528* 0.54*

Pair-level(Exp-3) 0.506* 0.536* 0.58*

Pair-level(Exp-5) 0.586* 0.6* 0.656*

Pair-level(Rand-1) 0.48* 0.504 0.504

Item+Pair-level (Exp-5) 0.493* 0.456* 0.56*

Item+Pair-level (Rand-5) 0.386* 0.416* 0.516*

Results

★ Metrics: P@k, MAP@k, nDCG@k at time T+1

★ Key findings:

○ Pair-level feedback improves recommendation.

○ Pair-level feedback is more discriminative than item-level.

○ Using explanations for pair-level feedback is essential.

26

Setup [Movies] P@3 P@5 P@10

Item-level 0.253 0.368 0.484

Pair-level(Exp-1) 0.506* 0.592* 0.58*

Pair-level(Exp-3) 0.506* 0.568* 0.596*

Pair-level(Exp-5) 0.48* 0.512* 0.568*

Pair-level(Rand-1) 0.453* 0.504* 0.56*

Item+Pair-level (Exp-5) 0.533* 0.544* 0.596*

Item+Pair-level (Rand-5) 0.453* 0.488* 0.572*

Setup [Books] P@3 P@5 P@10

Item-level 0.253 0.336 0.436

Pair-level(Exp-1) 0.506* 0.528* 0.54*

Pair-level(Exp-3) 0.506* 0.536* 0.58*

Pair-level(Exp-5) 0.586* 0.6* 0.656*

Pair-level(Rand-1) 0.48* 0.504 0.504

Item+Pair-level (Exp-5) 0.493* 0.456* 0.56*

Item+Pair-level (Rand-5) 0.386* 0.416* 0.516*

Some insights

★ Users give much more positive feedback than negative.

27

-Movies- -Books-

Some insights

★ Users give much more positive feedback than negative.

★ User behavior carries over from items to pairs (pearson correlation ≥ 0.5).

28

-Movies- -Books-

Some insights

★ Users give much more positive feedback than negative.

★ User behavior carries over from items to pairs (pearson correlation ≥ 0.5)

★ Users mostly give feedback on shared genres/content between rec and exp

29

-Movies- -Books-

Summary

ELIXIR

★ A human-in-the-loop framework that elicits lightweight user feedback on pairs of recommendation and explanation items.

★ Learns user-specific item representations based on user feedback.

★ Instantiated with recommenders based on random walks with restart.

★ Major gains in recommendation quality over item-level feedback, shown with real user studies.

30

Summary

ELIXIR

★ A human-in-the-loop framework that elicits lightweight user feedback on pairs of recommendation and explanation items.

★ Learns user-specific item representations based on user feedback.

★ Instantiated with recommenders based on random walks with restart.

★ Major gains in recommendation quality over item-level feedback, shown with real user studies.

31

Thanks!

32

Explanations in Action

★ The role of explanations is mostly limited to improving user trust and satisfaction.

★ Critiqued-enabled recommenders [Jin et al., CIKM’19], [Luo et al., SIGIR’20]

○ Feedback on explicit and coarse-grained item properties ○ Negative feedback

★ ELIXIR○ elicits lightweight user feedback on similarity of recommendation and explanation items. ○ supports both positive and negative feedback.

Jin et al., MusicBot: Evaluating Critiquing-Based Music Recommenders with Conversational Interaction, CIKM’19Luo et al., Deep Critiquing for VAE-based Recommender Systems, SIGIR’20

33

Explanations in Action

★ The role of explanations is mostly limited to improving user trust and satisfaction.

★ Critiqued-enabled recommenders [Jin et al., CIKM’19], [Luo et al., SIGIR’20]

○ Feedback on explicit and coarse-grained item properties ○ Negative feedback

★ ELIXIR○ elicits lightweight user feedback on similarity of recommendation and explanation items. ○ supports both positive and negative feedback.

Jin et al., MusicBot: Evaluating Critiquing-Based Music Recommenders with Conversational Interaction, CIKM’19Luo et al., Deep Critiquing for VAE-based Recommender Systems, SIGIR’20

34

I want to see more items like this.

I want to stop seeing this item and the like.I want to see more items like this.

I want to stop seeing this item and the like.

Anecdotal Examples

35

Mov

ies

Boo

ks

36

Setup [Movies] P@3 P@5 P@10

Item-level 0.253 0.368 0.484

Pair-level(Exp-1) 0.506* 0.592* 0.58*

Pair-level(Exp-3) 0.506* 0.568* 0.596*

Pair-level(Exp-5) 0.48* 0.512* 0.568*

Pair-level(Rand-1) 0.453* 0.504* 0.56*

Item+Pair-level (Exp-5) 0.533* 0.544* 0.596*

Item+Pair-level (Rand-5) 0.453* 0.488* 0.572*

Setup [Books] P@3 P@5 P@10

Item-level 0.253 0.336 0.436

Pair-level(Exp-1) 0.506* 0.528* 0.54*

Pair-level(Exp-3) 0.506* 0.536* 0.58*

Pair-level(Exp-5) 0.586* 0.6* 0.656*

Pair-level(Rand-1) 0.48* 0.504 0.504

Item+Pair-level (Exp-5) 0.493* 0.456* 0.56*

Item+Pair-level (Rand-5) 0.386* 0.416* 0.516*

Results

★ Metrics: P@k, MAP@k, nDCG@k at time T+1

★ Key findings:

○ Pair-level feedback improves recommendation.

37

Results

★ Metrics: P@k, MAP@k, nDCG@k at time T+1

★ Key findings:

○ Pair-level feedback improves recommendation.

○ Pair-level feedback is more discriminative than item-level.

38

Results

★ Metrics: P@k, MAP@k, nDCG@k at time T+1

★ Key findings:

○ Pair-level feedback improves recommendation.

○ Pair-level feedback is more discriminative than item-level.

○ Using explanations for pair-level feedback is essential.

39

ELIXIR: Feedback Densification

40

...

...

User uFight Club

7 years in Tibet

The Prestige

Pulp Fiction

Explanation (exp): Recommendation (rec)at time T

Do you like the similarity between (rec, exp):

0

1

1 1

1

-1

-1

. . .

Matrix

★ Label propagation [Zhu and Ghahramani, 2002] on pairs of items to mitigate sparsity ○ defining pseudo-items, selecting unlabeled pairs, using locality-sensitive hashing (LSH) for efficiency, …

Zhu and Ghahramani, Learning from labeled and unlabeled data with label propagation, CMU Technical Report (2002)

ELIXIR: Feedback Incorporation

The Idea: Learn user-specific parameters of a mapping function to produceuser-specific item representations

41

Distance of items to user’s profile

User u

similarity functionNo. feedback pairs

ELIXIR: Feedback Incorporation

The Idea: Learn user-specific parameters of a mapping function to produceuser-specific item representations

42

Distance of items to user’s profile

User u

After feedback incorporation

similarity functionNo. feedback pairs

Translation vs. Scaling

43

Translation

Scaling

Diversity

44

top related