discrete content-aware matrix factorizationa lot of machines. 2017/8/17 discrete content-aware...

26
Discrete Content-aware Matrix Factorization Rui Liu, University of Electronic Science and Technology of China Joint work with Defu Lian (UESTC), Yong Ge (UA), Kai Zheng (UESTC), Xing Xie (Microsoft Research) and Longbing Cao (UTS) KDD 2017 ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2017/8/17 Discrete Content-Aware Matrix Factorization 1

Upload: others

Post on 30-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Discrete Content-aware Matrix Factorization

Rui Liu, University of Electronic Science and Technology of China

Joint work with Defu Lian (UESTC), Yong Ge (UA), Kai Zheng (UESTC), Xing Xie (Microsoft Research) and Longbing Cao (UTS)

KDD 2017ACM SIGKDD Conference on Knowledge Discovery and Data Mining

2017/8/17 Discrete Content-Aware Matrix Factorization 1

Page 2: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Outline

2017/8/17 Discrete Content-Aware Matrix Factorization 2

•Motivation

•Proposed framework

• Experiment

•Conclusion

Page 3: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

2017/8/17 Discrete Content-Aware Matrix Factorization 3

Motivation

Google News Recommendation

Tasks: recommends new articles based on click and search History

Scale:

• millions of users• millions of articles

Page 4: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

2017/8/17 Discrete Content-Aware Matrix Factorization 4

Amazon Product RecommendationTasks: recommends new articles based on click and search History

Scale: • 300 million users• 480 million products

Motivation

Page 5: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

2017/8/17 Discrete Content-Aware Matrix Factorization 5

Motivation

• How to generate immediate recommendation?• Pre-computing top-k preferred items for users?

• Distributed/parallel computing

User interests are evolving

New items are emerging

A lot of machines

Page 6: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

2017/8/17 Discrete Content-Aware Matrix Factorization 6

Economic and Effective Way

1 -1 1

-1 1 1

-1 1 1

1 1 1

-1 1 -1

-1 1 1

1 -1 1 1 -1 -1 1 -1 1 1 1

-1 1 1 1 1 -1 1 1 -1 1 -1

1 -1 1 1 1 1 -1 1 1 1 -1

sign(𝑥) = ቊ1, 𝑥 > 0

−1, 𝑥 < 0

Page 7: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

2017/8/17 Discrete Content-Aware Matrix Factorization 7

Economic and Effective Way

< 𝑝𝑢, 𝑞𝑖 >=

𝑑=1

𝐷

𝑝𝑢𝑑𝑞𝑖𝑑

< 𝜙𝑢, 𝜓𝑖 >= 2𝐻 𝜙𝑢, 𝜓𝑖 − 𝐷𝐻 𝜙𝑢, 𝜓𝑖

𝜙𝑢𝜓𝑖

2KB

64B

10 Million products

20G 640M

D=512

𝑝𝑢, 𝑞𝑖 ∈ 𝑅𝐷

𝜙𝑢, 𝜓𝑖 ∈ ±1 𝐷

1

10

100

1000

10000

16 24 32 48

Ru

nn

ing

tim

e

D

Continous Hamming Ranking Multi-Index Hashing

Page 8: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

2017/8/17 Discrete Content-Aware Matrix Factorization 8

Related Work

• Dimension Reduction Techniques for Recommendation Systems• Matrix Factorization (KDD’11)

• Map users and items into low-dimension latent space

• Content-aware Matrix Factorization (RecSys’13, ICDM’15)• Take content information into account, help solve cold-start problem

• Weighted Regularized Matrix Factorization• Work well in collaborative filtering for implicit feedback

• Distributed Computing Techniques for RS• Large-scale Parallel Collaborative Filtering

• Parallel update to improve efficiency

Page 9: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

2017/8/17 Discrete Content-Aware Matrix Factorization 9

Related Work

• Learning Hashing Codes for Recommendation Systems• Binary Code learning method for Collaborative Filtering (KDD’12)

• Obtained binary codes preserve preference of users

• Preference Preserving Hashing (SIGIR’14)• Come up with a novel quantization algorithm

• Discrete Collaborative Filtering (SIGIR’16)• Tackle discrete optimization directly and efficiently

1. Cold-start problem2. Implicit feedback3. Classification

Page 10: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

2017/8/17 Discrete Content-Aware Matrix Factorization 10

Task

• DCMF: a joint framework• Incorporate content information into the total framework

• Solve cold-start problem

• Combine the Weighted Regularized Matrix Factorization• Solve for implicit feedback

• Find a solution for logit loss• Solve for classification

• A direct discrete optimization model• Update efficiently

Page 11: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Outline

2017/8/17 Discrete Content-Aware Matrix Factorization 11

•Motivation

•Proposed framework

• Experiment

•Conclusion

Page 12: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Total Framework

2017/8/17 Discrete Content-Aware Matrix Factorization 12

min𝚽,𝚿,𝐏,𝐐

𝑖,𝑗∈Ω

ℓ 𝑟𝑖𝑗 , 𝜙𝑖′𝜓𝑗

𝑠. 𝑡.𝚽 ∈ ±1 𝑀×𝐷 , 𝚿 ∈ ±1 𝑁×𝐷

𝟏′𝐏 = 0, 𝟏′𝐐 = 0, 𝐏′𝐏 = 𝑀𝐈, 𝐐′𝐐 = 𝑁𝐈

+𝛼1 𝚽− 𝐏 𝐹2 + 𝛼2 𝚿−𝐐 𝐹

2

+λ1 𝚽− 𝐗𝐔 𝐹2 + λ2 𝚿− 𝐘𝐕 𝐹

2 +𝛾1 𝐔 𝐹2 + 𝛾2 𝐕 𝐹

2 +𝛽

𝑖,𝑗

(𝜙𝑖′𝜓𝑗)

2𝐔, 𝐕

balance and decorrelation constraints

Interaction regularization

Make use of implicit feedback(challenge 2)

Content-aware term

Take content information of users and items into account(challenge 1)

vRegularization

·Regression task:

ℓ 𝑟𝑖𝑗 , 𝜙𝑖′𝜓𝑗 = 𝑟𝑖𝑗 − 𝜙𝑖

′𝜓𝑗2

·Classification task:

ℓ 𝑟𝑖𝑗 , 𝜙𝑖′𝜓𝑗 = log 1 + 𝑒−𝑟𝑖𝑗𝜙𝑖

′𝜓𝑗

(challenge 3)

Page 13: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Alternative Update Rule

2017/8/17 Discrete Content-Aware Matrix Factorization 13

𝐔 = 𝐗′𝐗 +γ1λ1𝐈𝐹

−1

𝐗′𝚽 𝐕 = 𝐘′𝐘 +γ2λ2𝐈𝐿

−1

𝐘′𝚿

𝐏 = 𝑀 𝑺𝑷, 𝑺𝑷 𝑻𝑷, 𝑻𝑷 ′ 𝐐 = 𝑁 𝑺𝑸, 𝑺𝑸 𝑻𝑸, 𝑻𝑸 ′

𝑺𝑷 ∈ ℝ𝑀×෩𝐷 𝑺𝑷 ∈ ℝ𝑀×(𝐷−෩𝐷)

𝑻𝑷 ∈ ℝ𝐷×෩𝐷 𝑻𝑷 ∈ ℝ𝐷×(𝐷−෩𝐷)

𝑺𝑸 ∈ ℝ𝑁×෩𝐷 𝑺𝑸 ∈ ℝ𝑁×(𝐷−෩𝐷)

𝑻𝑸 ∈ ℝ𝐷×෩𝐷 𝑻𝑸 ∈ ℝ𝐷×(𝐷−෩𝐷)

𝜙𝑖𝑑∗ = 𝑠𝑔𝑛 𝐾 𝜙𝑖𝑑 , 𝜙𝑖𝑑

𝜙𝑖𝑑 =

𝑗∈𝕝𝑖

(𝑟𝑖𝑗 − Ƹ𝑟𝑖𝑗 + 𝜙𝑖𝑑𝜓𝑗𝑑)𝜓𝑗𝑑 + 𝛼1𝑝𝑖𝑑

+λ1𝒖𝑑′ 𝒙𝑖 − 𝛽𝜙𝑖

′𝚿′𝜓𝑑 + 𝛽𝑁𝜙𝑖𝑑

𝜓𝑗𝑑∗ = 𝑠𝑔𝑛 𝐾 𝜓𝑗𝑑 , 𝜓𝑗𝑑

𝜓𝑗𝑑 =

𝑖∈𝕝𝑗

(𝑟𝑖𝑗 − Ƹ𝑟𝑖𝑗 + 𝜙𝑖𝑑𝜓𝑗𝑑)𝜙𝑖𝑑 + 𝛼2𝑞𝑗𝑑

+λ2𝒗𝑑′ 𝒚𝑗 − 𝛽𝜓𝑗

′𝚽′𝜙𝑑 + 𝛽𝑀𝜓𝑗𝑑

𝐾 𝑥, 𝑦 = ቊ𝑥, 𝑥 ≠ 0𝑦, 𝑥 = 0

(𝑟𝑖𝑗/4 − λ( Ƹ𝑟𝑖𝑗) Ƹ𝑟𝑖𝑗 + λ( Ƹ𝑟𝑖𝑗)𝜙𝑖𝑑𝜓𝑗𝑑)𝜓𝑗𝑑 (𝑟𝑖𝑗/4 − λ( Ƹ𝑟𝑖𝑗) Ƹ𝑟𝑖𝑗 + λ( Ƹ𝑟𝑖𝑗)𝜙𝑖𝑑𝜓𝑗𝑑)𝜙𝑖𝑑

Page 14: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Outline

2017/8/17 Discrete Content-Aware Matrix Factorization 14

•Motivation

•Proposed framework

• Experiment

•Conclusion

Page 15: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Dataset and Metric

MovieLens, classic MovieLens 10M dataset Yelp, the latest Yelp Challenge dataset Amazon, a subset of product reviews and metadata for Amazon books

2017/8/17 Discrete Content-Aware Matrix Factorization 15

Metric: NDCG for regressionMPR for classification

Dataset #users #items #ratings Density

MovieLens 69,838 8,940 9,983,758 1.60%

Yelp 13,679 12,922 640,143 0.36%

Amazon 35,151 33,195 1,732,060 0.15%

Data statistics

Page 16: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Baselines

2017/8/17 Discrete Content-Aware Matrix Factorization 16

• DCF• Hashing-based collaborative filtering.

• Outperforms almost all two-stage binary code learning methods for collaborative filtering including BCCF, PPH, CH.

• libFM: • Feature-based recommendation system.

• Has achieved the best sole-model for the track I challenge-link prediction-of KDD-Cup 2012.

• Supports both regression and classification tasks of recommendation.

Page 17: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Comparison with baselines - Regression

2017/8/17 Discrete Content-Aware Matrix Factorization 17

• Add content information

→ performance ↑

• Effective discrete optimization algorithm

Page 18: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Comparison with baselines - Regression

2017/8/17 Discrete Content-Aware Matrix Factorization 18

• Add content information

→ performance ↑

• Effective discrete optimization algorithm

Page 19: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Comparison with baselines - Regression

2017/8/17 Discrete Content-Aware Matrix Factorization 19

• Add content information

→ performance ↑

• Effective discrete optimization algorithm

Page 20: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Comparison with baselines - Regression

2017/8/17 Discrete Content-Aware Matrix Factorization 20

• Address cold-start problem well

• Effective discrete optimization algorithm

Page 21: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Comparison with baselines - Regression

2017/8/17 Discrete Content-Aware Matrix Factorization 21

• Address cold-start problem well

• Effective discrete optimization algorithm

Page 22: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Comparison with baselines - Regression

2017/8/17 Discrete Content-Aware Matrix Factorization 22

• Address cold-start problem well

• Effective discrete optimization algorithm

Page 23: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Comparison with baselines - Classification

2017/8/17 Discrete Content-Aware Matrix Factorization 23

Effectiveness of content information and iteration regularization Benefit of the use of logit loss Superiority of DCMF to DCF Validity of the proposed discrete optimization algorithm Superiority of DCMF to libFM on relatively sparse datasets

Page 24: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Convergence Study

2017/8/17 Discrete Content-Aware Matrix Factorization 24

Page 25: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Outline

2017/8/17 Discrete Content-Aware Matrix Factorization 25

•Motivation

•Proposed framework

• Experiment

•Conclusion

Page 26: Discrete Content-aware Matrix FactorizationA lot of machines. 2017/8/17 Discrete Content-Aware Matrix Factorization 6 Economic and Effective Way 1 -1 1-1 1 1-1 1 1 1 1 1-1 1 -1-1 1

Conclusion

• Proposed a new framework called DCMF to hash users and items with content information in both regression and classification tasks.

Thank you!Rui Liu, UESTC

[email protected]

2017/8/17 Discrete Content-Aware Matrix Factorization 26

• Developed an efficient discrete optimization algorithm for tackling discretized constraints as well as interaction regularization.

• Proved superiority in 3 public datasets