aws re:invent 2016: using mxnet for recommendation modeling at scale (mac306)

55
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. November 30, 2016 Using MXNet for Recommendation Modeling at Scale MAC306 Leo Dirac, Principal Engineer, AWS Deep Learning

Upload: amazon-web-services

Post on 16-Apr-2017

1.236 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

November 30, 2016

Using MXNet for

Recommendation Modeling at

Scale

MAC306

Leo Dirac, Principal Engineer, AWS Deep Learning

Page 2: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

What to Expect from the Session

Background on recommender systems and machine

learning.

Learn how to implement them on MXNet using p2

instances and the AWS Deep Learning AMI.

Explore several types of recommender systems, including

advanced deep learning ideas.

Learn tricks for handling sparse data in MXNet.

Page 3: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Background: Recommender

Systems & Machine Learning

Page 4: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Netflix Prize: 2006-2009

$1,000,000

4

Page 5: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Recommending Movies

/* Predict what Star Rating will user u give

movie m */

float predictRating(User u, Movie m) {

// How???

}

5

Page 6: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Q: How???

A: Machine Learning: Learn code from data

float predictRating(User u, Movie m) {

return mlModel.run(u,m);

}

6

Page 7: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Input

Data

Predictions

Training

Data Training Model

7

Page 8: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

8

Training

Data

All

Labelled

Data

75% 25%

.

.

Page 9: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

9

Training

Data TrainingTrial

Model

All

Labelled

Data

75% 25%

.

.

Page 10: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

10

Training

Data TrainingTrial

Model

Test

Data

All

Labelled

Data

75% 25%

.

.

Page 11: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

11

Training

Data TrainingTrial

Model

Evaluation

Result

Test

Data

All

Labelled

Data

75% 25%

.

.

Page 12: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

12

Training

Data TrainingTrial

Model

Evaluation

Result

Test

Data

Accuracy

All

Labelled

Data

75% 25%

.

.

Page 13: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Sparse Data

Page 14: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

User-Item Ratings Matrix

14

Page 15: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Size of user-item ratings matrix

15

Sample dataset: MovieLens 20M

(27,000 movies) * (138,000 users)

= 3,700,000,000 possible ratings

But only 20,000,000 ratings available.

99.5% of ratings are unknown.

http://grouplens.org/datasets/movielens/20m/

Page 16: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Storing the matrix

Dense

3.7B entries

Each entry:

•Rating: 1 byte

3.7 GB

Sparse

20M non-zero entries

Each entry:

•Rating: 1 byte

•Movie_id: 32-bit integer

•User_id: 32-bit integer

180 MB

16

Sparse is 20x smaller

Page 17: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Matrix Factorization

Page 18: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

MF as Math

18

Sparse

Behavior

Matrix≈It

em

sUsers

IxU

Ite

m

Em

bed

din

gs

X

User

Embeddings

IxD DxU

Page 19: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Embeddings

Emb(“The Karate Kid”) =

Amazon

Confidential

[-3.168

-0.136

3.770

4.767

3.558

-4.168

0.464

2.034

3.411

0.866]

Page 20: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Embeddings

Emb(“The Karate Kid”) = [-3.168

-0.136

3.770

4.767

3.558

-4.168

0.464

2.034

3.411

0.866]

Emb(“Ferris Bueller”) = [-3.101

-0.057

3.800

4.862

3.632

-4.157

0.549

2.064

3.428

0.884]

D(Emb(“K.Kid”) – Emb(“Ferris”)) = 0.138

D(Emb(“K.Kid”) – Emb(“My Little Pony”)) = 1.572

Page 21: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

MXNet

Page 22: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

22

Page 23: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

p2.xlarge

4,300,000,000,00032-bit floating point

operations/second

Page 24: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

GPUs: Feeding the beast

GPU

CoresGPU

RAM

PCI:~10 GB/s

CPU

240 GB/s

Ethernet2.5 GB/s

Page 25: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

p2.16xlarge

GPU

CPU Ethernet2.5 GB/s

PCIx: ~10 GB/s

GPU GPUGPU GPUGPU GPUGPU GPUGPU GPUGPU GPUGPU GPUGPU

Page 26: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

MXNet scaling

26

Page 27: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

MF as a neural network (NN)

27

User embedding

UUser (1-hot)Item (1-hot)

Item Embedding

embed embed

Dot Product

Rating

Page 28: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Deep Learning AMI with p2

Pre-installed:

• MXNet & other popular deep learning frameworks

• GPU Drivers, CUDA, cuDNN

• Jupyter notebook & python libraries

Page 30: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Binary Predictions

Page 31: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Why binary?

31

Page 32: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Binary user-item matrix

32

Page 33: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Original data

33

Page 34: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Predicting binary

float predictScore(User u, Movie m) {

return 1.0;

}

34

Page 35: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Original data

35

Page 36: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Negative sampling

36

Page 37: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Negative sampling

from mxreco import NegativeSamplingDataIter

train_data = NegativeSamplingDataIter(

train_data,

sample_ratio=5)

37

More details: BlackOut: Speeding up RNNLM w/ Very Large Vocabularies

Shihao Ji, S. V. N. Vishwanathan, Nadathur Satish, Michael J. Anderson, Pradeep Dubey

Page 39: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Content Features

Page 40: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

What do we know?

Behavioral interactions between users & items

Names of items

Pictures of items

What users searched for

40

Page 41: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

How to represent these in NN?

Unique Identifier: Embedding

Images: ConvNet (a.k.a. CNN)

Text: LSTM

Text: Bag of Words

41

Page 42: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Deep

Structured

Semantic

Model

42

DSSM

Embedding

URight ObjectLeft Object

Embedding

Deep Net Deep Net

Similarity

Label

Page 43: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

CosineLoss layer

43

import mxreco

pred = mxreco.CosineLoss(a=user, b=item,

label=label)

L~=0 L~=1 L~=2

Page 45: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Inspirational References

Learning Deep Structured Semantic Models for Web

Search using Clickthrough Data

• Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex

Acero, Larry Heck, October, 2013

Deep Neural Networks for YouTube Recommendations

• Paul Covington and Jay Adams and Emre Sargin, 2016

Order-Embeddings of Images and Language

• Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun, March

2016

45

Page 46: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

User-Level Models

Page 47: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Predicting with embeddings

def movies_for_user(u):

scores = {}

for m in movies:

score[m.id] = predictScore(u,m)

top_movies = sorted(scores.items()…)

return top_moves

47

Page 48: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

All content at once

def movies_for_user(u):

scores = userModel.predict(u)

top_movies = sorted(scores.items()…)

return top_movies

48

GPU

CoresGPU

RAM

PCI:~10 GB/s

CPU

240 GB/s

Ethernet2.5 GB/s

Page 49: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Multi-label neural network

Output Bag of Movies

Input Bag of Movies

Hidden

Units

Movie Probabilities

Loss & Gradient

UxN

NxU

Sparse input

Sparse output

Page 50: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Storing indexes

Conceptually:

• Predict: 1882, 2808, 24, 160, 1831, 2668

• Inputs: 2986, 329, 2012, 442, 512, 1544, 2615, 1037, 1876,

1917, 2532, 196, 1375, 1779, 2054, 2530, 2628, 1909, 2407,

316, 1356, 1603, 2046, 2428

50

Page 51: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Storing sparse data

Simpler if fixed width

Pad to end with “-1”

• Predict: 1882, 2808, 24, 160, 1831, 2668,-1,-1,-1,-1

• Inputs: 2986, 329, 2012, 442, 512, 1544, 2615, 1037, 1876,

1917, 2532, 196, 1375, 1779, 2054, 2530, 2628, 1909, 2407,

316, 1356, 1603, 2046, 2428,-1,-1,-1,-1,-1,-1,-1,-1

51

Page 52: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Trying It Yourself

Page 53: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Trying it yourself

Launch Deep Learning AMI

https://aws.amazon.com/marketplace/pp/B01M0AXXQB

Try examples in

https://github.com/dmlc/mxnet/example/recommender

Page 54: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Thank you!

Page 55: AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Remember to complete

your evaluations!