readinggroup xiang 24112016

30
1 Reading Group Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features Presenter:Xiang Zhang 05/07/2022

Upload: xiang-zhang

Post on 20-Mar-2017

35 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Readinggroup xiang 24112016

1

Reading GroupDeep Crossing: Web-Scale Modeling

without Manually Crafted Combinatorial Features

Presenter:Xiang Zhang

02/05/2023

Page 2: Readinggroup xiang 24112016

02/05/2023

1.Background2.Abstract3.Why choose it4.what’s the main idea5.How it works6.Implementation7.Expermentation8.Conclusions9.Experience

Main Content

2

Page 3: Readinggroup xiang 24112016

3

1. Background

02/05/2023

Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features

Authors from Microsoft Research

Keywords Neural Networks, DL, CNN, etc.

Conference KDD 2016

Page 4: Readinggroup xiang 24112016

4

1. Background

02/05/2023

KDD (A*) Conferences on Knowledge Discovery and Data mining

SIGKDD Special Interest Group on Knowledge Discovery in Data

KDD 2017 Aug.13-17, Halifax, CanadaDecember 9, 2016 (DDL)

Page 5: Readinggroup xiang 24112016

5

2.Abstract

Combine features Important and useful, but manually craft is time consuming and experience needed, accuracy unguaranteed especially in large scale, variety and volume of features

ContributionDeep neural network to automatically combine features

Tool Computational network Tool Kit, multi-GPU platform

02/05/2023

Page 6: Readinggroup xiang 24112016

6

3. Why choose it

Automatically combine features (reduce D)

Web-scale (Massive data)

Feature in different types & Dimensions

Better performance

02/05/2023

Page 7: Readinggroup xiang 24112016

7

4.what’s the main ideaWhat is feature extraction?

Individual features

An individual measurable property of a phenomenon being observed. (Representation of the data)

Combinatorial features

Defined in the joint space of individual features, to make the model shorter training time, simpler, better generalization.

02/05/2023

Page 8: Readinggroup xiang 24112016

02/05/2023

Manually:• Time• Experience

Automatically

4.what’s the main idea

Page 9: Readinggroup xiang 24112016

9

5.How it works

02/05/2023

ModelArchitecture

Page 10: Readinggroup xiang 24112016

02/05/2023

max(0, )O Ij j j jX W X b

( )j jm n ( 1)jn

( 1)jm

( 1)jm

( 1)jm ( 1)jm

( 1)jm

( 1)jn

Reduce Dj jm n

5.How it works5.1 Embedding layers

Page 11: Readinggroup xiang 24112016

02/05/2023

(0, )maxO Ij j j jX W X b

5.1 Embedding layers

Rectified linear unit (ReLU)• Elements non-negative

Activation function: a node defines the output of that node given an input 

• ReLU• Logistic• Tanh• Sigmoid

5.How it works

Page 12: Readinggroup xiang 24112016

12

5.How it works

02/05/2023

0 1, , ,O O O OKX X X X

5.2 stacking layers

Page 13: Readinggroup xiang 24112016

02/05/2023

0 1, , ,O O O OKX X X X

256, set 256

embedding & stackingj jn m

256,

stacking without embeddingjn

Feature Number

Inputs K

Embedding then stacking n

Stacking(non-embedding) K-n

Stacking all K

5.2 stacking layers

Stacking rules:Threshold: 256

5.How it works

Page 14: Readinggroup xiang 24112016

02/05/2023

0 1 0 1( , , , , )OR IR IRF W W B BX X X

𝑋 𝐼𝑅

5.3 Residual layers

• Inputs and outputs have the same size• Residual Unit is first used beyond image

recognition

5.How it works

Page 15: Readinggroup xiang 24112016

15

5.How it works

02/05/2023

5.4 Scoring layers

Sigmoid function:

𝑋 𝐼𝑅

Page 16: Readinggroup xiang 24112016

16

5.How it works

02/05/2023

1

1log ( log( ) (1 ) log(1 ))N

i i i ii

loss y p y pN

5.5 Objective function

Objective function: loss function or its negative

N: No. of samples : sample label : output of Model(predict)

𝑋 𝐼𝑅

𝒚 𝒊𝒑 𝒊

Page 17: Readinggroup xiang 24112016

02/05/2023

5.5 Early Crossing vs. Late Crossing

Deep CrossingDSSM

(Deep Semantic Similarity Model)

5.How it works

Page 18: Readinggroup xiang 24112016

18

6.Implementation

02/05/2023

SoftwareComputational Network Toolkit (CNKT)

Same theoretical foundation with Tensorflow

Hardware Multi-GPU platform

24 days (1 GPS) to 20 hours (32 GPUs)

Page 19: Readinggroup xiang 24112016

02/05/2023

7.Experimentation

7.1 Dataset

Page 20: Readinggroup xiang 24112016

02/05/2023

7.Experimentation

7.2 Performance on a Pair of Text Inputs

DSSM: late crossing

DP: early crossing

DC>DSSM

Page 21: Readinggroup xiang 24112016

21

7.Experimentation

2 May 2023

7.2 Performance on a Pair of Text Inputs

Production Model: one model can be used in sponsored search as baseline

DSSM < DC< Production

DC Main advantage: deal with many individual features

Page 22: Readinggroup xiang 24112016

22

7.Experimentation

2 May 2023

7.3 Beyond Text Input

All features works best

Page 23: Readinggroup xiang 24112016

23

7.Experimentation

2 May 2023

7.3 Beyond Text Input

Only counting feature is weak

Page 24: Readinggroup xiang 24112016

24

7.Experimentation

2 May 2023

7.3 Beyond Text Input

Counting feature is useful

Page 25: Readinggroup xiang 24112016

25

7.Experimentation

2 May 2023

7.3 Beyond Text Input

Performance changes a lot as features number changes; log loss suffers a big fluctuation with different feature combination. So that feature selection is meaningful.

Page 26: Readinggroup xiang 24112016

26

7.Experimentation

2 May 2023

7.4 Comparison with Production Models

2.2 billion samples

DC perform better with much less dataset

DC is easier to build and maintain

Page 27: Readinggroup xiang 24112016

27

8.Conclusions

Deep Crossing work well in automatically feature combinatorial in large scale

Need less time and experience

02/05/2023

Page 28: Readinggroup xiang 24112016

28

9.Experience

02/05/2023

• Deep learning (LSTM, CNN, etc.) can extract feature automatically, we can compare the efficient of them with this model

• we can use the raw data instead the individual features to train

• In different domains such mobile sensing, recommender system

Page 29: Readinggroup xiang 24112016

2902/05/2023

Thanks!

Page 30: Readinggroup xiang 24112016

3002/05/2023

Questions?