industry day theme #3corprel.iitd.ac.in/id2018/assets/file/posters/ai ml... · kdd 2016. •...

1
Abstract Extreme classification deals with problem with extremely large number of labels of the order O(10 6 ). The SwiftXML algorithm is developed to tackle such warm-start applications by leveraging label features. SwiftXML improves upon the state-of-the-art tree based extreme classifiers by partitioning tree nodes using two hyperplanes learnt jointly in the label and data point feature spaces. Introduction Existing Extreme Classification methods do not leverage label feature data. Existing warm start methods do not scale to extreme settings. Materials and Methods Conclusions Can leverage item features which can provide rich, and complementary, source of information to the user features relied on by traditional extreme classifiers. Improvement over existing warm start classifiers as well as XML classifiers and scale to large number of labels. School of Information Technology, IITD Industrial Significance Applications in Ads, tagging and recommendation etc. In a live deployment for sponsored search on Bing, SwiftXML could increase the relative CTR by 10 % while simultaneously reducing the BR by 30 %. Technology Readiness Level Ready! (Y. Prabhu, A. Kag, S. Gopinath, K. Dahiya, S. Harsola, R. Agrawal and M. Varma*) Kunal Dahiya (Ph.D. Scholar) Experiments/Results Word2Vec for label features. Up to 14% acc- urate predictions Performs better than early/late fusion. Bing Sponsored Search Deployment SwiftXML predictions generated up to 8% higher Click through rates and 9% lower bounce rates. Industry Day Theme #3: Artificial Intelligence, Machine Learning and Blockchain Technologies (AMB) Label Feature Data (Z) Users Labels (Items) Rating Matrix (R) User Feature Data (X) B G H K User Feature Space Item Feature Space C D E Node ,, ± || || 1 + || || 1 + Σ log 1+ −( + 1− ) Σ @ , , , −1, +1 , + , (1, ) User feature hyperplane Item feature hyperplane Partition assignment of users + , Label ranking in each partition 0 5 10 15 20 25 20% Revealed 40% Revealed 60% Revealed 80% Revealed Precision@5 for Wiki10 dataset WRMF PfastreXML SLEEC PDSparse DiSMEC IMC SwiftXML 30 31 32 33 34 35 36 37 38 39 20% Revealed 40% Revealed 60% Revealed 80% Revealed Precision@5 for Wikipedia-500K dataset PfastreXML PfastreXML-early PfastreXML-late SwiftXML 30 31 32 33 34 35 36 37 38 39 20% Revealed 40% Revealed 60% Revealed 80% Revealed Precision@5 for Wikipedia-500K dataset PfastreXML SwiftXML User Feature Data train data test data Rating Matrix N = 20, 40, 60, 80 Evaluation on datasets Algorithm Relative CTR Relative QOA Relative BR Bing-ensemble 100 100 100 PfastreXML 102 103 76 SwiftXML 110 112 69 F A I J A F B G, H, I J K C, D, E References Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In WSDM 2018. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In KDD 2016. Distributed Representations of Words and Phrases and their Compositionality. In CoRR 2013 DiSMEC - Distributed Sparse Machines for Extreme Multi- label Classification, In WSDM 2017. Acknowledgement Joint work with Y. Prabhu, A. Kag, S. Gopinath, S. Harsola, R. Agrawal and M. Varma* 0 10 20 30 40 50 60 20% Revealed 40% Revealed 60% Revealed 80% Revealed Precision@5 for Amazon-670K dataset PfastreXML SLEEC DiSMEC SwiftXML

Upload: others

Post on 02-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Industry Day Theme #3corprel.iitd.ac.in/id2018/assets/file/posters/AI ML... · KDD 2016. • Distributed Representations of Words and Phrases and their Compositionality. In CoRR 2013

Abstract

• Extreme classification deals with problem with extremely large number of labels of the order O(106).

• The SwiftXML algorithm is developed to tackle such warm-start applications by leveraging label features.

• SwiftXML improves upon the state-of-the-art tree based extreme classifiers by partitioning tree nodes using two hyperplanes learnt jointly in the label and data point feature spaces.

Introduction

• Existing Extreme Classification methods do not leverage label feature data.

• Existing warm start methods do not scale to extreme settings.

Materials and Methods

Conclusions

• Can leverage item features which can provide rich, and complementary, source of information to the user features relied on by traditional extreme classifiers.

• Improvement over existing warm start classifiers as well as XML classifiers and scale to large number of labels.

Sc

ho

ol

of

Info

rm

at

ion

Te

ch

no

log

y,

IIT

D

Industrial Significance• Applications in Ads, tagging and recommendation etc.• In a live deployment for sponsored search on Bing,

SwiftXML could increase the relative CTR by 10 % while simultaneously reducing the BR by 30 %.

Technology Readiness Level Ready!

(Y. Prabhu, A. Kag, S. Gopinath, K. Dahiya, S. Harsola, R. Agrawal and M. Varma*)

Kunal Dahiya (Ph.D. Scholar)

Experiments/Results

• Word2Vec for label features.

• Up to 14% acc-urate predictions

• Performs better than early/late fusion.

Bing Sponsored Search DeploymentSwiftXML predictions generated up to 8% higher Click through rates and 9% lower bounce rates.

Industry Day Theme #3: Artificial Intelligence, Machine

Learning and Blockchain Technologies (AMB)

Label Feature Data (Z)

Users

Labels (Items)

Rating Matrix (R)

User Feature Data (X)

B

G

H

K

User Feature Space Item Feature Space

CD

E

Node

𝑚𝑖𝑛𝑤𝑥,𝛿,𝑟± ||𝑤𝑥||1 + ||𝑤𝑧||1 + 𝐶Σ𝑖 log 1 + 𝑒−𝛿(𝛼𝑤𝑥

𝑡𝑥𝑖+ 1−𝛼 𝑤𝑧𝑡𝑧𝑖)

−𝐶𝑟 Σ𝑖ℒ𝑛𝐷𝐶𝐺@𝐿 𝑟𝛿𝑖 , 𝑦𝑖

𝑤𝑥 𝜖𝑅𝐷𝑥 , 𝑤𝑧 𝜖𝑅

𝐷𝑧, 𝛿𝜖 −1,+1 𝐿 , 𝑟+, 𝑟−𝜖𝜋(1, 𝐿)

𝑤𝑥 User feature hyperplane𝑤𝑧 Item feature hyperplane𝛿 Partition assignment of users

𝑟+, 𝑟− Label ranking in each partition

0

5

10

15

20

25

20% Revealed 40% Revealed 60% Revealed 80% Revealed

Precision@5 for Wiki10 dataset

WRMF PfastreXML SLEEC PDSparse DiSMEC IMC SwiftXML

30

31

32

33

34

35

36

37

38

39

20% Revealed 40% Revealed 60% Revealed 80% Revealed

Precision@5 for Wikipedia-500K dataset

PfastreXML PfastreXML-early PfastreXML-late SwiftXML

30

31

32

33

34

35

36

37

38

39

20% Revealed 40% Revealed 60% Revealed 80% Revealed

Precision@5 for Wikipedia-500K dataset

PfastreXML SwiftXML

User Feature Data

train data

test data

Rating Matrix

N = 20, 40, 60, 80

Evaluation on datasets

Algorithm Relative CTR Relative QOA Relative BR

Bing-ensemble 100 100 100

PfastreXML 102 103 76

SwiftXML 110 112 69

F

A

I

J

A

F

B

G, H, I

J

K

C, D, E

References

• Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In WSDM 2018.

• Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In KDD 2016.

• Distributed Representations of Words and Phrases and their Compositionality. In CoRR 2013

• DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification, In WSDM 2017.

AcknowledgementJoint work with Y. Prabhu, A. Kag, S. Gopinath, S. Harsola, R. Agrawal and M. Varma*

0

10

20

30

40

50

60

20% Revealed 40% Revealed 60% Revealed 80% Revealed

Precision@5 for Amazon-670K dataset

PfastreXML SLEEC DiSMEC SwiftXML