industry day theme #3corprel.iitd.ac.in/id2018/assets/file/posters/ai ml... · kdd 2016. •...

Abstract

• Extreme classification deals with problem with extremely large number of labels of the order O(106).

• The SwiftXML algorithm is developed to tackle such warm-start applications by leveraging label features.

• SwiftXML improves upon the state-of-the-art tree based extreme classifiers by partitioning tree nodes using two hyperplanes learnt jointly in the label and data point feature spaces.

Introduction

• Existing Extreme Classification methods do not leverage label feature data.

• Existing warm start methods do not scale to extreme settings.

Materials and Methods

Conclusions

• Can leverage item features which can provide rich, and complementary, source of information to the user features relied on by traditional extreme classifiers.

• Improvement over existing warm start classifiers as well as XML classifiers and scale to large number of labels.

Sc

ho

ol

of

Info

rm

at

ion

Te

ch

no

log

y,

IIT

D

Industrial Significance• Applications in Ads, tagging and recommendation etc.• In a live deployment for sponsored search on Bing,

SwiftXML could increase the relative CTR by 10 % while simultaneously reducing the BR by 30 %.

Technology Readiness Level Ready!

(Y. Prabhu, A. Kag, S. Gopinath, K. Dahiya, S. Harsola, R. Agrawal and M. Varma*)

Kunal Dahiya (Ph.D. Scholar)

Experiments/Results

• Word2Vec for label features.

• Up to 14% acc-urate predictions

• Performs better than early/late fusion.

Bing Sponsored Search DeploymentSwiftXML predictions generated up to 8% higher Click through rates and 9% lower bounce rates.

Industry Day Theme #3: Artificial Intelligence, Machine

Learning and Blockchain Technologies (AMB)

Label Feature Data (Z)

Users

Labels (Items)

Rating Matrix (R)

User Feature Data (X)

B

G

H

K

User Feature Space Item Feature Space

CD

E

Node

𝑚𝑖𝑛𝑤𝑥,𝛿,𝑟± ||𝑤𝑥||1 + ||𝑤𝑧||1 + 𝐶Σ𝑖 log 1 + 𝑒−𝛿(𝛼𝑤𝑥

𝑡𝑥𝑖+ 1−𝛼 𝑤𝑧𝑡𝑧𝑖)

−𝐶𝑟 Σ𝑖ℒ𝑛𝐷𝐶𝐺@𝐿 𝑟𝛿𝑖 , 𝑦𝑖

𝑤𝑥 𝜖𝑅𝐷𝑥 , 𝑤𝑧 𝜖𝑅

𝐷𝑧, 𝛿𝜖 −1,+1 𝐿 , 𝑟+, 𝑟−𝜖𝜋(1, 𝐿)

𝑤𝑥 User feature hyperplane𝑤𝑧 Item feature hyperplane𝛿 Partition assignment of users

𝑟+, 𝑟− Label ranking in each partition

0

5

10

15

20

25

20% Revealed 40% Revealed 60% Revealed 80% Revealed

Precision@5 for Wiki10 dataset

WRMF PfastreXML SLEEC PDSparse DiSMEC IMC SwiftXML

30

31

32

33

34

35

36

37

38

39


Precision@5 for Wikipedia-500K dataset

PfastreXML PfastreXML-early PfastreXML-late SwiftXML

30

31

32

33

34

35

36

37

38

39


Precision@5 for Wikipedia-500K dataset

PfastreXML SwiftXML

User Feature Data

train data

test data

Rating Matrix

N = 20, 40, 60, 80

Evaluation on datasets

Algorithm Relative CTR Relative QOA Relative BR

Bing-ensemble 100 100 100

PfastreXML 102 103 76

SwiftXML 110 112 69

F

A

I

J

A

F

B

G, H, I

J

K

C, D, E

References

• Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In WSDM 2018.

• Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In KDD 2016.

• Distributed Representations of Words and Phrases and their Compositionality. In CoRR 2013

• DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification, In WSDM 2017.

AcknowledgementJoint work with Y. Prabhu, A. Kag, S. Gopinath, S. Harsola, R. Agrawal and M. Varma*

0

10

20

30

40

50

60


Precision@5 for Amazon-670K dataset

PfastreXML SLEEC DiSMEC SwiftXML

industry day theme #3corprel.iitd.ac.in/id2018/assets/file/posters/ai ml... · kdd 2016. •...

Documents