online passive-aggressive...

38
Online Passive-Aggressive Algorithms Tirgul 11

Upload: others

Post on 23-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Online Passive-Aggressive Algorithms

Tirgul 11

Page 2: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Multi-Label Classification

2

Page 3: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Multilabel Problem: Example

• Mapping Apps to smart folders:

3

Assign an installed app to one or more folders Candy Crush

Saga

Page 4: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Goal

• Given 𝐴, an installed app:

• Assign it to 𝒚 ⊆ 𝒴 the set of relevant folders:

• Where 𝒴 is the set of all folders:

4

Farm Story

SocialGames

GamesSocial Music Photography Shopping

Page 5: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Multilabel Classification

• A variant of the classification problem:• Multiple target labels must by assigned to each instance.

• Setting:• There are 𝑘 different possible labels: 𝒴 = 1,… , 𝑘 .

• Every instance 𝒙𝑖 is associated with a set of relevant labels 𝒚𝑖.

• Special case: • There is a single relevant label for each instance: multiclass (single-label) classification.

5

Page 6: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Multilabel Classification

• Other usage:• Text categorization:

• 𝒙𝑖 represents a document.

• 𝒚𝑖 is the set of topics which are relevant to the document• Chosen from a predefined collection of topics.

• E.g.: A text might be about any of religion, politics, finance or education at the same time or none of these.

6

Page 7: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Task

7

Assign user’s installed applications to one ore

more smart folders in an automatic way

Page 8: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Task

• Training set of examples:

• Each example (𝐴𝑖 , 𝒚𝑖) contains an application and a set of one or more smart folders:

8

Farm Story SocialGames

Page 9: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Model

9

Find a function 𝒚 = 𝑓𝒘 𝐴 with parameters 𝒘

that assigns a set of folders 𝒚 to an installed

app 𝐴.

Page 10: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Model and Inference

10

weight vectorsfeature maps

𝑦 = 𝑓𝑤 𝐴 = 𝑎𝑟𝑔𝑠𝑜𝑟𝑡𝒚𝒘𝜙(𝐴, 𝒚)

Page 11: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Feature maps

TF-IDF of title

Google Play’s category

TF-IDF of description

Representation of

related apps

Page 12: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Inference

• Algorithm’s output upon receiving an instance 𝒙𝑖:• A score for each of the 𝑘 labels in 𝒴.

12

GamesSocial Music Photography Shopping

𝒴 =

0.6, 0.3, 0.7, 0.1, 0.01

𝒙𝑖 =Farm Story

𝑦𝑖 = 𝑎𝑟𝑔𝑠𝑜𝑟𝑡𝒚𝒘𝜙(𝐴𝑖 , 𝒚)

Page 13: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Inference

• The algorithm’s prediction is a vector in ℝ𝑘 where each element in the vector corresponds to the score assigned to the respective label.

• This form of prediction is often referred to as label ranking.

13

GamesSocial Music Photography Shopping

𝒴 =

0.6, 0.3, 0.7, 0.1, 0.01

𝑦 = 𝑎𝑟𝑔𝑠𝑜𝑟𝑡𝒚𝒘𝜙(𝐴, 𝒚)

∈ ℝ𝑘

Page 14: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Inference

• For a pair of labels 𝑟, 𝑠 ∈ 𝒴:• If score 𝑟 > 𝑠𝑐𝑜𝑟𝑒(𝑠):

• Label 𝑟 is ranked higher than label 𝑠.

• Goal of Algorithm:• Rank every relevant label above every

irrelevant label.

14

Social

Music

Games

Photography

Shopping

0.6

0.3

Page 15: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

The Margin

• Example: 𝐴𝑖 , 𝒚𝑖 = , .

• After making predictions 𝒚𝒊, the algorithm receives the correct set 𝒚𝑖 .

1. Find the least probable correct folder:

2. Find the most probable wrong folder:

15

Games

Music

Social

Photography

Shopping

Games Social

Games

Music

max

Page 16: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Iterate over examples

• Example: 𝐴𝑖 , 𝒚𝑖 = , .

• After making predictions 𝒚𝒊, the algorithm receives the correct set 𝒚𝑖 .

• Update:

16

Games

Music

Social

Photography

Shopping

Games Social

max

Page 17: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

The Margin

• We define the margin attained by the algorithm on round 𝑖 for example 𝒙𝑖 , 𝒚𝑖 ,

17

𝛾 𝒘𝑖 , 𝒙𝑖 , 𝑦𝑖 = min𝑟∈𝑦𝑖

𝒘𝑖 𝜙 𝒙𝑖 , 𝑟 − max𝑠 ∉𝑦𝑖

𝒘𝑖𝜙 𝒙𝑖 , 𝑠 .Games

Music

Social

Photography

Shopping

Page 18: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

The Margin

• The margin is positive if all relevant labels are ranked higher than all irrelevant labels.

• We are not satisfied with only a positive margin; we require the margin of every prediction to be at least 1.

18

𝛾 𝒘𝑖 , 𝒙𝑖 , 𝑦𝑖 = min𝑟∈𝑦𝑖

𝒘𝑖 𝜙 𝒙𝑖 , 𝑟 − max𝑠 ∉𝑦𝑖

𝒘𝑖𝜙 𝒙𝑖 , 𝑠 .

Page 19: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

The Loss

• Define a hinge loss:

19

𝛾 𝒘𝑖 , 𝒙𝑖 , 𝑦𝑖 = min𝑟∈𝑦𝑖

𝒘𝑖 𝜙 𝒙𝑖 , 𝑟 − max𝑠 ∉𝑦𝑖

𝒘𝑖𝜙 𝒙𝑖 , 𝑠 .

ℓ 𝑤𝑖 , 𝑥𝑖 , 𝑦𝑖 = 0 𝛾 𝒘𝑖 , 𝒙𝑖 , 𝑦𝑖 ≥ 1

1 − 𝛾 𝒘𝑖 , 𝒙𝑖 , 𝑦𝑖 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝕝 min𝑟∈𝑦𝑖

𝒘𝑖 𝜙 𝒙𝑖,𝑟 −max𝑠 ∉𝑦𝑖

𝒘𝑖𝜙 𝑥𝑖,𝑠 <0

Page 20: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

The Loss

• Could also be written as follows:

• where 𝑎 + = max(0, 𝑎)

20

ℓ 𝑤𝑖 , 𝑥𝑖 , 𝑦𝑖 = 1 − 𝛾 𝒘𝑖 , 𝒙𝑖 , 𝑦𝑖 +

ℓ 𝑤𝑖 , 𝑥𝑖 , 𝑦𝑖 = 0 𝛾 𝒘𝑖 , 𝒙𝑖 , 𝑦𝑖 ≥ 1

1 − 𝛾 𝒘𝑖 , 𝒙𝑖 , 𝑦𝑖 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Page 21: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Learning

• First approach:

• Goal :

21

Page 22: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Multilabel PA Optimization Problem

• An alternative approach:

22

𝑟𝑖 = 𝑎𝑟𝑔𝑚𝑖𝑛𝑟∈𝑦𝑖𝒘𝑖𝜙(𝒙𝑖 , 𝑟) 𝑠𝑖 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑠∉𝑦𝑖𝒘𝑖𝜙(𝒙𝑖 , 𝑠)

𝒘𝑖+1 = 𝑎𝑟𝑔𝑚𝑖𝑛𝑤1

2𝒘 −𝒘𝑖

2

𝑠. 𝑡. ℓ 𝒘, 𝒙𝑖 , 𝒚𝑖 = 1 − 𝛾 𝒘, 𝒙𝑖, 𝑦𝑖+= 0

i.e., the margin is greater than 1.

𝛾 𝒘𝑖 , 𝒙𝑖 , 𝑦𝑖 = 𝒘𝑖 𝜙 𝒙𝑖 , 𝑟𝑖 −𝒘𝑖𝜙 𝒙𝑖 , 𝑠𝑖

Page 23: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Passive Aggressive (PA)

• The algorithm is passive whenever the hinge-loss is zero:• ℓ𝑖 = 0𝑤𝑖+1 = 𝑤𝑖

• When the loss is positive, the algorithm aggressively forces 𝑤𝑖+1 to satisfy the constraint ℓ 𝒘𝑖 , 𝒙𝑖 , 𝒚𝑖 = 0, regardless of the step size required.

23

Page 24: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Passive Aggressive

• Solving with Lagrange Multipliers, we get the following update rule:

24

𝑤𝑖+1 = 𝑤𝑖 + 𝜏𝑖 𝜙 𝑥𝑖 , 𝑟𝑖 − 𝜙 𝑥𝑖 , 𝑠𝑖

𝜏𝑖 =ℓ𝑖

𝜙 𝑥𝑖 , 𝑟𝑖 − 𝜙 𝑥𝑖 , 𝑠𝑖2

Page 25: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Passive Aggressive

• The updated vector 𝑤𝑖+1 will classify example 𝑥𝑖 with ℓ 𝒘, 𝒙𝑖 , 𝒚𝑖 =0.

25

Page 26: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Ranking

26

Page 27: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Ranking Problem: Example

• A Prediction Bar:

27

Predict the apps the user is most likely to use at any given time and location.

Page 28: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

the most likely app

the 4th likely app

...

assume the user is at a context

...

at the office

12:47

today is Wednesday

Page 29: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Goal

29

Find a function 𝑓 that gets as input the

current context 𝒙 and predicts the apps the

user is likely to click at a given context.

Page 30: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Features

time of day

day of week

location

Page 31: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Discriminative Model

• Prediction:

31

Parameters 𝒘𝐴 ∈ ℝ𝑑

of app A Context𝒙 ∈ ℝ𝑑

...

Page 32: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

New Evaluation

• The performance of the prediction system is measured by Receiver operating Characteristics (ROC) curve.

true positive rate=

app was in the prediction bar and was clicked

total contexts where was clicked

false positive rate=

app was in the prediction bar and wasn’t clicked

total contexts where wasn’t clicked

Page 33: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Maximizing AUC

• By definition of the AUC (Bamber, 1975; Hanley and McNeil,1982):

33

𝐴𝑈𝐶

Page 34: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Pairwise Dataset

• For an application define two sets of context:

34Waze Waze

Page 35: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Maximizing AUC

• By definition of the AUC (Bamber, 1975; Hanley and McNeil,1982):

35

Page 36: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels
Page 37: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Solve using PA

• Set the next 𝑤𝑖 to be the minimizer of the following optimization problem

37

min𝑤∈ℝ𝑑,𝜉≥0

1

2𝑤 − 𝑤𝑖−1

2

𝑠. 𝑡. 𝑓𝑤 𝑥𝑖+ , 𝐴𝑖 − 𝑓𝑤 𝑥𝑖

−, 𝐴𝑖 ≥ 1

Page 38: Online Passive-Aggressive Algorithmsu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul11.pdf · Photography Shopping. The Margin •The margin is positive if all relevant labels

Implementation

• Online algorithm to solve the optimization problem efficiency on huge data.

• Theoretical guarantees of convergence.

38