exponentialcomputationalimprovement byreductionmean regression auc ranking classification quantile...

38
Exponential Computational Improvement by Reduction John Langford @ Microsoft Research Computational Challenges Workshop, May 2

Upload: others

Post on 17-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Exponential Computational Improvementby Reduction

John Langford @ Microsoft Research

Computational Challenges Workshop, May 2

Page 2: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

The Empirical Age

Speech Recognition

ImageNetDeep LearningNeural Machine Translation

What is a theorist to do?

Page 3: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

The Empirical Age

Speech RecognitionImageNet

Deep LearningNeural Machine Translation

What is a theorist to do?

Page 4: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

The Empirical Age

Speech RecognitionImageNetDeep Learning

Neural Machine Translation

What is a theorist to do?

Page 5: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

The Empirical Age

Speech RecognitionImageNetDeep LearningNeural Machine Translation

What is a theorist to do?

Page 6: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

The Empirical Age

Speech RecognitionImageNetDeep LearningNeural Machine Translation

What is a theorist to do?

Page 7: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

The Contextual Bandit Setting

For t = 1, . . . ,T :

1 The world produces some context x ∈ X

2 The learner chooses an action a ∈ A

3 The world reacts with reward ra ∈ [0, 1]

Goal: Learn a good policy for choosing actionsgiven context.

Page 8: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Reduction Results

Algo ε-greedy Bagg LinUCB Online C. Super.Loss 0.095 0.059 0.128 0.053 0.051Time 22 339 212× 103 17 6.9

Progressive validation loss on RCV1.

Page 9: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

The Offline Contextual Bandit Setting

Given exploration data (x , a, r , p)∗

Learn a good policy for choosing actions givencontext.

Page 10: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Offline Reduction Results

0.2

0.4

0.6

0.8

ecoli

gla

ss

letter

optd

igits

page-b

locks

pendig

its

satim

age

vehic

le

yeast

Cla

ssific

ation E

rror

Offset Tree

Page 11: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Agnostic Active Learning

For t = 1, . . . ,T :

1 The world produces some context x ∈ X

2 The learner predicts a label y ∈ Y

3 The learner chooses to request a label or not. Iflabel requested:

1 observe y2 update learning algorithm

Goal: Compete with supervised learning using alllabels while requesting as few as possible.

Page 12: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

AAL Reduction Results

0 1000 2000 3000 4000 50000

0.01

0.02

0.03

0.04

0.05

number of labels queried

test

err

or

PassiveActive

Page 13: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Logarithmic Time Prediction

Repeatedly1 See x2 Predict y ∈ {1, ...,K}3 See y

Goal: Find h(x) minimizing error rate:

Pr(x ,y)∼D

(h(x) 6= y)

with h(x) in time O(logK ).

Page 14: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Logarithmic Time Prediction

Repeatedly1 See x2 Predict y ∈ {1, ...,K}3 See y

Goal: Find h(x) minimizing error rate:

Pr(x ,y)∼D

(h(x) 6= y)

with h(x) in time O(logK ).

Page 15: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Log-time prediction results

-4

0

4

8

12

ALOIIm

agenet

ODP

Del

ta T

est

Err

or

(%)

From

OAA

Statistical Performance

LOMTreeRecall Tree

Page 16: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Summary

Problem Learning Reductions OCO PACCB Explore Yes Sorta? NoCB Learn Yes Sorta? No

Agnostic Active Yes Sorta? NoLog-time Yes No No

Page 17: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Outline

1 Why Reductions2 What is a learning reduction?3 Exponential Improvements

Page 18: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Learning Reduction Basics

Goal: minimize ` on D

`′ optimizer

Transform D into DR

Transform h with small `′(h,DR) into Rh with small`(Rh,D)...

h

such that if h does well on (DR , `′), Rh is guaranteed

to do well on (D, `).

Page 19: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Learning Reduction Basics

Goal: minimize ` on D

`′ optimizer

Transform D into DR

Transform h with small `′(h,DR) into Rh with small`(Rh,D)...

h

such that if h does well on (DR , `′), Rh is guaranteed

to do well on (D, `).

Page 20: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Learning Reduction Basics

Goal: minimize ` on D

`′ optimizer

Transform D into DR

Transform h with small `′(h,DR) into Rh with small`(Rh,D)...

h

such that if h does well on (DR , `′), Rh is guaranteed

to do well on (D, `).

Page 21: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Learning Reduction Basics

Goal: minimize ` on D

`′ optimizer

Transform D into DR

Transform h with small `′(h,DR) into Rh with small`(Rh,D)...

h

such that if h does well on (DR , `′), Rh is guaranteed

to do well on (D, `).

Page 22: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Learning Reduction Basics

Goal: minimize ` on D

`′ optimizer

Transform D into DR

Transform h with small `′(h,DR) into Rh with small`(Rh,D)...

h

such that if h does well on (DR , `′), Rh is guaranteed

to do well on (D, `).

Page 23: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Error Reductions: the simplest possible

Prove: Small `′ error ⇒ small ` error.

An issue: If R introduces noise, small `′ not possible.

⇒ Must prove small `′ possible for nonvacuousstatement.⇒ Error reductions weak for noisy problems.

Page 24: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Error Reductions: the simplest possible

Prove: Small `′ error ⇒ small ` error.

An issue: If R introduces noise, small `′ not possible.

⇒ Must prove small `′ possible for nonvacuousstatement.⇒ Error reductions weak for noisy problems.

Page 25: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Error Reductions: the simplest possible

Prove: Small `′ error ⇒ small ` error.

An issue: If R introduces noise, small `′ not possible.

⇒ Must prove small `′ possible for nonvacuousstatement.

⇒ Error reductions weak for noisy problems.

Page 26: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Error Reductions: the simplest possible

Prove: Small `′ error ⇒ small ` error.

An issue: If R introduces noise, small `′ not possible.

⇒ Must prove small `′ possible for nonvacuousstatement.⇒ Error reductions weak for noisy problems.

Page 27: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Regret Reductions: Dealing with noise

Let reg`,D = `(h,D)−minh′ `(h′,D)

Prove: Small reg`′,D ′ ⇒ small reg`,D .

Note: minh′ is over all functions.⇒ User is responsible for choosing right hypothesisspace.⇒ Unable to address information gathering

Page 28: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Regret Reductions: Dealing with noise

Let reg`,D = `(h,D)−minh′ `(h′,D)

Prove: Small reg`′,D ′ ⇒ small reg`,D .

Note: minh′ is over all functions.⇒ User is responsible for choosing right hypothesisspace.⇒ Unable to address information gathering

Page 29: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Regret Reductions: Dealing with noise

Let reg`,D = `(h,D)−minh′ `(h′,D)

Prove: Small reg`′,D ′ ⇒ small reg`,D .

Note: minh′ is over all functions.

⇒ User is responsible for choosing right hypothesisspace.⇒ Unable to address information gathering

Page 30: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Regret Reductions: Dealing with noise

Let reg`,D = `(h,D)−minh′ `(h′,D)

Prove: Small reg`′,D ′ ⇒ small reg`,D .

Note: minh′ is over all functions.⇒ User is responsible for choosing right hypothesisspace.

⇒ Unable to address information gathering

Page 31: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Regret Reductions: Dealing with noise

Let reg`,D = `(h,D)−minh′ `(h′,D)

Prove: Small reg`′,D ′ ⇒ small reg`,D .

Note: minh′ is over all functions.⇒ User is responsible for choosing right hypothesisspace.⇒ Unable to address information gathering

Page 32: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Oracle Reductions: Information gathering

Assume an oracle which given samples S returnsarg minh∈H `′(h, S)

Prove: Oracle (approximately) works ⇒Computationally efficient small online regret onoriginal problem.

Page 33: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Oracle Reductions: Information gathering

Assume an oracle which given samples S returnsarg minh∈H `′(h, S)

Prove: Oracle (approximately) works ⇒Computationally efficient small online regret onoriginal problem.

Page 34: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Mean Regression

AUC Ranking

Classification

IW ClassificationQuantile Regression 1 1

k−1 4

Classification

FilterTree

PECOC

Probing

CostingEi

Quanting

4 ECT

k/2

SearnTk ln TPSDP

Unsupervised by Self Prediction

Tk

??

−cost k

−Classificationk−Partial Labelk

T step RL with State Visitation Step RL with Demonstration PolicyT

Dynamic Models

??

OffsetTree

−way Regressionk

Regret Transform Reductions

1QuicksortRegret multiplier Algorithm Name

Page 35: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

Programming

Reductions ⇒ modularity, code reuse ⇒ good newsfor programming!

Vowpal Wabbit (http://hunch.net/ vw ) uses thissystematically.

Page 36: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

An Open Problem: $1K reward!

Conditional Probability EstimationDistribution D over X × Y , where Y = {1, . . . , k}.Find a Probability estimator h : X × Y → [0, 1]minimizing squared loss

`(h,D) = E(x ,y)∼D [(h(y |x)− y)2]

The problem: How can you do this in time O(log(k))with a constant regret ratio?

Page 37: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

An Open Problem: $1K reward!

Conditional Probability EstimationDistribution D over X × Y , where Y = {1, . . . , k}.Find a Probability estimator h : X × Y → [0, 1]minimizing squared loss

`(h,D) = E(x ,y)∼D [(h(y |x)− y)2]

The problem: How can you do this in time O(log(k))with a constant regret ratio?

Page 38: ExponentialComputationalImprovement byReductionMean Regression AUC Ranking Classification Quantile Regression IW Classification 1 1 k-1 4 Classification Filter Tree PECOC Probing Ei

More Details

Beygelzimer, Langford, Daume, Mineiro, “LearningReductions that Realy Work”, IEEE 104(1), 2016https://arxiv.org/abs/1502.02704