strategy-proof classification

Strategy-Proof Classification

Reshef MeirSchool of Computer Science and Engineering, Hebrew University

A joint work with Ariel. D. Procaccia and Jeffrey S. Rosenschein

Strategy-Proof Classification

• An Example of Strategic Labels in Classification• Motivation• Our Model• Previous work (positive results)

• An impossibility theoremAn impossibility theorem• More results (if there is time)More results (if there is time)

(~12 minutes)

ERM

Motivation Model Results

Strategic labeling: an example

Introduction

5 errors

There is a better classifier! (for me…)

Motivation Model ResultsIntroduction

If I will only change the

labels…

Motivation Model ResultsIntroduction

2+4 = 6 errors

ClassificationThe Supervised Classification problem:

– Input: a set of labeled data points (xi,yi)i=1..m

– output: a classifier c from some predefined concept class C ( functions of the form f : X-,+ )

– We usually want c to classify correctly not just the sample, but to generalize well, i.e .to minimize

R(c) ≡the expected number of errors w.r.t. the distribution D

Motivation ResultsIntroduction Model

E(x,y)~D[ c(x)≠y ]

Classification (cont.)• A common approach is to return the ERMERM, i.e.

the concept in C that is the best w.r.t. the given samples (has the lowest number of errors)

• Generalizes well under some assumptions on the concept class C

With multiple experts, we can’t trust our ERM!

Motivation ResultsIntroduction Model

Where do we find “experts” with incentives?

Example 1: A firm learning purchase patterns– Information gathered from local retailers– The resulting policy affects them – “the best policy, is the policy that fits my pattern”

Introduction Model ResultsMotivation

Users Reported Dataset

Classification AlgorithmClassifier

Introduction Model Results

Example 2: Internet polls / expert systems

Motivation

Related work• A study of SP mechanisms in Regression learning

– O. Dekel, F. Fischer and A. D. Procaccia, Incentive Compatible Regression Learning, SODA 2008

• No SP mechanisms for Clustering

– J. Perote-Peña and J. Perote. The impossibility of strategy-proof clustering, Economics Bulletin, 2003

Introduction Motivation Model Results

A problem instance is defined by

• Set of agents I = 1,...,n• A partial dataset for each agent i I,

Xi = xi1,...,xi,m(i) X• For each xikXi agent i has a label yik,

– Each pair sik=xik,yik is an example– All examples of a single agent compose the labeled

dataset Si = si1,...,si,m(i) • The joint dataset S= S1 , S2 ,…, Sn is our input

– m=|S|• We denote the dataset with the reported labels by S’

Introduction Motivation ResultsModel

Input: Example

++–––– ++

––––

––––––

++++ ++++ ++++

––

X1 Xm1 X2 Xm2 X3 Xm3

Y1 -,+m1 Y2 -,+m2 Y3 -,+m3

S = S1, S2,…, Sn = (X1,Y1),…, (Xn,Yn)


Incentives and Mechanisms

• A Mechanism M receives a labeled dataset S’ and outputs c C

• Private risk of i: Ri(c,S) = |k: c(xik) yik| / mi

• Global risk: R(c,S) = |i,k: c(xik) yik| / m

• We allow non-deterministic mechanisms– The outcome is a random variable– Measure the expected risk


ERM

We compare the outcome of M to the ERM:c* = ERM(S) = argmin(R(c),S)r* = R(c*,S)

c C

Can our mechanism simply compute and return the ERM?


Requirements

1. Good approximation: S R(M(S),S) ≤ β∙r*

2. Strategy-Proofness (SP): i,S,Si‘ Ri(M(S-i , Si‘),S) ≥ Ri(M(S),S)

• ERM(S) is 1-approximating but not SP• ERM(S1) is SP but gives bad approximation

Are there any mechanisms

that guarantee both SP and

good approximation?


MOST IMPORTANT

SLIDE

Restricted settings• A very small concept class: |C| = 2

– There is a deterministic SP mechanism that obtains a 3-approximation ratio

– This bound is tight– Randomization can improve the bound to 2

R. Meir, A. D. Procaccia and J. S. Rosenschein, Incentive Compatible Classification under Constant Hypotheses: A Tale of Two Functions, AAAI 2008


Restricted settings (cont.)• Agents with similar interests:

– There is a randomized SP 3-approximation mechanism (works for any class C)


R. Meir, A. D. Procaccia and J. S. Rosenschein, Incentive Compatible Classification with Shared Inputs, IJCAI 2009.

But not everything shines

• Without restrictions on the input, we cannot guarantee a constant approximation ratio

Our main result:Theorem: There is a concept class C, for which

there are no deterministic SP mechanisms with o(m)-approximation ratio


Deterministic lower bound

Proof idea: – First construct a classification problem that is

equivalent to a voting problem with 3 candidates

– Then use the Gibbard-Satterthwaite theorem to prove that there must be a dictator

– Finally, the dictator’s opinion might be very far from the optimal classification


Proof (1)

Construction: We have X=a,b, and 3 classifiers as follows

The dataset contains two types of agents, with samples distributed unevenly over a and b


We do not set the labels.

Instead, we denote by Y all the possible labelings of an agent’s dataset.

Proof (2)Let P be the set of all 6 orders over C A voting rule is a function of the form f: Pn CBut our mechanism is a function M: Yn C !

(its input are labels and not orders)

Lemma 1: there is a valid mapping g: Pn Yn, s.t. (M*g) is a voting rule


Proof (3)Lemma 2: If M is SP, and guarantees any bounded

approximation ratio, then f=M*g is dictatorialProof: (f is onto) any profile that c classifies perfectly

must induce the selection of c

(f is SP) suppose there is a manipulationBy mapping this profile to labels with g, we find a

manipulation of M, in contradiction to its SP

From the G-S theorem, f must be dictatorial


Proof (4)Introduction Motivation Model Results

Finally, f (and thus M) can only be dictatorial. We assume w.l.o.g. that the dictator is agent 1 of

type Ia. We now label the data points as follows:

The optimal classifier is cab, which makes 2 errors

The dictator selects ca, which makes m/2 errors

Real concept classesIntroduction Motivation Model Results

• We managed to show that there are no good (deterministic) SP mechanisms, but only for a synthetically constructed class.

• We are interested in more common classes, that are really used in machine learning. For example:

• Linear Classifiers• Boolean Conjunctions

Linear classifiers

Only 2 errors


“b”

cacb

cab

“a”

Ω(√m) errors

A lower bound for randomized SP mechanisms

• A lottery over dictatorships is still bad– Ω(k) instead of Ω(m), where k is the size of the

largest dataset controlled by an agent ( m ≈ k*n )

• However, it is not clear how to eliminate other mechanisms – G-S works only for deterministic mechanisms– Another theorem by Gibbard [’79] can help

• But only under additional assumptions


Upper bounds

• So, our lower bounds do not leave much hope for good SP mechanisms

• We would still like to know if they are tight

A deterministic SP O(m)-approximation is easy:– break ties iteratively according to dictators

What about randomized SP O(k) mechanisms?


The iterative random dictator (IRD)

(example with linear classifiers on R1)


v v




v v

Iteration 1: 2 errors




v v

Iteration 1: 2 errorsIteration 2: 5 errorsIteration 3: 0 errors




v v

Iteration 1: 2 errorsIteration 2: 5 errorsIteration 3: 0 errorsIteration 4: 0 errors




v v

Iteration 1: 2 errorsIteration 2: 5 errorsIteration 3: 0 errorsIteration 4: 0 errorsIteration 5: 1 error

Theorem: The IRD is O(k2) approximating for Linear Classifiers in R1

Future work• Other concept classes

• Other loss functions

• Alternative assumptions on structure of data

• Other models of strategic behavior

• …


strategy-proof classification

Documents

s erms

s rims

joint dataset s

outputs c c private

xikxi agent i

classifier c

labeled dataset si

strategyproofness sp