the independence of fairness-aware classifiers

The Independence ofFairness-Aware Classifiers

Toshihiro Kamishima*, Shotaro Akaho*, Hideki Asoh*, and Jun Sakuma**

*National Institute of Advanced Industrial Science and Technology (AIST), Japan**University of Tsukuba, Japan; and Japan Science and Technology Agency

IEEE International Workshop on Privacy Aspects of Data Mining (PADM)@ Dallas, USA, Dec. 7, 2013

1START

http://www.kamishima.net/

http://www.kamishima.net/

Introduction

2

Fairness-aware Data Miningdata analysis taking into account potential issues offairness, discrimination, neutrality, or independence

Fairness-aware ClassificationOne of a major task of fairness-aware data miningA problem of learning a classifier that predicts a class as accurately as possible under the fairness constraints from potentially unfair data

✤ We use the term “fairness-aware” instead of “discrimination-aware,” because the word “discrimination” means classification in the ML context, and this technique applicable to tasks other than avoiding discriminative decisions

[Romei+ 13]

Why the CV2NB method performed better:model biasdeterministic decision rule

Introduction

3

Calders & Verwer’s 2-naive-Bayes (CV2NB)is very simple, but highly effective

Other fairness-aware classifiers are equally accurate,but less fairer

Based on our findings, we discuss how to improve our method

Outline

4

Applications of Fairness-aware Data Miningprevention of unfairness, information-neutral recommendation, ignoring uninteresting information

Fairness-aware Classificationbasic notations, fairness in data mining, fairness-aware classification, Connection with PPDM, Calders & Verwer’s 2-naive-Bayes

Hypothetical Fair-factorization Naive Bayeshypothetical fair-factorization naive Bayes (HFFNB), Connection with other methods, experimental results

Why Did the HFFNB Method Fail?model bias, deterministic decision rule

How to Modify the HFFNB Methodactual fair-factorization method, experimental results

Conclusion

Outline

5


Fairness-aware Classificationbasic notations, fairness in data mining, fairness-aware classification, connection with PPDM, & Verwer’s 2-naive-Bayes




Conclusion

Prevention of Unfairness

6

An suspicious placement keyword-matching advertisement

African descent names European descent names

Arrested?

Located:

[Sweeney 13]

Advertisements indicating arrest records were more frequently displayed for names that are more popular among individuals of African descent than those of European descent

This situation is simply due to the optimization of click-through rate, and no information about users’ race was usedSuch unfair decisions can be prevented by FADM techniques

Recommendation Neutrality

7

The Filter Bubble ProblemPariser posed a concern that personalization technologies narrow and bias the topics of information provided to people

[TED Talk by Eli Pariser, http://www.filterbubble.com/]

Friend recommendation list in the FacebookTo fit for Pariser’s preference, conservative people are eliminatedfrom his recommendation list, while this fact is not noticed to him

FADM technologies are useful for providing neutral information

http://www.filterbubble.com/

http://www.filterbubble.com/

Ignoring Uninteresting Information

8

[Gondek+ 04]

Simple clustering methods find two clusters: one contains only faces, and the other contains faces with shoulders Data analysts consider this clustering is useless and uninterestingBy ignoring this uninteresting information, more useful male and female clusters could be obtained

non-redundant clustering : find clusters that are as independent from a given uninteresting partition as possible

a conditional information bottleneck method,which is a variant of an information bottleneck method

clustering facial images

Uninteresting information can be excluded by FADM techniques

Outline

9


Fairness-aware Classificationbasic notations, fairness in data mining, fairness-aware classification, connection with PPDM, & Verwer’s 2-naive-Bayes




Conclusion

Basic Notations

10

all features other than a sensitive featurenon-sensitive, but may correlate with S

non-sensitive feature vectorX

sensitive featureS socially sensitive informationex., gender or religion

objective variableY a result of serious decisionex., whether or not to allow credit

red-lining effect: Everyone inclines to eliminate a sensitive feature from calculations, but this action is insufficient

Non-sensitive features that correlate with sensitive featuresalso contains sensitive information

Fairness in Data Mining

11

Fairness in Data MiningSensitive information does not influence the target value

A sensitive feature and a target variablemust be unconditionally independent Y ?? S

Y ?? S | XY and S are merely conditionally independent:

Fairness-aware Classification

12

True Distribution Estimated Distribution

Data Set

Pr[Y,X, S]

Pr[Y |X, S] Pr[X, S]

= P̂r[Y,X, S;⇥]

P̂r[Y |X, S;⇥] P̂r[X, S]

=

D = {yi,xi, si}

learnsample

Fair True Distribution Fair Estimated Dist.

fairnessconstraint

P̂r†[Y,X, S;⇥]

P̂r†[Y |X, S;⇥] P̂r[X, S]

=Pr†[Y,X, S]

Pr†[Y |X, S] Pr[X, S]

=

fairnessconstraint

learn

approximate

approximate

Fairness-aware Classification

13

We want to approximate fair true distribution, but samples from this distribution cannot be obtained, because samples from real world are potentially unfair

the space of distributions

fairness-aware classification:find a fair model that approximates a true distribution instead of a fair true distribution under the fairness constraints

P̂r[Y,X, S;⇥⇤]

Pr[Y,X, S]

Pr†[Y,X, S]P̂r†[Y,X, S;⇥⇤]

model sub-space

fair sub-spaceY ?? S

P̂r[Y,X, S;⇥]

Privacy-Preserving Data Mining

14

Fairness in Data Miningthe independence between an objective Y and a sensitive feature S

from the information theoretic perspective,mutual information between Y and S is zero

from the viewpoint of privacy-preservation,protection of sensitive information when an objective variable is exposed

Different points from PPDMintroducing randomness is occasionally inappropriate for severe decisions, such as job applicationdisclosure of identity isn’t problematic in FADM, generally

Calders-Verwer’s 2-Naive-Bayes

15

Naive Bayes Calders-Verwer Two Naive Bayes (CV2NB)

S and X are conditionally independent given Y

non-sensitive features in X are conditionally independentgiven Y and S

[Calders+ 10]

✤ It is as if two naive Bayes classifiers are learned depending on each value of the sensitive feature; that is why this method was named by the 2-naive-Bayes

Unfair decisions are modeled by introducingthe dependence of X on S as well as on Y

Y

XS

Y

XS

keep the updated marginal distribution close to the

while Pr[Y=1 | S=1] - Pr[Y=1 | S=0] > 0 if # of data classified as “1” < # of “1” samples in original data then increase Pr[Y=1, S=0], decrease Pr[Y=0, S=0] else increase Pr[Y=0, S=1], decrease Pr[Y=1, S=1] reclassify samples using updated model Pr[Y, S]

Calders-Verwer’s 2-Naive-Bayes

16

update the joint distribution so that its fairness is enhanced

[Calders+ 10]

estimated model fair estimated modelfair

parameters are initialized by the corresponding sample distributions

is modified so as to improve the fairness

P̂r[Y,X, S] = P̂r[Y, S]Q

i P̂r[Xi|Y, S]

P̂r[Y, S]

P̂r[Y, S]

P̂r†[Y, S]

P̂r[Y ]

Outline

17


Fairness-aware Classificationbasic notations, fairness in data mining, fairness-aware classification, connection with PPDM, Calders & Verwer’s 2-naive-Bayes




Conclusion

Hypothetical Fair-factorization

18

Hypothetical Fair-factorizationA modeling technique to make a classifier fair

In a classification model, a sensitive feature and a target variable are decoupled A graphical model of a fair-factorized classifierBy this technique, a sensitive feature and a target variable become statistically independent

Y

XS

Hypothetical Fair-factorization naive Bayes (HFFNB)A fair-factorization technique is applied to a naive Bayes model

Under the ML or MAP principle, model parameters can be derived by simply counting training examples

P̂r†[Y,X, S] = P̂r†[Y ]P̂r†[S]Q

k P̂r†[X(k)|Y, S]

The HFFNB method is equivalent to changing decision boundary, p, to

(Elkan’s theorem regarding cost-sensitive learning)The HFFNB can be considered as a special case of the ROC method

In a non-fairized case, a new object (x, s) is classified into class 1, if

Connection with the ROC Decision Rule

19

[Kamiran+ 12]

P̂r[Y=1|x, s] � 1/2 ⌘ p

Kamiran et al.’s ROC decision ruleA fair classifier is built by changing the decision boundary, p,

according as the value of sensitive feature

p0 =P̂rr[Y |S](1� P̂r[Y ])

P̂r[Y ] + P̂r[Y |S]� 2P̂r[Y ]P̂r[Y |S]

CV2NB vs HFFNB

20

The CV2NB and HFFNB methods are comparedin their accuracy and fairness

Accuracy UnfairnessHFFNB 0.828 1.52×10-2

CV2NB 0.828 6.89×10-6

AccuracyThe larger value indicatesmore accurate prediction

Unfairness(Normalized Prejudice Index)

The larger value indicatesunfairer prediction

The HFFNB method is equally accurate as the CV2NB method,but it made much unfairer prediction

WHY?

Outline

21






Conclusion

Why Did the HFFNB Method Fail?

22

HFFNB method CV2NB method

explicitly imposed the fairness constraint to the model

The fairness of the model is enhanced by the post processing

Though both models are designed so as to enhance the fairness,the CV2NB method constantly learns much fairer model

Two reasons why these modeled independences are damagedModel Bias: the difference between model and true distributionsDeterministic Decision Rule: class labels are not probabilistically generated, but are deterministically chosen by the decision rule

Model Bias

23

In a hypothetically Fair-factorized model, data are assumed to be generated according to the estimated distribution

P̂r[Y ]P̂r[S]P̂r[X|Y, S]

Input objects are firstly generated from a true distribution,then the object is labeled according to the estimated distribution

P̂r[Y |X, S] Pr[X, S]estimated true

These two distributions are diverged, especially if model bias is high

The divergence between estimated and true distributions due to a model bias damages the fairness in classification

Model Bias

24

Synthetic DataData are truly generated from a naive Bayes modelModel bias is controlled by the number of features

HFFNB 1.02×10-1 1.10×10-1 1.28×10-1

CV2NB 5.68×10-4 9.60×10-2 1.28×10-1

high bias low bias

Changes of the NPI (fairness)

As the decrease of model biases,the differences between two methods in their fairness decreases

Deterministic Decision Rule

25

In a hypothetically Fair-factorized model, labels are assumed to be generated probabilistically according to the distribution:

P̂r[Y |X, S]

Labels generated by these two processes do not agree generally

Predicted labels are generated by this deterministic decision rule:y⇤ = arg max

y2Dom(Y )

ˆ

Pr[Y |X, S]

Deterministic Decision Rule

26

0.0

0.5

1.0

0.0

0.50.5

1.01.0

E[Y ⇤]

Pr[X=1|Y=0]Pr[X=1|Y=1]

P̂r[Y=1] = 0.5

simple classification model: one binary label and one binary feature

class distribution is uniform:Y* is deterministically determined: changing parameters: Pr[X=1|Y=1] and Pr[X=1|Y=0]

Y ⇤= argmaxPr[Y |X]

E[Y*] and E[Y] do not agree generally

E[Y*]

E[Y]

E[Y*] = E[Y]

Outline

27






Conclusion

Actual Fair-factorization

28

The reason why the HFFNB failed is the ignorance of the influence of a model bias and a deterministic decision rule

Actual Fair-factorizationA class and a sensitive features are decoupled not over the estimated distribution, but over the actual distribution

As the hypothetical fair-factorization, a class label and a sensitive feature are made statistically independentWe consider not the estimated distribution, , but the actual distribution, As a class label, we adopt a deterministically decided labels

P̂r[Y,X, S]P̂r[Y |X, S] Pr[X, S]

The multiplication of a true distribution, , is approximated by a sample mean,

Actual Fair-factorization naive Bayes

29

Actual Fair-factorization naive Bayes (AFFNB)An actual fair-factorization technique is applied to a naive Bayes model

model bias

P̂r[Y |X, S] Pr[X, S]

(1/|D|)P

(x,s)2D P̂r[Y |X=x, S=s]

deterministic decision rule

Instead of using a distribution of class labels, we count up the number of deterministically decided class labels

Y* and S are made independentunder the constraint that the marginal distribution of Y* and S equal to the corresponding sample distribution

CV2NB and AFFNB

30

The CV2NB and AFFNB methods are comparedin their accuracy and fairness

Accuracy UnfairnessAFFNB 0.828 5.43×10-6

CV2NB 0.828 6.89×10-6

CV2NB and AFFNB are equally accurate as well as equally fair

The superiority of the CV2NB method is considering the independence not over the estimated distribution, but over the actual distribution of a class label and a sensitive feature

Conclusion

31

ContributionsAfter reviewing a fairness-aware classification task, we focus on why the CV2NB method can attain fairer results than other methodsWe theoretically and empirically show the reason by comparing a simple alternative naive-Bayes modified by a hypothetical fair-factorization techniqueBased on our findings, we developed a modified version, an actual fair-factorization technique, and show that this technique drastically improved the performance

Future WorkWe plan to apply our actual fair-factorization technique in order to modify other classification methods, such as logistic regression or a support vector machine

Program Codes and Data SetsFairness-aware Data Mining

http://www.kamishima.net/fadm

Information-neutral Recommender Systemhttp://www.kamishima.net/inrs

AcknowledgementsThis work is supported by MEXT/JSPS KAKENHI Grant Number 16700157, 21500154, 23240043, 24500194, and 25540094.

32



http://www.kamishima.net/inrs

http://www.kamishima.net/inrs

the independence of fairness-aware classifiers

Technology

data mining fairness

fairnessaware classiers

term fairnessaware

applications of fairness

independence of fairness

fairness constraints

neutral information

unfair data