the independence of fairness-aware classifiers
DESCRIPTION
The Independence of Fairness-aware Classifiers IEEE International Workshop on Privacy Aspects of Data Mining (PADM), in conjunction with ICDM2013 Article @ Official Site: Article @ Personal Site: http://www.kamishima.net/archive/2013-ws-icdm-print.pdf Handnote : http://www.kamishima.net/archive/2013-ws-icdm-HN.pdf Program codes : http://www.kamishima.net/fadm/ Workshop Homepage: http://www.cs.cf.ac.uk/padm2013/ Abstract: Due to the spread of data mining technologies, such technologies are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair in sensitive features, such as race, gender, religion, and so on. The goal of fairness-aware classifiers is to classify data while taking into account the potential issues of fairness, discrimination, neutrality, and/or independence. In this paper, after reviewing fairness-aware classification methods, we focus on one such method, Calders and Verwer's two-naive-Bayes method. This method has been shown superior to the other classifiers in terms of fairness, which is formalized as the statistical independence between a class and a sensitive feature. However, the cause of the superiority is unclear, because it utilizes a somewhat heuristic post-processing technique rather than an explicitly formalized model. We clarify the cause by comparing this method with an alternative naive Bayes classifier, which is modified by a modeling technique called "hypothetical fair-factorization." This investigation reveals the theoretical background of the two-naive-Bayes method and its connections with other methods. Based on these findings, we develop another naive Bayes method with an "actual fair-factorization" technique and empirically show that this new method can achieve an equal level of fairness as that of the two-naive-Bayes classifier.TRANSCRIPT
The Independence ofFairness-Aware Classifiers
Toshihiro Kamishima*, Shotaro Akaho*, Hideki Asoh*, and Jun Sakuma**
*National Institute of Advanced Industrial Science and Technology (AIST), Japan**University of Tsukuba, Japan; and Japan Science and Technology Agency
IEEE International Workshop on Privacy Aspects of Data Mining (PADM)@ Dallas, USA, Dec. 7, 2013
1START
Introduction
2
Fairness-aware Data Miningdata analysis taking into account potential issues offairness, discrimination, neutrality, or independence
Fairness-aware ClassificationOne of a major task of fairness-aware data miningA problem of learning a classifier that predicts a class as accurately as possible under the fairness constraints from potentially unfair data
✤ We use the term “fairness-aware” instead of “discrimination-aware,” because the word “discrimination” means classification in the ML context, and this technique applicable to tasks other than avoiding discriminative decisions
[Romei+ 13]
Why the CV2NB method performed better:model biasdeterministic decision rule
Introduction
3
Calders & Verwer’s 2-naive-Bayes (CV2NB)is very simple, but highly effective
Other fairness-aware classifiers are equally accurate,but less fairer
Based on our findings, we discuss how to improve our method
Outline
4
Applications of Fairness-aware Data Miningprevention of unfairness, information-neutral recommendation, ignoring uninteresting information
Fairness-aware Classificationbasic notations, fairness in data mining, fairness-aware classification, Connection with PPDM, Calders & Verwer’s 2-naive-Bayes
Hypothetical Fair-factorization Naive Bayeshypothetical fair-factorization naive Bayes (HFFNB), Connection with other methods, experimental results
Why Did the HFFNB Method Fail?model bias, deterministic decision rule
How to Modify the HFFNB Methodactual fair-factorization method, experimental results
Conclusion
Outline
5
Applications of Fairness-aware Data Miningprevention of unfairness, information-neutral recommendation, ignoring uninteresting information
Fairness-aware Classificationbasic notations, fairness in data mining, fairness-aware classification, connection with PPDM, & Verwer’s 2-naive-Bayes
Hypothetical Fair-factorization Naive Bayeshypothetical fair-factorization naive Bayes (HFFNB), Connection with other methods, experimental results
Why Did the HFFNB Method Fail?model bias, deterministic decision rule
How to Modify the HFFNB Methodactual fair-factorization method, experimental results
Conclusion
Prevention of Unfairness
6
An suspicious placement keyword-matching advertisement
African descent names European descent names
Arrested?
Located:
[Sweeney 13]
Advertisements indicating arrest records were more frequently displayed for names that are more popular among individuals of African descent than those of European descent
This situation is simply due to the optimization of click-through rate, and no information about users’ race was usedSuch unfair decisions can be prevented by FADM techniques
Recommendation Neutrality
7
The Filter Bubble ProblemPariser posed a concern that personalization technologies narrow and bias the topics of information provided to people
[TED Talk by Eli Pariser, http://www.filterbubble.com/]
Friend recommendation list in the FacebookTo fit for Pariser’s preference, conservative people are eliminatedfrom his recommendation list, while this fact is not noticed to him
FADM technologies are useful for providing neutral information
Ignoring Uninteresting Information
8
[Gondek+ 04]
Simple clustering methods find two clusters: one contains only faces, and the other contains faces with shoulders Data analysts consider this clustering is useless and uninterestingBy ignoring this uninteresting information, more useful male and female clusters could be obtained
non-redundant clustering : find clusters that are as independent from a given uninteresting partition as possible
a conditional information bottleneck method,which is a variant of an information bottleneck method
clustering facial images
Uninteresting information can be excluded by FADM techniques
Outline
9
Applications of Fairness-aware Data Miningprevention of unfairness, information-neutral recommendation, ignoring uninteresting information
Fairness-aware Classificationbasic notations, fairness in data mining, fairness-aware classification, connection with PPDM, & Verwer’s 2-naive-Bayes
Hypothetical Fair-factorization Naive Bayeshypothetical fair-factorization naive Bayes (HFFNB), Connection with other methods, experimental results
Why Did the HFFNB Method Fail?model bias, deterministic decision rule
How to Modify the HFFNB Methodactual fair-factorization method, experimental results
Conclusion
Basic Notations
10
all features other than a sensitive featurenon-sensitive, but may correlate with S
non-sensitive feature vectorX
sensitive featureS socially sensitive informationex., gender or religion
objective variableY a result of serious decisionex., whether or not to allow credit
red-lining effect: Everyone inclines to eliminate a sensitive feature from calculations, but this action is insufficient
Non-sensitive features that correlate with sensitive featuresalso contains sensitive information
Fairness in Data Mining
11
Fairness in Data MiningSensitive information does not influence the target value
A sensitive feature and a target variablemust be unconditionally independent Y ?? S
Y ?? S | XY and S are merely conditionally independent:
Fairness-aware Classification
12
True Distribution Estimated Distribution
Data Set
Pr[Y,X, S]
Pr[Y |X, S] Pr[X, S]
= P̂r[Y,X, S;⇥]
P̂r[Y |X, S;⇥] P̂r[X, S]
=
D = {yi,xi, si}
learnsample
Fair True Distribution Fair Estimated Dist.
fairnessconstraint
P̂r†[Y,X, S;⇥]
P̂r†[Y |X, S;⇥] P̂r[X, S]
=Pr†[Y,X, S]
Pr†[Y |X, S] Pr[X, S]
=
fairnessconstraint
learn
approximate
approximate
Fairness-aware Classification
13
We want to approximate fair true distribution, but samples from this distribution cannot be obtained, because samples from real world are potentially unfair
the space of distributions
fairness-aware classification:find a fair model that approximates a true distribution instead of a fair true distribution under the fairness constraints
P̂r[Y,X, S;⇥⇤]
Pr[Y,X, S]
Pr†[Y,X, S]P̂r†[Y,X, S;⇥⇤]
model sub-space
fair sub-spaceY ?? S
P̂r[Y,X, S;⇥]
Privacy-Preserving Data Mining
14
Fairness in Data Miningthe independence between an objective Y and a sensitive feature S
from the information theoretic perspective,mutual information between Y and S is zero
from the viewpoint of privacy-preservation,protection of sensitive information when an objective variable is exposed
Different points from PPDMintroducing randomness is occasionally inappropriate for severe decisions, such as job applicationdisclosure of identity isn’t problematic in FADM, generally
Calders-Verwer’s 2-Naive-Bayes
15
Naive Bayes Calders-Verwer Two Naive Bayes (CV2NB)
S and X are conditionally independent given Y
non-sensitive features in X are conditionally independentgiven Y and S
[Calders+ 10]
✤ It is as if two naive Bayes classifiers are learned depending on each value of the sensitive feature; that is why this method was named by the 2-naive-Bayes
Unfair decisions are modeled by introducingthe dependence of X on S as well as on Y
Y
XS
Y
XS
keep the updated marginal distribution close to the
while Pr[Y=1 | S=1] - Pr[Y=1 | S=0] > 0 if # of data classified as “1” < # of “1” samples in original data then increase Pr[Y=1, S=0], decrease Pr[Y=0, S=0] else increase Pr[Y=0, S=1], decrease Pr[Y=1, S=1] reclassify samples using updated model Pr[Y, S]
Calders-Verwer’s 2-Naive-Bayes
16
update the joint distribution so that its fairness is enhanced
[Calders+ 10]
estimated model fair estimated modelfair
parameters are initialized by the corresponding sample distributions
is modified so as to improve the fairness
P̂r[Y,X, S] = P̂r[Y, S]Q
i P̂r[Xi|Y, S]
P̂r[Y, S]
P̂r[Y, S]
P̂r†[Y, S]
P̂r[Y ]
Outline
17
Applications of Fairness-aware Data Miningprevention of unfairness, information-neutral recommendation, ignoring uninteresting information
Fairness-aware Classificationbasic notations, fairness in data mining, fairness-aware classification, connection with PPDM, Calders & Verwer’s 2-naive-Bayes
Hypothetical Fair-factorization Naive Bayeshypothetical fair-factorization naive Bayes (HFFNB), Connection with other methods, experimental results
Why Did the HFFNB Method Fail?model bias, deterministic decision rule
How to Modify the HFFNB Methodactual fair-factorization method, experimental results
Conclusion
Hypothetical Fair-factorization
18
Hypothetical Fair-factorizationA modeling technique to make a classifier fair
In a classification model, a sensitive feature and a target variable are decoupled A graphical model of a fair-factorized classifierBy this technique, a sensitive feature and a target variable become statistically independent
Y
XS
Hypothetical Fair-factorization naive Bayes (HFFNB)A fair-factorization technique is applied to a naive Bayes model
Under the ML or MAP principle, model parameters can be derived by simply counting training examples
P̂r†[Y,X, S] = P̂r†[Y ]P̂r†[S]Q
k P̂r†[X(k)|Y, S]
The HFFNB method is equivalent to changing decision boundary, p, to
(Elkan’s theorem regarding cost-sensitive learning)The HFFNB can be considered as a special case of the ROC method
In a non-fairized case, a new object (x, s) is classified into class 1, if
Connection with the ROC Decision Rule
19
[Kamiran+ 12]
P̂r[Y=1|x, s] � 1/2 ⌘ p
Kamiran et al.’s ROC decision ruleA fair classifier is built by changing the decision boundary, p,
according as the value of sensitive feature
p0 =P̂rr[Y |S](1� P̂r[Y ])
P̂r[Y ] + P̂r[Y |S]� 2P̂r[Y ]P̂r[Y |S]
CV2NB vs HFFNB
20
The CV2NB and HFFNB methods are comparedin their accuracy and fairness
Accuracy UnfairnessHFFNB 0.828 1.52×10-2
CV2NB 0.828 6.89×10-6
AccuracyThe larger value indicatesmore accurate prediction
Unfairness(Normalized Prejudice Index)
The larger value indicatesunfairer prediction
The HFFNB method is equally accurate as the CV2NB method,but it made much unfairer prediction
WHY?
Outline
21
Applications of Fairness-aware Data Miningprevention of unfairness, information-neutral recommendation, ignoring uninteresting information
Fairness-aware Classificationbasic notations, fairness in data mining, fairness-aware classification, connection with PPDM, Calders & Verwer’s 2-naive-Bayes
Hypothetical Fair-factorization Naive Bayeshypothetical fair-factorization naive Bayes (HFFNB), Connection with other methods, experimental results
Why Did the HFFNB Method Fail?model bias, deterministic decision rule
How to Modify the HFFNB Methodactual fair-factorization method, experimental results
Conclusion
Why Did the HFFNB Method Fail?
22
HFFNB method CV2NB method
explicitly imposed the fairness constraint to the model
The fairness of the model is enhanced by the post processing
Though both models are designed so as to enhance the fairness,the CV2NB method constantly learns much fairer model
Two reasons why these modeled independences are damagedModel Bias: the difference between model and true distributionsDeterministic Decision Rule: class labels are not probabilistically generated, but are deterministically chosen by the decision rule
Model Bias
23
In a hypothetically Fair-factorized model, data are assumed to be generated according to the estimated distribution
P̂r[Y ]P̂r[S]P̂r[X|Y, S]
Input objects are firstly generated from a true distribution,then the object is labeled according to the estimated distribution
P̂r[Y |X, S] Pr[X, S]estimated true
These two distributions are diverged, especially if model bias is high
The divergence between estimated and true distributions due to a model bias damages the fairness in classification
Model Bias
24
Synthetic DataData are truly generated from a naive Bayes modelModel bias is controlled by the number of features
HFFNB 1.02×10-1 1.10×10-1 1.28×10-1
CV2NB 5.68×10-4 9.60×10-2 1.28×10-1
high bias low bias
Changes of the NPI (fairness)
As the decrease of model biases,the differences between two methods in their fairness decreases
Deterministic Decision Rule
25
In a hypothetically Fair-factorized model, labels are assumed to be generated probabilistically according to the distribution:
P̂r[Y |X, S]
Labels generated by these two processes do not agree generally
Predicted labels are generated by this deterministic decision rule:y⇤ = arg max
y2Dom(Y )
ˆ
Pr[Y |X, S]
Deterministic Decision Rule
26
0.0
0.5
1.0
0.0
0.50.5
1.01.0
E[Y ⇤]
Pr[X=1|Y=0]Pr[X=1|Y=1]
P̂r[Y=1] = 0.5
simple classification model: one binary label and one binary feature
class distribution is uniform:Y* is deterministically determined: changing parameters: Pr[X=1|Y=1] and Pr[X=1|Y=0]
Y ⇤= argmaxPr[Y |X]
E[Y*] and E[Y] do not agree generally
E[Y*]
E[Y]
E[Y*] = E[Y]
Outline
27
Applications of Fairness-aware Data Miningprevention of unfairness, information-neutral recommendation, ignoring uninteresting information
Fairness-aware Classificationbasic notations, fairness in data mining, fairness-aware classification, connection with PPDM, Calders & Verwer’s 2-naive-Bayes
Hypothetical Fair-factorization Naive Bayeshypothetical fair-factorization naive Bayes (HFFNB), Connection with other methods, experimental results
Why Did the HFFNB Method Fail?model bias, deterministic decision rule
How to Modify the HFFNB Methodactual fair-factorization method, experimental results
Conclusion
Actual Fair-factorization
28
The reason why the HFFNB failed is the ignorance of the influence of a model bias and a deterministic decision rule
Actual Fair-factorizationA class and a sensitive features are decoupled not over the estimated distribution, but over the actual distribution
As the hypothetical fair-factorization, a class label and a sensitive feature are made statistically independentWe consider not the estimated distribution, , but the actual distribution, As a class label, we adopt a deterministically decided labels
P̂r[Y,X, S]P̂r[Y |X, S] Pr[X, S]
The multiplication of a true distribution, , is approximated by a sample mean,
Actual Fair-factorization naive Bayes
29
Actual Fair-factorization naive Bayes (AFFNB)An actual fair-factorization technique is applied to a naive Bayes model
model bias
P̂r[Y |X, S] Pr[X, S]
(1/|D|)P
(x,s)2D P̂r[Y |X=x, S=s]
deterministic decision rule
Instead of using a distribution of class labels, we count up the number of deterministically decided class labels
Y* and S are made independentunder the constraint that the marginal distribution of Y* and S equal to the corresponding sample distribution
CV2NB and AFFNB
30
The CV2NB and AFFNB methods are comparedin their accuracy and fairness
Accuracy UnfairnessAFFNB 0.828 5.43×10-6
CV2NB 0.828 6.89×10-6
CV2NB and AFFNB are equally accurate as well as equally fair
The superiority of the CV2NB method is considering the independence not over the estimated distribution, but over the actual distribution of a class label and a sensitive feature
Conclusion
31
ContributionsAfter reviewing a fairness-aware classification task, we focus on why the CV2NB method can attain fairer results than other methodsWe theoretically and empirically show the reason by comparing a simple alternative naive-Bayes modified by a hypothetical fair-factorization techniqueBased on our findings, we developed a modified version, an actual fair-factorization technique, and show that this technique drastically improved the performance
Future WorkWe plan to apply our actual fair-factorization technique in order to modify other classification methods, such as logistic regression or a support vector machine
Program Codes and Data SetsFairness-aware Data Mining
http://www.kamishima.net/fadm
Information-neutral Recommender Systemhttp://www.kamishima.net/inrs
AcknowledgementsThis work is supported by MEXT/JSPS KAKENHI Grant Number 16700157, 21500154, 23240043, 24500194, and 25540094.
32