fairness and transparency in machine learning

31
Fairness and transparency in machine learning Tools and techniques PyData Berlin – 2017 Andreas Dewes (@japh44)

Upload: andreas-dewes

Post on 29-Jan-2018

203 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Fairness and Transparency in Machine Learning

Fairness and transparency in machine learning

Tools and techniques

PyData Berlin – 2017Andreas Dewes (@japh44)

Page 2: Fairness and Transparency in Machine Learning

Introduction: Why think about this?

Page 3: Fairness and Transparency in Machine Learning

Fairness in Machine Learning

• Fairness is not a technological problem, but unfair behaviorcan be replicated / automated using technology.

• Machine learning systems are not per se fair or unfair, but have the potential to be either depending on how we use them.

• We have a chance to eliminate unfairness by using machinelearning and data analysis to make personal biases explicit and design systems that eliminate it!

Page 4: Fairness and Transparency in Machine Learning

Discrimination

Discrimination is treatment or consideration of, or making a

distinction in favor of or against, a person or thing based on the group, class, or category to which that person or thing is perceived to belong to rather than on individual merit.

Protected attributes (examples):

Ethnicity, Gender, Sexual Orientation, ...

Page 5: Fairness and Transparency in Machine Learning

When is a process discriminating?Disparate Impact: Adverse impact of a process C on a givengroup X

𝑃 𝐶 = 𝑌𝐸𝑆 𝑋 = 0

𝑃 𝐶 = 𝑌𝐸𝑆 𝑋 = 1< τ

see e.g. "Certifying and Removing Disparate Impact" M. Feldman et. al. (arxiv.org)

Page 6: Fairness and Transparency in Machine Learning

Estimating with real-world data

τ =𝑐/ 𝑎 + 𝑐

𝑑/ 𝑏 + 𝑑

c

a

d

b

Page 7: Fairness and Transparency in Machine Learning

Alternative Approaches: Individual-BasedFairness

𝑓 𝑥1 − 𝑓 𝑥2 ≤ 𝐿 𝑥1 − 𝑥2

similar individual => similar treatment!

Page 8: Fairness and Transparency in Machine Learning

Let‘s try todesigna fair & transparentalgorithm

Page 9: Fairness and Transparency in Machine Learning

NYC Stop & Frisk Dataset

https://gist.github.com/dannguyen/67ece10c6132282b1da2

• Design a stop & frisk algorithm that is as fair as possible

• Ensure it fulfills the other goals that we have for it

Page 10: Fairness and Transparency in Machine Learning

Input Variablesappearance-related behavior-relatedAttribute Name

Description

Age SUSPECT'S AGE

Weight SUSPECT'S WEIGHT

Ht_feet SUSPECT'S HEIGHT (FEET)

Eyecolor SUSPECT'S EYE COLOR

Haircolor SUSPECT'S HAIRCOLOR

Race SUSPECT'S RACE

Sex SUSPECT'S SEX

Build SUSPECT'S BUILD

CS_Cloth WEARING CLOTHES COMMONLY USED IN A CRIME

CS_Objcs CARRYING SUSPICIOUS OBJECT

CS_Bulge SUSPICIOUS BULGE

CS_Descr FITS A RELEVANT DESCRIPTION

RF_Attir INAPPROPRIATE ATTIRE FOR SEASON

Attribute Name

Description

ac_evasv EVASIVE RESPONSE TO QUESTIONING

ac_assoc ASSOCIATING WITH KNOWN CRIMINALS

cs_lkout SUSPECT ACTING AS A LOOKOUT

cs_drgtr ACTIONS INDICATIVE OF A DRUG TRANSACTION

cs_casng CASING A VICTIM OR LOCATION

cs_vcrim VIOLENT CRIME SUSPECTED

ac_cgdir CHANGE DIRECTION AT SIGHT OF OFFICER

cs_furtv FURTIVE MOVEMENTS

ac_stsnd SIGHTS OR SOUNDS OF CRIMINAL ACTIVITY

rf_othsw OTHER SUSPICION OF WEAPONS

rf_knowl KNOWLEDGE OF SUSPECTS PRIOR CRIMINAL BEHAVIOR

rf_vcact ACTIONS OF ENGAGING IN A VIOLENT CRIME

rf_verbl VERBAL THREATS BY SUSPECT

Attribute Name

Description

inout WAS STOP INSIDE OR OUTSIDE?

trhsloc WAS LOCATION HOUSING OR TRANSIT AUTHORITY?

timestop TIME OF STOP (HH:MM)

pct PRECINCT OF STOP (FROM 1 TO 123)

ac_proxm

PROXIMITY TO SCENE OF OFFENSE

cs_other OTHER

ac_rept REPORT BY VICTIM / WITNESS / OFFICER

ac_inves ONGOING INVESTIGATION

ac_incid AREA HAS HIGH CRIME INCIDENCE

ac_time TIME OF DAY FITS CRIME INCIDENCE

circumstance-related

?

Page 11: Fairness and Transparency in Machine Learning

Process Model

Possible goals:

Build a system that decideswhether to frisk someone or

Try to maximize discovery ofcriminals while not botheringlaw-abiding citizens.

Do not discriminate againstindividual groups of people.

Page 12: Fairness and Transparency in Machine Learning

Choosing A Loss Function

Give a reward α if ouralgorithm correctly

identifies a person to frisk

Give a penalty -1 if our algorithm wronglyidentifies a person to frisk

α (weight parameter)

„It‘s okay to frisk α +1 people to find one criminal“.

Page 13: Fairness and Transparency in Machine Learning

Measuring Fairness via Disparate Treatment

Page 14: Fairness and Transparency in Machine Learning

Building a First ModelClean input data

Select attributes

Convert to binary valuesusing „one hot“ method

Train a classifier on thetarget value

Measure the score and discrimination metrics

Load CSV data into dataframe,

discretize all attributes, clean input data.

Use a logistic regressionclassifier to predict the targetattribute.

Split the data in training/testsets using a 70/30 split.

Generate models for a range of

α values and compareperformance.

Page 15: Fairness and Transparency in Machine Learning

First Attempt: To Frisk Or Not To Frisk…

Input attributes was this person frisked?

Page 16: Fairness and Transparency in Machine Learning

How To Judge The Success Rate of The Algorithm

Our algorithm should at least be as good as a random algorithm at picking people to frisk.

It can „buy“ true positives by accepting falsepositives. The higher α is, the more profitable thistrade becomes.

Eventually we will havefrisked all people, which isa solution to the problem(but not a good one…)

Page 17: Fairness and Transparency in Machine Learning

Example: Predicting Only With Noise (NoInformation)

We give no usefulinformation to thealgorithm at all.

It will therefore pick the action (frisk / not frisk) that will globallymaximize the score when chosen for all people.

Page 18: Fairness and Transparency in Machine Learning

Predicting „frisk“ with all available inputattributes

Now we give it all the input attributesthat we got.

It will make a prediction that ismuch better thanrandomly choosinga person to frisk.

Page 19: Fairness and Transparency in Machine Learning

What does it mean for individual groups?

There is strong mistreatment ofindividual groups.

The algorithmlearned to be just asbiased as thetraining data.

Page 20: Fairness and Transparency in Machine Learning

Where does the bias come from?

Let‘s see!

Predict „black“ from availableattributes.

The algorithm can easilydifferentiate between„white“ and „black“

Page 21: Fairness and Transparency in Machine Learning

Eliminating The Bias: Different Approaches

Remove /Modify

Data Points

1

2

3

Constrain the

Algorithm

forbiddensolutions

allowedsolutions

…or change the target attribute!

1

2

3

Remove Attributes

Page 22: Fairness and Transparency in Machine Learning

Trying Different Attribute Sets To Predict„Black“only behavior-basedattributes

only circumstance-basedattributes

Almost nopredictionpossible! Prediction still

possible(probably due to„pct“ attribute)

Page 23: Fairness and Transparency in Machine Learning

Let‘s Try It Reducing the Features: Use OnlyBehavior

previous modelwith all features

Page 24: Fairness and Transparency in Machine Learning

Disparate Treatment Is Reduced (But So IsUsefulness)

previous modelwith all features

Page 25: Fairness and Transparency in Machine Learning

Let's Try Using A Different Target Attribute

Input attributeswas this person arrested /

summoned?

Page 26: Fairness and Transparency in Machine Learning

Training with A Different Target: Arrests + Summons(only using circumstance-based attributes)

There should be

less bias in thearrests, as it isharder (but still possible) to arrestsomeone who isinnocent.

Page 27: Fairness and Transparency in Machine Learning

As Expected, Bias Is Reduced

No „preferential“ treatment evident forwhite people in the data(on the contrary)

Much better!

Page 28: Fairness and Transparency in Machine Learning

Better (But Still Imperfect) Treatment By The Algorithm

Page 29: Fairness and Transparency in Machine Learning

Take-Aways

• Most training data that we use contains biases• Some of these biases are implicit and not easy to recognize (if

we don‘t look)

• To protect people from discrimination, we need to record and analyze their sensitive data (in a secure way)

• Machine learning and data analysis can uncover hidden biasesin processes (if we're transparent about the methods)

• Algorithmic systems can improve the fairness of manualprocesses by ensuring no biases are present

Page 30: Fairness and Transparency in Machine Learning

Outlook: What Future ML Systems CouldLook Like

ML Algorithm

Non-protectedInput Data

SanitizedInput Data

ProtectedInput Data Explainer

explanations

results

Auditor

fairness metrics

Page 31: Fairness and Transparency in Machine Learning

Thanks!

Slides, code, literature and data will be made available here:

https://github.com/adewes/fatml-pydata

Contact me: [email protected] (@japh44)

Image Credits:https://gist.github.com/dannguyen/67ece10c6132282b1da2

https://commons.wikimedia.org/wiki/File:Deadpool_and_Predator_at_Big_Apple_Con_2009.jpg