data and ethics: why data science needs one

77
DATA Tim Rich Director of Data Science Publicis Worldwide AND ETHICS

Upload: tim-rich

Post on 11-Apr-2017

77 views

Category:

Data & Analytics


0 download

TRANSCRIPT

DATA

Tim Rich Director of Data Science Publicis Worldwide

AND

ETHICS

WHAT IS THIS?

‣ Advertisers and ethics… WTF!

‣ What me ethical?

‣ Mapping the code.

‣ Why do this at all?

WHAT IS THIS NOT?

‣ An attempt to get you to Tweet about something

‣ A vision for Tim’s perfect future

‣ A shameless plug for any association, businessor way of thinking

THAT BEING SAID, STICK AROUND AND GET YOUR MIND BLOWN

WHY DOES ADVERTISING CARE?

ADVERTISING SPENDS THE MONEY

“Follow the money.” -Karl Marx

AND IT’S A LOT...

= 2015 GDPPortugal VietnamCzech Republic

198 billion 199 billion 182 billion

579 billion

IMF - World Bank

https://blog.pagefair.com/2015/ad-blocking-report/

BUT WE HAVE A LOT TO LOSE

Brad Frost - Death to Bullshit

AND WE ALSO NEED TO RETHINK OUR METHODS

BUT DON’T FEAR – WE HAVE DATA AND DATA SCIENTISTS!

WHAT IS A DATA SCIENTIST?‣ Statistics ‣ Data Strategy ‣ Social Science ‣ Coding chops ‣ Good Looks

AND WE SEEM TO HAVE MORE AND MORE OF THEM IN THE WORLD IN GENERAL

O’Riley 2015 Data Science Surveyhttp://duu86o6n09pv.cloudfront.net/reports/2015-data-science-salary-survey.pdf

of +/- 600 respondents

1%

9%

23%25%

14%13%

6%5%

4%

0%

5%

10%

15%

20%

25%

30%

<21 21+25 26+30 31+35 36+40 41+45 46+50 51+55 56<

Percent2of2Respondents

Reported2 Age

THEY ARE ALSO A YOUNG BUNCH

AND THAT MAKES SENSE AS IT IS A YOUNG PROFESSION

1996 Members of the International Federation of

Classification Societies (IFCS) meet in Kobe, Japan.

2001 William S. Cleveland publishes “Data Science: An Action

Plan for Expanding the Technical Areas of the Field of Statistics.”

FIRST USE OF “DATA SCIENCE”

THE PAPER THAT LAUNCHED A 1,000 NERDS

MOREOVER, NEW ENTRANTS INTO THE FIELD ARE NOT GIVEN VERY MUCH ETHICAL TRAINING

Surveyed Syllabi from 13 Intro to Data Science Courses

ONLY THREE HAVE AT LEAST ONE MENTION OF AN “ETHICS” COMPONENT IN THE SYLLABUS

REGARDLESS, DATA SCIENCE IS AFFECTING ALL OF OUR EVERYDAY LIVES… OUR ONLINE LIVES

OUR MOVEMENT…

OUR MEDIA…

OUR MILITARY…

OUR POLITICS…

EVEN OUR IPHONES...

– Tim Cook

Earl, I think Data Science needs a code of ethics.

Yup.

A CODE OF ETHICS WOULD

‣ Establish credibility and responsibility outside of nerd-dom

‣ Provide a starting point to act as technology changes

‣ Galvanize the disparate data practitioner community

ALL THAT’S FINE…

BUILD ANYTHING YOU FIRST HAVE UNDERSTAND WHAT YOU ARE

WORKING WITH

A crash course in codes of ethics: THAT SHIT HUMANS DO

A TIMELINE OF ETHICAL CODES

EGYPTIAN CODE OF

MA’AT

JEWISH TORAH

HIPPOCRATIC OATH

BUSHIDO WARRIOR

CODE

PIRATE’S CODE OF THE

BRETHREN

FRENCH FOREIGN

LEGION CODE D'HONNEUR

JOURNALIST’S CREED

NUREMBURG CODE

I.R.B. - EXEMPT COMMON RULE

INTERNATIONAL STATISTICAL

INSTITUTE

ASSOCIATION FOR COMPUTING

MACHINERY

AMERICAN STATISTICAL

ASSOCIATION

DRAFT MODEL BIOETHICISTS

CODE

~1200 bce~2300 bce ~500 bce 1914~1600

~1000 1831

1999199219811946

1985

2005

increase of professional codes

ETHICAL CODES ARE NOT ALL THE SAME BUT THEY HAVE TWO CLASSES OF CHARACTERISTICS

Inward facing goals

Outward facing goals

INWARD FACING GOALS

‣ Provide guidance when norms are not explicit

‣ Reduce internal conflicts and build a common purpose

‣ Establish professional behavior

‣ Deter unethical behavior with sanctions and internal reporting structures

OUTWARD FACING GOALS‣ Protect vulnerable populations who could be

harmed by profession’s activities

‣ Establish the profession as a distinct moral community worthy of autonomy

‣ Serve as tool for disputes between member and non-member parties

‣ Create institutions resilient to external pressures

PROMOTE POSITIVE ENFORCEMENT

‣ Accept the distributed nature of professional communities creates too many judicial problems for active regulation

‣ Construct the code with consensus allowing for broad buy-in

‣ Set boundaries and expectations of the practicing community, allowing for self-affirming social control mechanisms

‣ Mediate internal group needs and external community interactions

‣ Adapt to future unknown circumstances

‣ Inspire collective identity supporting adherence and adoption

OVERALL A PROFESSIONAL CODE OF ETHICS SHOULD:

OKAY PROFESSOR, SO WHAT IS THE REAL REASON DATA SCIENCE NEEDS

AN ETHICAL CODE?

MORAL HAZARD

"In economics, moral hazard occurs when one person takes more risks because someone else bears the

burden of those risks." – wikipedia

https://en.wikipedia.org/wiki/Moral_hazard

MORAL HAZARD IN LENDING

http://www.pnhp.org/facts/single-payer-resources

MORAL HAZARD IN HEALTH CARE

http://www.economist.com/news/world-week/21569742-kals-cartoon

MORAL HAZARD IN ARMAMENTS

‣ Connections between data and the people it represents are very abstracted

‣ Digital creations affect people we never see

‣ Unintended algorithmic consequences are almost never known or explored

‣ When was the last time an algorithm ever “hurt” anybody?

DATA SCIENCE IS STEEPED IN MORAL HAZARD

Well, shit.

HOW A DATA SCIENCE CODE MAY BEGIN TO LOOK

–Paul Ohm“Broken Promises of Privacy: Responding to

the Surprising Failure of Anonymization,” UCLA Law Review 57,p.1702

“Data can be useful or anonymous,

but never both.”

THUS A CODE WOULD NEED TO MAINTAIN THE UTILITY

OF DATA WHILE BALANCING

CONTROL OF THAT DATA

A FRAMEWORK FOR A CODE IS COMPOSED OF THREE CLUSTERS

Data Ethics Code

Safety of useddata & analysis

Protection of subjects

Mathematical responsibility

Community

Privacy

bio-information

Business applications

3rd party usage

Identity

Ownership Verification

Right to be forgotten

Incorrect data correction

PRIVACY

‣ Once you buy or sell data what are the ethics around using it? You did ‘buy it’ right?

3rd party data

‣ What is the relationship between privacy of internet exploration and advertisement of relevant products?

Business applications

‣ Is data generated from your body owned differently?Bio-information

COMMUNITY

‣ How do we protect people who our analysis affects for negative consequences?

Protection of subjects

‣ Is there a system for correct use of professional tools and continuing education?

Mathematical responsibility

‣ Once data is used how is it discarded and sensitive analysis protected?

Safety of used data & analysis

IDENTITY

‣ Is there a need for a centralized personal data safe?

Ownership

‣ How do means of validation affect access, privacy and safety?

Validation

‣ What are the mechanisms to correct bad data?Incorrect data correction

THESE COMPONENTS PROVIDE THE BASIS FOR CONVERSATION NOT A HARD STRUCTURE

Data Ethics Code

Identity

Safety of useddata & analysis

Protection of subjects

Mathematical responsibility

Community

Privacy

bio-information

Business applications

3rd party usage Ownership Verification

Right to be forgotten

Incorrect data correction

ARE THERE OTHER THINGS WE SHOULD THINK ABOUT?

The code can not be built on personal conceptions of right and wrong. It must be general enough to span cultures, companies and continents.

THE CODE SHOULD EXIST OUTSIDE ANY FORMAL BUSINESS.

YOU SHOULD NOT MAKE MONEY OFF THE CODE.

The code should not be created by a small group, but rather

presents a chance for a more radical form of democracy

Whatever the combination, the code will have to be built by data scientists to have any chance at adoption

Often ethical codes come up after social disasters, can we get out in front of this?

Other than it could be good for people, why do this at all?

IT MAKES GOODBUSINESS SENSE

More ethical data treatment lowers

liability and reduces

corporate risk

Its not a matter of if you get hacked it is a matter of when

(and frankly if you find out)

http://www.techrepublic.com/article/data-breaches-may-cost-less-than-the-security-to-prevent-them/

$252 MILLION DOLLARS2013 - data breach

ESTIMATED $100 MILLION - $500 MILLION2006 - data theft

http://www.lifehealthpro.com/2015/06/18/the-10-most-expensive-data-breaches?t=regulatory&slreturn=1456110972&page=5

HIGH ESTIMATES $4 BILLION DOLLARS2011 - data breach of 75 client companies

http://www.eweek.com/c/a/Security/Epsilon-Data-Breach-to-Cost-Billions-in-WorstCase-Scenario-459480

marketing data

THE MORAL HIGH GROUND ALSO SELLS MORE SHIT

COVER YOUR ASS

PEOPLE WHO ARE CAUGHT UP IN UNETHICAL BEHAVIOR

ARE USUALLY SACKED

THEIR PROJECTS ARE SCRAPPED

AND IT GETS UGLY FROM A PROFESSIONAL

POINT OF VIEW

OK, SO WHAT’S NEXT?

Some folks working on this:

‣ The Council for Big Data, Ethics and Society

‣ Certified Analytics Professionals

‣ Michael McFarland, S.J. - Computer Scientist

‣ Cynthia Dwork - Microsoft Research

‣ Kord Davis - Digital Strategist

READ MORE HERE

TALK AMONGST YOUR FRIENDS

I’LL GIVE YOU A TOPIC

The right to be forgottenan ideal or practically achievable?

It seems data is a commoditydoes that make the data we create a

personal asset?

Ethical in a data decision making sense?

Edward Snowden

WHO IS LOOKING AFTER YOUR DATA?

THANK YOU