rapidminer wisdom 2016 - vijay kotu - keynote

38
Biases in Data Interpretation Democratizing Data Science

Upload: rapidminer

Post on 23-Jan-2017

564 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

Biases in Data Interpretation

Democratizing Data Science

Page 2: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

a)

b)

Page 3: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

a)

b)

Page 4: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote
Page 5: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

http://swiked.tumblr.com/

Page 6: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

OPTICAL ILLUSIONS

Müller-Lyer illusion is one of hundreds of known

Optical illusions

Optical illusion, vary significantly on it’s effects based

on beholder’s interpretation

Most of the optical illusion is an effect of a (an

advantageous) heuristic shortcut

Page 7: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

Property ValueMean of x 9Variance of x 11Mean of y 7.5Variance of y 4.1Correlation between x and y 0.8Linear regression line y = 3 + 0.5x

Page 8: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

Kotu, Vijay, and Bala Deshpande. Predictive Analytics and Data Mining: Concepts and Practice with RapidMiner. Morgan Kaufmann, 2014.

Page 9: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

Linda is 31 years old, single, outspoken, and very bright. She majored in

philosophy. As a student, she was deeply concerned with issues of discrimination

and social justice, and also participated in antinuclear demonstrations.

Which is more probable?

1. Linda is a bank teller.

2. Linda is a bank teller and is active in the feminist movement.

Kahneman, Daniel. Thinking, fast and slow. Macmillan, 2011.

Page 10: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

SEEING DATA

Our understanding of probability and statistics is

NOT intuitive or perfectly rational

Unconscious Conclusion

Kahneman, Daniel. Thinking, fast and slow. Macmillan, 2011.

Page 11: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

TRENDS IN ANALYTICS

OLAP

Structured Databases

Reporting

Tele

met

ry, I

nstr

umen

tatio

n an

d Po

int o

f tra

nsac

tion

Logs, Data Stores

Exploratory Data Analysis

Experimentation

Data Mining

Hypothesis Testing

Simulation

User / ProductQuery

Capturing Data Processing & Organizing Data Analyzing Data Using Data

Programmers Database Engineers

Business Intelligence Statisticians Data Analysts Business Users

Reporting

Page 12: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

DATA MINING

Statistics

Computing Machine Learning

QuantitativeOperations Research

Data StoresComputation

Machine Learning, Optimization, Algorithms

Data Mining in simpler terms, is finding useful patterns in the data.

Page 13: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

-

ANALYTICAL TECHNIQUES

Page 14: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

ANALYTICAL TECHNIQUES

Page 15: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

Analytics: Esoteric to Mainstream

More than before: Users, access to data, tools and techniques

Barrier of entry is lowered

Objective of Analytics: Communication of meaningful patterns from data

MAINSTREAM ANALYTICS

BIASES

Page 16: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

MAINSTREAM ANALYTICS

BIASES

Page 17: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

Day 1 Day 2 Total

Control (A) 20 / 990 = 2% 5 / 500 = 1% 25 / 1490 = 1.7%

Alternative (B) 1 / 10 = 10% 6 / 500 = 1.2% 7 / 510 = 1.4%

A / B Experiment - Clicks Conversion Rate

Crook, Thomas, et al. "Seven pitfalls to avoid when running controlled experiments on the web." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009.

Page 18: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

1. SIMPSON'S PARADOX

Page 19: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

1. SIMPSON'S PARADOX

Page 20: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

1. SIMPSON'S PARADOX

Paradox: A trend appears in different groups, but disappeared when the group is aggregated

Bickel, Peter J., Eugene A. Hammel, and J. William O’Connell. "Sex bias in graduate admissions: Data from Berkeley." Science 187.4175 (1975): 398-404.

Prevalence:

Not uncommon. Appears in multi-dimensional data where there are many groupings.

Watchout:

When there is high-variance in the sample size and response in groups.

Page 21: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote
Page 22: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

Human Tendency: Interpreting random streaks as clusters.

2. CLUSTERING ILLUSION

Driven by:

To underpredict the amount of variability likely to appear in a sample of random data.

Representativeness Heuristic

Page 23: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

A Study of Kidney Cancer in 3,141 counties of US

Counties in which the incidence of kidney cancer is lowest are: rural, sparsely populated and in Republican states

* in Midwest, South and the West. Kahneman, Daniel. Thinking, fast and slow. Macmillan, 2011.

Counties in which the incidence of kidney cancer is highest are: rural, sparsely populated and in Republican states

Page 24: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

* in Midwest, South and the West. Kahneman, Daniel. Thinking, fast and slow. Macmillan, 2011.

Person A: 4 balls for each trial

Person B: 7 balls for each trial

Chances of extreme result is 12.5%

Chances of extreme result is 01.6%

Page 25: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

3. LAW OF SMALL NUMBERS

Fact: Extreme outcomes are found in smaller sample. Ignoring this fact is a fallacy

Normal focus is on the statement and causality; not on reliability of the results. Statistics present the information but do not explain the causality

Prevalent in Surveys: e.g: Manager 360 Surveys.

Page 26: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

Success = Talent + Luck

Normal Distribution

An Experiment

1. 100 Students. 2. Pick the worst performers in a test. 3. Punish them 4. Administer second test

Finding: Their test scores improved

Conclusion: Punishment worked

Page 27: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

REGRESSION TO MEAN

Page 28: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

4. REGRESSION FALLACY

Fact: If a variable is extreme on its first measurement, it will tend to be closer to the average on its second measurement.

Regression Fallacy: Falling to account natural variation.

Marketing Manager 1: ROI +323%Campaign ACampaign B

Marketing Manager 2: ROI +230%Campaign CCampaign D

.

.

.Marketing Manager 20: ROI -256%

Page 29: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

5. CAUSATION & CORRELATION

Tendency: Perceiving a relationship between two unrelated events. Moreover, perceiving one caused the other.

Hamilton, David L., and Robert K. Gifford. "Illusory correlation in interpersonal perception: A cognitive basis of stereotypic judgments." Journal of Experimental Social Psychology 12.4 (1976): 392-407.

Ice Cream Shark attacks

Ice CreamShark attacks

More people in beach

Shark attacks

Page 30: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

6. SELECTION & SELF SELECTION

Selection Bias: Sample used is not representative of the population

Self Selection Bias: Sample consists of volunteers… particularly, when volunteers bear good news

Example: Customer Care > Satisfaction Survey

5%

20%

30%

40%

50%

60%

Satisfaction Rate

ResposeRate

OfferC

hann

el =

Em

ail

Phon

eC

hat

Page 31: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

7. FORECASTING -> POSITIVE BIAS

Planning Fallacy: Underestimate time taken to finish a future task

Optimism Bias: One feels less prone to risk of negative events

Due to: Self enhancement and perceived control

Leads to: Time / Cost overruns, Benefit Shortfalls

Page 32: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

8. ILLUSION OF CONTROL

Tendency: Overestimate our ability to control events. Sense of control over outcomes they do not control

Page 33: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

Meeting room Thermostats

Page 34: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

9. DUNNING-KRUGER EFFECT

Tendency: Unskilled individuals to overestimate their own ability and the tendency for experts to underestimate their own ability. Corollary ture.

Kruger, Justin, and David Dunning. "Unskilled and unaware of it: how difficulties in recognizing one's own incompetence lead to inflated self-assessments." Journal of personality and social psychology 77.6 (1999): 1121.

Page 35: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

10. CONFIRMATION BIAS

Human Tendency: To search, process, interpret and favour data in a way that confirms to one’s hypothesis or beliefs.

1. Remember and present information selectively.

2. Cherry picking data

3. “Case Studies”

“Let’s find data to prove our point of view”

Snyder, M. and Cantor, N. (1979), "Testing Hypotheses about Other People: The Use of Historical Knowledge," Journal of Experimental Social

Psychology, 15, 330-342

Page 36: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

BIASES WITH INTERPRETATION

Observed Human Tendency:

More than 85% believed they were less biased than the average

Bias = Systematic Error

Page 37: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

RECOMMENDED READING

Page 38: RapidMiner Wisdom 2016 - Vijay Kotu - Keynote

BIASED OPINIONS BY...

Vijay Kotu

linkedin.com/in/vkotu

@VijayKotu