thinking about data: a simple principle to help you improve your scientific data analysis

Post on 20-Jan-2016

22 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Thinking About Data: A Simple Principle to Help You Improve your Scientific Data Analysis Scott A. Venners, Ph.D., MPH November 13, 2003. 1. PowerPoint slides available at: www.artima.com/AMU/lecture.ppt (Try tomorrow). 2. ?. Classes. First Data Set. 3. Y = Outcome Variable - PowerPoint PPT Presentation

TRANSCRIPT

Thinking About Data:

A Simple Principle to Help You Improve your Scientific Data Analysis

Scott A. Venners, Ph.D., MPH

November 13, 20031

PowerPoint slides available at:

www.artima.com/AMU/lecture.ppt

(Try tomorrow)

2

Classes First Data Set

?

3

Y = Outcome Variable

X = Predictor of Interest

Cov1…N = Potential Confounders (Covariates)

Y = X + Cov1 + Cov2 + Cov3 + … + Cov(n)

X p-value <0.05?

Yes - Write a paper.

4

Simple Principle:

1. Your model only represents one possible explanation of data.

2. You must actively think of all possible alternative explanations and test them.

3. Those that are not testable define the uncertainty of your analysis.

5

(Can Test)

Possible Explanations of Data

(Cannot Test)

6

(Can Test)

Possible Explanations of Data

(Cannot Test)

7

(Can Test)

Possible Explanations of Data

(Cannot Test)

8

Possible Explanations of Data

(Cannot Test)Model

9

(Can Test)

Do not stop here!

(Cannot Test)

Model

10

Skills you need:

1. Thinking of possible explanations

2. Knowing how to test them.

11

Example 1: Simple model.

Skill: Visualizing Confounding

12

Example 1: Does an inactive lifestyle increase the risk of low bone density?

= Inactive Lifestyle

= Active Lifestyle13

Inactive LifestyleActive Lifestyle

14

Inactive LifestyleActive Lifestyle

= Inactive Lifestyle

= Active Lifestyle

= Low Bone Density

15

Inactive LifestyleActive Lifestyle

What else could cause this result?

Female, Smoking, Excessive Alcohol, Old Age…

16

Inactive LifestyleActive Lifestyle

Female Smoking Ex Alcohol Old Age

Active Lifestyle Inactive Lifestyle

49% 21%

1% 30%

51% 19%

1% 50%

17

Inactive LifestyleActive Lifestyle

Is the association between inactive lifestyle and low bone density confounded by old age?

30% Old Age 50%

18

Inactive LifestyleActive Lifestyle

Is the association between inactive lifestyle and low bone density confounded by old age?

30% Old Age 50%

No 19

Older Age

Lo

w B

on

e D

ensi

ty

Active Inactive

30%

50%

Younger AgeL

ow

Bo

ne

Den

sity

Active Inactive

30%

50%

20

Inactive LifestyleActive Lifestyle

Is the association between inactive lifestyle and low bone density confounded by old age?

Yes

30% Old Age 50%

21

Lo

w B

on

e D

ensi

ty

Active Inactive

0% 0%

Younger Age

Lo

w B

on

e D

ensi

ty

Active Inactive

100% 100%

Older Age

22

Independent Effect(s)

Active (0)

Inactive (1)

Active (0)

Inactive (1)

Lo

w B

on

e D

en

sit

y

10% 30%10% 30%

Inactive Only

10 + 0(Old) + 20(Inactive)

Older Age (1)Younger Age (0)

23

Independent Effect(s)

Older Age Only

10 + 20(Old) + 0(Inactive)Lo

w B

on

e D

en

sit

y

30% 30%10% 10%

Lo

w B

on

e D

en

sit

y

10% 30%10% 30%

Inactive Only

10 + 0(Old) + 20(Inactive)

Older Age (1)Younger Age (0)

Active (0)

Inactive (1)

Active (0)

Inactive (1)

24

Independent Effect(s)

Both Older Age and Inactive

10 + 20(Old) + 20(Inactive)

Older Age (1)L

ow

Bo

ne

De

ns

ity

30% 50%

Younger Age (0)

Older Age Only

10 + 20(Old) + 0(Inactive)Lo

w B

on

e D

en

sit

y

30% 30%10% 10%

10% 30%

Lo

w B

on

e D

en

sit

y

10% 30%10% 30%

Inactive Only

10 + 0(Old) + 20(Inactive)

Active (0)

Inactive (1)

Active (0)

Inactive (1) 25

Independent Effect(s)

Both Older Age and Inactive

10 + 20(Old) + 20(Inactive)

Lo

w B

on

e D

en

sit

y

30% 50%10% 30%

Older Age (1)Younger Age (0)

Active (0)

Inactive (1)

Active (0)

Inactive (1)

26

Independent Effect(s)

Both Older Age and Inactive

10 + 20(Old) + 20(Inactive)

Lo

w B

on

e D

en

sit

y

30% 50%10% 30%

Active Inactive Active Inactive

Older Age and Inactive Interaction

10 + 20(Old) + 20(Inactive)

+ 10(Old*Inactive)

Lo

w B

on

e D

en

sit

y

30% 60%10% 30%

Older Age (1)Younger Age (0)

27

Example 2:

Sometimes just putting potential confounders into model is not correct.

28

Example 2: Does passive smoking increase the risk of chronic cough?

= Passive Smoking

= No Passive Smoking29

Passive SmokingNo Passive Smoking

30

= Passive Smoking

= No Passive Smoking

= Chronic Cough

Passive SmokingNo Passive Smoking

25% Cough 25%

31

Passive SmokingNo Passive Smoking

What else could cause this result?

Active Smoking…

25% Cough 25%

32

Passive SmokingNo Passive Smoking

Is the association between passive smoking and cough confounded by active smoking?

45% Active Smoking 17%

33

Active Smoking

Co

ug

h

No Passive Passive

47% 47%

No Active SmokingC

ou

gh 7% 20%

No Passive Passive34

Co

ug

h

No Passive Passive

47% 47%

Co

ug

h 7% 20%

No Passive Passive

How to model?Active SmokingNo Active Smoking

35

Co

ug

h

47% 47%

Co

ug

h 7% 20%

No Passive (0)

Passive (1)

How to model?Active Smoking (1)No Active Smoking (0)

No Passive (0)

Passive (1)

?Cough% = 7 + 40(Smoke) + 13(Passive) - 13(Smoke*Passive)

No36

Example 3:

Sometimes explanations for data are not so clear.

37

Husband’s current smoking

None

<20 cigs/day

>20 cigs/day

Crude Adjusted*

OR p

Ref

1.19 .429

2.18 .013

Ref

1.04 .854

1.81 .049

OR p

* Adjusted for husband and wife’s ages, education, stress, exposure to dust and noise, husband’s alcohol use, previous smoking, and exposure to toxins, and wife’s body-mass index.

Odds ratios of early pregnancy loss.

38

Husband’s current smoking

None

<20 cigs/day

>=20 cigs/day

Crude Adjusted*

OR p

Ref

1.19 .429

2.18 .013

Ref

1.14 .576

2.02 .022

OR p

If remove husband’s education from model:

39

High School 79% 59% 50%

Husband’s Smoking None <20 cigs/day >=20 cigs/day

40

20%

29%

44%

22% 21%

30%

< High School >= High School

% Early Pregnancy

Loss

Husband’s Smoking None <20 cigs/day >20 cigs/day

41

20%

29%

44%

22% 21%

30%

< High School >= High School

% Early Pregnancy

Loss

Husband’s Smoking None <20 cigs/day >20 cigs/day

High School 79% 59% 50%

42

Main Points:

No matter if you have good resultsor bad, always think beyond your preferred explanation for data.

Explore all possibilities before choosing your preferred model.

Acknowledge what you cannot testas your limitations.

43

top related