regression using serial data jyoti sarkar, iupui october 30, [email protected]

48
Regression using serial data Jyoti Sarkar, IUPUI October 30, 2002 [email protected] 1

Upload: nora-perkins

Post on 13-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Regression using serial data

Jyoti Sarkar, IUPUI

October 30, 2002 [email protected] 1

The ProblemGiven: On n units, (x,y) “before” and

(x,y) “after” a treatmentGoal: Regress y on x

• X=a predictor (easy/inexpensive)• Y=a response (difficult/expensive)• Assume n units are independent

October 30, 2002 [email protected] 2

ExampleConcern: Many Japanese exhibit a bigger lower jaw

than upper jaw.Treatment (for growth of upper jaw):

Children (4 -12 years old) wore a mouth gear 8 -10 hours daily for 1- 2 years.

Questions• Was the treatment effective? … no control gp• How did measurements change with age?

October 30, 2002 [email protected] 3

October 30, 2002 [email protected] 4

Face Mask Experiment

Sample: 25 boys, 18 girls“before” and “after” treatment

– age (year day)

From X-ray plates measure:– ccorr = corrected C-axis SM (mm)– theta = (C-axis, anterior cranial base SN) – alpha = (C-axis, palatal plane thro’ M)

( ½ degree)

Objective: Regress y=ccorr on x=ageOctober 30, 2002 [email protected] 5

Face Mask data ☺ patient gender age1 theta1 alpha1 ccorr1 age2 theta2 alpha2 ccorr2

☺ 1 2 4.99 39.0 35.0 76.076 5.99 40.5 32.0 77.064

☺ 2 2 9.90 43.0 32.0 68.666 11.43 47.0 34.0 72.124etc.

October 30, 2002 [email protected] 6

Regress y=ccorr on x=age:(1) “before” data: (n=18)

>Regress ccorr1 on 1 age1

ccorr1=66.97 + 0.2530 age1• r2=.023, r2(adj)=.000• S=3.632, SE(b1)=0.4096, p-value=.545

• t.975,16=2.120

• 95% CI(1) = (-0.6153,1.1214)

October 30, 2002 [email protected] 7

121110 9 8 7 6 5 4

75

70

65

age1_

cco

rr1_

S = 3.63205 R-Sq = 2.3 % R-Sq(adj) = 0.0 %

ccorr1_ = 66.9668 + 0.252973 age1_

"Before" Treatment: 18 Girls

October 30, 2002 [email protected] 8

(2) “after” data: girls (n=18)>Regress ccorr2 on 1 age2

ccorr2=71.30 + 0.1142 age2• r2=.006, r2(adj)=.000• S=3.321, SE(b1)=0.3738, p-value=.764

• t.975,16=2.120

• 95% CI(1) = (-0.6782, 0.9066)

October 30, 2002 [email protected] 9

1413121110 9 8 7 6 5

80

75

70

age2_

cco

rr2_

S = 3.32165 R-Sq = 0.6 % R-Sq(adj) = 0.0 %

ccorr2_ = 71.3030 + 0.114175 age2_

"After" Treatment: 18 Girls

October 30, 2002 [email protected] 10

Data size doubled, range expanded>Stack age1 age2 age>Stack ccorr1 ccorr2 ccorr>Regress ccorr on 1 ageCcorr =67.5636 + 0.3793 age• r2=.049, r2adj=.021• S=3.745, SE(b1)=0.2880, p-value=.197• t.975,34=2.032• 95% CI(1) = (-0.2060, 0.9646)

October 30, 2002 [email protected] 11

(3) “superimposed” data: (n=36)

1413121110 9 8 7 6 5 4

80

75

70

65

age

cco

rr

S = 3.74537 R-Sq = 4.9 % R-Sq(adj) = 2.1 %

ccorr = 67.5636 + 0.379335 age

"Superimposed" Data: 18 Girls

October 30, 2002 [email protected] 12

Regress y on x: naïve attempts

All 3 naïve attempts yield• Low r2

• Large p-value => slope=0• CI э 0Conclusion:• Either “ccorr does not depend on age”• Or “we need a better regression model”

October 30, 2002 [email protected] 13

1413121110987654

80

75

70

65

age

cco

rr

Serial Bivariate Data: n=18 girls

October 30, 2002 [email protected] 14

Serial Bivariate Plot • ccorr increases with age (for most girls)• Regression of ccorr on age should have

positive slope, especially under treatment

Why then is r2 low?Between-subject variation is high.

Study within-subject change, to see if ccorr depends on age.

October 30, 2002 [email protected] 15

Within-subject change

• age = age2 - age1 = Treatment duration • ccorr = ccorr2 – ccorr1 = Change in ccorr• ccorr / age = within-subject slopeMeans (n=18 girls)

age2 = 8.39 ccorr2 = 72.26 age1 = 7.26 ccorr1 = 68.80 age = 1.13 ccorr = 3.46 ccorr/ age = 3.0251

Recall b1= (1) 0.2530 (2) 0.1142 (3) 0.3793

October 30, 2002 [email protected] 16

1.51.00.50.0

10

9

8

7

6

5

4

3

2

1

0

dage

dcco

rr

Within Subject Change: n=18 girls

October 30, 2002 [email protected] 17

Regress ccorr on age

>Regress dccorr on 1 dage;>noconstant.

dccorr = 3.0763 dageS=2.374, SE(b1)=0.4847, p-value = .000 t.975,17=2.11095% CI(1) = (2.0536,4.0990)Conclusion: ccorr increases with age

October 30, 2002 [email protected] 18

A Paradox:• Naïve regression slopes are zero• Within-subject slope is non-zero

What to do?• Find the proper regression model.

– Repeated Measures/Growth Curves– Repeated Measures with Covariate– Serial Correlation

October 30, 2002 [email protected] 19

Serial Correlation Model 1 • Regression model

ccorr = 0 + 1 age + error

• error variables ID N(0,2), dependent• Between-subject errors uncorrelated• Within-subject errors have correlation

October 30, 2002 [email protected] 20

Regression Model 1

1

1

2

1

0

2

2

1

2

1

2

1

2

1

2

1

Cov

,*)0(with

1

1

,...,1For

IID

j

j

j

j

j

j

j

j

j

j

N

x

xy

y

nj

October 30, 2002 [email protected] 21

If unknownPre-multiply by

1

1

1

1

2

1

1

1

1

1

2

1

12

212/1

1

1

2

1

where

k

k

kk

kk

October 30, 2002 [email protected] 22

Orthogonalized Model 1

10

01~

~

2

1

0

2

2

1

2IID

2~

1~

2

1

21

1

11

1

2

1

Cov

),0(with

~

~

~

~

~

~

,...,1For

2

j

j

j

j

j

j

j

j

j

j

IN

x

x

y

y

nj

October 30, 2002 [email protected] 23

Stacking …

0~,~corr

cov

121

2

2)~(

]~

1[~ ~

1

1/0

nn

jjj

I

XY

October 30, 2002 [email protected] 24

If unknown

)(SE)CI( )100%-(1 (5)

)SE(32

22)SE( RMSE,

32

22ˆ)4(

andˆ1where

Set (3)

~~Regress (2)

and izeOrthogonal Algorithm. usingˆGet (1)

132,2/111

11

1100

10

10

btbβ

cn

nb

n

n

cbcb

xbby

xccy

yx

n

October 30, 2002 [email protected] 25

Algorithm: Estimate

0. Begin = correlation(ccorr1, ccorr2)1. Orthogonalize age and ccorr using

to obtain tage & tccorr2. Regress tccorr on 1 tage

Save residuals3. If = corr(tresi1, tresi2) < .001,

STOPElse = Go to Step 1.

October 30, 2002 [email protected] 26

2/

MINITAB codes1>corr c7 c12 # initial rho>let k3=.730 # enter above/updated rho>let k1=(1/sqrt(1+k3)+1/sqrt(1-k3))/2>let k2=(1/sqrt(1+k3)-1/sqrt(1-k3))/2

# orthogonalize age>let c21=k1*c3+k2*c8>let c22=k2*c3+k1*c8>stack c21 c22 c31>name c31 'tage'

October 30, 2002 [email protected] 27

MINITAB codes2>let c23=k1*c7+k2*c12 # orthog… ccorr>let c24=k2*c7+k1*c12>stack c23 c24 c32 >name c32 'tccorr'>regress 'tccorr' 1 'tage'; >resi c33; >coef c34.>unstack c33 c35 c36; subs c18.>corr c35 c36 # STOP if <.001, else>let k3=k3+corr(c35,c36)/2

October 30, 2002 [email protected] 28

1413121110987654

80

75

70

65

age

cco

rr

Serial Bivariate Data: n=18 girls

October 30, 2002 [email protected] 29

757065

80

75

70

ccorr1_

cco

rr2_

Serial Correlation: Initially

October 30, 2002 [email protected] 30

“Orthogonalized” data: (n=36)

First iteration: (Model 1)

Initial =.730

• tccorr =46.9184 + 1.1271 tage• r2=.216, r2(adj)=.193, p-value=.004• Corr(tresi1, tresi2)=.191

Revised =.82545

0

1

October 30, 2002 [email protected] 31

50-5-10

10

5

0

tresi1

tres

i2

Additional Within-Subject Correlation: After 1 Iteration

October 30, 2002 [email protected] 32

Iteration History (Model 1)

Iter

October 30, 2002 [email protected] 33

50-5-10-15

10

5

0

tresi1

tres

i2

Additional Within-Subject Correlation: After 5 Iterations

October 30, 2002 [email protected] 34

“Orthogonalized” data: (n=36)

After Five iterations: =.8646

• Corr(tresi1, tresi2)=.000

• tccorr =42.132 + 1.6613 tage• r2=.347, r2(adj)=.328, • S=5.1319, SE(c1)=0.3908, p-value=.000

5

October 30, 2002 [email protected] 35

10 5 0

60

50

40

tagec

tcco

rr

S = 5.13827 R-Sq = 34.7 % R-Sq(adj) = 32.8 %

tccorr = 42.1316 + 1.66135 tagec

Regression with Serial Data: After 5 iterations

October 30, 2002 [email protected] 36

Regress y on x : (Model 1)ccorr = 57.532 + 1.6613 age

• =0.8646• =5.2091, SE(b1)=.3967, p-value=.000

• t.975,33=2.0345

• 95% CI(1) = (0.8560, 2.4683)

October 30, 2002 [email protected] 37

1494

82

72

62

age

ccor

r

Regression Using Serial Data (Model1)

October 30, 2002 [email protected] 38

Serial Correlation Model 2

• Regression model 2ccorr = 0 + 1 (age) + error

• error variables ID N(0,2), dependent• Between-subject errors uncorrelated• Within-subject errors have correlation (age2-age1)

October 30, 2002 [email protected] 39

Regression Model 2

1

1

1

0

2

2

1

2

1

2

1

2

1

Cov

with

1

1

,...,1For

1jx

2jx

1jx

2jx

j

j

j

j

j

j

j

j

x

x

y

y

nj

October 30, 2002 [email protected] 40

MINITAB Codes 3>let c19=‘age2’ – ‘age1’ >name c19 ‘dage’>corr c7 c12>let k3=.730 # enter above/updated correlation

# use age to orthogonalize>let c51=(1/sqrt(1+k3**c19)+1/sqrt(1-k3**c19))/2>let c52=(1/sqrt(1+k3**c19) -1/sqrt(1-k3**c19))/2

>let c21=c51*c3+c52*c8>let c22=c52*c3+c51*c8 etc.

October 30, 2002 [email protected] 41

Iteration History: (Model 2)

Iter

October 30, 2002 [email protected] 42

“Orthogonalized” data: (n=36)

After Five iterations: =.8781Corr(tresi1, tresi2)=.000

• tccorr =57.935 intdage + 1.6097 tage• r2=.336, r2(adj)=.316, • S=5.092, SE(c1)=0.3912, p-value=.000

5

October 30, 2002 [email protected] 43

Regress y on x : (Model 2)

October 30, 2002 [email protected] 44

ccorr = 57.935 + 1.6098 age

=0.8781 =5.169, SE(b1)=0.3971, p-

value=.000

t.975, 33=2.0345

95% CI(1) = (0.8018, 2.4176)

1494

80

75

70

65

age

ccor

rRegression Using Serial Data (Model 2)

October 30, 2002 [email protected] 45

1494

82

72

62

age

ccor

rRegression using Serial Data (Models 1 & 2)

October 30, 2002 [email protected] 46

Summary

• Model serial data properly• Estimate serial correlation

Use iterated algorithm• Regress orthogonalized data• Obtain regression of y on x• Adjust , SE(b1) and CI(1)

• Can extend to more repeats per subject

October 30, 2002 [email protected] 47

Thank you.

October 30, 2002 [email protected] 48