a unified approach for assessing agreement lawrence lin, baxter healthcare a. s. hedayat, university...

Post on 27-Mar-2015

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Unified Approach for Assessing Agreement

Lawrence Lin, Baxter Healthcare A. S. Hedayat, University of Illinois at

Chicago Wenting Wu, Mayo Clinic

Outline

IntroductionExisting approachesA unified approachSimulation studiesExamples

Introduction Different situations for agreement

Two raters, each with single readingMore than two raters, each with single readingMore than two raters, each with multiple readings• Agreement within a rater• Agreement among raters based on means• Agreement among raters based on individual

readings

Existing Approaches (1)

Agreement between two raters, each with single reading

Categorical data: • Kappa and weighted kappa

Continuous data: • Concordance Correlation Coefficient (CCC)• Intraclass Correlation Coefficient (ICC)

Existing Approaches (2)

Agreement among more than two raters, each with single reading

Lin (1989): no inferenceBarnhart, Haber and Song (2001, 2002): GEEKing and Chinchilli (2001, 2001): U-statisticsCarrasco and Jover (2003): variance components

Existing Approaches (3)

Agreement among more than two raters, each with multiple readings

Barnhart (2005)• Intra-rater/ inter-rater (based on

means) /total (based on individual observations) agreement

• GEE method to model the first and second moments

Unified Approach

Agreement among k (k≥2) raters, with each rater measures each of the n subjects multiple (m) times.Separate intra-rater agreement and inter-rater agreementMeasure relative agreement, precision, accuracy, and absolute agreement, Total Deviation Index (TDI) and Coverage Probability (CP)

Unified Approach - summary

Using GEE method to estimate all agreement indices and their inferencesAll agreement indices are expressed as functions of variance componentsData: continuous/binary/ordinaryMost current popular methods become special cases of this approach

Unified Approach - model

Set up

subject effect subject by rater effect error effect

rater effect

ijlijjiijl ey ),0(: 2

i

),0(: 2eijle

k

jj

1

0

),0(: 2ij

ni ,...,2,1

kj ,...,2,1

ml ,...2,1

)1()(1

1 1'

2'

2

kkk

j

k

jjjj

mj ,...,2,1

Unified Approach - targets

Intra-rater agreement: overall, are k raters consistent with themselves?

Inter-rater agreement: Inter-rater agreement (agreement based on mean): overall, are k raters agree with each other based on the average of m readings?Total agreement (agreement based on individual reading): overall, are k raters agree with each other based on individual of the m readings?

Unified Approach – agreement(intra)

: for over all k raters, how well is each rater in reproducing his readings?

]...,|)1()([

])1()([1

,,2,11

2.

1

2.

,

tindependenyyymyyE

myyE

ijmijij

m

lijijl

m

lijijl

Intrac

222

22

e

Intrac,

Unified Approach – precision(intra) and MSD

: for any rater j, the proportion of the variance that is attributable to the subjects (same as )Examine the absolute agreement independent of the total data range:

Intra

Intrac,

22'

2 2)( eijlijl yyEMSD

Unified Approach – TDI(intra) : for each rater j, % of observations are within unit of their replicated readings from the same rater.

is the cumulative normal distribution is the absolute value

||)2

11(1

)(

Intra

21 2)2

11( e

)( Intra )100*()( Intra

|.|

Unified Approach – CP(intra)

: for each rater j, of observations are within unit of their replicated readings from the same rater

)( Intra %100*)( Intra

)]2/(1[21 2)( eIntra

Unified Approach – agreement(inter)

: for over all k raters, how well are raters in reproducing each others based on the average of the multiple readings?

],,...,,|)1()([

)]1()([

1

..2.11

2...

1

2...

,

tindependenyyykyyE

kyyE

ikii

k

jiij

k

jiij

Interc

2222

2

/

me

Interc,

Unified Approach – precision(inter) : for any two raters, the proportion of the variance that is attributable to the subjects based on the average of the m readings

)var()var(

),cov(

'..

'..

ijij

ijijInter

yy

yy

me /222

2

Inter

Unified Approach – accuracy(inter)

: how close are the means of different raters:

2222

222

, /

/

m

m

e

eIntera

InteraInterInterc ,,

Intera,

Unified Approach – TDI(inter)

: for overall k raters, % of the average readings are within unit of the replicated averaged readings from the other rater.

)100*(

||)2

11(1

)(

Inter

me /222)2

11( 2221

)( Inter

)( Inter

Unified Approach – CP(inter)

: for each rater j, of averaged readings are within unit of replicated averaged readings from the other rater

)( Inter %100*)( Inter

)]/222/(1[21 222)( meIntra

Unified Approach – agreement(total)

: for over all k raters, how well are raters in reproducing each others based on the individual readings?

],,...,,|)1()([

])1()([

1

211

2.

1

2.

,

tindependenyyykyyE

kyyE

ikllili

k

jliijl

k

jliijl

totalc

2222

2

e

Totalc,

Unified Approach – precision(total)

: for any two raters, the proportion of the variance that is attributable to the subjects based on the individual readings

)var()var(

),cov(

''

''

lijijl

lijijltotal

yy

yy

222

2

e

total

Unified Approach – accuracy(total)

: how close are the means of different raters (accuracy)

2222

222

,

e

etotala

totala,

totalatotalTotalc ,,

Unified Approach – TDI(total)

: for overall k raters, % of the readings are within unit of the replicated readings from the other rater.

)(Total)100*(

||)2

11(1

)(

Total

2221 222)2

11( e

)(Total

Unified Approach – CP(total)

: for each rater j, of readings are within unit of replicated readings from the other rater

)(Total %100*)(Total

)]222/(1[21 222)( eTotal

Unified Approach

is the inverse cumulative normal distribution is a central Chi-squre distribution with df=1

)2

11(1 Q

Statistics INTRA INTER TOTAL M=1

Agreement

Precision

Accuracy NA

MSD

TDIπ

CPδ

222

22

e

me /2222

2

2222

2

e

222

2

e

222

22

e

me /222

2

222

2

e

22

2

e

m

m

e

e

/

/2222

222

2222

222

e

e

222

22

e

e

22 e me /222 222 222 222 e 22 22 e

IntraMSDQ InterMSDQ TotalMSDQ MSDQ

)1,(2

2

IntraMSD

)1,(2

2

InterMSD

)1,(2

2

TotalMSD

)1,(2

2

MSD

)1,(2

2

MSD

Estimation and Inference

Estimate all means, variance components,

and their variances and covariances by GEE methodEstimate all indices using above estimatesEstimate variances of all indices using above estimates and delta method

222221 ,,,,,,,, ek

Estimation and Inference (2)

: the covariance of two replications,

and ,with coming from rater and

coming from rater

'j

''lljjjl

)1(

2

2

1

1 1' 1 1'''

2

kkm

k

j

k

jj

m

l

m

llljj

'l l'l

Estimation and Inference (3)

: the variance from each combination

of (i, j), i.e., each cell. Thus is the average of all cells’ variances.

nk

n

i

k

jij

e

1 1

2

2

2ij

2e

Estimation and Inference (4)

: the variance of replication of rater : the covariance of two replications, and

, both of them coming from rater .

kmA

k

j

m

ljl

2

1 1

2

kmB

k

j

m

l

m

lljll

2

1

1

1 1'

'2

mBA e /222

2jl

'jll

l j

jl 'l

Estimation and Inference (5)

Using GEE method to estimate all indices through estimating the means and all variance components: 2222

21 ,,,,,...,, ek

n

iiiii YYHF

1

1' 0)(

Estimation and Inference (6)

E

D

C

B

A

A

A

YY k

j

i

..

..1

222

2

2

222

1

/

/

.

.

)(

m

m

YYE

e

e

e

k

j

ii

Estimation and Inference (7)myyyA ijmijijj /)..( 21

)1(/)(1

1 1'

2'..

kkyyBk

j

k

jjijij

)1(/]))([(2 21

1 1' 1 1'''.''.

kkmyyyyCk

j

k

jj

m

l

m

lljlijjlijl

k

j

m

lijijl kmyyD

1 1

2. /])1/()([

kmyyyykmyyEk

j

m

l

m

lljlijljlijl

k

j

m

ljlijl

2

1

1

1 1''.'.

2

1 1

2. /]))([(2/)(

Estimation and Inference (8)

is the working variance-covariance structure of , “working” means assume following normal distribution is the derivative matrix of expectation of with respective to all the parameters

iH

iYYiYY

),,,,,...,(/ 22221 ekiiF

iF

iYY

Estimation and Inference (9)GEE method provides:

estimates of all meansestimates of all variance componentsestimates of variances for all variance componentsEstimates of covariances between any two variance components

Estimation and Inference (10)

Delta method is used to estimate the variances for all indices

2222

22,

22222,

, )(

)var()],cov(2)var()[var()1()var(

e

eIntracIntracIntrac

2222

2222,,

)(

)],cov(),[cov()1(2

e

eeIntracIntrac

Estimation and Inference (12)

2222

22222,

22,

, )(

)var(/)var()[var()()var()1()var(

e

eIntercIntercInterc

m

22222

222222

)/(

]/),cov(2/),cov(2),cov(2

m

mm

e

ee

22222

222222,,

)/(

]/),cov(),cov(),[cov()1(2

m

m

e

eIntercInterc

Estimation and Inference (13)

2222

2222,

22,

, )(

)var()var()[var()()var()1()var(

e

eTotalcTotalcTotalc

22222

222222

)(

)],cov(2),cov(2),cov(2

e

ee

22222

222222,,

)(

)],cov(),cov(),[cov()1(2

e

eTotalcTotalc

Estimation and Inference (14)

2222

22222222

)/(

]/),cov(2/)var()[var()()var()1()var(

m

mm

e

eeInterInterInter

2222

2222

)/(

]/),cov(),[cov()1(2

m

m

e

eInterInter

2222

2222222

)(

)],cov(2)var()[var()()var()1()var(

e

eeTotalTotalTotal

2222

2222

)(

)],cov(),[cov()1(2

e

eTotalTotal

Estimation and Inference (15)

22222

2222222,

, )/(

/),cov(2/)var()var()[var()1()var(

m

mm

e

eeInteraIntera

22222

222222,,

)/(

)],cov(/),cov(),[cov()1(2

m

m

e

eInteraIntera

22222

22,

2222

)/(

)var(]/),cov(2),cov(2

m

m

e

Interae

Estimation and Inference (16)

22222

222222,

, )(

),cov(2)var()var()[var()1()var(

e

eeTotalaTotala

22222

22,

2222

)(

)var()],cov(2),cov(2

e

Totalae

22222

222222,,

)(

)],cov(),cov(),[cov()1(2

e

eTotalaTotala

Estimation and Inference (17)

)var(4)var( 22eIntra

]),cov(),cov(

),cov()var(

)var()[var(4)var(2222

222

2222

mmmeee

Inter

)],cov(),cov(),cov()var()var()[var(4)var( 2222222222eeeTotal

Estimation and Inference (18)Transformations for variances

Z-transformation: CCC-indices and precision indices

Logit-transformation: accuracy and CP indices

Log-transformation: TDI indices

)1

1ln(

2

1

c

cz

)1

ln(a

a

)ln( 2e

Simulation Studythree types of data: binary/ordinary/normalthree cases for each type of data

k=2, m=1 / k=4, m=1 / k=2, m=3

for each case: 1000 random samples with sample size n=20for binary and ordinary data: inferences obtained through transformation vs. no-transformationFor normal data: transformation

Simulation Study (2)Conclusions:

Algorithm works well for three types of data, both in estimates and in inferencesFor binary and ordinary data: no need for transformationFor normal data, Carrasco’s method is superior than us, but for categorical data, our is superior. For ordinal data, both Carrasco’s method and ours are similar.

Example One

Sigma method vs. HemoCue method in measuring the DCHLb level in patients’ serum299 samples: each sample collected twice by each method Range: 50-2000 mg/dL

Example One – HemoCue method

HemoCue method first readings vs. second readings

Example One – Sigma method

Sigma method first readings vs. second readings

Example One – HemoCue vs. Sigma

HemoCue’s averages vs. Sigma’s averages

Example One – analysis result (1)

Statistics Estimates 95% CI* Allowance

ccc_inter 0.9866 0.9818 0.9775

ccc_total 0.9859 0.9809

precision_intra 0.9986 0.9982 0.9943

precision_inter 0.9866 0.9818

precision_total 0.9860 0.9809

accuracy_inter 0.9999 0.9974

accuracy_total 0.9999 0.9974

Example One – analysis result (2)

*: for all CCC, precision, accuracy and CP indices, the 95% lower limits are

reported. For all TDI indices, the 95% upper limit are reported.

Statistics Estimates 95% CI* Allowance

TDIintra(0.9) 41.0903 47.2713 75

TDIinter(0.9) 127.273 149.799 150

TDItotal(0.9) 130.548 152.678

CPintra(75) 0.9973 0.9942 0.9

CPinter(150) 0.9475 0.9170 0.9

CPintra(150) 0.9412 0.9102

Example Two

Hemagglutinin Inhibition (HAI) assay for antibody to Influenza A (H3N2) in rabbit serum samples from two labs64 rabbit serum samples: measured twice by each labAntibody level: negative/positive/highly positive

Example Two – Lab one

Second Reading

First Reading

Negative

Positive Highly positive

Negative

6 1 0

Positive 0 49 0

Highly positive

0 0 8

Example Two – Lab two

Second Reading

First Reading

Negative Positive Highly positive

Negative 2 0 0

Positive 0 22 2

Highly positive

0 5 33

Example Two: Lab one vs. lab two

Lab Two First Reading

Lab OneFirst Reading

Negative Positive Highly positiv

e

Negative 2 5 0

Positive 0 19 30

Highly positive

0 0 8

Example Two: lab one vs. lab two

Lab Two Second Reading

Lab OneSecond Reading

Negative Positive Highly positive

Negative 2 4 0

Positive 0 23 27

Highly positive

0 0 8

Example TwoStatistics Estimates 95% CI* Allowance

ccc_inter 0.37225 0.22039 0.4375

ccc_total 0.35776 0.20970

precision_intra

0.88361 0.79692 0.75

precision_inter

0.56795 0.4359

precision_total

0.53489 0.39999

accuracy_inter

0.65543 0.51586

accuracy_total

0.66885 0.53561

Conclusions (1)

When data are continuous and m goes to ∞:

agreement indices are the same as that proposed by Barnhart (2005), both in estimates and inferencesimprovements• Precision indices, accuracy indices TDIs

and CP• Variance components

Conclusions (2)When m=1:

agreement index degenerates into OCCC as proposed by King (2002), Carrasco (2003) for continuous data Improvements:

• For categorical data:– King’s method: approximates to kappa and weighted

kappa, our estimates (without transformation) are exactly the same as kappa and weighted kappa, both in estimate and in inference.

– Our estimates superior to Carrasco’s estimates when precision and accuracy are high

• Covariates adjustment become available

Conclusions (3)

When data are continuous, k=2 and m=1:

agreement index degenerates to the original CCC by Lin (1989)

When data are binary, k=2 and m=1:

agreement index degenerates into kappa, both in estimate and inference

Conclusions (4)When data are ordinary, k=2 and m=1:

agreement index degenerates into weighted kappa with below weight set, both in estimate and in inference.

kjik

jiwij ,...,2,1,,

)1(

)(1

2

2

Conclusions (5)

Unified approach Relative agreement indices: CCC with precision and accuracy – data rangeAbsolute agreement: Total deviation indices and Coverage Probability – normal assumptionLink function need more workRequire balanced data

ReferencesBarkto, John J (1966): The intraclass correlation coefficient as a measure of reliability. Pshchological Reports 19, 3-11.Barnhart, H. X. and Williamson, J. M. (2001). Modeling concordance correlation via GEE to evaluate reproducibility. Biometrics 57, 931-940.Barnhart, H. X. Song, Jingli and Haber, Michael J. (2005): Assessing intra, inter and total agreement with replicated readings. Statistics in Medicine 19: 255-270.Carrasco, J. L. and Jover, L. (2003). Estimating the generalized concordance correlation coefficient through variance components. Biometrics 59, 849-858.Fleiss, J., Cohen, J. and Everitt, B (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin 72, 323-327.King, Tonya S. and Chinchilli, Vernon M. (2001): A generalized concordance correlation coefficient for continuous and categorical data. Statistics in Medicine 20: 2131-2147.Lin, L. I. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 255-268.Lin, L. I., Hedayat, A. S., Sinha, B., and Yang, M. (2002). Statistical methods in assessing agreement: models, issues & tools. Journal of American Statistical Association 97(457), 257-270.Wu, Wenting. A unified approach for assessing agreement. Ph.D. thesis, UIC, 2006

top related