computational radiology laboratory a teaching … · computational radiology laboratory ... –...

Department of Radiology, MRI DivisionComputational Radiology Laboratory

An NCRR National Resource Center

Computational Radiology LaboratoryBrigham and Women’s HospitalBoston, Massachusetts USA

a teaching affiliate ofHarvard Medical School

STAPLE: Simultaneous Truth and Performance Level Estimation

An algorithm for the evaluation of image segmentations.Simon K. Warfield, Kelly Zou, William M. Wells

Validation of Image Segmentation• Comparison to digital and physical phantoms:

– Excellent for testing the anatomy, noise and artifact which is modeled.

– Typically lacks range of variability encountered in practice.

• Comparison to expert performance; to other algorithms:

• What is the appropriate measure for such comparisons ?

• Our new approach:• Simultaneous estimation of hidden ``ground truth’’ and

expert performance.• Enables comparison between and to experts.• Can be easily applied to clinical data exhibiting range of

normal and pathological variability.

How to judge segmentations of the peripheral zone?

Peripheral zone and segmentations0.5T MR of prostate

Algorithm

• Complete data model:• Binary ground truth Ti for each voxel i.• Expert j makes segmentation decisions Dij

• Expert performance characterized by sensitivity p and specificity q.

• We observe expert decisions D. If we knew ground truth T, we could construct maximum likelihood estimates for each expert’s sensitivity (true positive fraction) and specificity (true negative fraction):

)|( qp,TD,f

)|,(lnmaxargˆ,ˆ qp,TDqpqp,

f=

Expectation-Maximization

= )|(ln)ˆ|( )ˆ|( θTD,θθ θD,T fEQ g

• Since we don’t know ground truth T, treat T as a random variable, and solve for the parameters that maximize:

• Parameter values θj=[pj qj]T that maximize the conditional expectation of the log-likelihood function are found by iterating two steps:– Estimate hidden ground truth given a previous

estimate of the expert quality parameters.– Estimate expert performance parameters based on

how the expert decisions compared to the current estimate of the ground truth.

To Solve for Expert Parameters:

∑=

∑=

=

=

=

∏∏

∏ ∏

∏ ∏∑

iT ij

ojojiij

ij

ojojiij

i

iiT i

jojojiij

ii j

ojojiij

g

g

TgqpTDg

TgqpTDgTg

i j i

TgqpTDg

TgqpTDg

gg

ggg

ffE

fE

)()ˆ,ˆ,|(

)()ˆˆ|()ˆˆ|(

each voxelFor experts.over and sover voxel indexes where

)()ˆ,ˆ,|(

)()ˆˆ|(

)()ˆˆ|(

)()ˆˆ|()ˆˆ|(

expert.each of parameters theof estimates previous theare where

)](),|([lnmaxarg

)]|,([lnmaxargˆ,ˆ

,,

,,

ˆˆ

)ˆˆ|(

)ˆˆ|(

][][

oq,op,D

Toq,opT,D

Toq,opT,Doq,opD,T

Tqp,TD

qp,TDqp

i

T

q,p

q,pD,Tqp,

q,pD,Tqp,

oo

oo

oo

True Segmentation Estimate

ˆ ˆ( 1| )

( 1)( 1) (1 ( 1))

i i

i

i i

W g T

g Tg T g T

αα β

≡ =

==

= + − =

iD ,p ,qo o

: 1 : 0

: 0 : 1

ˆ ˆ(1 )

ˆ ˆ(1 )

( 1) prior probability ground truth is 1 probability that ground truth is 1

ij ij

ij ij

j jj D j D

j jj D j D

i

i

p p

q q

g TW

α

β= =

= =

=

= −

= −

=

∏ ∏∏ ∏

Expert Performance Estimateˆ ˆ( | )

ˆ ˆ( | )

ˆ ˆ( | )

ˆ ˆ( | ),

ˆ ˆ, arg max [ln ( | , ) ( )]

arg max [ln ( | , , ) ln ( )]

arg max [ln ( | , , )]

ˆ ˆ, arg max [ln ( | ,j j

g

g ij i j j iij i

g ij i j jj i

gj j ij ip q

E f f

E f D T p q f T

E f D T p q

p q E f D T p

=

= +

=

=

∏ ∏

∑∑

o o

o o

o o

o o

T D, p , qp,q

T D, p , qp,q

T D, p , qp,q

T D, p , q

p q D T p,q T

,

, : 1 : 1

: 0 : 0

, )]

arg max ln ( | 1, , )

(1 ) ln ( | 0, , )]

arg max ln (1 ) ln(1 )

ln(1 ) (1 ) ln

j j

j jij ij

ij ij

j ji

i ij i j jp q i

i ij i j ji

i j i jp q i D i D

i j i ji D i D

q

W f D T p q

W f D T p q

W p W q

W p W q

= =

= =

= = +

− =

= + − −

+ − + −

∑∑

∑∑ ∑

∑ ∑

Expert Performance Estimators

∑∑∑∑∑

∑

==

=

==

=

−+−

−=

+=

0:1:

0:

0:1:

1:

)1()1(

)1(ˆ

ˆ

ijij

ij

ijij

ij

Di iDi i

Di i

j

Di iDi i

Di i

j

WW

Wq

WW

Wp

p (sensitivity, true positive fraction) : ratio of expert identified class 1 to total class 1 in the image.

q (specificity, true negative fraction) : ratio of expert identified class 0 to total class 0 in the image.

Results• Synthetic expert segmentations of known

ground truth, specified performance parameters.

• Prostate peripheral zone segmentation evaluation.

• Brain tumor segmentation evaluation.• Knee femoral cartilage segmentation

evaluation.

Synthetic Experts• Several experiments with known ground truth

and known performance parameters. • Goal:

– Determine if STAPLE accurately identifies known ground truth.

– Determine if STAPLE accurately determines known expert performance parameters.

– Understand sensitivity of STAPLE with respect to changes in prior hyper-parameters; requirements for number of observations to enable good estimation; convergence characteristics.

Synthetic Experts10 segmentations by experts with p=0.95, q=0.90

0.001685std. dev q0.900035mean q0.001201std. dev p0.950104mean p

STAPLE p,q estimates:

Four segmentations of ten shown. STAPLE ground truth.

Synthetic ExpertsThree segmentations differing by horizontal displacement.

g(Ti=1) = 0.12.88,.99p3,q3

.88,.99p2,q2

1.0,1.0p1,q1

STAPLE results.

g(Ti=1) = 0.500.66,1.0p3,q3

0.66,1.0p2,q2

0.66,1.0p1,q1

Initialize STAPLE with pi=qi=0.90, two experiments with different global priors.

Prostate Peripheral Zone

.944.955.967.951.913Dice

.999.999.999.994.998qj

.895.918.937.991.879pj

54321

STAPLE truth estimateFrequency of selection by experts.

Tumor Segmentation Evaluation

MR image Experts STAPLETumor region

0.99900.99820.98570.9999qj

0.90630.99860.99930.8951pj

auto321

Knee Femoral Cartilage

26 Experts

Knee MRI STAPLE truth estimate.

Conclusion• Key advantages of STAPLE

– Estimates ``true’’ segmentation.– Assesses expert performance.

• Principled mechanism which enables– Comparison of different experts,– Comparison of algorithm and experts.

• Extensions– Non-stationary prior probability g(Ti=1)– Neighborhood model (MRF) for coherent spatial

structure of ground truth.– Incorporate multiple observations by experts.– Priors for expert sensitivity, specificity.

AcknowledgementsData for this study was provided by:

• Peter M. Black.• Ferenc A. Jolesz.• Ron Kikinis.• Lawrence Panych.

• Martha Shenton.• Clare Tempany.• Carl Winalski.• Michael Kaus.

This study was supported by:The Whitaker FoundationCenter for the Integration of Medicine and Innovative TechnologyNIH P41 RR13218, P01 CA67165, R01 RR11747, R01 CA86879, R33 CA99015, R21 CA89449.

Relaxing the voxel independence assumption: MRF model of local coherency.

, ,

ˆ ˆ( | ) ( )ˆ ˆ( | )

ˆ ˆ( | ) ( )

ˆ ˆ( | ) ( ) ( | )

ˆ ˆ( | , , ) ( ) ( | )

where indexes over voxels and over experts.For each voxel

[ ][ ]

ij i oj oj i i ii j

T ij i oj oj i i iii j

g gg

g g

g D T p q g T g T T

g D T p q g T g T T

i j i

g

∂

∂

=

=∑

∑∏ ∏

∏ ∏

T

D T,p ,q To oT D,p ,qo o D T,p ,q To o

, ,

i

ˆ ˆ( | ) ( ) ( | )ˆ ˆ( | )

ˆ ˆ( | , , ) ( ) ( | )

where ( | ) is the prior probability of T given the true segmentation of the neighbors of voxel i.

ij i oj oj i i ij

iT ij i oj oj i i ii

j

i i

g D T p q g T g T TT

g D T p q g T g T T

g T T

∂

∂

∂

=∑

∏∏iD ,p ,qo o

, ,

, ,

ˆ ˆ ˆ ˆ( | ) ( | ) ( ) ( | )

ˆ ˆ ˆ ˆlog ( | ) log( ( | ) ( ))

( (1 )(1 ))

where 0 iff voxels , are neighbors.

i ij i oj oj i i ij

i ij i oj oj ij

kl k l k lk l

kl

g T g D T p q g T g T T

g T g D T p q g T

T T T T

k l

β

β

∂∝

∝ +

+ − −

>

∏

∑

∑∑

i

i

D ,p ,qo o

D ,p ,qo o

: 1 : 0

: 0 : 1

Greig et al. 1989 :Solve for with Ford-Fulkerson 1 ( (1 )(1 ))2

where = log( ( | 1, , ) / ( | 0, , ))

ˆ ˆ(1 ) ( 1)log

ˆ ˆ(1 ) (ij ij

ij ij

i

i i kl k l k lk l

i i i

j j ij D j D

j jj D j D

T i

T T T T T

g T g T

p p g T

q q g T

λ β

λ

= =

= =

∀

+ + − −

= =

− ==

−

∑ ∑∑

∏ ∏∏ ∏

i iD p q D p q

0)

log( /(1 )).

i

i iW W

=

= −

MAP estimation with MRF prior

Synthetic ExpertsOnly three segmentations by different quality experts.

0.9000,0.8987p3, q30.9511,0.8987p2, q20.9505,0.9494p1, q1


p=0.95,q=0.95 p=0.95,q=0.90

p=0.90,q=0.90STAPLE ground truth.

With MRF prior

Synthetic Experts10 observations of segmentation by expert with p=q=0.99

0.00071std. dev q0.990121mean q0.000616std. dev p0.990237mean p


Four segmentations of ten shown. STAPLE ground truth.

computational radiology laboratory a teaching … · computational radiology laboratory ... –...

Documents