chris jermaine computer science department rice university ...cmj4.web.rice.edu/forlydia.pdf · 1...

72
1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to Antimicrobial Resistance Chris Jermaine Computer Science Department Rice University [email protected]

Upload: duongphuc

Post on 06-Feb-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

1

Som

e A

lgor

ithm

s fo

r Det

ectin

g C

hang

es a

nd T

rend

s in

Dat

a...

With

A

pplic

atio

ns to

Ant

imic

robi

al

Res

ista

nce

Chr

is Je

rmai

neC

ompu

ter S

cien

ce D

epar

tmen

tR

ice

Uni

vers

itycm

j4@

cs.ri

ce.e

du

Page 2: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

2

Dru

g R

esis

tant

“Sup

erbu

gs”

•N

osoc

omia

l “su

perb

ugs”

are

a m

ajor

wor

ry-H

osp.

res.

bugs

that

hav

e de

velo

ped

antim

icro

bial

resi

stan

ce

•A

re m

any

head

line-

grab

bing

exa

mpl

es...

-MR

SA (M

ethi

cilli

n-re

sist

ant S

taph

yloc

occu

s aur

eus)

; sta

ph

resi

stan

t to

peni

cilli

ns a

nd c

epha

losp

orin

s

Page 3: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

3

Dru

g R

esis

tant

“Sup

erbu

gs”

•N

osoc

omia

l “su

perb

ugs”

are

a m

ajor

wor

ry-B

acte

ria th

at h

ave

deve

lope

d an

timic

robi

al re

sist

ance

Are

man

y he

adlin

e-gr

abbi

ng e

xam

ples

...-M

RSA

(Met

hici

llin-

resi

stan

t Sta

phyl

ococ

cus a

ureu

s)-R

esis

tant

Pse

udom

onas

aer

ugin

osa

Page 4: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

4

Dru

g R

esis

tant

“Sup

erbu

gs”

•N

osoc

omia

l “su

perb

ugs”

are

a m

ajor

wor

ry-B

acte

ria th

at h

ave

deve

lope

d an

timic

robi

al re

sist

ance

Are

man

y he

adlin

e-gr

abbi

ng e

xam

ples

...-M

RSA

(Met

hici

llin-

resi

stan

t Sta

phyl

ococ

cus a

ureu

s)-R

esis

tant

Pse

udom

onas

aer

ugin

osa

-ND

M-1

(New

Del

hi m

etal

lo-b

eta-

lact

amas

e); e

nzym

e, n

ot a

bu

g; fi

rst f

ound

in K

lebs

iella

pne

umon

iae

in In

dia

Page 5: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

5

Dru

g R

esis

tant

“Sup

erbu

gs”

•N

osoc

omia

l “su

perb

ugs”

are

a m

ajor

wor

ry-B

acte

ria th

at h

ave

deve

lope

d an

timic

robi

al re

sist

ance

Are

man

y he

adlin

e-gr

abbi

ng e

xam

ples

...-M

RSA

(Met

hici

llin-

resi

stan

t Sta

phyl

ococ

cus a

ureu

s)-R

esis

tant

Pse

udom

onas

aer

ugin

osa

-ND

M-1

(New

Del

hi m

etal

lo-b

eta-

lact

amas

e)-K

PC (K

lebs

iella

pne

umon

iae

carb

apen

amas

e); r

esis

tanc

e to

ne

arly

eve

ry d

rug,

requ

ires n

asty

trea

tmen

ts (p

olym

yxin

s)

Stat

es w

here

KPC

has b

een

foun

d

Page 6: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

6

Dru

g R

esis

tant

“Sup

erbu

gs”

•N

osoc

omia

l “su

perb

ugs”

are

a m

ajor

wor

ry-B

acte

ria th

at h

ave

deve

lope

d an

timic

robi

al re

sist

ance

Are

man

y he

adlin

e-gr

abbi

ng e

xam

ples

...-M

RSA

(Met

hici

llin-

resi

stan

t Sta

phyl

ococ

cus a

ureu

s)-R

esis

tant

Pse

udom

onas

aer

ugin

osa

-ND

M-1

(New

Del

hi m

etal

lo-b

eta-

lact

amas

e)-K

PC (K

lebs

iella

pne

umon

iae

carb

apen

amas

e)•

Wha

t hap

pens

whe

n 98

% o

f bug

s hav

e N

DM

-1, K

PC, .

..?-N

ot to

o fa

r fet

ched

-Will

it ta

ke u

s bac

k to

192

8?

Page 7: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

7

How

Do

Sup

erbu

gs A

ppea

r?•

Rel

ated

to m

is-/o

veru

se o

f ant

imic

robi

als

•Se

lect

ive

pres

sure

cau

ses b

ugs t

o “l

earn

to”

cope

with

toxi

ns

Page 8: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

8

How

Do

Sup

erbu

gs A

ppea

r?•

Rel

ated

to m

is-/o

veru

se o

f ant

imic

robi

als

•Se

lect

ive

pres

sure

cau

ses b

ugs t

o “l

earn

to”

cope

with

toxi

ns•

Mos

t ofte

n as

soci

ated

with

hea

lth-c

are

faci

litie

s-M

icro

bes/

antim

icro

bial

s/im

mun

e-co

mpr

omis

ed p

eopl

e/hy

per-

activ

e do

ctor

s all

toge

ther

in sm

all s

pace

-Ofte

n th

ese b

ugs a

re n

ot su

ited

to th

e “re

al w

orld

”; le

t’s h

ope

it st

ays t

hat w

ay•

Are

CA

ver

sion

s, bu

t eve

n th

ese

tend

to b

e ce

nter

ed a

roun

d fa

cil-

ities

not

unl

ike

heal

thca

re (e

x: lo

cker

room

s!)

Page 9: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

9

Trag

edy

of th

e C

omm

ons

•Ev

eryo

ne in

the

hosp

ital k

now

s its

the

docs

’ fau

lt•

Even

the

docs

But t

hat d

oesn

’t m

ean

we

can

actu

ally

bla

me

them

...

Page 10: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

10

Trag

edy

of th

e C

omm

ons

•Ev

eryo

ne in

the

hosp

ital k

now

s its

the

docs

’ fau

lt•

Even

the

docs

But t

hat d

oesn

’t m

ean

we

can

actu

ally

bla

me

them

...•

No

one

was

eve

r sue

d be

caus

e th

ey c

rush

a ti

ny b

ug w

ith th

e bi

g-ge

st h

amm

er a

vaila

ble

-Or b

ecau

se th

ey (t

ry to

) cru

sh a

viru

s with

an

antib

iotic

•A

nd th

ere’

s alw

ays a

smal

l cha

nce

that

if th

e do

c ta

kes t

he ti

me

to

do th

ings

righ

t, di

sast

er c

an re

sult

•H

ence

the

prob

lem

get

s wor

se a

nd w

orse

...

Page 11: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

11

How

Do

Hos

pita

ls C

ope?

•Ty

pica

lly h

ave

an e

pide

mio

logi

st•

Usu

ally

an

MD

trai

ned

in in

fect

ious

dis

ease

s-M

ay a

lso

have

a d

egre

e in

pub

lic h

ealth

, sta

tistic

s, et

c.•

Spen

ds so

me

frac

tion

of ti

me

look

ing

at d

ata,

con

nect

ing

dots

•O

ne ta

sk is

mon

itorin

g em

ergi

ng a

ntim

icro

bial

resi

stan

ce

Page 12: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

12

How

Do

Hos

pita

ls C

ope?

•Ty

pica

lly h

ave

an e

pide

mio

logi

st•

Usu

ally

an

MD

trai

ned

in in

fect

ious

dis

ease

s-M

ay a

lso

have

a d

egre

e in

pub

lic h

ealth

, sta

tistic

s, et

c.•

Spen

ds so

me

frac

tion

of ti

me

look

ing

at d

ata,

con

nect

ing

dots

•O

ne ta

sk is

mon

itorin

g em

ergi

ng a

ntim

icro

bial

resi

stan

ce•

My

impr

essi

on-T

hey

all a

gree

(bad

doc

s) =

(ant

imic

robi

al re

sist

ance

)-B

ut th

ey c

an’t

tell

docs

wha

t to

do-T

hey

see

thei

r job

as d

isco

verin

g/pr

ovid

ing

data

to d

ocs

-Hop

eful

ly, g

et d

ocs t

o th

ink

twic

e be

fore

they

act

Page 13: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

13

Wha

t’s O

ur G

oal?

•W

ant t

o gi

ve th

at e

pide

mio

logi

st a

set o

f too

ls to

hel

p ou

t•

Goa

l is t

o he

lp h

im/h

er “

min

e” th

e da

ta-T

ools

to a

utom

atic

ally

dis

cove

r im

porta

nt tr

ends

-Fin

d th

ings

not

obv

ious

thro

ugh

“tra

ditio

nal m

etho

ds”.

.. ta

bu-

latin

g th

e da

ta, r

unni

ng re

gres

sion

s, et

c.

Page 14: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

14

Wha

t’s O

ur G

oal?

•W

ant t

o gi

ve th

at e

pide

mio

logi

st a

set o

f too

ls to

hel

p ou

t•

Goa

l is t

o he

lp h

im/h

er “

min

e” th

e da

ta-T

ools

to a

utom

atic

ally

dis

cove

r im

porta

nt tr

ends

-Fin

d th

ings

not

obv

ious

thro

ugh

“tra

ditio

nal m

etho

ds”.

.. ta

bu-

latin

g th

e da

ta, r

unni

ng re

gres

sion

s, et

c.•

DIS

CL

AIM

ER

-I’m

a c

ompu

ter s

cien

tist,

not a

clin

icia

n-S

o m

y fo

cus f

rom

her

e is

mos

tly o

n m

etho

ds, n

ot re

sults

!

Page 15: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

15

1st P

rob:

Spa

tial A

nom

aly

Det

ectio

n•

Imag

ine

you

are

a H

E•

Hav

e re

sist

ance

dat

a fo

r you

r hos

pita

l, hu

ndre

ds o

f oth

ers

-For

eac

h bu

g/dr

ug p

air,

for e

ach

of la

st 1

0 ye

ars,

you

have

:(a

) Num

ber o

f pat

ient

s who

had

bug

s cul

ture

d(b

) Num

ber o

f cul

ture

s tha

t cam

e ba

ck “

susc

eptib

le”

•R

esis

tanc

e go

es u

p ov

er ti

me.

But

you

won

der:

-Are

thin

gs g

ettin

g w

orse

fast

er a

t my

hosp

ital c

ompa

red

to th

e su

rrou

ndin

g ho

spita

ls?

-Are

thin

gs g

ettin

g w

orse

fast

er in

my

imm

edia

te a

rea

com

-pa

red

to th

e re

gion

?

Page 16: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

16

1st P

rob:

Spa

tial A

nom

aly

Det

ectio

n•

Imag

ine

you

are

a H

E•

Hav

e re

sist

ance

dat

a fo

r you

r hos

pita

l, hu

ndre

ds o

f oth

ers

-For

eac

h bu

g/dr

ug p

air,

for e

ach

of la

st 1

0 ye

ars,

you

have

:(a

) Num

ber o

f pat

ient

s who

had

bug

s cul

ture

d(b

) Num

ber o

f cul

ture

s tha

t cam

e ba

ck “

susc

eptib

le”

•R

esis

tanc

e go

es u

p ov

er ti

me.

But

you

won

der:

-Are

thin

gs g

ettin

g w

orse

fast

er a

t my

hosp

ital c

ompa

red

to th

e su

rrou

ndin

g ho

spita

ls?

-Are

thin

gs g

ettin

g w

orse

fast

er in

my

imm

edia

te a

rea

com

-pa

red

to th

e re

gion

?

us

Com

bina

toria

lpr

oble

m!

Page 17: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

17

1st P

rob:

Spa

tial A

nom

aly

Det

ectio

n•

Imag

ine

you

are

a H

E•

Hav

e re

sist

ance

dat

a fo

r you

r hos

pita

l, hu

ndre

ds o

f oth

ers

-For

eac

h bu

g/dr

ug p

air,

for e

ach

of la

st 1

0 ye

ars,

you

have

:(a

) Num

ber o

f pat

ient

s who

had

bug

s cul

ture

d(b

) Num

ber o

f cul

ture

s tha

t cam

e ba

ck “

susc

eptib

le”

•R

esis

tanc

e go

es u

p ov

er ti

me.

But

you

won

der:

-Are

thin

gs g

ettin

g w

orse

fast

er a

t my

hosp

ital c

ompa

red

to th

e su

rrou

ndin

g ho

spita

ls?

-Are

thin

gs g

ettin

g w

orse

fast

er in

my

imm

edia

te a

rea

com

-pa

red

to th

e re

gion

?

Page 18: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

18

1st P

rob:

Spa

tial A

nom

aly

Det

ectio

n•

Imag

ine

you

are

a H

E•

Hav

e re

sist

ance

dat

a fo

r you

r hos

pita

l, hu

ndre

ds o

f oth

ers

-For

eac

h bu

g/dr

ug p

air,

for e

ach

of la

st 1

0 ye

ars,

you

have

:(a

) Num

ber o

f pat

ient

s who

had

bug

s cul

ture

d(b

) Num

ber o

f cul

ture

s tha

t cam

e ba

ck “

susc

eptib

le”

•R

esis

tanc

e go

es u

p ov

er ti

me.

But

you

won

der:

-Are

thin

gs g

ettin

g w

orse

fast

er a

t my

hosp

ital c

ompa

red

to th

e su

rrou

ndin

g ho

spita

ls?

-Are

thin

gs g

ettin

g w

orse

fast

er in

my

imm

edia

te a

rea

com

-pa

red

to th

e re

gion

?

Page 19: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

19

1st P

rob:

Spa

tial A

nom

aly

Det

ectio

n•

Imag

ine

you

are

a H

E•

Hav

e re

sist

ance

dat

a fo

r you

r hos

pita

l, hu

ndre

ds o

f oth

ers

-For

eac

h bu

g/dr

ug p

air,

for e

ach

of la

st 1

0 ye

ars,

you

have

:(a

) Num

ber o

f pat

ient

s who

had

bug

s cul

ture

d(b

) Num

ber o

f cul

ture

s tha

t cam

e ba

ck “

susc

eptib

le”

•R

esis

tanc

e go

es u

p ov

er ti

me.

But

you

won

der:

-Are

thin

gs g

ettin

g w

orse

fast

er a

t my

hosp

ital c

ompa

red

to th

e su

rrou

ndin

g ho

spita

ls?

-Are

thin

gs g

ettin

g w

orse

fast

er in

my

imm

edia

te a

rea

com

-pa

red

to th

e re

gion

?

Page 20: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

20

Spat

ial A

nom

aly

Det

ectio

n•

Long

-stu

died

pro

blem

in st

atis

tics,

epid

emio

logy

, dat

a m

inin

g•

Usu

al c

onte

xt is

find

ing

spat

ial “

hot s

pots

” (d

isea

se, s

ales

, etc

.)•

Cla

ssic

exa

mpl

e: 1

854

Bro

ad S

treet

cho

lera

out

brea

k

Page 21: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

21

Spat

ial A

nom

aly

Det

ectio

n•

Obv

ious

ly, t

here

is a

lot m

ore

mat

h th

ese

days

in S

AD

•A

nd th

ere’

s a lo

t mor

e da

ta, t

oo -

plot

s by

hand

are

not

pra

ctic

al!

•B

ut th

e ba

sic

idea

is u

ncha

nged

...

Page 22: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

22

Spat

ial A

nom

aly

Det

ectio

n•

Und

erly

ing

stat

istic

al m

odel

is ty

pica

lly q

uite

sim

ple

Hav

e so

me

spat

ial r

egio

n R

R

Page 23: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

23

Spat

ial A

nom

aly

Det

ectio

n•

Und

erly

ing

stat

istic

al m

odel

is ty

pica

lly q

uite

sim

ple

Div

ided

into

subr

egio

ns th

at a

re

smal

l eno

ugh

that

insi

de-r

egio

n va

riatio

n is

not

inte

rest

ing

R

Page 24: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

24

Spat

ial A

nom

aly

Det

ectio

n•

Und

erly

ing

stat

istic

al m

odel

is ty

pica

lly q

uite

sim

ple

BTW

: Poi

sson

wid

ely

used

her

e be

caus

e it

give

s you

(und

er c

erta

in a

ssum

ptio

ns) t

he p

roba

bilit

y th

at w

ith a

ver

y la

rge

popu

latio

n, y

ou’d

see

n ev

ents

in a

tim

e in

terv

al (f

or e

xam

ple,

sick

peo

ple)

, giv

en th

at h

isto

rical

ly th

ere

wer

e la

mbd

a ev

ents

exp

ecte

d

Ass

ume

data

in e

ach

cell

are

gen-

erat

ed u

sing

sim

ple

mod

el (e

x:

Pois

son)

with

kno

wn

para

ms

R λ15

=

λ11

19=

λ12

=

λ22

=

λ9

5=

λ3

=

Page 25: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

25

Spat

ial A

nom

aly

Det

ectio

n•

Und

erly

ing

stat

istic

al m

odel

is ty

pica

lly q

uite

sim

ple

Then

whe

n yo

u ob

serv

e th

e ac

tual

da

ta...

R λ15

=

λ11

19=

λ12

22=

λ9

=n

35=

λ3

=n

14=

n12

=

n22

=n

13=

n20

=

n13

=

Page 26: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

26

Spat

ial A

nom

aly

Det

ectio

n•

Und

erly

ing

stat

istic

al m

odel

is ty

pica

lly q

uite

sim

ple

...yo

u try

to fi

nd a

con

t., re

ason

-ab

ly-s

hape

d re

gion

whe

re o

bs.

data

are

ext

rem

ely

unlik

ely

R λ15

=

λ11

19=

λ12

22=

λ9

=n

35=

λ3

=n

14=

n12

=

n22

=n

13=

n20

=

n13

=

Page 27: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

27

Spat

ial A

nom

aly

Det

ectio

n•

Mos

t rel

ated

wor

k in

the

stat

istic

al li

tera

ture

beg

ins w

ith th

e so

-ca

lled

“spa

tial s

can

stat

istic

” (P

oiss

on m

odel

)...

-Ide

a is

to se

arch

all

poss

ible

con

tiguo

us re

gion

s of a

cer

tain

sh

ape

(usu

ally

a c

ircle

, squ

are,

rect

angl

e)-F

ind

top

k th

at re

ject

a P

oiss

on-b

ased

LRT

-May

be u

se si

mul

atio

n to

dea

l with

MH

T-R

etur

n th

ose

regi

ons t

hat s

urvi

ve to

the

user

Page 28: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

28

Mor

e C

ompl

icat

ed M

odel

s•

But

wha

t if y

our p

robl

em is

mor

e co

mpl

icat

ed?

•O

ur m

otiv

atin

g pr

oble

m:

-Ano

mal

ies i

n no

soco

mia

l ant

imic

robi

al re

sist

ance

tren

ds-W

e kn

ow tr

end

is g

ener

ally

upw

ard

-But

is th

e up

war

d tre

nd u

nifo

rm, o

r is t

here

spat

ial v

aria

tion?

Page 29: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

29

Mor

e C

ompl

icat

ed M

odel

s•

But

wha

t if y

our p

robl

em is

mor

e co

mpl

icat

ed?

•O

ur m

otiv

atin

g pr

oble

m:

-Ano

mal

ies i

n no

soco

mia

l ant

imic

robi

al re

sist

ance

tren

ds-W

e kn

ow tr

end

is g

ener

ally

upw

ard

-But

is th

e up

war

d tre

nd u

nifo

rm, o

r is t

here

spat

ial v

aria

tion?

•A

reas

onab

le m

odel

for a

hos

pita

l’s re

sist

ance

tren

d:

01

Resistancerate

At s

tart

of ti

me,

resi

stan

ce

prob

abili

ty fo

r a b

ug in

an

arbi

trary

pat

ient

is p

0

p 0

time

Page 30: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

30

Mor

e C

ompl

icat

ed M

odel

s•

But

wha

t if y

our p

robl

em is

mor

e co

mpl

icat

ed?

•O

ur m

otiv

atin

g ex

ampl

e:-A

nom

alie

s in

noso

com

ial a

ntim

icro

bial

resi

stan

ce tr

ends

-Due

to (m

is-)

use

of a

ntim

icro

bial

s, bu

gs d

evel

op re

sist

ance

-But

is th

e up

war

d tre

nd u

nifo

rm, o

r is t

here

spat

ial v

aria

tion?

•A

reas

onab

le m

odel

for a

hos

pita

l’s re

sist

ance

tren

d:

01

Resistancerate

Each

yea

r, th

ere

is a

ch

ange

in

resi

stan

ce ra

te

ΔΔ

1 ye

ar

Page 31: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

31

Mor

e C

ompl

icat

ed M

odel

s•

But

wha

t if y

our p

robl

em is

mor

e co

mpl

icat

ed?

•O

ur m

otiv

atin

g ex

ampl

e:-A

nom

alie

s in

noso

com

ial a

ntim

icro

bial

resi

stan

ce tr

ends

-Due

to (m

is-)

use

of a

ntim

icro

bial

s, bu

gs d

evel

op re

sist

ance

-But

is th

e up

war

d tre

nd u

nifo

rm, o

r is t

here

spat

ial v

aria

tion?

•A

reas

onab

le m

odel

for a

hos

pita

l’s re

sist

ance

tren

d:

01

Resistancerate

An

infe

cted

pat

ient

at t

ime

t has

a re

sist

ant b

ug if

Ber

-no

ulli

trial

with

pro

be

is tr

uep

p 0t

t 0–

()Δ

+=

t

p

Page 32: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

32

Mor

e C

ompl

icat

ed M

odel

s•

But

wha

t if y

our p

robl

em is

mor

e co

mpl

icat

ed?

•O

ur m

otiv

atin

g ex

ampl

e:-A

nom

alie

s in

noso

com

ial a

ntim

icro

bial

resi

stan

ce tr

ends

-Due

to (m

is-)

use

of a

ntim

icro

bial

s, bu

gs d

evel

op re

sist

ance

-But

is th

e up

war

d tre

nd u

nifo

rm, o

r is t

here

spat

ial v

aria

tion?

•A

reas

onab

le m

odel

for a

hos

pita

l’s re

sist

ance

tren

d:

01

Resistancerate

An

infe

cted

pat

ient

at t

ime

t has

a re

sist

ant b

ug if

Ber

-no

ulli

trial

with

pro

b is tr

uep

p 0t

t 0–

()Δ

+=

t

p

*Cou

ld im

agin

e ev

en m

ore

real

istic

mod

els:

allo

w fo

r cor

rela

tion

acro

ss tr

ials

(pat

ient

s),

go lo

gist

ic ra

ther

than

line

ar...

the

poin

t is t

hat a

sim

ple

Pois

son

not e

noug

h!

Page 33: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

33

Find

ing

a R

egio

n of

Uni

que

Tren

ds•

Giv

en th

is, w

e ha

ve m

any

hosp

itals

in a

larg

e sp

atia

l are

a

Page 34: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

34

Find

ing

a R

egio

n of

Uni

que

Tren

ds•

Each

has

its o

wn

set o

f dat

a as

soci

ated

with

it

Page 35: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

35

Find

ing

a R

egio

n of

Uni

que

Tren

ds•

Dat

a ar

e us

ed to

lear

n a

loca

l val

ue fo

r Δ

Page 36: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

36

Find

ing

a R

egio

n of

Uni

que

Tren

ds•

Then

we

find

a lo

cal r

egio

n w

ith a

n ab

norm

al

-Thi

s res

ult i

ndic

ates

that

yes

, thi

ngs a

re w

orse

loca

lly th

en

they

are

in th

e w

ider

are

a

Δ

Page 37: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

37

Two

Key

Pro

blem

s to

Add

ress

Giv

es ri

se to

two

key

ques

tions

:1.

How

doe

s one

bui

ld a

gen

eric

softw

are

that

allo

ws s

patia

l ano

m-

aly

sear

ch w

ith v

irtua

lly a

ny u

ser-s

peci

fied

stat

istic

al m

odel

?-T

hat w

ay, c

an e

asily

giv

e a

who

le su

ite o

f mod

els t

o H

E-I

s the

re a

prin

cipl

ed w

ay to

def

ine

the

gene

ral d

etec

tion

prob

-le

m?

-How

can

a u

ser s

peci

fy/c

ode

his/

her s

peci

fic m

odel

?2.

How

doe

s one

ens

ure

that

the

sear

ch is

reas

onab

ly fa

st?

-We

are

com

pute

r sci

entis

ts, a

fter a

ll

Page 38: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

38

Bui

ldin

g a

Gen

eral

Pur

pose

Sof

twar

e -

Pro

blem

Def

initi

on•

Act

ually

qui

te e

asy

to c

ome

up w

ith a

n ap

prop

riate

pro

blem

def

i-ni

tion,

bas

ed o

n a

gene

ric L

RT•

LRT: -G

iven

like

lihoo

d fu

nc.

L

θX

()

Thin

k of

L a

s def

inin

g yo

ur m

odel

...Ta

kes i

n a

set o

f par

amet

ers (

e.g.

slop

e, in

terc

ept).

..R

etur

ns th

e lik

elih

ood

they

pro

duce

d th

e da

ta se

t X...

Para

ms m

atch

dat

a cl

osel

y? T

hen

L re

turn

s lar

ge v

al

Page 39: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

39

Bui

ldin

g a

Gen

eral

Pur

pose

Sof

twar

e -

Pro

blem

Def

initi

on•

Act

ually

qui

te e

asy

to c

ome

up w

ith a

n ap

prop

riate

pro

blem

def

i-ni

tion,

bas

ed o

n a

gene

ric L

RT•

LRT: -G

iven

like

lihoo

d fu

nc.

-Let

b

e th

e fu

ll pa

ram

eter

spac

e,

the

“nul

l” o

r uni

nter

est-

ing

part

of th

e pa

ram

eter

spac

e (e

.g.,

all

s are

iden

tical

)

X(

)

ΘΘ

0

ΔTh

at is

, the

ta is

cho

sen

from

the

(infin

ite) s

et

Wan

t to

see

if th

eta

is m

ore

likel

y in

or o

ut o

f If

in, t

hen

noth

ing

to se

e he

re!

Θ Θ0

Page 40: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

40

Bui

ldin

g a

Gen

eral

Pur

pose

Sof

twar

e -

Pro

blem

Def

initi

on•

Act

ually

qui

te e

asy

to c

ome

up w

ith a

n ap

prop

riate

pro

blem

def

i-ni

tion,

bas

ed o

n a

gene

ric L

RT•

LRT: -G

iven

like

lihoo

d fu

nc.

-Let

b

e th

e fu

ll pa

ram

eter

spac

e,

the

“nul

l” o

r uni

nter

est-

ing

part

of th

e pa

ram

eter

spac

e (e

.g.,

all

s are

iden

tical

)-W

ant t

o co

mpa

re

vs.

X(

)

ΘΘ

0

ΔH

0:θ

Θ0

∈H

a:θ

ΘΘ

0–

Page 41: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

41

Bui

ldin

g a

Gen

eral

Pur

pose

Sof

twar

e -

Pro

blem

Def

initi

on•

Act

ually

qui

te e

asy

to c

ome

up w

ith a

n ap

prop

riate

pro

blem

def

i-ni

tion,

bas

ed o

n a

gene

ric L

RT•

LRT: -G

iven

like

lihoo

d fu

nc.

-Let

b

e th

e fu

ll pa

ram

eter

spac

e,

the

“nul

l” o

r uni

nter

est-

ing

part

of th

e pa

ram

eter

spac

e (e

.g.,

all

s are

iden

tical

)-W

ant t

o co

mpa

re

vs.

-Can

use

-Cla

ssic

resu

lt; u

nder

,

is c

hi-s

quar

ed

X(

)

ΘΘ

0

ΔH

0:θ

Θ0

∈H

a:θ

ΘΘ

0–

ΛX(

)2

sup θ

Θ0

∈ L

θX

()

sup θ

Θ∈

X(

)----

--------

--------

--------

--------

--lo

g–

=

H0

ΛX(

)

Page 42: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

42

Bui

ldin

g a

Gen

eral

Pur

pose

Sof

twar

e -

Pro

blem

Def

initi

on•

Then

, lay

out

dat

a in

a g

rid:

Page 43: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

43

Bui

ldin

g a

Gen

eral

Pur

pose

Sof

twar

e -

Pro

blem

Def

initi

on•

Then

, lay

out

dat

a in

a g

rid:

•O

ut o

f all

regi

ons c

onta

inin

g ho

spita

l, fin

d th

e k

that

hav

e th

e gr

eate

st

val

ues

•C

an u

se c

hi-s

quar

ed d

ist t

o de

term

ine

sign

ifica

nce,

or e

lse

do

sim

ulat

ion

k =

3

ΛX(

)

Page 44: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

44

Bui

ldin

g a

Gen

eral

Pur

pose

Sof

twar

e -

Pro

blem

Def

initi

on•

Then

, lay

out

dat

a in

a g

rid:

•O

ut o

f all

regi

ons c

onta

inin

g ho

spita

l, fin

d th

e k

that

hav

e th

e gr

eate

st

val

ues

•C

an u

se c

hi-s

quar

ed d

ist t

o de

term

ine

sign

ifica

nce,

or e

lse

do

sim

ulat

ion

k =

3

ΛX(

)

Page 45: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

45

Bui

ldin

g a

Gen

eral

Pur

pose

Sof

twar

e -

Pro

blem

Def

initi

on•

Then

, lay

out

dat

a in

a g

rid:

•O

ut o

f all

regi

ons c

onta

inin

g ho

spita

l, fin

d th

e k

that

hav

e th

e gr

eate

st

val

ues

•C

an u

se c

hi-s

quar

ed d

ist t

o de

term

ine

sign

ifica

nce,

or e

lse

do

sim

ulat

ion

k =

3

ΛX(

)

Page 46: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

46

“Tem

plat

izin

g” th

e S

oftw

are

•Th

at’s

fine

, but

we

wan

t to

build

a so

ftwar

e th

at m

akes

it e

asy

to

impl

emen

t suc

h a

test

for a

ny p

artic

ular

like

lihoo

d fu

nctio

n•

So th

at’s

wha

t we

did!

Page 47: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

47

“Tem

plat

izin

g” th

e S

oftw

are

•To

app

ly o

ur so

ftwar

e, u

ser n

eed

only

dev

elop

an

appr

opria

te

gene

rativ

e st

atis

tical

mod

el (b

inom

ial w

ith ti

me-

depe

nden

t p in

ou

r exa

mpl

e), l

oad

up th

e da

ta, c

hoos

e a

few

opt

ions

, and

pre

ss

<ret

urn>

•W

e ca

n pr

ovid

e a

libra

ry o

f com

mon

mod

els

Page 48: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

48

Key

Impl

emen

tatio

n C

halle

nge

- Spe

ed

•D

epen

ding

on

exac

t pro

blem

, are

~n4 re

ctan

gles

to c

ompu

te

for

•W

hat’s

a re

alis

tic b

ut re

lativ

ely

larg

e va

lue

of n

? M

aybe

100

?

•G

rant

ed, w

hen

I’ve

got

Am

azon

’s c

loud

, 100

4 is n

ot th

at b

ig•

But

nai

ve im

plem

enta

tion

invo

kes t

wo

MLE

s for

eac

h re

gion

•M

LEs o

ften

need

non

-line

ar o

ptim

izat

ion;

at t

en se

cond

s eac

h,

that

’s st

ill 3

2 ye

ars o

f com

pute

tim

e - b

ig e

ven

by c

loud

stan

dard

s•

Our

softw

are

uses

pre

-com

puta

tion

plus

a se

t of t

ricks

to u

pper

-bo

und

with

out e

ver c

ompu

ting

an M

LE

ΛX A(

) ΛX(

)

Page 49: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

49

How

Wel

l Doe

s it

Wor

k?•

We

have

app

lied

this

to m

any

prob

lem

s•

In g

ener

al, s

peed

up is

in th

e ra

nge

of 5

tim

es to

100

tim

es•

On

the

antim

icro

bial

resi

stan

ce e

xam

ple

mod

el (3

00+

hosp

itals

):

Grid

size

Our

tim

ePr

unin

g ra

teN

aive

tim

eSp

eedu

p16

x 1

60.

15 d

ays

96.5

%2.

6 da

ys17

.332

x 3

21.

1 da

ys97

.6%

36 d

ays

31.8

64 x

64

11.9

day

s98

.0%

544

days

45.7

Page 50: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

50

2nd

Pro

b: D

escr

ibin

g R

esis

tanc

e Tr

ends

•Wan

t to

allo

w H

E to

und

erst

and

the

trend

in h

is/h

er h

ospi

tal

•Key

dat

a: m

icro

biol

ogy

lab

repo

rtpa

tient

: Chr

is J

erm

aine

bug:

e. C

oli

date

: Aug

ust 2

2, 2

010

•U

sual

repo

rt: ta

ble

w. b

ug v

s. dr

ug, g

ive

perc

enta

ge in

eac

h ce

ll-D

oes n

ot sh

ow tr

end

over

tim

e (O

K, c

an fi

x th

at)

Dru

g 1

Dru

g 2

Dru

g 3

Dru

g 4

Dru

g 5

Sus

cept

ible

Sus

cept

ible

Res

ista

ntR

esis

tant

Und

eter

min

ed

Page 51: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

51

2nd

Pro

b: D

escr

ibin

g R

esis

tanc

e Tr

ends

•B

igge

r iss

ue: d

oes n

ot a

ccur

atel

y de

scib

e co

rrel

atio

ns...

-Sim

ple

tabl

e ca

nnot

des

crib

e th

e di

ffere

nce

betw

een

this

-And

this

-Bot

h ha

ve 5

0% re

sist

ance

for e

very

thin

g!

Dru

g 1

Dru

g 2

Dru

g 3

Dru

g 4

Dru

g 5

Sus

cept

ible

Sus

cept

ible

Res

ista

ntR

esis

tant

Sus

cept

ible

Dru

g 1

Dru

g 2

Dru

g 3

Dru

g 4

Dru

g 5

Res

ista

ntR

esis

tant

Sus

cept

ible

Sus

cept

ible

Res

ista

nt

Dru

g 1

Dru

g 2

Dru

g 3

Dru

g 4

Dru

g 5

Sus

cept

ible

Sus

cept

ible

Sus

cept

ible

Sus

cept

ible

Sus

cept

ible

Dru

g 1

Dru

g 2

Dru

g 3

Dru

g 4

Dru

g 5

Res

ista

ntR

esis

tant

Res

ista

ntR

esis

tant

Res

ista

nt

Page 52: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

52

Mix

ture

Mod

els

•Ins

tead

of t

abul

ar re

p, u

se a

“M

M”

(clu

ster

ed re

pres

enta

tion)

•MM

are

ubi

quito

us in

stat

s, da

ta m

inin

g, m

achi

ne le

arni

ng...

•Sta

tistic

al m

odel

com

pose

d of

mul

tiple

(sim

ple)

com

pone

nts

–Cla

ssic

exa

mpl

e: G

auss

ian

Mix

ture

Mod

el

•MM

imag

ines

follo

win

g, g

ener

ativ

e pr

o-ce

ss fo

r eac

h da

ta p

oint

:(1

) Rol

l a b

iase

d di

e to

sele

ct c

ompo

nent

(die

det

erm

ines

so-c

alle

d “m

ixin

g pr

op.”

)(2

) Use

sele

cted

com

pone

nt to

gen

erat

e po

int

•Pop

ular

for s

ever

al re

ason

s:–S

impl

e di

strib

utio

ns a

s bui

ldin

g bl

ock

mea

ns le

arne

d m

odel

is e

asy

to u

nder

stan

d–W

ith e

noug

h co

mpo

nent

s, ca

n m

odel

non

-par

amet

ric d

ata

of a

rbitr

ary

com

plex

ity–M

odel

pro

vide

s a n

atur

al se

gmen

tatio

n/gr

oupi

ng o

f dat

a

Page 53: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

53

Mix

ture

Mod

els

•Var

iant

on

mul

tinom

ial m

ixtu

re m

odel

mig

ht g

ive

you:

Bug

Cla

ss ID

Pr (

freq)

Dru

g 1

Dru

g 2

Dru

g 3

Dru

g 4

Dru

g 5

10.

8R

: 0.2

S

: 0.6

U

: 0.2

R: 0

.1S

: 0.9

U

: 0.0

R: 0

.0

S: 1

.0U

: 0.0

R: 0

.1

S: 0

.9

U: 0

.0

R: 0

.2

S: 0

.7U

: 0.1

20.

15R

: 0.6

S

: 0.2

U

: 0.2

R: 0

.1S

: 0.9

U

: 0.0

R: 0

.0

S: 1

.0U

: 0.0

R: 0

.3

S: 0

.6

U: 0

.1

R: 0

.7

S: 0

.2U

: 0.1

30.

05R

: 1.0

S

: 0.0

U

: 0.0

R: 1

.0S

: 0.0

U

: 0.0

R: 0

.7

S: 0

.3U

: 0.0

R: 0

.5

S: 0

.4

U: 0

.1

R: 0

.8

S: 0

.1U

: 0.1

Page 54: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

54

MM

+ E

volv

ing

Mix

ing

Pro

porti

ons

•Wha

t if w

e co

uld

give

the

epi.

som

ethi

ng li

ke th

is?

•Sho

ws a

cle

ar d

ecre

ase

in “

good

” su

bpop

ulat

ion

over

tim

e–A

lso

show

s sig

nific

ant i

ncre

ase

in w

orris

ome

“cla

ss 3

•Far

mor

e us

eful

than

the

stat

ic p

ictu

re v

ia c

lass

ical

MM

Bug

Cla

ss ID

Dru

g 1

Dru

g 2

Dru

g 3

Dru

g 4

Dru

g 5

1R

: 0.2

S

: 0.6

U

: 0.2

R: 0

.1S

: 0.9

U

: 0.0

R: 0

.0

S: 1

.0U

: 0.0

R: 0

.1

S: 0

.9

U: 0

.0

R: 0

.2

S: 0

.7U

: 0.1

2R

: 0.6

S

: 0.2

U

: 0.2

R: 0

.1S

: 0.9

U

: 0.0

R: 0

.0

S: 1

.0U

: 0.0

R: 0

.3

S: 0

.6

U: 0

.1

R: 0

.7

S: 0

.2U

: 0.1

3R

: 1.0

S

: 0.0

U

: 0.0

R: 0

.1S

: 0.9

U

: 0.0

R: 0

.0

S: 1

.0U

: 0.0

R: 0

.5

S: 0

.4

U: 0

.1

R: 0

.8

S: 0

.1U

: 0.1

Pr =

0

Pr =

1

clas

s 1cl

ass 2

clas

s 3

Year

= ‘0

0Ye

ar =

‘08

Freq

.

Tim

e

Page 55: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

55

MM

+ E

volv

ing

Mix

ing

Pro

porti

ons

•To

obta

in su

ch a

mod

el, i

mag

ine

a ne

w g

ener

ativ

e pr

oces

s for

eac

h da

ta p

oint

:(1

) Che

ck th

e “c

lock

” to

det

erm

ine

time

t whe

n th

e da

ta p

oint

is to

be

gene

rate

d

(2) C

ompu

te

to o

btai

n m

ixin

g pr

opor

tions

at t

ime

t

(3) R

oll a

n ap

prop

riate

ly b

iase

d di

e to

sele

ct c

ompo

nent

(4) U

se se

lect

ed c

ompo

nent

to g

ener

ate

poin

t

•PD

F is

πt()

bt

t b–

t et b

–----

--------

--e

b–

()

×+

=

fy

Θt,

()

π it()

f iy

θ i(

)ik

∑=

Page 56: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

56

Why

Lin

ear E

volu

tion?

•Tw

o m

ain

reas

ons

•Firs

t, pe

ople

(use

rs) l

ove

linea

r mod

els

–Eas

y to

und

erst

and,

vis

ualiz

e, e

spec

ially

for n

on-m

ath

peop

le–L

et’s

face

it: l

inea

r reg

ress

ion

is b

y FA

R th

e m

ost p

opul

ar “

data

min

ing

algo

rithm

”!

•Sec

ond,

no

need

to n

orm

aliz

e w

eigh

ts u

nder

line

ar m

odel

–Kno

w th

at

is a

lway

s one

for a

ny t

–But

use

a d

iffer

ent f

unct

ion

(e.g

., al

low

ing

for a

“pe

ak”)

and

you

may

nee

d to

nor

m–C

an d

isto

rt or

igin

al fu

nctio

n an

d pr

ovid

e fo

r ver

y st

rang

e re

sults

π it()

i1

=k∑

Page 57: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

57

No

Rea

son

To E

volv

e O

nly

Mix

ing

•In

fact

, AN

Y li

near

ly in

terp

rabl

e pa

ram

can

be

hand

led

in th

is w

ay

–Mea

ns, c

ovar

ianc

es, h

yper

-par

amet

ers,

etc.

•“Li

near

ly in

terp

rabl

e” m

eans

that

if

and

a

re v

alid

par

amet

er

valu

es, t

hen

is a

lway

s val

id

•Exa

mpl

e: li

near

com

bina

tions

of t

wo

cova

rianc

e mat

rices

(mus

t be

posi

tive,

sem

i-def

inite

) are

als

o po

sitiv

e, se

mi-d

efin

ite

θ bθ e

θθ b

tt b

–t e

t b–

--------

-----θ e

θ b–

()

×+

=

Page 58: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

58

Can

Eve

n A

llow

Mul

tiple

Cut

s...

allo

ws m

ore

deta

il in

tren

d in

form

atio

n (a

t cos

t of e

xtra

par

ams)

Pr =

0

Pr =

1

clas

s 1cl

ass 2

clas

s 3

Year

= ‘0

0Ye

ar =

‘08

Freq

.

Tim

eP

r = 0

Pr =

1

clas

s 1cl

ass 2

clas

s 3

Year

= ‘0

0Ye

ar =

‘08

Freq

.

Tim

e

Page 59: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

59

Gen

erat

ive

Bay

esia

n M

M•W

e as

sum

e th

e fo

llow

ing

gene

rativ

e pr

oces

s for

a d

ata

poin

t:

– p

aram

eter

izes

the

Inve

rse

Gam

ma

and

is u

ser-s

uppl

ed

– p

aram

eter

izes

a D

irich

let,

whi

ch p

rodu

ces t

he m

ixin

g pa

ram

s at t

he b

egin

ning

an

d en

d of

tim

e

–, w

hen

then

incl

udes

as

wel

l as t

he m

ixtu

re c

ompo

nent

par

ams,

is th

en a

ra

ndom

var

iabl

e

α α

η b η e

� b �e

�πt

zy

IGR

IGR

Diri

chle

t

Diri

chle

t

Mul

tinom

ial

f z

α η Θη b

η eb

e,

,,

Page 60: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

60

Gen

erat

ive

Bay

esia

n M

M•W

e as

sum

e th

e fo

llow

ing

gene

rativ

e pr

oces

s for

a d

ata

poin

t:

– p

aram

eter

izes

the

Inve

rse

Gam

ma

and

is u

ser-s

uppl

ed

– p

aram

eter

izes

a D

irich

let,

whi

ch p

rodu

ces t

he m

ixin

g pa

ram

s at t

he b

egin

ning

an

d en

d of

tim

e

–, w

hen

then

incl

udes

as

wel

l as t

he m

ixtu

re c

ompo

nent

par

ams,

is th

en a

ra

ndom

var

iabl

e

α α

η b η e

� b �e

�πt

zy

IGR

IGR

Diri

chle

t

Diri

chle

t

Mul

tinom

ial

f z

α η Θη b

η eb

e,

,,

In m

ulti-

cut c

ase,

can

add

man

y of

thes

e, p

ositi

ons

dete

rmin

ed b

yD

irich

let

Page 61: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

61

Gen

erat

ive

Bay

esia

n M

M•W

e as

sum

e th

e fo

llow

ing

gene

rativ

e pr

oces

s for

a d

ata

poin

t:

– p

aram

eter

izes

the

Inve

rse

Gam

ma

and

is u

ser-s

uppl

ed

– p

aram

eter

izes

a D

irich

let,

whi

ch p

rodu

ces t

he m

ixin

g pa

ram

s at t

he b

egin

ning

an

d en

d of

tim

e

–, w

hen

then

incl

udes

as

wel

l as t

he m

ixtu

re c

ompo

nent

par

ams,

is th

en a

ra

ndom

var

iabl

e

α α

η b η e

� b �e

�πt

zy

IGR

IGR

Diri

chle

t

Diri

chle

t

Mul

tinom

ial

f z

α η Θη b

η eb

e,

,,

Can

hav

ea

sim

ilar

proc

ess f

oran

y ot

her

para

met

er

Page 62: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

62

Gen

erat

ive

Bay

esia

n M

M•W

e as

sum

e th

e fo

llow

ing

gene

rativ

e pr

oces

s for

a d

ata

poin

t:

•The

n c

an b

e “l

earn

ed”

via

a G

ibbs

sam

pler

–Gib

bs sa

mpl

er d

raw

s sam

ples

from

the

“pos

terio

r dis

tribu

tion”

for

–Tha

t is,

sam

ples

from

w

here

–Tec

hnic

ally

(som

ewha

t) ch

alle

ngin

g: u

se fa

ct D

irich

let r

elie

s on

Gam

ma

α α

η b η e

� b �e

�πt

zy

IGR

IGR

Diri

chle

t

Diri

chle

t

Mul

tinom

ial

f z

Θ

ΘF

ΘX

()

Xz 1

t 1,

()

z 2t 2

,(

)…

,,

{}

=

Page 63: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

63

Is T

his

Use

ful?

•Exa

mpl

e ap

p: a

pplie

d th

e m

odel

/lear

ner t

o ho

spita

l dat

a–A

roun

d 10

K d

iffer

ent l

ab re

sults

for E

. col

i; 27

ant

imic

robi

als

–Fou

r yea

rs o

f dat

a–F

rom

seve

ral d

iffer

ent h

ospi

tals

–Lea

rned

five

clu

ster

s

Page 64: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

64

Is T

his

Use

ful?

•Exa

mpl

e ap

p: a

pplie

d th

e m

odel

/lear

ner t

o re

al-li

fe h

ospi

tal d

ata

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 1

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 2

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 3

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 4

00.

51pa

ttern

5

Amp/SulbactamAmox/K Clav

ImipenemCefepime

Ticar/K ClavMeropenem

AmikacinTobramycin

CefazolinCefotetan

LevofloxacinCeftriaxone

CiprofloxacinMoxifloxacinCefotaximeCeftazidimeCefuroxime

NitrofurantoinPip/Tazo

GentamicinAztreonam

CefoxitinErtapenem

AmpicillinTrimeth/SulfaTetracyclineCephalothin

0

0.2

0.4

0.6

year

clus

ter

2

clus

ter

4

clus

ter

1cl

uste

r 3

clus

ter

5

Page 65: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

65

Is T

his

Use

ful?

•Exa

mpl

e ap

p: a

pplie

d th

e m

odel

/lear

ner t

o re

al-li

fe h

ospi

tal d

ata

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 1

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 2

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 3

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 4

00.

51pa

ttern

5

Amp/SulbactamAmox/K Clav

ImipenemCefepime

Ticar/K ClavMeropenem

AmikacinTobramycin

CefazolinCefotetan

LevofloxacinCeftriaxone

CiprofloxacinMoxifloxacinCefotaximeCeftazidimeCefuroxime

NitrofurantoinPip/Tazo

GentamicinAztreonam

CefoxitinErtapenem

AmpicillinTrimeth/SulfaTetracyclineCephalothin

0

0.2

0.4

0.6

year

clus

ter

2

clus

ter

4

clus

ter

1cl

uste

r 3

clus

ter

5

Obs

erva

tions

From

a c

linic

al st

and-

poin

t, re

ason

to su

spec

t “e

volu

tion”

from

clus

ter 2

to

4 to

1 to

3 to

5. W

hy?

Page 66: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

66

Is T

his

Use

ful?

•Exa

mpl

e ap

p: a

pplie

d th

e m

odel

/lear

ner t

o re

al-li

fe h

ospi

tal d

ata

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 1

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 2

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 3

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 4

00.

51pa

ttern

5

Amp/SulbactamAmox/K Clav

ImipenemCefepime

Ticar/K ClavMeropenem

AmikacinTobramycin

CefazolinCefotetan

LevofloxacinCeftriaxone

CiprofloxacinMoxifloxacinCefotaximeCeftazidimeCefuroxime

NitrofurantoinPip/Tazo

GentamicinAztreonam

CefoxitinErtapenem

AmpicillinTrimeth/SulfaTetracyclineCephalothin

0

0.2

0.4

0.6

year

clus

ter

2

clus

ter

4

clus

ter

1cl

uste

r 3

clus

ter

5

Obs

erva

tions

From

a c

linic

al st

and-

poin

t, re

ason

to su

spec

t “e

volu

tion”

from

clus

ter 2

to

4 to

1 to

3 to

5. W

hy?

Ear

ly g

ener

atio

n C

epha

-lo

spor

ins (

key

drug

s)

Page 67: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

67

Is T

his

Use

ful?

•Exa

mpl

e ap

p: a

pplie

d th

e m

odel

/lear

ner t

o re

al-li

fe h

ospi

tal d

ata

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 1

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 2

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 3

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 4

00.

51pa

ttern

5

Amp/SulbactamAmox/K Clav

ImipenemCefepime

Ticar/K ClavMeropenem

AmikacinTobramycin

CefazolinCefotetan

LevofloxacinCeftriaxone

CiprofloxacinMoxifloxacinCefotaximeCeftazidimeCefuroxime

NitrofurantoinPip/Tazo

GentamicinAztreonam

CefoxitinErtapenem

AmpicillinTrimeth/SulfaTetracyclineCephalothin

0

0.2

0.4

0.6

year

clus

ter

2

clus

ter

4

clus

ter

1cl

uste

r 3

clus

ter

5

Obs

erva

tions

From

a c

linic

al st

and-

poin

t, re

ason

to su

spec

t “e

volu

tion”

from

clus

ter 2

to

4 to

1 to

3 to

5. W

hy?

Adv

ance

d ge

nera

tion

Cep

halo

spor

ins (

resis

-ta

nce

is in

dica

tor

of

EB

SL)

Page 68: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

68

Is T

his

Use

ful?

•Exa

mpl

e ap

p: a

pplie

d th

e m

odel

/lear

ner t

o re

al-li

fe h

ospi

tal d

ata

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 1

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 2

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 3

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 4

00.

51pa

ttern

5

Amp/SulbactamAmox/K Clav

ImipenemCefepime

Ticar/K ClavMeropenem

AmikacinTobramycin

CefazolinCefotetan

LevofloxacinCeftriaxone

CiprofloxacinMoxifloxacinCefotaximeCeftazidimeCefuroxime

NitrofurantoinPip/Tazo

GentamicinAztreonam

CefoxitinErtapenem

AmpicillinTrimeth/SulfaTetracyclineCephalothin

0

0.2

0.4

0.6

year

clus

ter

2

clus

ter

4

clus

ter

1cl

uste

r 3

clus

ter

5

Obs

erva

tions

The

prev

alen

ce o

f clu

ster

2

+ 4

(“be

nign

” cl

uste

rs)

decr

ease

s sig

nific

antly

Page 69: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

69

Is T

his

Use

ful?

•Exa

mpl

e ap

p: a

pplie

d th

e m

odel

/lear

ner t

o re

al-li

fe h

ospi

tal d

ata

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 1

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 2

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 3

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 4

00.

51pa

ttern

5

Amp/SulbactamAmox/K Clav

ImipenemCefepime

Ticar/K ClavMeropenem

AmikacinTobramycin

CefazolinCefotetan

LevofloxacinCeftriaxone

CiprofloxacinMoxifloxacinCefotaximeCeftazidimeCefuroxime

NitrofurantoinPip/Tazo

GentamicinAztreonam

CefoxitinErtapenem

AmpicillinTrimeth/SulfaTetracyclineCephalothin

0

0.2

0.4

0.6

year

clus

ter

2

clus

ter

4

clus

ter

1cl

uste

r 3

clus

ter

5

Obs

erva

tions

On

the

othe

r han

d, p

reva

-le

nce

of c

lust

er 1

(the

“b

ridge

” ov

er to

EB

SL)

incr

ease

s ove

r tim

e

Page 70: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

70

Is T

his

Use

ful?

•Exa

mpl

e ap

p: a

pplie

d th

e m

odel

/lear

ner t

o re

al-li

fe h

ospi

tal d

ata

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 1

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 2

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 3

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 4

00.

51pa

ttern

5

Amp/SulbactamAmox/K Clav

ImipenemCefepime

Ticar/K ClavMeropenem

AmikacinTobramycin

CefazolinCefotetan

LevofloxacinCeftriaxone

CiprofloxacinMoxifloxacinCefotaximeCeftazidimeCefuroxime

NitrofurantoinPip/Tazo

GentamicinAztreonam

CefoxitinErtapenem

AmpicillinTrimeth/SulfaTetracyclineCephalothin

0

0.2

0.4

0.6

year

clus

ter

2

clus

ter

4

clus

ter

1cl

uste

r 3

clus

ter

5

Obs

erva

tions

Fortu

nate

ly, t

his i

s not

yet

le

d to

an

incr

ease

of t

he

“nas

ty”

bugs

in c

lust

ers 3

+

5, b

ut th

e ep

idem

iolo

-gi

st sh

ould

be

wor

ried?

?

Page 71: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

71

Is T

his

Use

ful?

•Exa

mpl

e ap

p: a

pplie

d th

e m

odel

/lear

ner t

o re

al-li

fe h

ospi

tal d

ata

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 1

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 2

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 3

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

270

0.51

patte

rn 4

00.

51pa

ttern

5

Amp/SulbactamAmox/K Clav

ImipenemCefepime

Ticar/K ClavMeropenem

AmikacinTobramycin

CefazolinCefotetan

LevofloxacinCeftriaxone

CiprofloxacinMoxifloxacinCefotaximeCeftazidimeCefuroxime

NitrofurantoinPip/Tazo

GentamicinAztreonam

CefoxitinErtapenem

AmpicillinTrimeth/SulfaTetracyclineCephalothin

0

0.2

0.4

0.6

year

clus

ter

2

clus

ter

4

clus

ter

1cl

uste

r 3

clus

ter

5

Obs

erva

tions

Als

o in

tere

stin

g th

at c

lus-

ter 1

is th

e bi

gges

t “m

over

”; sh

ows g

reat

est

resi

stan

ce to

Flu

oroq

uino

-lo

nes (

pow

erfu

l, bu

t now

w

idel

y ov

erus

ed a

ntim

i-cr

obia

ls)

Page 72: Chris Jermaine Computer Science Department Rice University ...cmj4.web.rice.edu/ForLydia.pdf · 1 Some Algorithms for Detecting Changes and Trends in Data... With Applications to

72

Con

clus

ions

•W

e ha

ve w

orke

d on

man

y ot

her p

robl

ems i

n th

is d

omai

n•

Ex: h

ow to

aut

omat

ical

ly d

etec

t cha

nge

of d

istri

butio

n?•

The

oppo

rtuni

ties f

or c

ompu

tatio

nally

-orie

nted

peo

ple

to h

elp

are

wid

e-ra

ngin

g•

In p

ract

ice:

the

bigg

est p

robl

em is

get

ting

the

data

•In

oth

er d

omai

ns, I

T pe

ople

acc

ept n

eed

for “

DW

” an

d “B

A”.

..•

But

thes

e ar

e st

ill to

ugh

sells

in h

ealth

care

dom

ain

•Se

lf-as

sess

men

t: ho

w d

id o

ur 4

-yea

r gra

nt g

o? C

-/D+

•M

aybe

tim

e w

ill c

hang

e th

is?

We

can

only

hop

e!