funnel: automatic mining of spatially coevolving epidemicsyasuko/publications/kdd14-funnel...1930...

Post on 27-Aug-2019

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1930 1940 1950 1960 1970 19800

5x 104

Year

Cou

nt

1930 1940 1950 1960 1970 1980

105

Year

Cou

nt (l

og)

0.1

0.2

0.3

0.4

0.5

February (2)

August (8)

March (3)

September (9)

April (4)

October (10)

May (5)

November (11)

June (6)

December (12)

July (7) January (1)

RubellaMeasles

Mumps

GonorrheaStreptococcal sore throat

ChickenpoxSmallpox

Lymedisease

TyphoidfeverCryptosporidiosis

Rocky mountain spotted fever

Typhus fever

Influenza

1930 1950 19700

5000

10000

Year

1930 1950 1970100

105

Year

OriginalI(t)

OriginalS(t)I(t)V(t)

2000 4000 60000

1000

2000

3000

Number of time−ticks (n)

Wal

l clo

ck ti

me

(s)

10 20 30 40 50

2000

2500

3000

Number of states (l)

Wal

l clo

ck ti

me

(s)

10 20 30 40 500

100020003000

Number of diseases (d)

Wal

l clo

ck ti

me

(s)

States Diseases Time

1930 1940 1950 1960 1970 19800

2

4

6 x 104

Year

Cou

nt

1930 1940 1950 1960 1970 1980

105

Year

Cou

nt (l

og)

OriginalI(t)

OriginalS(t)I(t)V(t)

Vaccination

1930 1935 1940 1945 1950100

105

Year

Coun

t (lo

g)

proposed=8.69e+03, AR(52,26,8)=9.62e+03, 9.76e+03, 9.77e+03

FunnelAR(52)AR(26)AR(8) 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

0100020003000

Year (per month)

Cou

nt

SircamKlez

Badtrans NetskyMytob

FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Motivation - Given: large set of epidemiological data e.g., Measles cases from 1928 to 1982 (50 states)

Goal: statistically summarize all the epidemic time-series

Conclusions – FUNNEL has following advantages: •  General & Sense-making: it captures all essential aspects (P1-P5) •  Fully-automatic: it needs no training set •  Scalable: it scales linearly with the input size

Yasuko Matsubara

Kumamoto University yasuko@cs.kumamoto-u.ac.jp

Yasushi Sakurai

Kumamoto University yasushi@cs.kumamoto-u.ac.jp

Willem G. van Panhuis University of Pittsburgh

wav10@pitt.edu

Christos Faloutsos

Carnegie Mellon University christos@cs.cmu.edu

Data description – Project Tycho •  56 contagious diseases for U.S. states •  from 1888 to the present (>125 years) 3rd order tensor (diseases x location x Time)

Data: http://www.tycho.pitt.edu/

Code: http://www.cs.kumamoto-u.ac.jp/~yasuko/software.html

Experiments - (a) Sense-making (5 properties) (b) Fitting accuracy (c) Scalability – (linear with data size)

ACM SIGKDD 2014

20th ACM SIGKDD Conference August 24-27, 2014, New York City

Vaccinationeffect

Outbreakse.g.,  1941Yearly  

periodicity

Time disease loc cases 04-01 measles PA 4740 04-01 measles NY 5310 04-01 rubella CA 1923 … … … …

dise

ases

Time

x

l

d

n

# of cases in 1931, …

Element x: # of cases (weekly) e.g., ‘measles’, ‘PA’, ‘April 1-7, 1931’, ‘4740’

Observations - Properties of real epidemic data

P1 P2 P3 P4 P5

yearly periodicity (e.g., flu peaks in the winter) disease reduction effects (e.g., vaccination) area specificity and sensitivity (e.g., correlation) external shock events (e.g., outbreaks) mistakes, incorrect values (e.g., typos)

Proposed model: FUNNEL (a) With a single epidemic: FUNNEL-RE

(b) With multi-evolving epidemics: FUNNEL-full

People of 3 classes •  S(t) : susceptible •  I (t) : Infected •  V(t) : Vigilant

S(t)

I(t)

V(t)

I(t)

e.g., measles in NY

SIRS SKIPS Funnel−R Funnel25

30

35

40Local

RMSE

Discussion - Application & Generality

(a) Forecasting (Influenza) (b) Epidemics on computer networks N

d

l

P1 P2 P3 P4 P5

��

��

���

�(t)

�(t)

�(t)

P1 P2

P3

Measles

1930 1940 1950 1960 1970 19800

500

1000

1500

Year

Cou

nt

Er= 4.26e+01 (1.2e+04, 0.59, 0.48, 0.11) , P=(0.2, 24), (2081, 0.02), k=16

1930 1940 1950 1960 1970 1980100

105

Year

Cou

nt

OriginalI(t)

OriginalS(t)I(t)V(t)

Mistake

1930 1935 1940 19450

500

1000

1500

2000

Year

Coun

t

Er= 1.07e+02 (1.9e+04, 0.58, 0.44, 0.09) , P=(0.1, 47), (1041, 0.03), k=8

1930 1935 1940 1945100

105

Year

Coun

t

OriginalI(t)

OriginalS(t)I(t)V(t)

Smallpox epidemicsin 1937−39

P4 P5

Optimization algorithm: FUNNEL-Fit Idea (1) Model description cost Q. How can we find externals and mistakes?? A. Minimize coding cost! Idea (2) Multi-layer optimization

=X

(step1) Global fitting

(step2) Local fitting X X

Global

P1 P2

B R

P4 P5

MELocal

N

P4 P5

ME

P3

P1 P2 P3 P4 P5

ME

FUNNEL F = =

Model description cost of F

Coding cost of X given F

Dimensions of X

M

E

P4

P5

Linear

Log

: strength of infection (yearly cycle) : healing rate : forgetting rate : disease reduction effect : temporal susceptible rate

β(t)δθ(t)ε(t)

Extra Local Global

Base matrix

Geo-disease matrix

External shock tensor Disease reduction

matrix

Mistake tensor

E= {E(D),E(T ),E(S )}

Disease matrix

Time matrix

State matrix

E =E(S )

E(D)E(T )

X

l

d

n

X

γ

top related