funnel: automatic mining of spatially coevolving epidemicsyasuko/publications/kdd14-funnel...1930...
Post on 27-Aug-2019
217 Views
Preview:
TRANSCRIPT
1930 1940 1950 1960 1970 19800
5x 104
Year
Cou
nt
1930 1940 1950 1960 1970 1980
105
Year
Cou
nt (l
og)
0.1
0.2
0.3
0.4
0.5
February (2)
August (8)
March (3)
September (9)
April (4)
October (10)
May (5)
November (11)
June (6)
December (12)
July (7) January (1)
RubellaMeasles
Mumps
GonorrheaStreptococcal sore throat
ChickenpoxSmallpox
Lymedisease
TyphoidfeverCryptosporidiosis
Rocky mountain spotted fever
Typhus fever
Influenza
1930 1950 19700
5000
10000
Year
1930 1950 1970100
105
Year
OriginalI(t)
OriginalS(t)I(t)V(t)
2000 4000 60000
1000
2000
3000
Number of time−ticks (n)
Wal
l clo
ck ti
me
(s)
10 20 30 40 50
2000
2500
3000
Number of states (l)
Wal
l clo
ck ti
me
(s)
10 20 30 40 500
100020003000
Number of diseases (d)
Wal
l clo
ck ti
me
(s)
States Diseases Time
1930 1940 1950 1960 1970 19800
2
4
6 x 104
Year
Cou
nt
1930 1940 1950 1960 1970 1980
105
Year
Cou
nt (l
og)
OriginalI(t)
OriginalS(t)I(t)V(t)
Vaccination
1930 1935 1940 1945 1950100
105
Year
Coun
t (lo
g)
proposed=8.69e+03, AR(52,26,8)=9.62e+03, 9.76e+03, 9.77e+03
FunnelAR(52)AR(26)AR(8) 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
0100020003000
Year (per month)
Cou
nt
SircamKlez
Badtrans NetskyMytob
FUNNEL: Automatic Mining of Spatially Coevolving Epidemics
Motivation - Given: large set of epidemiological data e.g., Measles cases from 1928 to 1982 (50 states)
Goal: statistically summarize all the epidemic time-series
Conclusions – FUNNEL has following advantages: • General & Sense-making: it captures all essential aspects (P1-P5) • Fully-automatic: it needs no training set • Scalable: it scales linearly with the input size
Yasuko Matsubara
Kumamoto University yasuko@cs.kumamoto-u.ac.jp
Yasushi Sakurai
Kumamoto University yasushi@cs.kumamoto-u.ac.jp
Willem G. van Panhuis University of Pittsburgh
wav10@pitt.edu
Christos Faloutsos
Carnegie Mellon University christos@cs.cmu.edu
Data description – Project Tycho • 56 contagious diseases for U.S. states • from 1888 to the present (>125 years) 3rd order tensor (diseases x location x Time)
Data: http://www.tycho.pitt.edu/
Code: http://www.cs.kumamoto-u.ac.jp/~yasuko/software.html
Experiments - (a) Sense-making (5 properties) (b) Fitting accuracy (c) Scalability – (linear with data size)
ACM SIGKDD 2014
20th ACM SIGKDD Conference August 24-27, 2014, New York City
Vaccinationeffect
Outbreakse.g., 1941Yearly
periodicity
Time disease loc cases 04-01 measles PA 4740 04-01 measles NY 5310 04-01 rubella CA 1923 … … … …
dise
ases
Time
x
l
d
n
# of cases in 1931, …
Element x: # of cases (weekly) e.g., ‘measles’, ‘PA’, ‘April 1-7, 1931’, ‘4740’
Observations - Properties of real epidemic data
P1 P2 P3 P4 P5
yearly periodicity (e.g., flu peaks in the winter) disease reduction effects (e.g., vaccination) area specificity and sensitivity (e.g., correlation) external shock events (e.g., outbreaks) mistakes, incorrect values (e.g., typos)
Proposed model: FUNNEL (a) With a single epidemic: FUNNEL-RE
(b) With multi-evolving epidemics: FUNNEL-full
People of 3 classes • S(t) : susceptible • I (t) : Infected • V(t) : Vigilant
S(t)
I(t)
V(t)
I(t)
e.g., measles in NY
SIRS SKIPS Funnel−R Funnel25
30
35
40Local
RMSE
Discussion - Application & Generality
(a) Forecasting (Influenza) (b) Epidemics on computer networks N
d
l
P1 P2 P3 P4 P5
��
��
���
�
�(t)
�(t)
�(t)
P1 P2
P3
Measles
1930 1940 1950 1960 1970 19800
500
1000
1500
Year
Cou
nt
Er= 4.26e+01 (1.2e+04, 0.59, 0.48, 0.11) , P=(0.2, 24), (2081, 0.02), k=16
1930 1940 1950 1960 1970 1980100
105
Year
Cou
nt
OriginalI(t)
OriginalS(t)I(t)V(t)
Mistake
1930 1935 1940 19450
500
1000
1500
2000
Year
Coun
t
Er= 1.07e+02 (1.9e+04, 0.58, 0.44, 0.09) , P=(0.1, 47), (1041, 0.03), k=8
1930 1935 1940 1945100
105
Year
Coun
t
OriginalI(t)
OriginalS(t)I(t)V(t)
Smallpox epidemicsin 1937−39
P4 P5
Optimization algorithm: FUNNEL-Fit Idea (1) Model description cost Q. How can we find externals and mistakes?? A. Minimize coding cost! Idea (2) Multi-layer optimization
=X
(step1) Global fitting
(step2) Local fitting X X
Global
P1 P2
B R
P4 P5
MELocal
N
P4 P5
ME
P3
P1 P2 P3 P4 P5
ME
FUNNEL F = =
Model description cost of F
Coding cost of X given F
Dimensions of X
M
E
P4
P5
Linear
Log
: strength of infection (yearly cycle) : healing rate : forgetting rate : disease reduction effect : temporal susceptible rate
β(t)δθ(t)ε(t)
Extra Local Global
Base matrix
Geo-disease matrix
External shock tensor Disease reduction
matrix
Mistake tensor
E= {E(D),E(T ),E(S )}
Disease matrix
Time matrix
State matrix
E =E(S )
E(D)E(T )
X
l
d
€
n
X
γ
top related