ryan p. adams - duke electrical and computer engineeringlcarin/sahd_adams.pdf · 2013. 8. 3. ·...
TRANSCRIPT
http://hips.seas.harvard.edu
Ryan P. AdamsSchool of Engineering and Applied Sciences
Harvard University
Joint work with Scott Linderman
DISCOVERING IMPLICIT NETWORKS FROM POINT PROCESS DATA
Saturday, August 3, 13
SOCIAL NETWORK ANALYSIS
Szell et al, Nature 2012Saturday, August 3, 13
PROTEIN-PROTEIN INTERACTIONS
http://mippi.ornl.gov/areas/bioinfo.shtmlSaturday, August 3, 13
COLLABORATIVE FILTERING
Saturday, August 3, 13
EXPLICIT VS. IMPLICIT NETWORKS
Typically, we see edges and reason about the latent properties of the vertices.
Saturday, August 3, 13
EXPLICIT VS. IMPLICIT NETWORKS
What if we don’t observe edges, but only noisy emissions from each vertex?
Saturday, August 3, 13
FUNCTIONAL CONNECTIVITY OF NEURONS
Truccolo, Hochberg & Donoghue, 2010Saturday, August 3, 13
PATTERNS OF GANG VIOLENCE
−87.9 −87.85 −87.8 −87.75 −87.7 −87.65 −87.6 −87.55 −87.541.6
41.65
41.7
41.75
41.8
41.85
41.9
41.95
42
42.05
42.1
La
titu
de
Longitude
Murder and Battery in Chicago (2001−2012)
Battery
Murder
Saturday, August 3, 13
OVERVIEW
‣Mutually-Exciting Point Processes
‣Aldous-Hoover Graph Priors
‣MCMC Inference with Data Augmentation
‣Application Examples
‣Extending for Neural Models with Inhibition
Saturday, August 3, 13
MULTIVARIATE POINT PROCESSES
‣ The point process is a foundational statistical object.‣ Gives us random subsets of a larger space.‣ Many data are well modeled as point processes:‣ Seismology‣ Epidemiology‣ Economics
‣ Modeling dependence is challenging - “beyond Poisson”‣ Strauss and Gibbs Processes‣ Determinantal and Permanental Point Processes‣ TODAY: Mutually-Exciting (Hawkes) Processes
Saturday, August 3, 13
POINT PROCESS
‣ A point process on gives us random subsets .
‣ Formally, a random locally-finite counting measure.
‣ Most of the time we think of them as giving us finite subsets of compact subset of , e.g., time or space.
X {xn}Nn=1
Rd
Saturday, August 3, 13
POISSON PROCESS
‣ The Poisson process is the most basic point process.‣ Disjoint regions are independent.‣ The number of points in a region is determined by
integrating the rate function over that region.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
20
40
60
80
100
120
140
Saturday, August 3, 13
DEPENDENCE IN MULTIVARIATE POINT PROCESSES
‣ In the Poisson process, everything is conditionally independent given the rate function.
‣ How can we get spike-driven dynamics?
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS DYNAMICS
time
I
II
III
IV
Saturday, August 3, 13
HAWKES PROCESS
‣ The Hawkes process specifies conditional Poisson dynamics in causal cascades.‣ Hawkes - Biometrika (1971), J. RSS-B (1971)‣ Hawkes - Stochastic Point Processes (1972)‣ Hawkes & Oakes - J. Applied Prob. (1974)
‣ Purely excitatory: each spike increases the intensity according to a weighted temporal kernel.
‣ Self-excitation: increase your own rate.‣ Mutual-excitation: increase other rates.‣ Spectral conditions ensure stability.
Saturday, August 3, 13
GRAPH STRUCTURE FROM HAWKES DYNAMICS
Saturday, August 3, 13
GRAPH STRUCTURE FROM HAWKES DYNAMICS
Saturday, August 3, 13
GRAPH STRUCTURE FROM HAWKES DYNAMICS
Saturday, August 3, 13
BAYESIAN INFERENCE
‣ The Hawkes process provides a likelihood to connect hypotheses about an unobserved graph to observed event data.
‣ With a Bayesian model, we can manipulate the posterior distribution over graphs, marginalizing out nuisance parameters.
‣ We can infer the temporal kernels that modulate the interaction.
‣ We can perform model comparison between different random graph models and their properties.
Saturday, August 3, 13
EXCHANGEABLE RANDOM GRAPHS
‣ Recent work on exchangeable random graphs provides a rich set of priors for underlying networks, e.g., Diaconis & Janson (2007), Orbanz & Roy (2013).
‣ Aldous-Hoover unifies many existing graph models:
Erdös-Renyi Stochastic BlockModel
Latent DistanceModel
Saturday, August 3, 13
HAWKES PROCESS FORMALISM
‣ nodes with events in .‣ ordered event times . ‣ Node of event n given by . ‣ Base rates ‣ Kernel , such that and .
‣ Binary adjacency matrix‣ Interaction weight matrix
‣ An event on induces expected events on .
K [0, T ]
N sn 2 [0, T ]
cn 2 {1, 2, · · · ,K}
g✓(t) g✓(t < 0) = 0
Z 1
0g✓(t) dt = 1
�0k(t)
A 2 {0, 1}K⇥K
W 2 RK⇥K+
Wk,k0Ak,k0k k0
Saturday, August 3, 13
HAWKES PROCESS WITH DATA AUGMENTATION
‣ Instantaneous Poisson rate:
‣ The superposition property of Poisson processes means that each event is explained by either the background rate or exactly one previous event.
‣ Use to represent these latent variables, where means that event induced event .
�k(t) = �0k(t) +
NX
n=1
I(sn < t)Acn,kWcn,kg✓(t� sn)
Z 2 {0, 1}N⇥N
Zn,n0 = 1 n n0
Saturday, August 3, 13
DATA-AUGMENTED LIKELIHOOD
p({sn, cn}Nn=1,Z |A,W , {�0k(t)}Kk=1, ✓) =
KY
k=1
exp
(�Z T
0�0k(⌧) d⌧
)NY
n=1
�0k(sn)
I(cn=k)I(1�Pn0 Zn0,n)
⇥KY
k0=1
(�Z T
sn
Acn,k0Wcn,k0g✓(⌧ � sn) d⌧
)
⇥NY
n0=1
(Acn,k0Wcn,k0g✓(sn0 � sn))Zn,n0
Saturday, August 3, 13
DATA-AUGMENTED LIKELIHOOD
p({sn, cn}Nn=1,Z |A,W , {�0k(t)}Kk=1, ✓) =
KY
k=1
exp
(�Z T
0�0k(⌧) d⌧
)NY
n=1
�0k(sn)
I(cn=k)I(1�Pn0 Zn0,n)
⇥KY
k0=1
(�Z T
sn
Acn,k0Wcn,k0g✓(⌧ � sn) d⌧
)
⇥NY
n0=1
(Acn,k0Wcn,k0g✓(sn0 � sn))Zn,n0
Poisson process likelihood for eventsfrom the background process.
Saturday, August 3, 13
DATA-AUGMENTED LIKELIHOOD
p({sn, cn}Nn=1,Z |A,W , {�0k(t)}Kk=1, ✓) =
KY
k=1
exp
(�Z T
0�0k(⌧) d⌧
)NY
n=1
�0k(sn)
I(cn=k)I(1�Pn0 Zn0,n)
⇥KY
k0=1
(�Z T
sn
Acn,k0Wcn,k0g✓(⌧ � sn) d⌧
)
⇥NY
n0=1
(Acn,k0Wcn,k0g✓(sn0 � sn))Zn,n0
Poisson process likelihood for eventsinduced by previous events.
Saturday, August 3, 13
DATA-AUGMENTED LIKELIHOOD
p({sn, cn}Nn=1,Z |A,W , {�0k(t)}Kk=1, ✓) =
KY
k=1
exp
(�Z T
0�0k(⌧) d⌧
)NY
n=1
�0k(sn)
I(cn=k)I(1�Pn0 Zn0,n)
⇥KY
k0=1
(�Z T
sn
Acn,k0Wcn,k0g✓(⌧ � sn) d⌧
)
⇥NY
n0=1
(Acn,k0Wcn,k0g✓(sn0 � sn))Zn,n0
Ugly looking, but just normalization constants.
Saturday, August 3, 13
MODELING DETAILS
‣Logistic-normal impulse response
‣Reasonably flexible, with compact support.
‣Gaussian process for log background rate
‣Smoothly-varying or periodic external effects.
‣Conjugate gamma priors for weights
‣Can also be coupled via, e.g., latent block identity.
Saturday, August 3, 13
INFERENCE WITH MCMC
‣ Graph structure: collapsed block Gibbs
‣ Edge weights: Gibbs (conjugate gamma posterior)
‣ Latent parent explanations: parallel Gibbs
‣ Background rates: elliptical slice sampling
‣ Temporal kernels: Gibbs or slice sampling
‣ Many transitions can be efficiently simulated on a GPU, enabling models with hundreds of nodes and millions of events.
Saturday, August 3, 13
FINANCIAL UPTICKS/DOWNTICKS
Uptic
k
BAC
AXP
GS
WFC
JPM
IBM
MSFT
ORCL
QCOM
T
Dow
ntic
k
BA
C
AX
P
GS
WF
C
JPM
IBM
MS
FT
OR
CL
QC
OM T
Uptick
BAC
AXP
GS
WFC
JPM
IBM
MSFT
ORCL
QCOM
T
BA
C
AX
P
GS
WF
C
JPM
IBM
MS
FT
OR
CL
QC
OM T
Downtick
Inferred Block Structure (SBM+GP)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
BA
C
AX
P
GS
WF
C
JPM
IBM
MS
FT
OR
CL
QC
OM T
BA
C C
GS
WF
C
JPM
IBM
MS
FT
OR
CL
QC
OM T
Interaction between Finance and Technology Stocks
Uptic
ks O
utg
oin
gD
ow
ntic
ks O
utg
oin
g
Upticks Incoming Downticks Incoming
BAC
AXP
GS
WFC
JPM
IBM
MSFT
ORCL
QCOM
T
BAC
C
GS
WFC
JPM
IBM
MSFT
ORCL
QCOM
T0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
10 stocks, finance and tech sectors, intraday price moves over one week in Sept 2009.
Weighted Edges Block MembershipSaturday, August 3, 13
HOMICIDES AND ARMED BATTERIES IN CHICAGO
−87.9 −87.85 −87.8 −87.75 −87.7 −87.65 −87.6 −87.55 −87.541.6
41.65
41.7
41.75
41.8
41.85
41.9
41.95
42
42.05
42.1
Latit
ude
Longitude
Murder and Battery in Chicago (2001−2012)
Battery
Murder
Saturday, August 3, 13
CPD-IDENTIFIED GANNG BOUNDARIES
−87.9 −87.85 −87.8 −87.75 −87.7 −87.65 −87.6 −87.55 −87.541.6
41.65
41.7
41.75
41.8
41.85
41.9
41.95
42
42.05
42.1
Latit
ude
Longitude
Gang territories identified by CPD
Gangster Disciples
Black Disciples
Black P Stone & Mickey Cobras
Mafia Insane Vice Lords
Black P Stone
Four Corner Hustlers
Ambrose
Latin Kings
Conservative Vice Lords
Black Disciples, Black P Stone, Conservative Vice Lords
Four Corner Hustlers, Black P Stone, and CBL
Gangster Disciples and Black Disciples
New Breed
Maniac Latin Disciples
Party People
Satan Disciples
Two−Six
Latin Saints
Black Souls
Two−Two Boys
Traveling Vice Lords
Unknown Vice Lords
Spanish Cobras
Latin Jivers
Orchestra Albany
Saturday, August 3, 13
ALLIANCES: FOLK NATION VS. PEOPLE NATION
−87.9 −87.85 −87.8 −87.75 −87.7 −87.65 −87.6 −87.55 −87.541.6
41.65
41.7
41.75
41.8
41.85
41.9
41.95
42
42.05
42.1
Latit
ude
Longitude
Gang Affiliations as Identified by Chicago PD
Folk
People
Saturday, August 3, 13
SEASONAL VARIATION
Saturday, August 3, 13
INFERRED INTERACTING TERRITORIES
−87.9 −87.85 −87.8 −87.75 −87.7 −87.65 −87.6 −87.55 −87.541.6
41.65
41.7
41.75
41.8
41.85
41.9
41.95
42
42.05
42.1
Latit
ude
Longitude
Inferred Gang Territories and Interactions
We also infer the event sources, using a spatial model and a Dirichlet
process prior.
Saturday, August 3, 13
TRYING TO PREDICT HOTSPOTS
−87.9 −87.85 −87.8 −87.75 −87.7 −87.65 −87.6 −87.55 −87.541.6
41.65
41.7
41.75
41.8
41.85
41.9
41.95
42
42.05
42.1
La
titu
de
Longitude
Predicted Homicide Rate, Memorial Day 2012
Homicide
Battery
Saturday, August 3, 13
NEURAL MODELING WITH INHIBITION
‣ Neural functional connectivity involves both excitation and inhibition, so the pure Hawkes is inappropriate.
‣ We extend the model to allow for negative weights and include a saturating nonlinearity.
‣ This effective becomes a Bayesian variant of the popular generalized linear model (GLM) from the computational neuroscience literature.
‣ We can leverage graph priors within this framework to discover latent neural properties and connectivity.
Saturday, August 3, 13
BAYESIAN GLM FOR NEURAL DATA
!"#$%&'()*+,-
.#/0
1#/)'#/2*& 3(*42*0#'2/205 6(255(*-6'($05505
!!!"# $%
"#&"'$
*78
!!"# "!"#
902&:/0+-30/;('%-----
!"● #
<1$#4#'=
>0?@('#4-A?@)450-
.05@(*50-g!<B/=
Saturday, August 3, 13
RETINAL GANGLION CELLS
324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377
OFF RF Centers
1 2 3 4 5 6 7 8123456789
−0.1
−0.05
0
0.05
0.1
ON RF Centers
1 2 3 4 5 6 7 8123456789
−0.1
−0.05
0
0.05
0.1
(a)
−11
−11
−1
0
1
Spatial stimulus filter (OFF)
−0.4
−0.2
0
−11
−1
1
Spatial stimulus filter (ON)
0
0.2
0.4
(b)
0 0.1 0.20
10
20
t (s)
f(t)
Temporal stimulus filter (OFF)
0 0.1 0.20
2
4
6
t (s)
f(t)
Temporal stimulus filter (ON)
(c)
Figure 4: Inferred stimulus filters under a Gaussian process prior distribution. (a) Spatial component of thetuning curve convolved with the inferred locations of the 27 retinal ganglion cells in the dataset, separatedaccording to their inferred types. Shading indicates the amplitude of the tuning curve at that location. (b)Inferred spatial shape of tuning curves shared by inferred blocks of ON and OFF cells. (c) Amplitude of theinferred filters versus temporal offset.
5 Application to Neural Recordings
Next we applied our model to spike trains simultaneously recorded from a population of 27 retinalganglion cells (RGCs)2, previously analyzed in [16]. This population is comprised of two typesof cells, ON and OFF cells, characterized by their response to visual stimuli. ON cells increasetheir firing when light is shone upon the area of the retina to which they are sensitive; OFF cellsincrease their firing when light is not shone upon their retinal location. These region of the retinato which a cell is sensitive is called its receptive field, and the expected firing rate as a function ofthe stimulus presented in the receptive field is called the “tuning curve” of the neuron. Cell typesare identified by a common tuning curve centered at various locations throughout the retina. Inthis dataset, the population is presented with a Gaussian white noise video. We use one minute ofrecording corresponding to approximately 50K spikes.
We develop a Bayesian model to infer (a) the presence of multiple neuron types sharing the sametuning curve, (b) the location at which each neuron’s tuning curve is centered, (c) the distribution ofweights as a function of these neural types, and (d) the effect of distance between neuron locationson the probability of functional interaction. To do so we introduce a block model with a Gaussianprocess prior over spatiotemporal tuning curves, and Gaussian priors over weights as a function ofblock type. Each neuron has a 2-dimensional location in the retinal plane, also Gaussian distributed,at which its tuning curve is centered. The tuning curve is convolved with the stimulus at this locationto produce a background activation. The adjacency matrix A is given a latent distance prior usingthese same locations. Details of these priors are given in the supplementary material.
Using the labels found in [16] as ground truth, our model correctly infers the neuron types with100% accuracy. The inferred cell locations are shown in Figure 4a in units of pixels of the presentedstimuli, and agree with the results of [16]. The spatial and temporal components of the inferred type-specific tuning curves are shown in Figures 4b and 4c, respectively. These exhibit the canonical formwhich characterizes ON and OFF cells. In comparison to [16], this model has a reduced parameterset due to the sharing of receptive fields.
In addition to these inferred receptive fields, we also infer latent structural patterns of the interactionnetwork as a function of cell type and location. Figure 5a shows the inferred weighted adjacency ma-trix of interactions (ordered first by type and then by location along a 1D projection). As expected,cells excite nearby cells of the same type and inhibit those of the opposite type. These are quantified
2We thank Jonathan Pillow and E.J. Chichilnisky for providing this data
7
27 neurons, 50K spikes, macaque retina(Data from Pillow and Chichilnisky)
Saturday, August 3, 13
INFERRED NETWORK PROPERTIES
Postsynaptic Neuron
Pre
synaptic
Neuro
n
RGC Connectivity
OFF ON
OFF
ON
−2
−1
0
1
2
−2 0 20
1
2
W
Pr(
W)
Outgoing Weights (OFF)
−2 0 20
1
2
W
Pr(
W)
Outgoing Weights (ON)
||xk-xk`||2
P(A
k,k`
)
Pr(A) vs Distance
0 2 40
0.5
1
Saturday, August 3, 13
SUMMARY
‣ We cannot directly observed edges for many networks of interest.
‣ However, these latent graphs can be inferred from vertex emissions.
‣ The purely-excitatory case (Hawkes process) enables an elegant data-augmentation approach to inference.
‣ MCMC is fast and tractable, and lets us reason about different graphs and graph priors.
‣ The inhibitory case is less tractable, but important for neuroscience applications.
Scott Linderman
Saturday, August 3, 13