multiscale granger causality and information decomposition
TRANSCRIPT
Department of Data Analysis – University of Ghent
Multiscale Granger Causality
and Information Decomposition
Daniele Marinazzo
C3S Conference
September 2017
FBK and University of Trento
Luca Faes
http://users.ugent.be/~dmarinaz/
@dan_marinazzo
University of Bari and INFN
Sebastiano Stramaglia
X
S
Y
Y = (Y1, Y2, … , Yn, … ,YN)
X=(X1, X2, … , Xn, … ,XN)
• Target system Y:
• Dynamical System S={S1,...,SM}
S={X1,...,XM-1 , Y} = {X,Y}X={X1,...,XM-1}
X1
SY
…X2
IntroductionInformation Dynamics:
TheorySynergy andRedundancy
ConclusionsInformation Dynamics:
ApplicationDYNAMICAL SYSTEMS AND PROCESSES
ny
nn-1, n-2, …
nx,...],[ 21
nnn xxx
,...],[ 21 nnn yyy
y
x
presentpast
• Information Storage:
• Information Transfer:
);( nnx xxIS
)|;( nnnxy xyxIT
[JT Lizier et al, Information Sci 2012]
[T Schreiber et al, Phys Rev Lett 2000]
Regularity of x
Influence yx
Information Dynamics
Information-theoretical quantities can be computed from (co)variance, exact forGaussian processes
n
p
k
knkT
nnnn UZVXU
1
A][
State Space (SS) representation
Vector Autoregressive (VAR) representation of
nnn
nnn
ESU
ESS
C
K1State eq.
Observation eq.
},,{},{ VZXYXU
)()(C][ XVnn
XVnn ESVX
)()(C][ XZnn
XZnn ESZX
)()(C Xnn
Xn ESX
][ 2,11
2| nXVZX E
][2)(
,112
|XVnXVX E
][2)(
,112
|XZnXZX E
][2)(
,112
|XnXX E
• Practical computation of prediction and entropy measures
The SS representation is closed under the formation of sub-models
[l. Barnett and A.K. Seth, PRE 91(4) 040101, 2015;L Faes et al, arXiv preprint:1602.06155, 2016]
Estimation of VAR parameters:
,kA )cov( n
SUBMODELS:
Obs. without Z
Obs. without V
Obs. without V,Z
Estimation of SS parameters:
)cov(,),,,( nEVVKCA
)cov(,),,,( )(22222
XVnEVVKCA
)cov(,),,,( )(33333
XZnEVVKCA
)cov(,),,,( )(44444
XZVnEVVKCA
least squares
BIC for model order selection
Partial variances:
DARE (discrete algebraic Ricatti Equation)
• All partial variances can be computed from the VAR parameters estimated only once!
INFORMATION DECOMPOSITIONSTATE SPACE REPRESENTATION
METHODSINFORMATION DECOMPOSITIONPROBLEMS WHICH WEREN’T
https://arxiv.org/abs/1708.08001
https://arxiv.org/abs/1708.06990
INTRODUCTIONINTRODUCTIONBIAS AND VARIANCE REDUCED IN SS FORMULATION
INTRODUCTION
x
NnyxY nnn ,...,1},{
x
...
2x
x
x
1x 9x2x ...NnyxY nnn ,...,1},{
/,...,1,~
NnYY nn
RESCALING (scale factor ):
1) AVERAGING
2) DOWNSAMPLING
NnYY
l
lnn ,...,,1~
1
0
MULTISCALE ANALYSIS OF TIME SERIES: CHANGE OF TIME SCALE
2x
1x 9x2x
INTRODUCTIONMETHODS
Example:
x~
3,9 N
Example:
3,9 N
1
0
1
0
1,
1
l
lnn
l
lnn yyxx
/,...,1,},{ NnyxY nnn
[M Costa et al, Phys. Rev. Lett. 89, 2002]
• Rescaling can be seen as a two-step procedure
• Traditional procedure for rescaling
[J. Valencia et al, IEEE Trans. Biomed Eng. 56, 2009]
1x
3
~ 9879
xxxx
1x
3
9873
xxxx
93~xx
... ...
3x
INTRODUCTIONMETHODS
nnn
nnn
EZY
EZZ~~~~
~~~~~1
C
KAn
p
k
knkn YY UA
1
Observedtime series
Rescaledtime series
MULTISCALE REPRESENTATION OF LINEAR PROCESSES USING STATE SPACE MODELS
),( ΣAVAR
)cov( nUΣ
),,( ΣBAVARMA
1
01
~
l
lnl
p
k
knkn YY UBA
1,...,1,0,1
ll IB
)~
,~
,~
,~
( VKCASS
),,,( VKCASS
),,,,( SRQCASS
0I0000
00I000
000000
0000I0
00000I
BBBAAA
A
qqpp 1111
~
DOWNSAMPLINGAVERAGING
][~
111 qBBAAC
TT0 ][
~00B00IK
T00)
~cov(
~ΣBBV nE
nnn
nnn
EZY
EZZ
C
KA1
nnn
nnn
VXY
WXX
C
A1
VRCCAA~
,~
,~
VKAS
KVKAQAQ
~~~
~~~~~
1
TT1
Discrete Algebraic Ricatti Equation
The State Space model defining the multivariate linear process after rescaling
can be obtained from the original VAR parameters and the scale factor
Averaged time series
1[Aoki & Havenner, Econ. Rev. 10, 1991]
[1]
2[Solo, Neural Comp 28, 2016]
[2]
[2,3]
3[Barnett & Seth, Phys. Rev. E 91, 2015]
BRAIN-TO-HEART DYNAMICS
0.25 (1)
xnnn
nnn
wxy
uxx
2
1
5.0
25.0
nnnn
nnnn
wxyy
uyxx
75
31
5.025.0
75.025.0
y0.5 (2) 0.5 (7)
0.75 (3)
0.25 (5)
MULTISCALE COMPUTATION OF INFORMATION DYNAMICS FOR VAR PROCESSES
• Unidirectional interaction:
In
form
ati
on
S
torag
e
In
form
ati
on
Tran
sfe
r
Sx
Sy
Sx
Sy
Txy
Tyx
Txy
0.25 (1)
x y
• Bidirectional interaction:
In
form
ati
on
S
torag
e
In
form
ati
on
Tran
sfe
r
Sx
SxSy
Sy
Tyx Tyx
Txy
Txy
• Averaging step: introduces autocorrelations ( Storage)
does not alter causal interactions ( Transfer)
• Downsampling step: removes autocorrelations
elicits scale-dependent causal interactions
TE peaks at scales compatible with the interaction delay
Tyx
INTRODUCTIONSIMULATIONS
9
a) Modern climate data b) Paleoclimate data
L Faes, S Stramaglia, G Nollo, D Marinazzo, ‘Multiscale Granger causality’, ArXiv 2017 https://arxiv.org/abs/1703.08487
MULTISCALE INTERACTIONS IN CLIMATOLOGY
global land-ocean temperature index and CO2
concentration measured at monthly resolution from March 1958 to February 2017 (708 data points)
GT and CO2 concentration on the VostokIce Core data, extended by the EPICA Dome C data which go back to 800,000 years ago
JOINT INFORMATION
In the presence of two sources Yi and Yk, and atarget Yj, we want to quantify the informationtransferred to Yj from the sources Yi and Yk
taken together
TRANSFER ENTROPY
JOINT TRANSFER ENTROPY
Interaction Information Decomposition (IID) Partial Information Decomposition (PID)
L Faes, D Marinazzo, S Stramaglia, 'Multiscale information decomposition: exact computation for multivariate Gaussian processes', Entropy, special issue on Multivariate entropy measures and their applications, 2017, 19(8), 408.5.
PARTIAL INFORMATION DECOMPOSITION
Synergy and redundancy as mutually exclusive phenomena
L Faes, D Marinazzo, S Stramaglia, 'Multiscale information decomposition: exact computation for multivariate Gaussian processes', Entropy, special issue on Multivariate entropy measures and their applications, 2017, 19(8), 408.5.
PARTIAL INFORMATION DECOMPOSITION
Distinct non-negative measures of redundancy and synergy, thereby accounting for the possibility that redundancy and synergy may coexist as separate elements of information modification.
The interaction TE is actually a measure of the ‘net’ synergy manifested in the transfer of information from the two sources to the target.
PID components cannot be obtained through classic information theory simply subtracting conditional MI terms: one more relation is needed to solve all the quantities. Shannon information theorydoes not univocally determine this decomposition
Redundancy is defined as the minimum of the information provided by each individual source to the target
This choice satisfies the desirable property that the redundant TE is independent of the correlation between the source processes.
13
• Validation on simulated linear stochastic processes
Exact profiles of IID and PID measures
Simulation scheme
MULTISCALE INFORMATION DECOMPOSITION
Interaction Information Decomposition (IID) Partial Information Decomposition (PID)
MULTISCALE ID IN EPILEPSY
We look at 64 cortical electrodes as targets, and two depth hippocampal electrodes (11 and 12) as drivers[M. Kramer et al., Epilepsy Research 79, 173-186, 2008]
MULTISCALE ID IN EPILEPSY
We look at 64 cortical electrodes as targets, and two depth hippocampal electrodes (11 and 12) as drivers[M. Kramer et al., Epilepsy Research 79, 173-186, 2008]
http://users.ugent.be/~dmarinaz/
@dan_marinazzo
• Faes et al., Multiscale Granger Causality, ArXiv 2017 https://arxiv.org/abs/1703.08487
• Faes et al., Multiscale Information Decomposition: Exact Computation for Multivariate Gaussian Processes, Entropy 2017, 19(8), 408; doi:10.3390/e19080408
• Faes et al. On the interpretability and computational reliability of frequency-domain Granger causality, F1000 research 2017 https://f1000research.com/articles/6-1710/v1
• https://github.com/danielemarinazzo - www.lucafaes.net
THANKS