multiscale granger causality and information decomposition

Department of Data Analysis – University of Ghent

Multiscale Granger Causality

and Information Decomposition

Daniele Marinazzo

C3S Conference

September 2017

FBK and University of Trento

Luca Faes

http://users.ugent.be/~dmarinaz/

[email protected]

@dan_marinazzo

University of Bari and INFN

Sebastiano Stramaglia

X

S

Y

Y = (Y1, Y2, … , Yn, … ,YN)

X=(X1, X2, … , Xn, … ,XN)

• Target system Y:

• Dynamical System S={S1,...,SM}

S={X1,...,XM-1 , Y} = {X,Y}X={X1,...,XM-1}

X1

SY

…X2

IntroductionInformation Dynamics:

TheorySynergy andRedundancy

ConclusionsInformation Dynamics:

ApplicationDYNAMICAL SYSTEMS AND PROCESSES

ny

nn-1, n-2, …

nx,...],[ 21

nnn xxx

,...],[ 21 nnn yyy

y

x

presentpast

• Information Storage:

• Information Transfer:

);( nnx xxIS

)|;( nnnxy xyxIT

[JT Lizier et al, Information Sci 2012]

[T Schreiber et al, Phys Rev Lett 2000]

Regularity of x

Influence yx

Information Dynamics

Information-theoretical quantities can be computed from (co)variance, exact forGaussian processes

n

p

k

knkT

nnnn UZVXU

1

A][

State Space (SS) representation

Vector Autoregressive (VAR) representation of

nnn

nnn

ESU

ESS

C

K1State eq.

Observation eq.

},,{},{ VZXYXU

)()(C][ XVnn

XVnn ESVX

)()(C][ XZnn

XZnn ESZX

)()(C Xnn

Xn ESX

][ 2,11

2| nXVZX E

][2)(

,112

|XVnXVX E

][2)(

,112

|XZnXZX E

][2)(

,112

|XnXX E

• Practical computation of prediction and entropy measures

The SS representation is closed under the formation of sub-models

[l. Barnett and A.K. Seth, PRE 91(4) 040101, 2015;L Faes et al, arXiv preprint:1602.06155, 2016]

Estimation of VAR parameters:

,kA )cov( n

SUBMODELS:

Obs. without Z

Obs. without V

Obs. without V,Z

Estimation of SS parameters:

)cov(,),,,( nEVVKCA

)cov(,),,,( )(22222

XVnEVVKCA

)cov(,),,,( )(33333

XZnEVVKCA

)cov(,),,,( )(44444

XZVnEVVKCA

least squares

BIC for model order selection

Partial variances:

DARE (discrete algebraic Ricatti Equation)

• All partial variances can be computed from the VAR parameters estimated only once!

INFORMATION DECOMPOSITIONSTATE SPACE REPRESENTATION

METHODSINFORMATION DECOMPOSITIONPROBLEMS WHICH WEREN’T

https://arxiv.org/abs/1708.08001

https://arxiv.org/abs/1708.06990

INTRODUCTIONINTRODUCTIONBIAS AND VARIANCE REDUCED IN SS FORMULATION

INTRODUCTION

x

NnyxY nnn ,...,1},{

x

...

2x

x

x

1x 9x2x ...NnyxY nnn ,...,1},{

/,...,1,~

NnYY nn

RESCALING (scale factor ):

1) AVERAGING

2) DOWNSAMPLING

NnYY

l

lnn ,...,,1~

1

0

MULTISCALE ANALYSIS OF TIME SERIES: CHANGE OF TIME SCALE

2x

1x 9x2x

INTRODUCTIONMETHODS

Example:

x~

3,9 N

Example:

3,9 N

1

0

1

0

1,

1

l

lnn

l

lnn yyxx

/,...,1,},{ NnyxY nnn

[M Costa et al, Phys. Rev. Lett. 89, 2002]

• Rescaling can be seen as a two-step procedure

• Traditional procedure for rescaling

[J. Valencia et al, IEEE Trans. Biomed Eng. 56, 2009]

1x

3

~ 9879

xxxx

1x

3

9873

xxxx

93~xx

... ...

3x

INTRODUCTIONMETHODS

nnn

nnn

EZY

EZZ~~~~

~~~~~1

C

KAn

p

k

knkn YY UA

1

Observedtime series

Rescaledtime series

MULTISCALE REPRESENTATION OF LINEAR PROCESSES USING STATE SPACE MODELS

),( ΣAVAR

)cov( nUΣ

),,( ΣBAVARMA

1

01

~

l

lnl

p

k

knkn YY UBA

1,...,1,0,1

ll IB

)~

,~

,~

,~

( VKCASS

),,,( VKCASS

),,,,( SRQCASS

0I0000

00I000

000000

0000I0

00000I

BBBAAA

A

qqpp 1111

~

DOWNSAMPLINGAVERAGING

][~

111 qBBAAC

TT0 ][

~00B00IK

T00)

~cov(

~ΣBBV nE

nnn

nnn

EZY

EZZ

C

KA1

nnn

nnn

VXY

WXX

C

A1

VRCCAA~

,~

,~

VKAS

KVKAQAQ

~~~

~~~~~

1

TT1

Discrete Algebraic Ricatti Equation

The State Space model defining the multivariate linear process after rescaling

can be obtained from the original VAR parameters and the scale factor

Averaged time series

1[Aoki & Havenner, Econ. Rev. 10, 1991]

[1]

2[Solo, Neural Comp 28, 2016]

[2]

[2,3]

3[Barnett & Seth, Phys. Rev. E 91, 2015]

BRAIN-TO-HEART DYNAMICS

0.25 (1)

xnnn

nnn

wxy

uxx

2

1

5.0

25.0

nnnn

nnnn

wxyy

uyxx

75

31

5.025.0

75.025.0

y0.5 (2) 0.5 (7)

0.75 (3)

0.25 (5)

MULTISCALE COMPUTATION OF INFORMATION DYNAMICS FOR VAR PROCESSES

• Unidirectional interaction:

In

form

ati

on

S

torag

e

In

form

ati

on

Tran

sfe

r

Sx

Sy

Sx

Sy

Txy

Tyx

Txy

0.25 (1)

x y

• Bidirectional interaction:

In

form

ati

on

S

torag

e

In

form

ati

on

Tran

sfe

r

Sx

SxSy

Sy

Tyx Tyx

Txy

Txy

• Averaging step: introduces autocorrelations ( Storage)

does not alter causal interactions ( Transfer)

• Downsampling step: removes autocorrelations

elicits scale-dependent causal interactions

TE peaks at scales compatible with the interaction delay

Tyx

INTRODUCTIONSIMULATIONS

9

a) Modern climate data b) Paleoclimate data

L Faes, S Stramaglia, G Nollo, D Marinazzo, ‘Multiscale Granger causality’, ArXiv 2017 https://arxiv.org/abs/1703.08487

MULTISCALE INTERACTIONS IN CLIMATOLOGY

global land-ocean temperature index and CO2

concentration measured at monthly resolution from March 1958 to February 2017 (708 data points)

GT and CO2 concentration on the VostokIce Core data, extended by the EPICA Dome C data which go back to 800,000 years ago

JOINT INFORMATION

In the presence of two sources Yi and Yk, and atarget Yj, we want to quantify the informationtransferred to Yj from the sources Yi and Yk

taken together

TRANSFER ENTROPY

JOINT TRANSFER ENTROPY

Interaction Information Decomposition (IID) Partial Information Decomposition (PID)

L Faes, D Marinazzo, S Stramaglia, 'Multiscale information decomposition: exact computation for multivariate Gaussian processes', Entropy, special issue on Multivariate entropy measures and their applications, 2017, 19(8), 408.5.

PARTIAL INFORMATION DECOMPOSITION

Synergy and redundancy as mutually exclusive phenomena

L Faes, D Marinazzo, S Stramaglia, 'Multiscale information decomposition: exact computation for multivariate Gaussian processes', Entropy, special issue on Multivariate entropy measures and their applications, 2017, 19(8), 408.5.

PARTIAL INFORMATION DECOMPOSITION

Distinct non-negative measures of redundancy and synergy, thereby accounting for the possibility that redundancy and synergy may coexist as separate elements of information modification.

The interaction TE is actually a measure of the ‘net’ synergy manifested in the transfer of information from the two sources to the target.

PID components cannot be obtained through classic information theory simply subtracting conditional MI terms: one more relation is needed to solve all the quantities. Shannon information theorydoes not univocally determine this decomposition

Redundancy is defined as the minimum of the information provided by each individual source to the target

This choice satisfies the desirable property that the redundant TE is independent of the correlation between the source processes.

13

• Validation on simulated linear stochastic processes

Exact profiles of IID and PID measures

Simulation scheme

MULTISCALE INFORMATION DECOMPOSITION

Interaction Information Decomposition (IID) Partial Information Decomposition (PID)

MULTISCALE ID IN EPILEPSY

We look at 64 cortical electrodes as targets, and two depth hippocampal electrodes (11 and 12) as drivers[M. Kramer et al., Epilepsy Research 79, 173-186, 2008]

MULTISCALE ID IN EPILEPSY

We look at 64 cortical electrodes as targets, and two depth hippocampal electrodes (11 and 12) as drivers[M. Kramer et al., Epilepsy Research 79, 173-186, 2008]

http://users.ugent.be/~dmarinaz/

[email protected]

@dan_marinazzo

• Faes et al., Multiscale Granger Causality, ArXiv 2017 https://arxiv.org/abs/1703.08487

• Faes et al., Multiscale Information Decomposition: Exact Computation for Multivariate Gaussian Processes, Entropy 2017, 19(8), 408; doi:10.3390/e19080408

• Faes et al. On the interpretability and computational reliability of frequency-domain Granger causality, F1000 research 2017 https://f1000research.com/articles/6-1710/v1

• https://github.com/danielemarinazzo - www.lucafaes.net

THANKS

multiscale granger causality and information decomposition

Science