big data & (financal) networks€¦ · motivations i big data: big opportunities together with...

Big Data & (Financal) Networks

Daniele Tantari

Scuola Normale Superiore, Pisa

October 11, 2017

1 / 19

Motivations

IBig data: big opportunities together withtechnological, methodological and conceptualnew challenges.

I Extract informations in a sea of noise.

IComplex Networks representation of data:the role of the interactions, beyondindividuality

I From macro to micro: Classification andPrediction.

2 / 19

Financial networks: general tasks

I Finance provides a large set of situationswhere networks are present or can beconstructed from data.

I Examples: interbank networks, tradingnetworks, payment networks, shareholdernetworks, control networks, etc,

I Many other examples comes from time series:Correlation networks, Partial correlationnetworks, Granger causality networks, etc.

IPrediction on Nodes: signal propagation(risk/default), central nodes, networkvulnerability

IPrediction on Links: networkformation/evolution, reconstruction frompartial observations, recommendation systems.

3 / 19

Financial networks: outline of the talk

In this presentation I will discuss some financial applications to:

I Local risk vs Global network properties with an application to thestudy of Unicredit payments network. Prediction on nodes

I Temporal networks and link forecasting with an application topreferential trading in interbank networks Prediction on links

4 / 19

Payment Network

and Credit Risk Rating

(E. Letizia and F. Lillo)

5 / 19

THE CORPORATE PAYMENT NETWORK

Source: Unicredit

Large proprietary dataset of payments between Italian firms: 2.4M companies, 47M payments in 2014.

Includes information on risk rating for a large fraction of companies.

6 / 19

Is the position of a company in the payment network informative

on its riskiness beyond its local properties?

CompanyD

CompanyC

CompanyE

CompanyI

CompanyG Company

F

CompanyB

CompanyH

Company A

Low Risk Medium Risk High Risk

I How much are your neighbours informative on your risk?

I What is the relation between the macro organization of the networkand the risk distribution?

7 / 19

Degree

Strong relation betweendegree and risk.

AssortativityUsing the three risk classes

B =

Pi eii � aibi

1�P

i aibi

where eij is the fraction of edgesconnecting vertices of type i and j ,

ai =P

j eij and bj =P

i eij .

Homophily of risk.B > 0: neighbours of firms in thepayment network tend to have

similar risk profile.

8 / 19

Ranking hierarchical organization

I Strong hierarchical structure of the payment network: data drivensupply chains identification.

I A high risk concentration in low class nodes (top of the hierarchy)could trigger a cascade of distress in the higher rank classes.

Machine Learning risk classifier!

IEvaluation of new clients frompartial informations.

INew measure of risk based onnetwork properties.

9 / 19

Random models for temporal

networks and link forecasting

(with P. Mazzarisi, P. Barucca, and F. Lillo)

10 / 19

Research question

Many networks are inherently dynamic as links are created and destroyedthrough time.

IPreferential relations between nodes tend to preserve past links (Ifwe were friends yesterday we will be friend today).

INode specific properties can drive the evolution of the networktopology (Two social persons are more likely to be friend).

We propose a novel methodology for modelling temporal networkssubject to link persistence and time-varying node characteristics anddisentangling their role.

I How the node characteristic and the preferential trading shape afinancial network and how to account for the two linkingmechanisms in a statistical model of temporal networks?

I A proper modeling of network dynamics allows performing shortterm link prediction.

11 / 19

Link persistence

I We model the tendency of a link that does (or does not) exist attime t � 1 to continue existing (or not existing) at time t.

I Discrete AutoRegressive DAR(1) model

P(At |At�1,↵,�) =Y

i,j>i

⇣↵ijIAt

ijAt�1

ij+ (1� ↵ij)�

Atij

ij (1� �ij)1�At

ij

⌘

I The larger is ↵ij , the more persistent is the link between i and j .

12 / 19

Fitness dynamics

I The dynamic parameter ✓ti (fitness) of node i is latent and describesits tendency in creating links.

I We extend the fitness model to the dynamic case.

I We model it as a hidden Markov chain

13 / 19

Fitness dynamics (2)

I Each node i is characterized by a quantity ✓ti , i.e. the node fitness.We assume that it follows an AR(1) process,

✓ti = �0,i + �

1,i✓t�1

i + ✏ti

�0,i 2 R, |�

1,i | < 1 and ✏ti ⇠ NID(0,�i ) with �i > 0.

I We define the link probability at time t as:

P(Atij = 1|✓ti , ✓tj ) =

e

(✓ti +✓t

j )

1 + e

(✓ti +✓t

j )

I The larger ✓ti is, the larger is the probability for all links incident tonode i .

14 / 19

Dynamic fitness + link persistence

We combine the hidden dynamics of (fitness) with the mechanism ofcopying from the past (link persistence).

15 / 19

Empirical results

We investigate two aspects:

I How strong is preferential trading between two banks, when theirpropensity to trade is accounted for (using fitness)?

I Link prediction.

The investigated database is the electronic Market of Interbank Deposit(e-MID), an electronic segment of Italian interbank market.

We focus on the time series of weekly aggregated, unweighted, anddirected adjacency matrix At : At

ij = 1 if bank i lends money at least onceto bank j during the week t.

16 / 19

What is fitness measuring?

Aug 2012 Feb 2013 Aug 2013 Feb 2014 Aug 2014 Feb 2015

EU

RO

(m

ln)

0

50

100

150

200

250

300

350

400Bank exposure for lender '3'

bank exposure

δ e θ3,t

I Correlations between x

ti ⌘ e

✓ti and the

bank’s exposure in the weightednetwork;

IWe obtain information on the

weighted e-MID network having

only the binary information.

Disentangle random and preferential trading

I The DAR(1) model that does notaccount for time evolving networktopology (fitness), tends tooverestimate link stability.

IStatistical validation of preferentialtrading against the null (only fitness)

αij

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

DAR-TGRG

DAR(1)

17 / 19

Out-of-sample link prediction for e-MID

Specificity0 0.2 0.4 0.6 0.8 1

Sensi

tivity

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

TGRG(AUC≈ 0.83)

DAR-TGRG(AUC≈ 0.85)

DAR(1)(AUC≈ 0.80)

threshold value for αij

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

AU

C

0.65

0.7

0.75

0.8

0.85

0.9

0.95

TGRG

DAR-TGRG

DAR(1)

I Out of sample analysis for the e-MID interbank market.

I DAR-TGRG outperforms TGRG and DAR(1) network models.

I In average, network topology is more important than link stabilityfor link prediction in e-MID.

18 / 19

Conclusions

I Financial networks are a powerful tool to study the (dynamic)interaction of a large set of financial agent.

I Financial networks are the channels of propagation of risk.Interesting interplays between idiosyncratic risk and local or globalnetwork topology

I Random network models are a powerful tool forI

Modeling temporal networks and forecasting links

IInference of large scale network structures

19 / 19

big data & (financal) networks€¦ · motivations i big data: big opportunities together with...

Documents