big data & (financal) networks€¦ · motivations i big data: big opportunities together with...
TRANSCRIPT
Big Data & (Financal) Networks
Daniele Tantari
Scuola Normale Superiore, Pisa
October 11, 2017
1 / 19
Motivations
IBig data: big opportunities together withtechnological, methodological and conceptualnew challenges.
I Extract informations in a sea of noise.
IComplex Networks representation of data:the role of the interactions, beyondindividuality
I From macro to micro: Classification andPrediction.
2 / 19
Financial networks: general tasks
I Finance provides a large set of situationswhere networks are present or can beconstructed from data.
I Examples: interbank networks, tradingnetworks, payment networks, shareholdernetworks, control networks, etc,
I Many other examples comes from time series:Correlation networks, Partial correlationnetworks, Granger causality networks, etc.
IPrediction on Nodes: signal propagation(risk/default), central nodes, networkvulnerability
IPrediction on Links: networkformation/evolution, reconstruction frompartial observations, recommendation systems.
3 / 19
Financial networks: outline of the talk
In this presentation I will discuss some financial applications to:
I Local risk vs Global network properties with an application to thestudy of Unicredit payments network. Prediction on nodes
I Temporal networks and link forecasting with an application topreferential trading in interbank networks Prediction on links
4 / 19
Payment Network
and Credit Risk Rating
(E. Letizia and F. Lillo)
5 / 19
THE CORPORATE PAYMENT NETWORK
Source: Unicredit
Large proprietary dataset of payments between Italian firms: 2.4M companies, 47M payments in 2014.
Includes information on risk rating for a large fraction of companies.
6 / 19
Is the position of a company in the payment network informative
on its riskiness beyond its local properties?
CompanyD
CompanyC
CompanyE
CompanyI
CompanyG Company
F
CompanyB
CompanyH
Company A
Low Risk Medium Risk High Risk
I How much are your neighbours informative on your risk?
I What is the relation between the macro organization of the networkand the risk distribution?
7 / 19
Degree
Strong relation betweendegree and risk.
AssortativityUsing the three risk classes
B =
Pi eii � aibi
1�P
i aibi
where eij is the fraction of edgesconnecting vertices of type i and j ,
ai =P
j eij and bj =P
i eij .
Homophily of risk.B > 0: neighbours of firms in thepayment network tend to have
similar risk profile.
8 / 19
Ranking hierarchical organization
I Strong hierarchical structure of the payment network: data drivensupply chains identification.
I A high risk concentration in low class nodes (top of the hierarchy)could trigger a cascade of distress in the higher rank classes.
Machine Learning risk classifier!
IEvaluation of new clients frompartial informations.
INew measure of risk based onnetwork properties.
9 / 19
Random models for temporal
networks and link forecasting
(with P. Mazzarisi, P. Barucca, and F. Lillo)
10 / 19
Research question
Many networks are inherently dynamic as links are created and destroyedthrough time.
IPreferential relations between nodes tend to preserve past links (Ifwe were friends yesterday we will be friend today).
INode specific properties can drive the evolution of the networktopology (Two social persons are more likely to be friend).
We propose a novel methodology for modelling temporal networkssubject to link persistence and time-varying node characteristics anddisentangling their role.
I How the node characteristic and the preferential trading shape afinancial network and how to account for the two linkingmechanisms in a statistical model of temporal networks?
I A proper modeling of network dynamics allows performing shortterm link prediction.
11 / 19
Link persistence
I We model the tendency of a link that does (or does not) exist attime t � 1 to continue existing (or not existing) at time t.
I Discrete AutoRegressive DAR(1) model
P(At |At�1,↵,�) =Y
i,j>i
⇣↵ijIAt
ijAt�1
ij+ (1� ↵ij)�
Atij
ij (1� �ij)1�At
ij
⌘
I The larger is ↵ij , the more persistent is the link between i and j .
12 / 19
Fitness dynamics
I The dynamic parameter ✓ti (fitness) of node i is latent and describesits tendency in creating links.
I We extend the fitness model to the dynamic case.
I We model it as a hidden Markov chain
13 / 19
Fitness dynamics (2)
I Each node i is characterized by a quantity ✓ti , i.e. the node fitness.We assume that it follows an AR(1) process,
✓ti = �0,i + �
1,i✓t�1
i + ✏ti
�0,i 2 R, |�
1,i | < 1 and ✏ti ⇠ NID(0,�i ) with �i > 0.
I We define the link probability at time t as:
P(Atij = 1|✓ti , ✓tj ) =
e
(✓ti +✓t
j )
1 + e
(✓ti +✓t
j )
I The larger ✓ti is, the larger is the probability for all links incident tonode i .
14 / 19
Dynamic fitness + link persistence
We combine the hidden dynamics of (fitness) with the mechanism ofcopying from the past (link persistence).
15 / 19
Empirical results
We investigate two aspects:
I How strong is preferential trading between two banks, when theirpropensity to trade is accounted for (using fitness)?
I Link prediction.
The investigated database is the electronic Market of Interbank Deposit(e-MID), an electronic segment of Italian interbank market.
We focus on the time series of weekly aggregated, unweighted, anddirected adjacency matrix At : At
ij = 1 if bank i lends money at least onceto bank j during the week t.
16 / 19
What is fitness measuring?
Aug 2012 Feb 2013 Aug 2013 Feb 2014 Aug 2014 Feb 2015
EU
RO
(m
ln)
0
50
100
150
200
250
300
350
400Bank exposure for lender '3'
bank exposure
δ e θ3,t
I Correlations between x
ti ⌘ e
✓ti and the
bank’s exposure in the weightednetwork;
IWe obtain information on the
weighted e-MID network having
only the binary information.
Disentangle random and preferential trading
I The DAR(1) model that does notaccount for time evolving networktopology (fitness), tends tooverestimate link stability.
IStatistical validation of preferentialtrading against the null (only fitness)
αij
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
DAR-TGRG
DAR(1)
17 / 19
Out-of-sample link prediction for e-MID
Specificity0 0.2 0.4 0.6 0.8 1
Sensi
tivity
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
TGRG(AUC≈ 0.83)
DAR-TGRG(AUC≈ 0.85)
DAR(1)(AUC≈ 0.80)
threshold value for αij
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
AU
C
0.65
0.7
0.75
0.8
0.85
0.9
0.95
TGRG
DAR-TGRG
DAR(1)
I Out of sample analysis for the e-MID interbank market.
I DAR-TGRG outperforms TGRG and DAR(1) network models.
I In average, network topology is more important than link stabilityfor link prediction in e-MID.
18 / 19
Conclusions
I Financial networks are a powerful tool to study the (dynamic)interaction of a large set of financial agent.
I Financial networks are the channels of propagation of risk.Interesting interplays between idiosyncratic risk and local or globalnetwork topology
I Random network models are a powerful tool forI
Modeling temporal networks and forecasting links
IInference of large scale network structures
19 / 19