realized networks - aarhus universitet...we are grateful for useful comments to peter hansen, nour...

$: Realized Networks - Aarhus Universitet...We are grateful for useful comments to Peter Hansen, Nour Meddahi and Kevin Sheppard and seminar participants at the \Measuring and Modeling$
Realized Networks

Christian Brownlees† Eulalia Nualart† Yucheng Sun†

October 2014

Abstract

In this work we introduce a lasso based regularization procedure for large di-mensional realized covariance estimators of log-prices. The procedure consists ofshrinking the off diagonal entries of the inverse realized covariance matrix towardszero. This technique produces covariance estimators that are positive definite andwith a sparse inverse. We name the regularized estimator realized network, sinceestimating a sparse inverse covariance matrix is equivalent to detecting the partialcorrelation network structure of the log-prices. We focus in particular on applyingthis technique to the Multivariate Realized Kernel and the Two–Scales Realized Co-variance estimators based on refresh time sampling. These are consistent covarianceestimators that allow for market microstructure effects and asynchronous trading.The paper studies the large sample properties of the regularized estimators andestablishes conditions for consistent estimation of the integrated covariance and forconsistent selection of the partial correlation network. As a by–product of the the-ory, we also establish novel concentration inequalities for the Multivariate RealizedKernel estimator. The methodology is illustrated with an application to a panelof US bluechips throughout 2009. Results show that the realized network estima-tor outperforms its unregularized counterpart in an out–of–sample global minimumvariance portfolio prediction exercise.

Keywords: Networks, Realized Covariance, Lasso

JEL: C13, C33, C52, C58

† Department of Economics and Business, Universitat Pompeu Fabra and Barcelona GSE,e-mail: [email protected], [email protected], [email protected] are grateful for useful comments to Peter Hansen, Nour Meddahi and Kevin Sheppard and seminarparticipants at the “Measuring and Modeling Financial Risk with High Frequency Data” conference(Florence, 19–20 June 2014); and “Barcelona GSE Summer Forum: Time Series Analysis in Macro andFinance” workshop (Barcelona, 19–20 June 2014). Christian Brownlees acknowledges financial supportfrom Spanish Ministry of Science and Technology (Grant MTM2012-37195) and from Spanish Ministry ofEconomy and Competitiveness, through the Severo Ochoa Programme for Centres of Excellence in R&D(SEV-2011-0075). Eulalia Nualart’s research is supported in part by the European Union programmeFP7-PEOPLE-2012-CIG under grant agreement 333938.

1

1 Introduction

The covariance matrix of the log–prices of financial assets is a fundamental ingredient in

many applications ranging from asset allocation to risk management. For more than a

decade now the econometric literature has made a number of significant steps forward in

the estimation of covariance matrices using financial high frequency data. This new gener-

ation of estimators, commonly known as realized covariance estimators, measure precisely

the daily covariance of log–prices using intra-daily price information. The literature has

proposed an extensive number of procedures that allow to estimate the covariance effi-

ciently under general assumptions, such as the presence of market microstructure effects

and asynchronous trading in the price data generating process.

Despite the significant leaps forward, carrying out inference on large covariance ma-

trices is inherently challenging. As forcefully put forward by, inter alia, Ledoit and Wolf

(2004), it is hard to estimate precisely the covariance matrix when the number of as-

sets is large. Another hurdle that arises in the analysis of large multivariate systems is

synthesizing effectively the information contained in the covariance matrix, in order to

unveil the dependence structure of the assets. In this work we propose a realized co-

variance estimation strategy that tackles simultaneously both of these challenges. The

estimation approach consists of using lasso–type shrinkage to regularize a realized co-

variance estimator. The lasso procedure identifies nonzero partial correlation relations

among the assets and therefore allows to represent the dependence structure of the as-

sets as a network. If the partial correlation structure of the assets is sufficiently sparse,

then the regularized estimator can be substantially more precise than its unregularized

counterpart.

The framework we work in makes a number of fairly common assumptions on the

dynamics of the asset prices (cf Bandi and Russell, 2006; Ait-Sahalia, Mykland, and

Zhang, 2005; Fan, Li, and Yu, 2012). We assume that observed log prices are equal to

the efficient log prices, which are Brownian martingales, plus a noise term that is due

to market microstructure frictions. Prices are observed according to the realization of

a counting process driving the arrival of trades/quotes of each asset and are allowed to

2

be asynchronous. The target estimand of interest is integrated covariance matrix of the

efficient log prices.

We introduce a network definition for a collection of assets induced by the integrated

covariance, which we call the integrated partial correlation network. Assets i and j are

connected in the integrated partial correlation network iff the partial correlation between

i and j implied by the integrated covariance is nonzero. As it is well known, the network

is entirely characterized by the inverse of the integrated covariance matrix, the integrated

concentration matrix. In fact, it has been known since at least Dempster (1972) that

if the (i, j)–th entry of the inverse covariance matrix is zero, then variables i and j are

partially uncorrelated, that is, are uncorrelated conditional on all other assets . Thus, the

sparsity structure of the integrated concentration matrix determines the partial correlation

network dependence structure among assets.

We propose a lasso–type procedure to obtain sparse integrated concentration ma-

trix estimators. The procedure is based on regularizing a consistent realized covariance

estimator. Several realized covariance estimators have been introduced in the literature

in the presence of market microstructure effects and asynchronous trading. In this work

we focus in particular on the Multivariate Realized Kernel (MRK) and the Two–Scales

Realized Covariance estimators (TSRC) based on pairwise refresh sampling (Ait-Sahalia

et al., 2005; Barndorff-Nielsen, Hansen, Lunde, and Shephard, 2011; Fan et al., 2012).

These estimators are regularized using the glasso (Friedman, Hastie, and Tibshirani,

2011); which shrinks the off–diagonal elements of the inverse of the covariance estimators

to zero. The procedure allows to detect the nonzero linkages of the integrated partial

correlation network. Moreover, the sparse integrated concentration matrix estimator can

be inverted to obtain an estimator of the integrated covariance.

We study the large sample properties of the realized network estimator, and estab-

lish conditions for consistent estimation of the integrated concentration and consistent

selection of the integrated partial correlation network. We develop the theory for the

MRK and TSRC estimators based on pairwise refresh–sampling building up on the gen-

eral asymptotic theory developed by Ravikumar, Wainwright, Raskutti, and Yu (2011).

3

The MRK estimator results are obtained by developing a novel concentration inequality,

while for the TSRC estimator we apply a concentration inequality derived in Fan et al.

(2012). Results are established in a high–dimensional setting, that is, we allow for the

total number of parameters to be larger than the amount of observations available to

the extent that the proportion of nonzero parameters is small relative to the total. We

conjecture that other sufficiently well behaved realized covariance estimators should share

analogous properties. We emphasize that the theory is developed under essentially the

same assumptions for both estimators. In particular, we assume the market microstruc-

ture noise that contaminates prices is iid. It is important to stress however that the MRK

can also handle more complex types of market microstructure dependence but that we do

not take advantage of this in this current work. A simulation study used to investigate

the finite sample properties of the procedure shows that when the underlying network is

sparse, the realized network provides substantial gains in terms of estimation precision.

We apply the realized network methodology to analyse the network structure of a

panel US bluechips throughout 2009 using the MRK, TSRC as well as the classic Realized

Covariance (RC) estimators. More precisely, we use the realized network to regularize the

so called idiosyncratic realized covariance matrix, that is the residual covariance matrix

of the assets after netting out the influence of the market factor. Results show that after

controlling for the market factor, assets still exhibit a significant amount of cross–sectional

dependence. A number of empirical findings emerge from the analysis of the estimated

networks. Interestingly, several network characteristics are fairly stable throughout the

year. The number of links in the network is around 5% of the total possible number of

linkages. The distribution of the connections of the assets exhibits power law behavior,

that is the number of connections is heterogeneous and the most interconnected stocks

have a large number of connections relative to the total number of links. The stocks in the

industrials and energy sectors show a high degree of sectoral clustering, that is there is a

large number of connections among firms in these industry groups. Technology companies

and Google in particular are the most highly interconnected firms throughout the year.

We conjecture that this could be driven by the fact that starting from March 2009 with

4

the beginning of post crisis market rally technology stocks and Google in particular have

been the most rapidly recovering stocks of the year. We investigate the usefulness of

our procedure from a forecasting perspective by carrying out a Markowitz type Global

Minimum Variance (GMV) portfolio prediction exercise. We run a horse race among

different covariance estimator to assess which estimator produces GMV portfolio weights

that deliver the minimum out–of–sample GMV portfolio variance. Results show that the

realized network significantly improves prediction accuracy irrespective of the covariance

estimator used.

Our contribution is related to different strands of the literature. First we contribute to

the literature on realized volatility and realized covariance estimation (Andersen, Boller-

slev, Diebold, and Labys, 2003; Barndorff-Nielsen and Shephard, 2004) in the presence of

market microstructure effects and asynchronous trading. Contributions on this strand of

the literature include like Ait-Sahalia et al. (2005); Bandi and Russell (2006); Barndorff-

Nielsen, Hansen, Lunde, and Shephard (2008); Barndorff-Nielsen et al. (2011); Zhang

(2011); Fan et al. (2012). More specifically, this work builds up on the literature con-

cerned with the estimation and regularization of possibly large realized covariances. One

of the first significant contributions in the area is Hautsch, Kyj, and Oomen (2012). Im-

portant research in this field also includes Hautsch, Kyj, and Malec (2013), Corsi, Peluso,

and Audrino (2015) and Lunde, Shephard, and Sheppard (2011). Last, this paper re-

lates to network modelling in statistics and econometrics, which includes Meinshausen

and Buhlmann (2006), Billio, Getmansky, Lo, and Pellizzon (2012), Diebold and Yilmaz

(2014), Hautsch, Schaumburg, and Schienle (2014a,b), Barigozzi and Brownlees (2013),

Banerjee and Ghaoui (2008).

The rest of the paper is structured as follows. In Section 2 we present the model

and the realized network estimator. The theoretical properties of the estimator are anal-

ysed in Section 3. Section 4 contains an illustration of the realized network estimator

using simulated data, while Section 5 presents an application to a panel of US bluechips.

Concluding remarks follow in Section 6.

5

2 Realized Networks

2.1 The Efficient Price

Let (y(t), t ≥ 0) denote the n-dimensional log–price process of n assets. We assume that

y(t) is a Brownian martingale whose dynamics are given by

y(t) =

∫ t

0

Θ(u)dB(u) , (1)

where B(u) is an n-dimensional Brownian motion with independent components defined

on a filtered probability space (Ω,F ,Ft,P). The spot covolatility process Θ(u) is an n×n

positive definite random matrix whose elements are almost surely continuous, and adapted

to the filtration Ft. We consider our process over a fixed time interval of length h > 0,

which typically represents a day. To simplify notation we assume h equal one, and we

set y = y(1). Also, we define the i–th entry of y as yi, for i = 1, . . . , n. The ex–post

covariance matrix of y is given by

Var (y) = E (yy′) =

∫ 1

0

Σ(t)dt ,

where Σ(t) = Θ(t)Θ(t)′ = (σij(t)) is the spot covariance matrix. We define the integrated

covariance as Σ? =∫ 1

0Σ(t)dt = (σ?ij).

We impose the following regularity assumption on the volatility dynamics in order to

derive the main results of this paper.

Assumption 1. For any i, j ∈ 1, . . . , n and t ∈ [0, 1], σij(t) is a continuous stochastic

process which is (1a) either bounded on [0, 1]; (1b) or it satisfies that for all x > 0,

P

(sup

0≤t≤1|σij(t)| ≥ x

)≤ kσ exp

(−axb

),

for some positive constants kσ, a and b.

That is, the spot volatilities are assumed to be uniformly bounded on [0, 1], or to

satisfy an exponential tail behavior.

6

2.2 The Integrated Partial Correlation Network

We introduce a network representation of y that is based on the partial linear dependence

structure induced by the integrated covariance matrix. We call this network representation

the integrated partial correlation network. A network is defined as an undirected graph

G = (V , E), where V is the set of vertices V = 1, 2, . . . , n and E is the set of edges

E ⊂ V × V . In the integrated partial correlation network the set of vertices corresponds

to the set of assets, and a pair of assets is connected by an edge iff their daily returns are

partially correlated given all other daily returns in the panel, that is,

E = (i, j) ∈ V × V , ρij 6= 0, i 6= j ,

where ρij is the integrated partial correlation, a measure of linear cross-sectional condi-

tional dependence defined as

ρij = Cor(yi, yj|yk, k 6= i, j) .

The integrated partial correlations can be characterized by the inverse integrated co-

variance matrix K? = (Σ∗)−1 = (k?ij). We call K? the integrated concentration matrix.

We have that the integrated partial correlations can be expressed as

ρij =−k?ij√k?iik

?jj

.

Thus, the integrated partial correlation network can be equivalently defined as

E = (i, j) ∈ V × V , k?ij 6= 0, i 6= j .

This characterization of the integrated partial correlation network is convenient for

estimation. The detection of the network structure among the assets is equivalent to

recovering the zeros of the integrated concentration matrix from the data.

Also, notice that if the spot volatility is deterministic then absence of partial correla-

7

tion implies conditional independence between two assets.

2.3 Market Microstructure Noise & Asynchronous Trading

It is customary to assume that rather than the efficient price, the econometrician observes

the transaction (or midquote) price. This differs from the efficient price because trades

(quotes) (i) are affected by an array of market frictions that go under the umbrella term

of market microstructure and, (ii) are observed at discrete time points. To this extent, we

assume that for each asset i for i = 1, . . . , n the econometrician observes at timestamps

Ti = ti 1, ti 2, . . . , timi the transaction (or midquote) prices xi(ti k) defined as

xi(ti k) = yi(ti k) + ui(ti k) ,

where ui(ti k) denotes the microstructure noise associated with the k–th trade. We also

use the shorthand notation xi k, yi k and ui k to denote respectively xi(ti k), yi(ti k), and

ui(ti k).

We assume that ui k is a discrete process, as we model it only when a new observed

price for stock i occurs, and it is such that

ui ki.i.d∼ N(0, η2

i ) ,

where η2i is some positive constant. The noise process is independent of the efficient price

process, and in order to simplify the exposition, we use the same notation (Ω,F ,Ft,P) for

the product filtered probability space of the two processes. Moreover the noise processes

of different assets are independent of each other.

2.4 Factor Structure

Classical asset pricing theory models like the capm or apt imply that the unexpected

rate of return of risky assets can be expressed as a linear function of few common factors

and an idiosyncratic component. Factors induce a fully interconnected partial correlation

network structure. In these cases, it is more interesting to carry out network analysis on

8

the partial correlation structure of the assets after netting out the influence of common

sources of variation.

We augment the y process with additional k components representing factors. The

dynamics of the augmented system are assumed to be the same as the one described in (1).

Moreover, the factors are assumed to be observed, as it is commonly done in the empirical

finance literature. The integrated covariance of the system can then be partitioned as an

(n+ k)× (n+ k) matrix

Σ? =

Σ?AA Σ?

FA

Σ?AF Σ?

FF

, (2)

where A and F denote, respectively, the blocks of assets and factors.

We define the idiosyncratic integrated covariance as the covariance of the residuals

obtained by projecting the factors onto the assets, which is given by

Σ?I = Σ?

AA −Σ?AF [Σ?

FF ]−1 Σ?FA = (σ?I ij) .

Then we define the idiosyncratic integrated partial correlation network as the network

whose set of edges is given by

EI =

(i, j) ∈ V × V , k?I ij 6= 0, i 6= j,

where k?I ij is the i, j–entry of the matrix K∗I = (Σ∗I)−1.

2.5 Realized Covariance Estimation

Our strategy for the estimation of the integrated concentration matrix K∗ is based on

regularizing a consistent estimator of the integrated covariance Σ?. In this section we

introduce two well known estimators for the integrated covariance estimators that are

robust to microstructure noise: the Multivariate Realized Kernel and the Two–Scales

Realized Covariance estimators based on pairwise refresh time.

9

2.5.1 Pairwise Refresh Time

A challenge in the estimation realized covariation is dealing with the asynchronycity of

the observed price. A standard technique used to overcome this hurdle is refresh time

introduced by Barndorff-Nielsen et al. (2011). Several variants of this technique exist,

like the pairwise and groupwise refresh time approaches, used in Fan et al. (2012), Lunde

et al. (2011) and Hautsch et al. (2012). In this work we focus on pairwise refresh time.

Pairwise refresh time based covariance estimation consists of estimating each entry of

the covariance separately. The i, j–entry of the matrix is computed by first synchronizing

the observations of assets i and j using refresh time and then estimating the covariance

between assets i and j using the synchronized data. Notice that this approach does not

guarantee that the covariance estimator is positive definite, as each covariance entry is

estimated using different subsets of observations.

The refresh time method works as follows. Consider two stocks i and j observed at

times Ti = ti1, . . . , timi and Tj = tj1, . . . , tj mj on [0, 1]. The set of pairwise refresh

time points is Tij =τ0, τ1, . . . , τm−1, τm, τm+1

, where 0 = τ0 < τ1 . . . < τm−1 < τm ≤

τm+1 = 1 and m is the amount of pairwise refresh time observations for stocks i and j on

[0, 1]. For 0 < k ≤ m, τk is determined by

τk = max min t ∈ Ti : t > τk−1 ,min t ∈ Tj : t > τk−1 .

The timestamps for assets i and j that correspond to the refresh time τk are respectively

tri ,k = max t ∈ Ti : t ≤ τk and trj ,k = max t ∈ Tj : t ≤ τk .

Accordingly, we obtain the refresh sample prices which are xri = xi(tri 1), . . . , xi(trim) for

stock i and xrj =xj(t

rj 1), . . . , xj(t

rj m)

for stock j. We will use the shorthand notation

xr` k, yr` k and ur`, k to denote x`(t

r` k), y`(t

r` k) and u`(t

r` k), respectively.

10

2.5.2 Realized Kernel Estimator

The Multivariate Realized Kernel (MRK) is the multivariate extension of the Realized

Kernel estimator proposed in Barndorff-Nielsen et al. (2008). The MRK estimator is

denoted by

ΣK = (σK ij) ,

and its i, j–entry is defined as

σK ij = γ0

(xri , x

rj

)+

1

2

H∑h=1

k

(h− 1

H

)(γh(x

ri , x

rj) + γ−h(x

ri , x

rj) + γh(x

rj , x

ri ) + γ−h(x

rj , x

ri )),

(3)

where for each h ∈ −H,−H + 1, . . . ,−1, 0, 1, . . . , H − 1, H,

γh(xri , x

rj) =

m−1∑p=1

(xri p − xri p−1)(xrj p−h − xrj p−h−1).

The kernel k(x) is a weighting function satisfying the following assumption.

Assumption 2. The function k : [0, 1] → R is bounded and twice differentiable with

bounded derivatives, and satisfies that k(1) = k′(0) = k′(1) = 0 and k(0) = 1.

A typical choice of the kernel weighting function which we also use in this work is the

Parzen kernel, defined as,

k(x) =

1− 6x2 + 6x3 0 ≤ x ≤ 12

2(1− x)3 12< x ≤ 1.

Notice that when i = j, the estimator in (3) is equivalent to the univariate realized kernel

estimator of Barndorff-Nielsen et al. (2008). For a choice of H of OP

(m

12

), this estimator

is convergent at rate m14 .

A number of remarks on the estimator are in order. First, the MRK estimation based

on pairwise refresh sampling using a pair specific optimal bandwidth, has already been

put forward in the literature by Lunde et al. (2011). Second, we note that the properties

of the estimator depend on the family of kernel function adopted. There are two main

11

classes put forward in the literature: flat–top, and non–flat top kernels. In this paper we

consider the former. Also, flat–top kernels have better convergence properties than the

other class. The non flat–top kernels on the other hand have a slower rate of convergence

but handle dependent market microstructure noise, which however we do not consider

in this work. See the discussion on this point also in Barndorff-Nielsen et al. (2011).

For simplicity in this work we do not consider jittering. Thus in order to estimate the

realized variance or covariance on [0, 1] we use some prices that happen before time 0

and after time 1. In the empirical implementation, for each pair we choose H as the

(rounded) average of the optimal bandwidth for the realized kernel volatility estimator

of the two assets following the procedure detailed in Barndorff-Nielsen et al. (2008) and

Barndorff-Nielsen, Hansen, Lunde, and Shephard (2009).

For this work is of particular importance a concentration inequality for the MRK esti-

mator which establishes the rate at which the entries in ΣK converge to their counterparts

in Σ? as the sample size increases. In order to obtain the expression of the inequality, we

make the following assumptions:

Assumption 3. The observation times are independent of the y and u processes. For any

assets i, j ∈ V, the pairwise refresh times satisfy sup1<k≤m+1

m(τk − τk−1) ≤ c∆ ≤ ∞, where

c∆ is some positive constant, and m is the number of refresh times on [0, 1].

Assumption 4. The parameter H of the MRK equals c1m12 , for some c1 > 0.

Theorem 1. Assume Assumptions 1a, 2, 3 and 4 hold. Then there are positive constants

a1, a2 and a3, such that for large m, x ∈ [0, a1], and i, j ∈ V,

P(∣∣σK ij − σ?ij

∣∣ > x)≤ a2me

−a3m14 x.

Observe that since the function f(m) = a2me−a3m

14 x is decreasing on

((4a3x

)4

,∞)

,

if we denote by M the minimum pairwise refresh sample size across all pairs of stocks,

then for all i, j ∈ V and x ∈(

4

a3M14, a1

], we have


∣∣ > x)≤ a2Me−a3M

14 x.

12

If x ∈[0, 4

a3M14

], then a2Me−a3M

14 x ≥ a3Me−4 ≥ 1 when M is large enough. Conse-

quently, for large M , x ∈ [0, a1], and i, j ∈ V , it holds that


∣∣ > x)≤ a2Me−a3M

14 x. (4)

2.5.3 Two–Scale Realized Covariance Estimator

The Two–Scale Realized Covariance (TSRC) is the multivariate extension of the Two–

Scales estimator proposed in (Ait-Sahalia et al., 2005). The TSRC estimator is denoted

by

ΣTS = (σTS,ij) ,

and its generic i, j–entry is defined as

σTS,ij =1

K

m∑k=K+1

(xri k − xri k−K

) (xrj k − xrj k−K

)−mK

mJ

1

J

m∑k=J+1

(xri k − xri k−J

) (xrj k − xrj k−J

)

where mK = m−K+1K

and mJ = m−J+1J

.

Zhang (2011) shows that the optimal choice of K has order K = OP

(m

23

), and J

can be taken to be a constant such as 1. Under this choice, the concentration inequality

derived by Fan et al. (2012) shows that the TSRC estimator converges at rate m16 . In

the empirical implementation, for each pair, we choose J as 1 and K as the (rounded)

average of the optimal bandwidth for the realized two scale volatility estimator of the two

assets, following the procedure detailed in Ait-Sahalia et al. (2005).

We next state the concentration inequality proved in Fan et al. (2012), which requires

the following assumption:

Assumption 5. The TSRC parameters are J = 1 and mK is such that 12m

13 ≤ mK ≤

2m13 .

Theorem 2. Assume Assumptions 1, 3 and 5 hold. Then there are positive constants

a1, a2 and a3 such that for all i, j ∈ V:

13

(i) If Assumption 1a is true, then for large m and x ∈ [0, a1m16 ],

P(m

16

∣∣σTS ij − σ?ij∣∣ > x

)≤ a2 exp(−a3x

2).

(ii) If Assumption 1b is true, then for large m and x ∈ [0, a1m4+b6b ],

P(m

16

∣∣σTS ij − σ∗ij∣∣ > x

)≤ a2 exp

(−a3x

2b4+b

),

where b > 0 is the constant from Assumption 1b.

Following the same reasoning of the previous section, if we define M as the minimum

pairwise refresh sample size across all pairs of stocks, then under Assumption 1a, for large

M , x ∈ [0, a1] and i, j ∈ V ,

P(∣∣σTS ij − σ∗ij

∣∣ > x)≤ a2e

−a3M13 x2

, (5)

and a similar inequality holds under Assumption 1b.

2.5.4 Idiosyncratic Realized Covariance Estimation

The MRK and TSRC estimators of the integrated covariance can also be used to estimate

the idiosyncratic integrated covariance. Let Σ be either the MRK or TSRC estimator

and partition the covariance matrix analogously to equation (2) as

Σ =

ΣAA ΣFA

ΣAF ΣFF

.

We then define the idiosyncratic realized covariance estimator ΣI = (σI ij) as

ΣI = (σI ij) = ΣAA −ΣFA

[ΣFF

]−1ΣAF . (6)

The following corollary establishes the concentration properties of the estimator ΣI

building upon Theorems 1 and 2.

14

Corollary 1. When the assumptions of Theorem 1 hold and Σ is the MRK estimator,

there are positive constants a1, a2 and a3 such that for large M , x ∈ [0, a1], and i, j ∈ V,

P(∣∣σI ij − σ∗I ij∣∣ > x

)≤ a2Me−a3M

14 x. (7)

When the assumptions of Theorem 2(i) hold and Σ is the TSRC estimator, there are

positive constants a1, a2 and a3, such that for large M , x ∈ [0, a1], and i, j ∈ V,

P(∣∣σI ij − σ∗I ij∣∣ > x

)≤ a2e

−a2M13 x2

,

and a similar inequality holds under the assumptions of Theorem 2(ii).

2.6 Realized Network Estimator

We use the Graphical lasso procedure (glasso) to estimate the integrated concentration

matrix K? and the idiosyncratic integrated concentration matrix K?I . Here, we focus on

the integrated concentration estimation only, since the idiosyncratic integrated concen-

tration case is analogous. Let Σ denote either the MRK or TSRC covariance estimators.

The realized network estimator is defined as

K = arg minK∈Sn

tr(ΣK)− log det(K) + λ

∑i 6=j

|kij|

, (8)

where λ ≥ 0 and Sn is the set of n×n symmetric positive definite matrices. We denote by

(kij) the entries of the realized network estimator K. Notice that any consistent estimator

of the integrated covariance Σ? that satisfies an appropriate concentration inequality can

be used.

The realized network estimator is a shrinkage type estimator. If we set λ = 0 in (8), we

obtain the normal log-likelihood function of the covariance matrix, which is minimized by

the inverse realized covariance estimator (Σ)−1. If λ is positive, (8) becomes a penalized

likelihood function with penalty equal to the sum of the absolute values of the non-

diagonal entries in the estimator. The important feature of the absolute value penalty is

15

that for λ > 0 some of the entries of the realized network estimator are going to be set

to exact zeros. The highlight of this type of estimator is that it simultaneously estimates

and selects the nonzero entries of K?.

Banerjee and Ghaoui (2008) show that the optimization problem in (8) can be solved

through a series of lasso regressions. Define

Σ =

Σ11 Σ12

Σ′12 Σ22

and K−1 = W =

W11 w12

w′12 w22

, (9)

where Σ11 and W11 are (n−1)×(n−1) matrices, and Σ12 and w12 are (n−1)×1 vectors.

Suppose that that the values of W11 and w22 are known and that we are concerned with

finding the w12 which minimizes (8). Banerjee and Ghaoui (2008) consider the following

minimization problem

minβ

1

2

(W

1211β − b

)′ (W

1211β − b

)+ λ||β||1

, (10)

where b = W− 1

211 Σ12 and ||β||1 is the sum of absolute values of entries in β. The authors

then show that (10) is the the dual problem of (8) in the sense that if β solves (10), then

w12 = W11β solves (8). Note that the optimization problem in (10) can be cast as a

lasso regression estimation problem.

These results motivate the following iterative algorithm to solve (8) (Friedman, Hastle,

Hofling, and Tibshirani, 2007). Let W denote an initial estimate of W, for example

W = Σ + λIn, where In is the identity matrix of order n. In each iteration of the

algorithm, for each asset i we permute the rows and columns of W so that the n–the

column/row of the matrix contains the covariances with respect to i. We then partition

W as in (9) and update the values of w12 and w′12 by solving equation (10). The algorithm

is iterated until convergence. Finally, we obtain the K estimator by taking the inverse of

W.

16

3 Theory

In this section, we establish the large sample properties of the realized network estimator

defined in (8). In particular we obtain results on the rate of convergence of the estimator

as well as the probability of correct selection of the network.

In this section we adopt a more detailed notation. Given a matrix U = (uij) ∈

R`×m, we set ||U||∞, ||U||1, ||U||1,o, and |||U|||∞ to denote maxi,j|uij|,

∑i,j |uij|,

∑i 6=j |uij|

and maxj

∑mk=1 |ujk|, where i ∈ 1, 2, . . . , ` and j ∈ 1, 2, . . . ,m. If A = (aij) is a p× q

matrix and B is an m × n matrix, the Kronecker product of matrices A and B is the

pm× qn matrix given by

A⊗B =

a11B · · · a1qB

.... . .

...

ap1B · · · apqB

.

We index the pm rows of A⊗B by

R = (1, 1), (2, 1), ..., (m, 1), (1, 2), (2, 2), ..., (m, 2), ...., (1, p), ..., (m, p),

and the qn columns by

C = (1, 1), (2, 1), ..., (n, 1), (1, 2), (2, 2), ..., (n, 2), ...., (1, q), ..., (n, q).

For any two subsets R ⊂ R and C ⊂ C, we denote by (A⊗B)R×C the matrix such that

(A⊗B)(i,j)(c,d) is an entry of (A⊗B)R×C iff (i, j) ∈ R and (c, d) ∈ C.

Assumption 6. Consider the n2×n2 matrix Γ? = Σ?⊗Σ?. There exists some α ∈ (0, 1]

such that

maxe∈Sc||Γ?

eS (Γ?SS)−1 ||1 ≤ 1− α,

where S = E ∪ (i, i)|i ∈ V and Sc =

(i, j) ∈ V × V , k?ij = 0

.

Assumption 6 limits the amount of dependence between the non-edge terms (indexed

by Sc) and the edge-based terms (indexed by S). The limit is controlled by α: the bigger

17

the α the smaller the dependence. In other words, if we set

X(j,k) = yjyk − E (yjyk) , for all j, k ∈ V ,

then the correlation between X(j,k) and X(`,m) is low for any (j, k) ∈ S and (`,m) ∈ Sc.

The following theorems establish the properties of the realized network estimator (cf

Ravikumar et al., 2011). Theorems 3 and 4 are for the MRK estimator, and Theorems

5 and 6 are for the TSRC estimator. Theorems 3 and 5 establish the rate at which

the estimator converges to the true value as the sample size M increases. Theorems 4

and 6 establish a lower bound on the probability of correctly detecting the nonzero partial

correlations (as well as their signs) as a function of the sample size M . In particular, these

theorems show that the estimators are model selection consistent with high probability,

when n is large.

Theorem 3. Assume that the assumptions of Theorem 1 and Assumption 6 hold, and

choose λ = 8α

log(a2Mnτ )

a3M14

in (8), where τ > 2 is arbitrary, ai are the constants in (4), and

α is from Assumption 6.

Assume also that

M >

(log(c1n

−τ )

c2

)4

,

where c1 and c2 are two positive constants that will be given in the Appendix. Then,

P

(||K−K?||∞ ≤ 2CΓ∗

(1 + 8α−1

)M− 1

4log (a2Mnτ )

a3

)≥ 1− 1

nτ−2.

where CΓ? = |||(Γ?SS)−1|||∞.

Observe that Theorem 3 states how large the number of observations need to be in

order to have sufficiently precise estimates of the entries of K∗. The Theorem allows

for the number of observations M to be smaller than the total number of entries of the

concentration matrix. The rate of growth of M depends on the constants c1 and c2, which

are explicit in the proof, and depend on a number of population parameters. In particular,

the maximum degree of the network and coefficients proportional to the magnitude of the

18

entries of K∗ and its inverse. The bound is such that the higher sparsity of the network,

the lower is the requirement on M .

Observe that if the assumptions of Theorem 3 hold, and if for all i, j ∈ V ,∣∣k?ij∣∣ >

OP

(M− 1

4 log (Mnτ )), then the sign of kij is the same as that of k?ij with large probability

when n is large enough. In order to avoid ambiguity, we set that if k?ij = 0, sign(kij) =

sign(k?ij) means kij = 0.

Theorem 4. Under the assumptions of Theorem 3, if

M >

(log(c3n

−τ )

c4

)4

,

where c3 and c4 are two positive constants that will be given in the Appendix, then

P(

sign(kij) = sign(k?ij),∀i, j ∈ V)≥ 1− 1

nτ−2.

From Theorems 3 and 4, we get that if M > OP

(log4(n)

), the events ||K−K?||∞ ≤

OP

(log(Mnτ )

M14

)and sign(kij) = sign(k?ij), for all i, j ∈ V , are true with large probability

when n is large enough.

Theorem 5. Assume that the assumptions of Theorem 2 and Assumption 6 hold, and

choose λ = 8αM− 1

6

(log(a2nτ )

a3

) 12

in (8), where τ > 2 is arbitrary, ai are the constants in

(5), and α is from Assumption 6.

Assume also that

M >

(log (a2n

τ )

a3

)3

c60,

where c0 is a positive constants that will be given in the Appendix. Then

P

(||K−K?||∞ ≤ 2CΓ∗

(1 + 8α−1

)M− 1

6

(log (a2n

τ )

a3

) 12

)≥ 1− 1

nτ−2.

where CΓ? = |||(Γ?SS)−1|||∞.

19

Theorem 6. Under the assumptions of Theorem 5, if

M >

(log (a2n

τ )

a3

)3

c61,

where c1 is a positive constant that will be given in the Appendix, then

P(

sign(kij) = sign(k?ij),∀i, j ∈ V)≥ 1− 1

nτ−2.

From Theorems 5 and 6, we have that if M > OP

(log3(n)

), the events ||K−K?||∞ ≤

OP

(log

12 (n)

M16

)and sign(kij) = sign(k?ij), for all i, j ∈ V are true with large probability

when n is large enough.

Detailed proof of the theorems are given in the Appendix. In what follows, we outline

the arguments used to establish the results. Note first that we can rewrite the objective

function in (8) as

tr(KΣ)− log det(K) + λ||K||1,o.

Recall that the sub-differential of the absolute value function f(x) = |x|, x ∈ R, is

given by the function

f 0(x) =

1 if x > 0

−1 if x < 0

∈ [−1, 1] if x = 0.

Then, given the definition of the norm ||·||1,o, the sub-differential of this norm evaluated at

some matrix K = (kij) ∈ Rn×n includes the set of all symmetric matrices Z = (zij) ∈ Rn×n

such that

zij =

0 if i = j

sign(kij) if i 6= j and kij 6= 0

∈ [−1, 1] if i 6= j and kij = 0.

We can now state the following result.

Lemma 1. Assume that Σ has positive diagonal entries. Then for any λ > 0, the `1-

20

regularized log-determinant problem

K = arg minK∈Sn

tr(KΣ)− log det(K) + λ||K||1,o

(11)

has a unique solution K characterized by

Σ− K−1 + λZ = 0, (12)

where Z is an element of the sub-differential of the norm || · ||1,o evaluated at K.

The proof of Lemma 1 will be provided in Appendix A. Based on this lemma, we

follow the following procedure:

(a) We consider the matrix K = (kij) that solves the restricted log-determinant prob-

lem

K = arg minK∈Sn,KSc=0

tr(KΣ)− log det(K) + λ||K||1,o

, (13)

where λ is some positive constant, and KSc = kij|(i, j) ∈ Sc.

(b) We define the matrix Z = (zij) by

zij =

0 if i = j

sign(kij) if i 6= j and k?ij 6= 0

1λ(σij − σij) if i 6= j and k?ij = 0,

where (K)−1 = Σ = (σij).

(c) We verify that:

|zij| ≤ 1 for all (i, j) ∈ Sc.

This procedure is called the construction of the primal-dual witness solution (K, Z). Ob-

serve that in step (a), we do not care about the specific algorithm that can be used to

obtain the minimizer. Notice that the function (12) is convex and has a unique minimum

in Sn, which we denote by K. Moreover, since K is the minimizer, the sub-differential of

21

(13) evaluated at each entry of K is zero. Thus we have that

σij − σij + sign(kij)λ = 0, (i, j) ∈ E

σij − σij = 0, (i, j) ∈ (`, `), ` ∈ V .

Finally, according to the definition of Z, we obtain that

Σ− K−1 + λZ = 0. (14)

Consequently, based on Lemma 1, if Z is an element of the sub-differential of the norm

|| · ||1,o evaluated at K, which will be checked in step (c), we get that K = K.

Now we get back to the logic of the proof of the theorems. First we obtain a lower

bound of the probability that condition (c) holds. As explained before, this condition

implies that K = K. Then we seek for a bound of the `∞ norm of K−K?, and conditioned

on K = K, this bound also applies to ||K−K?||∞. Thus we obtain an upper bound for

||K−K?||∞ in probability. Define this bound as x for a given probability. For a non-zero

entry k?ij in K?, if its absolute value is bigger than x, we have sign(kij) = sign(k?ij) with the

same probability. Thus to guarantee sign consistency, we consider the minimum absolute

value of the non-zero entries in K?, and show that under certain probability the maximum

of |kij − kij| for all i, j ∈ V is still smaller than this minimum.

4 Simulation Study

In this section we carry out a simulation study to assess the finite sample properties of the

realized network estimator. The simulation exercise consists of simulating one day of data

and to apply the realized network estimator to estimate the integrated covariance, the

integrated concentration, as well as detecting the integrated partial correlation network.

In our numerical implementation, a trading day is 7.5 hours and the simulation of the

continuous time process is carried out using the Euler scheme with a discretization step

of 5 seconds.

22

The efficient price follows an n–dimensional zero drift diffusion with time varying spot

covariance equal to

Σ(t) = V(t) R V(t) ,

where V(t) is a diagonal matrix with positive diagonal entries and R is a symmetric

positive definite matrix. In this model V(t) drives the time varying volatility of each

asset while R determines the (partial) correlation structure of the process. The diagonal

entries of V(t) follow a CIR process

dvii(t) = κ(v − vii(t)) + τ√vii(t)dWi(t) ,

where v is the long run mean of the process, τ > 0 is its volatility, κ > 0 is the mean

reversion parameter, and Wi(t) is a Brownian motion independent of B(t) in (1). Also,

it is assumed that Wi(t) for i = 1, ..., n are independent. The matrix R is a function of a

realization of an Erdos–Renyi random graph. The Erdos–Renyi random graph G = (V , E)

is a undirected graph defined over a fixed set of vertices V = 1, . . . , n and a random

set of edges where the existence of an edge between any pair of vertices is determined by

an independent Bernoulli trial with probability p. We generate R by first simulating an

Erdos–Renyi random graph G and then setting R equal to

R = [In + D−A]−1 ,

where In is the n dimensional identity matrix, and D and A are respectively the degree

matrix and the adjancy matrix of the random graph G. The model for R is such that

the underlying random graph structure determines the sparsity structure of the spot

concentration matrix. Also, note that R is symmetric positive definite by construction.

As far as the microstructure effects are concerned, we assume that the noise component

ui k is an independent zero mean normal random variable with variance η2 for all stocks.

Finally, prices of each stock are observed according to a Poisson process with a constant

intensity µ. Table 1 contains detailed information on the parameter choices made in the

simulation exercise. We consider three scenarios in the simulation study that differ on the

23

amount of sparsity of the underlying network, which is governed by p. In the first scenario

np is smaller than one, and in this case the Erdos–Renyi graph will not have components

larger than O(log(n)), almost surely. In the second scenario np is exactly equal to one,

and in this case the random graph will have a largest component of order n2/3, almost

surely. Finally, in the last scenario np is greater than one, and in this case the graph will

have a giant component.

INSERT TABLE 1 ABOUT HERE

The integrated covariance and integrated partial correlation network are estimated

using the realized network estimator introduced in the previous section. As far as the

estimation of the integrated covariance is concerned, we consider three estimators: the

MRK and TSRC based on pairwise refresh time sampling as well as the “classic” 5–minute

realized covariance (RC) estimator. The realized covariance ΣV = (σV ij) is defined as

σV ij =m∑k=2

(x′ik − x′ik−1)(x′jk − x′jk−1),

where x′ik, x′ik−1 and x′jk, x

′jk−1 are respectively the 5–minute frequency prices of assets i

and j. Each estimator is computed using different values of the shrinkage λ value.

We use different metrics to evaluate the performance of the realized network estimator.

We use a Root Means Square Error (RMSE) type loss based on the scaled Frobenius norm

of the covariance and concentration matrices to evaluate the estimation precision (Hautsch

et al., 2012). The selection performance of the estimator is measured with the AUROC.

The scaled Frobenius norm of the realized concentration estimator K is defined as

RMSE(K)

=

√√√√ 1

n

n∑i,j=1

(kij − k∗ij)2 ,

and, analogously, the scaled Frobenius norm of the realized covariance estimator Σ is

RMSE(Σ)

=

√√√√ 1

n

n∑i,j=1

(σij − σ∗ij)2 .

24

The ROC curve and the AUROC index are standard measures of performance used for

classifications. The ROC (Receiver Operating Characteristic) curve is a graphical plot

which illustrates the performance of a binary classifier system as its discrimination thresh-

old is varied. In our context, the binary classifier is whether or not kij = 0 for i 6= j and

i, j ∈ V , and the discrimination threshold is the value of λ. The ROC consists of the

plot of the true positive rate (TPR) and false positive rate (FPR) of the classifier. These

two rates are obtained from the comparison of K? with K. TPR equals the percentage

of the pairs correctly identified as connected among the truely connected ones, and FPR

equals the percentage of the pairs falsely identified as linked among the not linked ones.

Clearly, a larger λ shrinks kij more towards zero and when λ = 0, kij hardly equals to

zero. For a particular estimator on K∗, we set its TPR value as vertical coordinate, FPR

value as horizontal coordinate. Then we have a series of points corresponding to a series

of estimators as the value of λ changes, and the ROC curve is generated by connecting

those points. Generally increasing λ enlarges both TPR and FPR, and that’s why the

slope of the ROC curve is positive. The area underneath the ROC (AUROC) is then used

as an overall normalized index of accuracy of the classifier. The closer the index to one,

the more accurate the classifier.

For each parameter setting of the model, we perform 100 Monte Carlo replications

and report here summary statistics.

Table 2 reports the summary results of the simulation study. For each estimator and

simulation setting, the table reports the root mean square error of the plain estimator

as well as the “oracle” network estimator. Oracle refers to the network estimator based

on the best possible (yet unfeasible) choice of the shrinkage parameter. The table also

reports the AUROC of the network estimator. The table shows that typically the sparser

the integrated covariance matrix, the larger the gains from using the realized network

estimator. Notice that the performance of the TSRC and MRK plain estimators as

estimators of K is sometimes poorer than the one of the RC. This is a consequence of

the fact that as the entries of these estimator are computed pairwise, these estimator are

ill–posed when the networks are not sufficiently sparse. On the other hand, the realized

25

network version of these estimators performs much better than the RC. Finally, we note

that in this specific simulation setting, the results based on the TSRC estimator are almost

systematically more accurate than the ones based on the MRK and RC.


We give more detailed insights into the performance of the network estimator for

Design 3. Figure 1 displays RMSE of the realized network estimator as an estimator of

Σ? as a function of the tuning parameter λ The picture displays the typical profile of

shrinkage type estimators. As the shrinkage parameter increases the estimator rapidly

becomes more efficient. We note that, in this simulation setting, the efficiency gains are

quite substantial in relative terms.

INSERT FIGURE 1 ABOUT HERE

Overall, the results convey that the gains by using the realized network estimator

when the partial correlation structure is sparse can be substantial.

5 Empirical Application

We use the realized network estimator to analyse the dependence structure of a panel

of US bluechips from the NYSE throughout 2009. We then engage in a Markowitz style

Global Minimum Variance portfolio forecasting exercise to showcase the advantages of

our proposed methodology.

5.1 Data & Estimation

We consider a sample of 100 liquid US bluechips that have been part of the S&P100 index

for most of the 2000’s. We also include in the panel the SPY ETF, the ETF tracking the

S&P 500 index. We work with tick–by–tick transaction prices obtained from the NYSE–

TAQ database. Before proceeding with the econometric analysis, the data are filtered

using standard techniques described in Brownlees and Gallo (2006) and Barndorff-Nielsen

26

et al. (2009). The full list of tickers, company names and industry groups is reported in

table 3.


We estimate the integrated covariance of the assets throughout 2009. More precisely,

for each of the 52 weeks of 2009, we use the data on the last weekday available of each

week to construct the realized covariance estimators. On each of these days, we only

consider the tickers that have at least 1000 trades.

Exploratory analysis (non reported in the paper) confirms the well documented evi-

dence of a CAPM–type factor structure in the panel. To this extent, our realized covari-

ance estimation strategy consists of first decomposing the realized covariance in systematic

and idiosyncratic covariance components and then regularizing the idiosyncratic part with

the realized network. More precisely, we compute the realized covariance of the assets

in the panel together with the SPY ticker (the proxy of the market), and then obtain

the systematic and idiosyncratic realized covariance of the assets on the basis of formula

(6). Finally, we apply the realized network regularization procedure to the idiosyncratic

realized covariance. The regularization parameter λ is chosen by minimizing a BIC–type

criterion defined as

BIC(λ) = M ×[− log det KI λ + tr

(KI λΣI

)]+ logM ×#(i, j) : 1 ≤ i ≤ j ≤ n, kI ij λ 6= 0 ,

as it is suggested in, among others, Yuan and Lin (2007). On each week of 2009, we

estimate the realized network using three (idiosyncratic) realized covariance estimators:

MRK, TSRC as well as the “classic” RC using data sampled at a 5 minute frequency.

5.2 Realized Network Estimates

In this section we present the realized network estimation results. We first provide details

for one specific date only that roughly corresponds to the mid of the sample (June 26,

2009), and then report summaries for all estimated networks in 2009. In the interest of

27

space we report the TSRC estimator results only. The MRK and RC provide essentially

analogous conclusions.

We begin by showing in figure 2 the heatmap of the idiosyncratic correlation matrix

associated with the idiosyncratic realized covariance estimator on June 26. Notice that

the heatmap is constructed by sorting stocks by industry group and then by alphabetical

order. The picture clearly shows that after netting out the influence of the market factor,

a fair amount of cross–sectional dependence is still present across stocks. Inspection of the

heatmap reveals that the majority of estimated correlation coefficients are positive. The

correlation matrix exhibits a block diagonal structure hinting that correlation is stronger

among firms in the same industry. On this date, the intra-industry group correlation is

particularly strong for energy companies.


We estimate the realized network using the glasso and use the bic to choose the

optimal amount of shrinkage. To give insights on the sensitivity of the estimated network

to the shrinkage parameter λ, the top panel of figure 3 reports the so called trace, which

is the graph of the estimated partial correlations as a function of the shrinkage parameter.

The bottom panel of the same figure also shows the value of the bic criterion as a function

of the shrinkage parameter. The plot highlights how the sparsity of the estimated network

varies substantially over the range of lambda values considered and that the majority of

estimated partial correlations are positive irrespective of the value of shrinkage imposed

on the estimator.


Figure 4 contains the idiosyncratic partial correlation network associated with the

optimal λ. The estimated network has 188 edges, which correspond to approximately the

5% of the total number of possible edges in the network on this date. The number of

companies that have interconnections are 66 (roughly 2/3 of the total) and, interestingly,

they are all connected in a unique giant component.

28

A number of comments on the empirical characteristics of the network are in order.

First, on this date, Google (GOOG) emerges as a particularly highly interconnected firm,

with linkages spreading to several other industry groups. In order to further investigate

this we carry out centrality analysis of the network using the page rank algorithm, and

report the top ten most central companies in table 4. The page rank algorithm conveys

that Google is indeed the most central stock on this date. The estimated network also

shows some degree of industry clustering, that is linkages are more frequent among firms

in the same industry group. In order to get better insights on the industry linkages in

table 5 we report the total number of links across industry groups. The table shows

that firms in the industrials, energy, technology and financials groups are particularly

interconnected among each other. Last, in figure 5 we report the degree distribution

of the estimated network and the distribution of the nonzero partial correlations. As

far as the degree distribution is concerned, the network exhibits the typical features of

Power Law networks, that is the number of connections is heterogeneous and the most

interconnected stocks have a large number of connections relative to the total number

of links. The histogram of the partial correlation shows that the majority of the partial

correlations are positive and that positive partial correlations are on average larger than

the negative ones.

INSERT FIGURES 4, 5 AND TABLES 4, 5 ABOUT HERE

We report a number of summary statistics for the sequence of networks estimated

in 2009. First, in figure 6 we report the proportion of number of links in the network

throughout the year. The picture shows that sparsity is stable a value slightly below 5%

of the total possible number of linkages. The plots shows the sparsity rate vis–a–vis the

VIX volatility index to show that no particular time series pattern emerges in the network

sparsity and that in particular the sparsity is unrelated to the level of volatility of the

market.


29

Figure 7 shows the total number of links of each industry group divided by the total

number of possible edges. The plot omits the series for materials, telecom and utilities due

to their small size. Technology, energy, financial and industrials are the most intercon-

nected sectors also throughout 2009, and the level and cross sectional rankings are fairly

stable across the sample. In order to give more insights on the degree of concentration

withing each group, in figure 8 we report the concentration of links in each industry group

measured using the Herfindahl index. Again, materials, telecom and utilities are omitted

from the graph. Once again, no particular time series pattern emerges from the plot and

cross sectional concentration rankings is quite stable. The picture shows that the most

interlinked sectors have quite different concentration characteristics. Technology is one of

the most highly concentrated sector. Detailed inspection of the results reveals that this

is driven by the fact that in 2009 Google is essentially the most interconnected ticker in

the sample. On the other hand, industrials have the smaller average concentration, in

that the number of links is quite uniformly distributed across firms and no specific “hub”

emerges among these tickers.

INSERT FIGURES 7 AND 8 ABOUT HERE

Overall results convey that after conditioning on a one factor structure that data still

has a fair amount of cross–sectional dependence and that networks provide a useful device

to synthesize such information. Tha main empirical features of the network are stable

throughout 2009. Firms in the energy and industrials sectors are strongly interconnected.

Technology companies and Google in particular are the most highly interconnected firms

throughout the year. We conjecture that this could be driven by the fact that starting

from March 2009, with the beginning of post crisis market rally, technology stocks and

Google in particular have been the most rapidly recovering stocks of the year.

5.3 Predictive Analysis

In order to assess the ability of the regularized network methodology to provide more

precise estimates of the integrated covariance we carry out an asset allocation prediction

exercise (Hautsch et al., 2012, 2013; Engle and Colacito, 2006).

30

The forecasting exercise is designed as follows. For each week of 2009, we construct

the Markowitz Global Minimum Variance (GMV) portfolio weights using the formula

w =Σ−11

1′Σ−11,

where 1 is n–dimensional vector of ones and

Σ = σ2mββ

′ + ΣI .

The resulting GMV portfolio weights are used to construct a portfolio of the assets which

is held for the following week. At the end of the week, we compute the daily sample

variance of the portfolio return for that week. We repeat the exercise for all the weeks

of 2009. The performance of the covariance estimators is evaluated by assessing which

estimator delivers the smallest average out–of–sample GMV portfolio variance.

Different estimation strategies the covariance matrix Σ are employed. First, we esti-

mate the covariance on the basis of either the MRK, TSRC and RC estimators. For each

of these estimators we consider three approaches to compute Σ: the simple unconstrained

covariance estimator, the constrained estimator obtained by setting all the off–diagonal

elements of ΣI to zero and the realized network estimator where ΣI is estimated by the

realized network. Notice that the unconstrained realized covariance estimator is not guar-

anteed to be positive definite. When the estimator is close to being singular we regularize

it by adding to the matrix the identity times a positive scalar.

In figure 9 we report the time series of the weekly out–of–sample GMV annualized

volatility of the constrained, unconstrained and realized network procedures for the TSRC.

The figure shows that unconstrained approach produces fairly high out of sample GMV

volatilities. Constraining the idiosyncratic covariance to be diagonal improves on average

performance but still produces rather noisy estimates. The realized network approach

produces stable and low GMV variability which is almost uniformly more accurate than

the unconstrained approach and is only beaten by the constrained procedure on a small

number of instances.

31


We report summary results on the forecasting exercise in Table 6. The table shows

the average annualized volatility of the GMV portfolios. The asterisks of the table de-

note the significance of a test for variance equality between the realized network GMV

variance and the other GMV variances. The three different covariance estimators deliver

analogous inference. The unrestrained procedure delivers large out–of–sample variances.

In line with theory, the RC estimator is the one that performs the worst. Constrain-

ing the idiosyncratic covariance to be diagonal does not generally improve out–of–sample

performance. The realized network performs always best and reduces the GMV variance

by roughly a third with respect to the unconstrained estimator. Moreover, the variance

reduction of the realized network is significant in all instances. Interestingly, the differ-

ence in the forecasting performance across the different estimators is substantially smaller

after carrying out network regularization.


6 Conclusions

In this work we propose a regularization procedure for realized covariance estimators.

The regularization consists of shrinking the off–diagonal elements of the inverse realized

covariance matrix to zero using a Lasso–type penalty. Since estimating a sparse inverse

covariance matrix is equivalent to detecting the partial correlation network structure of the

log-prices we call our procedure realized network. The technique is specifically designed

for the Multivariate Realized Kernel and the Two–Scales Realized Covariance estimators

based on refresh time sampling, which are state–of–the–art consistent covariance estima-

tors that allow for market microstructure effects and asynchronous trading. We establish

the large sample properties of the procedure estimator and show that the realized net-

work consistently estimates of the integrated covariance and for consistently selects of

the partial correlation network. We also establish novel concentration inequalities for

the Multivariate Realized Kernel estimator. An empirical exercise is used to highlight

32

the usefulness of the procedure and an out—of–sample GMV portfolio asset allocation

exerices is carried out to compare our procedure against a number of benchmarks. Re-

sults convey that realized network enhances the prediction properties of classic realized

covariance estimators.

33

References

Ait-Sahalia, Mykland, P. A., and Zhang, L. (2005). How often to sample a continuous-timeprocess in the presence of market microstructure noise. Review of Financial Studies, 18(2),351–416.

Andersen, T. G., Bollerslev, T., Diebold, F. X., and Labys, P. (2003). Modeling and forecastingrealized volatility. Econometrica, 71, 579–625.

Bandi, F. M. and Russell, J. R. (2006). Separating microstructure noise from volatility. Journalof Financial Economics, 79, 655–692.

Banerjee, O. and Ghaoui, L. E. (2008). Model Selection Through Sparse Maximum LikelihoodEstimation for Multivariate Gaussian or Binary Data. Journal of Machine Learning Research,9, 485–416.

Barigozzi, M. and Brownlees, C. (2013). NETS: Network Estimation for Time Series. Technicalreport, Barcelona GSE.

Barndorff-Nielsen, O. and Shephard, N. (2004). Econometric analysis of realized covariation:High frequency based covariance, regrssion, and correlation in financial economics. Econo-metrica, 72(3), 885–925.

Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2008). Designing realizedkernels to measure the ex post variation of equity prices in the presence of noise. Econometrica,76, 1481–1536.

Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2009). Realised kernelsin practice: Trades and quotes. Econometrics Journal , 12, 1–32.

Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2011). Multivariaterealised kernels: Consistent positive semi-definite estimators of the covariation of equity priceswith noise and non-synchronous trading. Journal of Econometrics, 162(2), 149–169.

Billio, M., Getmansky, M., Lo, A., and Pellizzon, L. (2012). Econometric Measures of Con-nectedness and Systemic Risk in the Finance and Insurance Sectors. Journal of FinancialEconomics, 104, 535–559.

Brownlees, C. T. and Gallo, G. M. (2006). Financial econometric analysis at ultra–high fre-quency: Data handling concerns. Computational Statistics and Data Analysis, 51, 2232–2245.

Corsi, F., Peluso, S., and Audrino, F. (2015). Missing in asynchronicity: a kalman-em approachfor multivariate realized covariance estimation. Journal of Applied Econometrics.

Dempster, A. P. (1972). Covariance selection. Biometrics, 28, 157–175.

Diebold, F. and Yilmaz, K. (2014). On the Network Topology of Variance Decompositions:Measuring the Connectedness of Financial Firms. Journal of Econometrics, 182, 119–134.

Engle, R. and Colacito, R. (2006). Testing and valuing dynamic correlations for asset allocation.Journal of Business and Economic Statistics, 24, 238–253.

Fan, J., Li, Y., and Yu, K. (2012). Vast volatility matrix estimation using high-frequency datafor portfolio selection. Journal of the American Statistical Association, 107(497), 412–428.

Friedman, J., Hastle, T., Hofling, H., and Tibshirani, R. (2007). Pathwise Coordinate Opti-mization. The Annals of Applied Statistics, 1, 302–332.

34

Friedman, J., Hastie, T., and Tibshirani, R. (2011). Sparse inverse covariance estimation withthe graphical lasso. Biostatistics, 9, 432–441.

Hautsch, N., Kyj, L. M., and Oomen, R. C. A. (2012). A blocking and regularization approachto high dimensional realized covariance estimation. Journal of Applied Econometrics, 27(4),625–645.

Hautsch, N., Kyj, L. M., and Malec, P. (2013). Do high-frequency data improve high-dimensionalportfolio allocations? Journal of Applied Econometrics, forthcoming.

Hautsch, N., Schaumburg, J., and Schienle, M. (2014a). Financial network systemic risk contri-butions. Review of Finance, forthcoming.

Hautsch, N., Schaumburg, J., and Schienle, M. (2014b). Forecasting Systemic Impact in Finan-cial Networks. International Jounal of Forecasting , 30(3), 781–794.

Ledoit, O. and Wolf, M. (2004). A Well-Conditioned Estimator for Large–Dimensional Covari-ance Matrices. Journal of Multivariate Analysis, 88, 365–411.

Lunde, A., Shephard, N., and Sheppard, K. (2011). Econometric analysis of vast covariancematrices using composite realized kernels. Technical report.

Meinshausen, N. and Buhlmann, P. (2006). High Dimensional Graphs and Variable Selectionwith the Lasso. The Annals of Statistics, 34, 1436–1462.

Ravikumar, P., Wainwright, M. J., Raskutti, G., and Yu, B. (2011). High-dimensional covari-ance estimation by minimizing `1-penalized log-determinant divergence. Electronic Journalof Statistics, 5, 935–980.

Yuan, M. and Lin, Y. (2007). Model selection and estimation in the gaussian graphical model.Biometrika, 94(1), 19–35.

Zhang, L. (2011). Estimating covariation: Epps effect, microstructure noise. Journal of Econo-metrics, 160(1), 33–47.

35

Table 1: Simulation Settings

Mnemonic Dimension Volatility Network Microstructuren v, κ, τ p η2, µ

Design 1 50 50, 0.1, 2.75 0.9/50 0.25, 0.5Design 2 50 50, 0.1, 2.75 1.0/50 0.25, 0.5Design 3 50 50, 0.1, 2.75 1.1/50 0.25, 0.5

The table reports the parameter settings used in the simulation study: n is the number of stocks, v, κand τ are the parameters of the volatility process, p is the probability that the partial correlation of anytwo stocks is not zero, η2 is the variance of microstructure noises and µ is the Poisson intensity drivingthe trading process.

Table 2: Realized Network Simulation Study

Σ KRMSE RMSE RMSE RMSE AUROCplain net plain net net

Realized CovarianceDesign 1 36.978 22.844 0.076 0.027 0.708Design 2 37.757 23.536 0.071 0.028 0.644Design 3 35.700 22.603 0.077 0.031 0.667

Multivariate Realized KernelDesign 1 34.068 15.899 1.831 0.023 0.764Design 2 34.535 16.654 1.090 0.022 0.697Design 3 32.786 16.022 1.065 0.025 0.727

Two–Scales Realized CovarianceDesign 1 30.326 12.885 1.137 0.021 0.806Design 2 31.065 13.167 1.900 0.022 0.719Design 3 29.110 12.840 0.828 0.024 0.746

The table reports the results of the simulation study on the selection and estimation precision ofthe realized network estimator. “Plain” stands for unregularized estimators and “Net” refers to theregularized ones.

36

Tab

le3:

Tic

kers

,C

ompa

nyN

ames

and

Sect

ors

Tic

ker

Sym

bol

Com

pany

Nam

eG

ICS

Sec

tor

Tic

ker

Sym

bol

Com

pany

Nam

eG

ICS

Sec

tor

AM

ZN

Am

azo

n.c

om

Con

sum

erD

iscr

etio

nary

AB

TA

bb

ott

Lab

ora

tori

esH

ealt

hC

are

CM

CS

AC

om

cast

Con

sum

erD

iscr

etio

nary

AM

GN

Am

gen

Hea

lth

Care

DIS

Walt

Dis

ney

Con

sum

erD

iscr

etio

nary

BA

XB

axte

rIn

tern

ati

on

al

Hea

lth

Care

FF

ord

Moto

rC

on

sum

erD

iscr

etio

nary

BM

YB

rist

ol-

Myer

sS

qu

ibb

Hea

lth

Care

FO

XA

Tw

enty

-Fir

stC

entu

ryF

ox

Con

sum

erD

iscr

etio

nary

GIL

DG

ilea

dS

cien

ces

Hea

lth

Care

GM

Gen

eral

Moto

rsC

on

sum

erD

iscr

etio

nary

JN

JJoh

nso

n&

Joh

nso

nH

ealt

hC

are

HD

Hom

eD

epot

Con

sum

erD

iscr

etio

nary

LLY

Lilly

&C

o.

Hea

lth

Care

LO

WL

ow

esC

on

sum

erD

iscr

etio

nary

MD

TM

edtr

on

icH

ealt

hC

are

MC

DM

cDon

ald

sC

on

sum

erD

iscr

etio

nary

MR

KM

erck

Hea

lth

Care

NK

EN

IKE

Con

sum

erD

iscr

etio

nary

PF

EP

fize

rH

ealt

hC

are

SB

UX

Sta

rbu

cks

Con

sum

erD

iscr

etio

nary

UN

HU

nit

edH

ealt

hG

rou

pH

ealt

hC

are

TG

TT

arg

etC

on

sum

erD

iscr

etio

nary

BA

Boei

ng

Com

pany

Ind

ust

rials

TW

XT

ime

Warn

erIn

c.C

on

sum

erD

iscr

etio

nary

CA

TC

ate

rpil

lar

Ind

ust

rials

CL

Colg

ate

-Palm

olive

Con

sum

erS

tap

les

EM

RE

mer

son

Ele

ctri

cIn

du

stri

als

CO

ST

Cost

coC

o.

Con

sum

erS

taple

sF

DX

Fed

Ex

Ind

ust

rials

CV

SC

VS

Care

mark

Con

sum

erS

tap

les

GD

Gen

eral

Dyn

am

ics

Ind

ust

rials

KO

Th

eC

oca

Cola

Com

pany

Con

sum

erS

tap

les

GE

Gen

eral

Ele

ctri

cIn

du

stri

als

MD

LZ

Mon

del

ezIn

tern

ati

on

al

Con

sum

erS

tap

les

HO

NH

on

eyw

ell

Intl

Ind

ust

rials

MO

Alt

ria

Gro

up

Inc

Con

sum

erS

tap

les

LM

TL

ock

hee

dM

art

inIn

du

stri

als

PE

PP

epsi

Co

Inc.

Con

sum

erS

tap

les

MM

M3M

Com

pany

Ind

ust

rials

PG

Pro

cter

&G

am

ble

Con

sum

erS

tap

les

NS

CN

orf

olk

Sou

ther

nIn

du

stri

als

PM

Ph

ilip

Morr

isC

on

sum

erS

tap

les

RT

NR

ayth

eon

Co.

Ind

ust

rials

WA

GW

alg

reen

Con

sum

erS

tap

les

UN

PU

nio

nP

aci

fic

Ind

ust

rials

WM

TW

al-

Mart

Sto

res

Con

sum

erS

tap

les

UP

SU

nit

edP

arc

elS

ervic

eIn

du

stri

als

AP

AA

pach

eE

ner

gy

UT

XU

nit

edT

ech

nolo

gie

sIn

du

stri

als

AP

CA

nad

ark

oP

etro

leu

mE

ner

gy

AA

PL

Ap

ple

Info

rmati

on

Tec

hn

olo

gy

CO

PC

on

oco

Ph

illip

sE

ner

gy

AC

NA

ccen

ture

Info

rmati

on

Tec

hn

olo

gy

CV

XC

hev

ron

En

ergy

CS

CO

Cis

coSyst

ems

Info

rmati

on

Tec

hn

olo

gy

DV

ND

evon

En

ergy

En

ergy

EB

AY

eBay

Info

rmati

on

Tec

hn

olo

gy

HA

LH

all

ibu

rton

Co.

En

ergy

EM

CE

MC

Info

rmati

on

Tec

hn

olo

gy

NO

VN

ati

on

al

Oil

wel

lV

arc

oE

ner

gy

FB

Face

book

Info

rmati

on

Tec

hn

olo

gy

OX

YO

ccid

enta

lP

etro

leu

mE

ner

gy

GO

OG

Google

Inc.

Info

rmati

on

Tec

hn

olo

gy

SL

BS

chlu

mb

erger

Ltd

.E

ner

gy

HP

QH

ewle

tt-P

ack

ard

Info

rmati

on

Tec

hn

olo

gy

XO

ME

xxon

Mob

ilE

ner

gy

IBM

IBM

Info

rmati

on

Tec

hn

olo

gy

AIG

AIG

Fin

an

cials

INT

CIn

tel

Info

rmati

on

Tec

hn

olo

gy

AL

LA

llst

ate

Fin

an

cials

MA

Mast

erca

rdIn

form

ati

on

Tec

hn

olo

gy

AX

PA

mer

ican

Exp

ress

Co

Fin

an

cials

MS

FT

Mic

roso

ftIn

form

ati

on

Tec

hn

olo

gy

BA

CB

an

kof

Am

eric

aF

inan

cials

OR

CL

Ora

cle

Info

rmati

on

Tec

hn

olo

gy

BK

Ban

kof

New

York

Fin

an

cials

QC

OM

QU

AL

CO

MM

Inc.

Info

rmati

on

Tec

hn

olo

gy

BR

K.B

Ber

ksh

ire

Hath

aw

ay

Fin

an

cials

TX

NT

exas

Inst

rum

ents

Info

rmati

on

Tec

hn

olo

gy

CC

itig

rou

pIn

c.F

inan

cials

VV

isa

Inc.

Info

rmati

on

Tec

hnolo

gy

CO

FC

ap

ital

On

eF

inan

cial

Fin

an

cials

DD

Du

Pont

Mate

rials

GS

Gold

man

Sach

sG

rou

pF

inan

cials

DO

WD

ow

Ch

emic

al

Mate

rials

JP

MJP

Morg

an

Chase

&C

o.

Fin

an

cials

FC

XF

reep

ort

-McM

ora

nM

ate

rials

ME

TM

etL

ife

Inc.

Fin

an

cials

MO

NM

on

santo

Co.

Mate

rials

MS

Morg

an

Sta

nle

yF

inan

cials

TA

T&

TIn

cT

elec

om

mu

nic

ati

on

sS

PG

Sim

on

Pro

per

tyG

rou

pF

inan

cials

VZ

Ver

izon

Com

mu

nic

ati

on

sT

elec

om

mu

nic

ati

on

sU

SB

U.S

.B

an

corp

Fin

an

cials

AE

PA

mer

ican

Ele

ctri

cP

ow

erU

tiliti

esW

FC

Wel

lsF

arg

oF

inan

cials

EX

CE

xel

on

Uti

liti

esA

BB

VA

bb

Vie

Hea

lth

Care

SO

Sou

ther

nC

o.

Uti

liti

es

The

tabl

ere

port

sth

elis

tof

com

pany

tick

ers,

com

pany

nam

esan

din

dust

ryse

ctor

s.

37

Table 4: Centrality on 2009-06-26

Rank Ticker Sector

1 GOOG Information Technology2 MA Information Technology3 SLB Energy4 FCX Materials5 APA Energy6 COP Energy7 OXY Energy8 APC Energy9 DVN Energy10 CVX Energy

The table reports the top tickers by eigenvector centrality on June 26, 2009.

Table 5: Links on 2009-06-26

Disc Stap Ener Fin Heal Ind Tech Mat Tel Util

Disc 1 2 4 8 1Stap 2 1 1 1 1 6 1Ener 1 35 2 2 12 6Fin 11 2 14 3Heal 1 2 3 7 3Ind 4 1 2 2 12 20 4Tech 8 6 12 14 7 20 14 6 2Mat 1 1 6 3 3 4 6 3TelUtil 2

The table reports the number of estimated links among industry groups on June 26, 2009.

Table 6: GMV Forecasting

Realized Network Unconstrained ConstrainedMRK 24.66 31.83∗∗∗ 29.51∗∗

TSRC 23.78 37.43∗∗∗ 41.41∗∗∗

RV 23.82 39.99∗∗∗ 40.65∗∗∗

The table reports the results of the GMV forecasting comparison exercise.“Realized Network” meansΣI is the realized network estimator, “Unconstrained” means to the ΣI in not regularized and “Con-strained” means ΣI is obtained by setting all of its off-diagonal entries to zero.

38

Figure 1: Realized Network Estimator Precision

The figure displays the monte carlo average of the scaled Frobenius norm of the realized networkestimator based on the Multivariate Realized Kernel (triangles), Two–Scales Realized Covariance (circles)and Realized Covariance (squares) as a function of the tuning parameter λ.

Figure 2: Idiosyncratic Correlation Heatmap on 2009-06-26

The figure shows the heatmap of the idiosyncratic realized correlation matrix on June 26, 2009estimated using the TSRC estimator. Darker colors indicate higher correlations in absolute value.

39

Figure 3: Trace on 2009-06-26

The figure shows the trace and BIC of the realized network estimator on June 26, 2009.

40

Figure 4: Idiosyncratic Partial Correlation Network on 2009-06-26

AMZN

LOW

NKE

SBUX

TGT

CL

COST

PEP

PG

PM

WMT

APAAPC

COP

CVXDVN

HAL

NOV

OXY

SLB

XOM

AXP

BK

COF

GS

JPM

MET

MS

SPG

USB

WFC

ABT AMGN

BAX

GILDJNJ

MRK

BA

CAT

EMR

FDX

GD

HON

LMT MMM

NSC

RTN

UNP

UPS

UTX

AAPL

ACN

EBAY

GOOG

IBM MA

ORCLQCOMTXN

V

DD

DOW

FCX

MON

EXC

SO

The figure shows the optimal realized network estimated on June 26, 2009.

Figure 5: Degree and Partial Correlation Distribution on 2009-06-26

The figure shows the degree distribution and the distribution of partial correlations on June 26, 2009.

41

Figure 6: Sparsity vs. Volatility

The figure shows the sparsity of the estimated network (square) vis–a–vis the level of volatilitymeasured by the VIX (circle) for each week of 2009.

Figure 7: Sectorial Links

The figure shows the number of linkages of the different industry groups over the total number ofpossible linkages for each week of 2009.

42

Figure 8: Sectorial Concentration

The figure shows the link concentration (measured using the Herfindahl index) of the different industryfor each week of 2009.

Figure 9: GMV Forecasting

The figure shows the weekly annualized volatilities of the GMV portfolios for each week of 2009. Plotshows the GMV weekly volatility of the realized network based weights (circle), unrestricted covarianceweights (square) and restricted covariance weights (triangle).

43

A Technical Appendix

In this Section we provide the proofs of Theorems 1,3,4,5,6, Lemma 1 and Corollary 1.

Proof of Theorem 1. We start proving the concentration inequality for a generic asset i.

By (1), the process of the efficient log-price of this asset is given by

yi(t) =n∑j=1

∫ t

0

θij(u)dBj(u).

where θij(u) is the i, j-entry of Θ(u), and Bj are the iid one-dimensional components

of the n-dimensional Brownian motion B. Recall that the asset i is observed at times

Ti = ti1, ..., timi on [0, 1], which are assumed to satisfy sup2≤j≤mimi(tij − tij−1) ≤ c∆

(Assumption 3). We use the notation xj, yj, uj to respectively denote x(tij), y(tij), u(tij),

for j ∈ 1, ...,mi, and η2 denotes the variance of the centered Gaussian random variables

uj. We also denote mi by m. In order to simplify the notation we will also write θ(u)dB(u)

to denote the scalar product∑n

j=1 θij(u)dBj(u). In all what follows c > 0 will denote a

generic constant that may change from line to line.

By definition (3), the realized kernel estimator of σ∗ii equals

σK ii = γ0(y, y) + A+B + C, (15)

where

A =H∑h=1

k

(h− 1

H

)(γh(y, y) + γ−h(y, y)),

B = 2γ0(y, u) +H∑h=1

k

(h− 1

H

)(γh(y, u) + γh(u, y) + γ−h(y, u) + γ−h(u, y)),

C = γ0(u, u) +H∑h=1

k

(h− 1

H

)(γh(u, u) + γ−h(u, u)),

and for each h ∈ −H, . . . , H, γh(y, u) =∑m−1

j=1 (yj−yj−1)(uj−h−uj−h−1). By Assumption

4 H = cm12 , where c is some positive constant.

By Lemma 3 in Fan et al. (2012), there exist constants c1, c2 > 0 such that for large

44

m and x ∈ [0, c1],

P (|γ0(y, y)− σ∗ii| ≥ x) ≤ 2 exp(−c2mx2). (16)

Therefore, by (15) and (16), it suffices to bound the terms A,B and C in probability. For

this, we will use the following well-known exponential martingale inequality.

Lemma 2. Let (γ(t), t ∈ [0, 1]) be an adapted process such that there exists a constant

c > 0 for whichK∑j=1

∫ tj

tj−1

|γ(u)|2du ≤ c a.s.,

where 0 ≤ t0 < t1 < · · · < tK ≤ 1 is a partition of [0, 1]. Then for any x > 0,

P

(∣∣∣∣∣K∑j=1

∫ tj

tj−1

γ(u)dB(u)

∣∣∣∣∣ ≥ x

)≤ 2 exp

(−x

2

2c

).

Bound in probability of the term A in (15). We decompose A as

A =m−1∑j=1

∫ tj

tj−1

θ(u)dB(u)H∑h=1

k

(h− 1

H

)(∫ tj−h

tj−h−1

θ(u)dB(u) +

∫ tj+h

tj+h−1

θ(u)dB(u)

)

= A1 + A2.

We start bounding A1. For any x > 0 and α > 0, we have

P(|A1| ≥

x

2

)≤ P

∣∣∣∣∣∣m−1∑j=1

∫ tj

tj−1

θ(u)dB(u)Dj1(sup

j∈1,...,m−1|Dj |≤m−

14+α

)∣∣∣∣∣∣ ≥ x

4

+ P

(sup

j∈1,...,m−1|Dj| > m−

14

+α

),

where

Dj =H∑h=1

k

(h− 1

H

)∫ tj−h

tj−h−1

θ(u)dB(u).

By Assumptions 2 and 3,

H∑h=1

k2

(h− 1

H

)∫ tj−h

tj−h−1

|θ(u)|2du ≤ cHC4m

= cm−12 .

45

Therefore, appealing to Lemma 2, we get that

P

(sup

j∈1,...,m−1|Dj| > m−

14

+α

)≤ 2m exp

(−cm2α

).

On the other hand, since

m−1∑j=1

∫ tj

tj−1

|θ(u)|2duD2j1(

supj∈1,...,m−1

|Dj |≤m−14+α

) ≤ cm2α− 12 ,

again using Lemma 2, we obtain that for all x > 0

P

∣∣∣∣∣∣m−1∑j=1

∫ tj

tj−1

θ(u)dB(u)Dj1(sup

j∈1,...,m−1|Dj |≤m−

14+α

)∣∣∣∣∣∣ ≥ x

4

≤ 2 exp(−cm

12−2αx2

).

Therefore, choosing α > 0 such that m2α = m14x, we conclude that

P(|A1| ≥

x

2

)≤ 2m exp

(−cm

14x). (17)

We next bound A2. We write A2 = A21 + A22 + A23, where

A21 =H−1∑i=1

∫ ti+1

ti

θ(u)dB(u)i−1∑j=0

k

(i− jH

)∫ tj+1

tj

θ(u)dB(u)

A22 =m−1∑i=H

∫ ti+1

ti

θ(u)dB(u)i−1∑

j=i−H

k

(i− jH

)∫ tj+1

tj

θ(u)dB(u)

A23 =m+H−1∑i=m

∫ ti+1

ti

θ(u)dB(u)m−2∑j=i−H

k

(i− jH

)∫ tj+1

tj

θ(u)dB(u).

We start bounding A22. For a generic i, using Assumptions 2, 3 and 4, we get that

i−1∑j=i−H

∫ tj+1

tj

k2

(i− jH

)|θ(u)|2du ≤ cm−

12 .

Thus, according to Lemma 2, for any α > 0, we obtain that

P

(sup

i∈H,...,m−1|Fi| > mα− 1

4

)≤ 2m exp

(−cm2α

),

46

where Fi =∑i−1

j=i−H k(i−jH

) ∫ tj+1

tjθ(u)dB(u). Therefore, using again Lemma 2, we con-

clude that for any x > 0 and α > 0,

P(|A22| ≥

x

6

)≤ P

∣∣∣∣∣∣m−1∑i=H

∫ ti+1

ti

θ(u)Fi1(sup

i∈H,...,m−1|Fi|≤mα−

14

)dB(u)

∣∣∣∣∣∣ ≥ x

12

+ P

(sup

i∈H,...,m−1|Fi| > mα− 1

4

)≤ 2 exp

(−cm

12−2αx2

)+ 2m exp

(−cm2α

),

sincem−1∑i=H

∫ ti+1

ti

|θ(u)|2F 2i 1(

supi∈H,...,m−1

|Fi|≤mα−14

)du ≤ cm2α− 12 .

Thus, choosing α as in (17), we conclude that A2,2 also satisfies (17). Similarly, so do

A2,1, A2,3, and A.

Bound in probability of the term B in (15). We decompose B as B = B1 +B2, where

B1 = γ0(y, u) +H∑h=1

k

(h− 1

H

)(γh(y, u) + γ−h(y, u))

B2 = γ0(u, y) +H∑h=1

k

(h− 1

H

)(γh(u, y) + γ−h(u, y)).

We start bounding B1. We write

B1 =m−1∑j=1

∫ tj

tj−1

θ(u)dB(u)Gj,

where

Gj =H∑h=1

k

(h− 1

H

)(uj−h − uj−h−1 + uj+h − uj+h−1) + uj − uj−1. (18)

Observe that since k(0) = 1, k(1) = 0, we have

Gj ∼ N

(0, 2η2

H∑h=1

(k

(h

H

)− k

(h− 1

H

))2).

47

Moreover, there exists βh ∈[h−1H, hH

]such that

H∑h=1

(k

(h

H

)− k

(h− 1

H

))2

=1

H

H∑h=1

1

Hk′(βh)

2 ≤ cm−12 .

Therefore, for all α > 0,

P

(sup

j∈1,...,m−1|Gj| > mα− 1

4

)> 2m exp

(−cm2α

).

Consequently, by Lemma 2, for any x > 0,

P (|B1| ≥ x) ≤ P

∣∣∣∣∣∣m−1∑j=1

∫ tj

tj−1

θ(u)dB(u)Gj1(sup

j∈1,...,m−1|Gj |≤mα−

14

)∣∣∣∣∣∣ ≥ x

+ P

(sup

j∈1,...,m−1|Gj| > mα− 1

4

)≤ 2 exp

(−cm

12−2αx2

)+ 2m exp

(−cm2α

),

sincem−1∑i=H

∫ ti+1

ti

|θ(u)|2G2j1(

supj∈1,...,m−1

|Gj |≤mα−14

)du ≤ cm2α− 12 .

Thus, choosing α as in (17), we conclude that B1 also satisfies (17).

We next bound B2. We write

B2 =m−1∑j=1

(uj − uj−1)Sj = −S1u0 +m−1∑j=2

(Sj−1 − Sj)uj−1 + Sm−1um−1,

where

Sj =H∑h=1

k

(h− 1

H

)(∫ tj−h

tj−h−1

θ(u)dB(u) +

∫ tj+h

tj+h−1

θ(u)dB(u)

)+

∫ tj

tj−1

θ(u)dB(u).

Now, for all x > 0 and α > 0, we write

P (|B2| ≥ x) ≤ P

(∣∣∣∣B21nE≤m2α− 12

o∣∣∣∣ ≥ x

2

)+ P

(E > m2α− 1

2

),

where E = S21 +

∑m−1j=2 (Sj−1 − Sj)2 + S2

m−1.

48

Since the random variables u0, u1, ..., um are iid centered Gaussian with variance η2,

P

(∣∣∣∣B21nE≤m2α− 12

o∣∣∣∣ ≥ x

2

)≤ 2 exp

(cm

12−2αx2

).

In order to bound the second term, we observe that

E = S1(2S1 − S2) +m−2∑j=2

Sj(2Sj − Sj−1 − Sj+1) + Sm−1(2Sm−1 − Sm−2).

Thus,

P(E > 7m2α− 1

2

)≤ P

(sup

j∈2,...,m−2|2Sj − Sj−1 − Sj+1| > mα− 5

4

)

+ P

(sup

j∈1,...,m−1|Sj| > mα− 1

4

).

Straightforward computations show that both terms are bounded by cm exp (−cm2α).

Therefore, choosing α as in (17), we conclude that B2, and thus B also satisfies (17).

Bound in probability of the term C in (15). We write

C = −u0e1 +m−2∑j=1

(ej − ej+1)uj + em−1um−1,

where ej =∑H

h=2 k(h−1H

)(uj−h − uj−h−1 + uj+h − uj+h−1) + uj+1 − uj−2.

For all x > 0 and α > 0, we write

P (|C| ≥ x) ≤ P

(∣∣∣∣C1nF≤m2α− 1

2

o∣∣∣∣ ≥ x

2

)+ P

(F > m2α− 1

2

),

where F = e21 +

∑m−1j=2 (ej − ej−1)2 + e2

m−1.

Again, straightforward computations show that

P(F > 3m2α− 1

2

)≤ P

(sup

j∈2,...,m−1(ej − ej−1)2 > m2α− 3

2

)+ P

(e2

1 > m2α− 12

)+ P

(e2m−1 > m2α− 1

2

)≤ cm exp

(−cm2α

).

49

Hence, the rest of the proof follows as for B2, which shows that C also satisfies (17).

Conclusion. The bounds obtained for A, B, and C, together with (15) and (16), show

that there exist constant c1, c2, c3, c4 > 0 such that for large m and x ∈ [0, c1],

P (|σK ii − σ∗ii| ≥ x) ≤ 2 exp(−c2mx2) + c3m exp

(−c4m

14x).

Now, when m−34 < x ≤ c1, we have x2m ≥ xm

14 , and thus, we obtain that

P (|σK ii − σ∗ii| ≥ x) ≤ c5m exp(−c6m

14x).

On the other hand, when 0 ≤ x ≤ m−34 , we have that

m exp(−c4xm

14

)≥ m exp

(−c4m

− 12

)> 1,

and thus the same bound holds. This concludes the proof of the theorem for the diagonal

terms.

We are now going to prove the concentration inequality for two generic assets i and j.

By (1), the integrated covariance of the log-returns of assets i and j on [0, 1] is given by

∫ 1

0

Θi(u)Θj(u)′du =1

4

(∫ 1

0

(Θi(u) + Θj(u)) (Θi(u) + Θj(u))′ du

)− 1

4

(∫ 1

0

(Θi(u)−Θj(u)) (Θi(u)−Θj(u))′ du

),

(19)

where Θk(u), k ∈ 1, ..., n denote the rows of the matrix Θ(u). Therefore, in order to

estimate the integrated covariance, it suffices to estimate the two terms in (19).

The MRK estimator of s∗ij =∫ 1

0(Θi(u) + Θj(u)) (Θi(u) + Θj(u))′ du is given by

sK ij = γ0(xri + xrj) +H∑h=1

k

(h− 1

H

)(γh(x

ri + xrj) + γ−h(x

ri + xrj)),

where γh(xri + xrj) = γh(x

ri + xrj , x

ri + xrj), for all h ∈ −H, . . . , H.

Observe that if there is no asynchronicity among the observations, then we can derive

the concentration inequality for sK ij as we did for the case of a single asset. Therefore, it

50

suffices to split the estimator as sK ij = sK ij +∑10

q=1 Fq, where sK ij is the analogous of sK ij

but replacing xri + xrj by xi + xj, where x`k := y`(τk) + ur`k, for ` = i, j and k ∈ 1, ...,m.

The terms Fq, are those that contain the points 4y` k := y`(τk)− yr`k, that is,

F1 = γ0(4yi, xi + xj) +H∑h=1

k

(h− 1

H

)(γh(4yi, xi + xj) + γ−h(4yi, xi + xj)),

F3 = γ0(xi + xj,4yi) +H∑h=1

k

(h− 1

H

)(γh(x1 + x2,4yi) + γ−h(xi + xj,4yi)),

F5 = γ0(4yi,4yj) +H∑h=1

k

(h− 1

H

)(γh(4yi,4yj) + γ−h(4yi,4yj)),

F7 = γ0(4yi, uri + urj) +H∑h=1

k

(h− 1

H

)(γh(4yi, uri ) + γ−h(4yi, uri + urj)),

F9 = γ0(uri + urj ,4yi) +H∑h=1

k

(h− 1

H

)(γh(u

ri + urj ,4yi) + γ−h(u

ri + urj ,4yi)).

F2, F4, F6, F8 and F10 are equal to F1, F3, F5, F7 and F9, respectively, by replacing 4yi by

4yj and viceversa.

By the previous proof of the concentration inequality for a single asset i, we have that

there exist constants c1, c2, c3 > 0 such that for large m and x ∈ [0, c1],

P(∣∣sK ij − σ∗ij

∣∣ ≥ x)≤ c2m exp

(−c3m

14x).

On the other hand, straightforward computations show that for each q ∈ 1, ..., 10, there

exist constants c1, c2, c3 > 0 such that for large m and x ∈ [0, c1],

P (|Fq| ≥ x) ≤ c2m exp(−c3m

14x).

Proceeding similarly, we obtain the same estimate for the MRK estimator of the second

term in (19), which concludes the desired proof.

Proof of Corollary 1. We prove the corollary when Σ is the TSCR estimator. The proof

follows similarly for the MRK estimator.

51

Notice that Σ∗FF = σ∗n+1,n+1 and ΣFF = σn+1,n+1. Then for all i, j ∈ V ,

σ∗I ij = σ∗ij −σ∗n+1,iσ

∗j,n+1

σ∗n+1,n+1

and σI ij = σij −σn+1,iσj,n+1

σn+1,n+1

.

Therefore,

P(∣∣σI ij − σ∗I ij∣∣ > x) ≤ P

(∣∣σ∗ij − σij∣∣ ≥ x

2

)+ P

(∣∣∣∣σn+1,iσj,n+1

σn+1,n+1

−σ∗n+1,iσ

∗j,n+1

σ∗n+1,n+1

∣∣∣∣ ≥ x

2

).

According to (5), for large M , x ∈ [0, a1] and i, j ∈ V , the first term is bounded by

a2e−a3M

13 x2

. In order to bound the second term, set b1 = σn+1,n+1 − σ∗n+1,n+1, b2 =

σn+1,i − σ∗n+1,i, and b3 = σj,n+1 − σ∗j,n+1. Then,

∣∣∣∣σn+1,iσj,n+1

σn+1,n+1

−σ∗n+1,iσ

∗j,n+1

σ∗n+1,n+1

∣∣∣∣=

∣∣∣∣b3σ∗n+1,n+1σ

∗n+1,i + b2σ

∗n+1,n+1σ

∗j,n+1 + b2b3σ

∗n+1,n+1 − b1σ

∗n+1,iσ

∗j,n+1

σ∗n+1,n+1(σ∗n+1,n+1 + b1)

∣∣∣∣ . (20)

Without loss of generality, we assume that σ∗j,n+1 and σ∗n+1,i are non-zero. Otherwise, the

proof will follow similarly. Then,

P

(∣∣∣∣σn+1,iσj,n+1

σn+1,n+1

−σ∗n+1,iσ

∗j,n+1

σ∗n+1,n+1

∣∣∣∣ ≥ 8x

)≤ P

(|b1| ≥ max

(σ∗n+1,n+1

2,σ∗2n+1,n+1x

|σ∗n+1,iσ∗j,n+1|

))+ P

(|b2| ≥ max

(σ∗n+1,n+1x

|σ∗j,n+1|, 1

))+ P

(|b3| ≥

σ∗n+1,n+1x

|σ∗n+1,i|

).

Observe that in the last inequality we have used the fact that if |b1| ≥σ∗n+1,n+1

2, then

we can lower bound the denominator in (20) byσ∗2n+1,n+1

2. Then appealing again to the

concentration inequality (5), for large M , and x ∈ [0, c1], this term is also bounded by

c2e−c3M

13 x2

. Thus, the result follows.

Proof of Lemma 1. By Lagrangian duality, for each positive λ, there exists a positive

constant c(λ) such that the minimization problem (11) can be written in the equivalent

constrained form

minK∈Sn,||K||1,o≤c(λ)

[tr(KΣ

)− log det(K)

].

52

When ||K||1,o ≤ c(λ), we can find a positve constant c such that

h(K) := tr(KΣ

)− log det(K) ≥

n∑i=1

kiiσii − log det(K)− c.

Set S1 = K ∈ Sn, ||K||1,o ≤ c(λ). For a generic i, the function f(x) = xσii − log x is

increasing in ( 1σii,+∞), decreasing in (0, 1

σii), and approaches to infinity as x → ∞ or

x→ 0+. Thus we can find positive constants ai, bi such that for any K0 =(k0ij

)∈ S1 \S2,

where S2 = K ∈ Sn : ||K1,o|| ≤ c(λ), ai ≤ kii ≤ bi ∀ i and K1 ∈ S2, we have

n∑i=1

(k0iiσii − log k0

ii

)> c+ h(K1).

According to Hadamard’s inequality for positive definite matrices, det(K0) ≤∏n

i=1 k0ii.

Therefore,

h(K0) ≥n∑i=1

k0iiσii − log det(K0)− c ≥

n∑i=1

(k0iiσii − log k0

ii

)− c > h(K1).

Consequently, the minimum of h(K) over S2 is also its minimum over S1. Because h(K)

is a convex function and S2 is compact, the minimum is unique. By standard optimality

conditions for convex programs, the minimizer K satisfies (12).

In order to prove Theorems 3, 4, 5 and 6, we need the following preliminary lemmas.

Set W = (wij) = Σ−Σ?,4 = K−K?,R(4) = Σ−1 −Σ∗−1 + Σ?4Σ?.

Lemma 3. Suppose that for some λ > 0,

max||W||∞, ||R(4)||∞ ≤αλ

8,

where α ∈ (0, 1] is from Assumption 6. Then for all (i, j) ∈ Sc, |zij| ≤ 1. That is, step

(c) in the construction of the primal-dual witness solution succeeds.

Lemma 4. Suppose that ||4||∞ ≤ 13CΣ?d

, where d is the maximum degree of the network,

53

that is the maximum number of edges that include a vertex, and CΣ? = |||Σ?|||∞. Then

||R(4)||∞ ≤3

2d||4||2∞C3

Σ? .

Lemma 5. Suppose that

r = 2CΓ?(||W||∞ + λ) ≤(3dmax

(CΣ? , C3

Σ?CΓ?)−1

, (21)

where CΓ∗ = ||| (Γ∗SS)−1 |||∞. Then ||4||∞ ≤ r.

Lemma 6. Assume that (21) holds. Assume also that km > 2CΓ?(||W||∞+λ), where km is

the minimum absolute value of the non-zero entries of K?. Then sign(kij

)= sign

(k?ij),

for any (i, j) ∈ S.

In the following, Lemma 7 is for the case when Σ is the TSCR estimator, and Lemma

8 is for the MRK estimator.

Lemma 7. Assume that for some τ > 2, M− 16

[log(a2nτ )

a3

] 12 ≤ a1, where ai are the constants

in (5). Then

P

(||W||∞ > M− 1

6

[log (a2n

τ )

a3

] 12

)≤ 1

nτ−2.

Lemma 8. Assume that M ≥16

„log

n−τ a43x4

a2

«4

a43x

4 , for some τ > 2, where ai are the constants

in (4). Then there exists c0 depending on ai such that for all x ∈ [0, c0],

P (||W||∞ ≥ x) ≤ 1

nτ−2.

Proof of Lemma 3. We first rewrite condition (14) in the equivalent form

K∗−14K∗−1 + W −R(4) + λZ = 0. (22)

In order to transform this matricial equation into a vector equation, we use the following

notion: For any matrix A, we set A for the vector obtained by stacking up the rows of

54

A into a single column vector. In particular,

K∗−14K∗−1 =(K∗−1 ⊗K∗−1

)4 = Γ?4.

For any matrix A = (aij) ∈ Rn×n and any subset T of V , we define the vector AT such

that aij ∈ AT if and only if i, j ∈ T , and the relative order of any two entries in AT is

the same as their order in A. Given that 4Sc = 0 by construction, equation (22) can be

rewritten as two blocks of linear equations as

Γ?SS4S + WS −R(4)S + λZS = 0 (23)

Γ?ScS4S + WSc −R(4)Sc + λZSc = 0. (24)

Since Γ?SS is invertible, we obtain from (23) that

4S = (Γ?SS)−1

[−WS +R(4)S − λZS

].

Substituting this expression into equation (24), we get that

ZSc = −1

λΓ?ScS(Γ?

SS)−1(WS −R(4)S) + Γ?ScS(Γ?

SS)−1ZS −1

λ(WSc −R(4)Sc).

Recalling the definitions of || · || and ||| · |||, we obtain that

||ZSc||∞ ≤1

λ|||Γ?

ScS(Γ?SS)−1|||∞

(||WS||∞ + ||R(4)S)||∞

)+ ||Γ?

ScS(Γ?SS)−1ZS||∞ +

1

λ

(||WS||∞ + ||R(4)S||∞

).

According to Assumption 6 and the fact that ||ZS|| ≤ 1, we have that

||Γ?ScS(Γ?

SS)−1ZS||∞ ≤ 1− α,

so that

||ZSc|| ≤2− αλ

(||WS||∞ + ||R(4)S||∞) + 1− α.

55

Therefore, if max||W||∞, ||R(4)||∞ ≤ αλ8, we conclude that ||ZSc||∞ < 1.

Proof of Lemma 4. We write R(4) in the form

R(4) = (K? + 4)−1 −K∗−1 + K∗−14K∗−1.

Using the fact that |||4|||∞ ≤ d||4||∞, and the assumption of the lemma, we get that

|||K∗−14|||∞ ≤ |||K∗−1|||∞|||4|||∞ ≤ CΣ?d||4||∞ <1

3. (25)

Consequently, we have the matrix expansion

(K? + 4)−1 =(K?(In + K∗−14)

)−1= (In + K∗−14)−1(K?)−1

=∞∑k=0

(−1)k(K∗−14)k(K?)−1

= K∗−1 −K∗−14K∗−1 +∞∑k=2

(−1)k(K∗−14)k(K?)−1

= K∗−1 −K∗−14K∗−1 + K∗−14K∗−14JK∗−1,

where J =∑∞

k=0(−1)k(K∗−14)k. Therefore, we conclude that

R(4) = K∗−14K∗−14JK∗−1. (26)

Let (ei)ni=1 denote the Euclidean basis in Rn. Given the fact that for any vectors a, b ∈ Rn,

|a′b| ≤ ||a||∞||b||1, it holds that

||R(4)||∞ = maxi,j

∣∣e′iK∗−14K∗−14JK∗−1ej∣∣

≤ maxi||e′iK∗−14||∞max

j||K∗−14JK∗−1ej||1.

Since for any vector u ∈ Rn, ||u′4||∞ ≤ ||u||1||4||∞, we get that

||R(4)||∞ ≤ maxi||e′iK∗−1||1||4||∞max

j||K∗−14JK∗−1ej||1.

56

Continuing on, we obtain that

||R(4)||∞ ≤ |||K∗−1|||∞||4||∞|||K∗−1J′4K∗−1|||∞

≤ CΣ? ||4||∞|||K∗−1|||2∞|||J′|||∞|||4|||∞.(27)

Given the definition of J and (25), we have

|||J′|||∞ ≤∞∑k=0

|||4K∗−1|||k∞ ≤1

1− |||K∗−1|||∞|||4|||∞≤ 3

2,

Substituting this in (27) and using the fact that |||4|||∞ ≤ d||4||∞, we conclude that

||R(4)||∞ ≤3

2CΣ? ||4||∞|||K∗−1|||2∞|||4|||∞ ≤

3

2C3

Σ?d||4||2∞.

Proof of Lemma 5. For any matrix X = (xij) ∈ Rn×n, we define XS =(x

(s)ij

)such that if

(i, j) ∈ S, we have x(s)ij = xij; and if (i, j) ∈ Sc, we have x

(s)ij = 0. Let X = (xij) ∈ Rn×n,

such that XS = X. Set

G(XS) = ΣS −(X−1

)S

+ λZS,

and

F (XS) = −(Γ?SS)−1G(K?

S + XS)S + XS.

If we take partial derivatives of the Lagrangian of the restricted problem (13) with respect

to the unconstrained entries in K, the partial derivatives must vanish at the optimum, so

we obtain the zero-gradient conditon:

ΣS − (K−1)S + λZS = 0.

Thus G(XS) = 0 is true if and only if XS = KS. Consequently, F (XS) = XS is true if

and only if XS = 4S = KS −K?. Notice that 4S = 4, so our goal is to prove that

57

4S ∈ B(r), where B(r) denotes the ball

B(r) =XS : ||XS||∞ ≤ r,X ∈ Rn×n , with r = 2CΓ? (||W||∞ + λ) .

We have already shown that 4S is the only fixed point for F (XS) which is continuous

on B(r) that is convex and compact. Therefore, if we show that for any XS ∈ B(r),

F (X−S) ∈ B(r), we have that 4S ∈ B(r) by Brouwer’s fix point theorem. By definition,

G(K?S + XS) = ΣS −

((K? + X)−1

)S

+ λZS

=(ΣS −

(K∗−1

)S

)+(−((K? + X)−1

)S

+(K∗−1

)S

)+ λZS

= WS +(−((K? + X)−1

)S

+(K∗−1

)S

)+ λZS.

(28)

Note that ||X||∞ ≤ r ≤ 13CΣ∗d

, as XS ∈ B(r) and the entries on Sc of X are zero.

Then, according to (26),

R(X) = (K? + X)−1 −K∗−1 + K∗−1XK∗−1 = (K∗−1X)2J(X)K∗−1,

where J(X) =∑∞

k=0(−1)k(K∗−1X)k. If we take the vectorized form and restrict to entries

in S, we obtain

(K? + X)−1 −K∗−1S + Γ?

SSXS = (K∗−1X)2J(x)K∗−1S. (29)

Based on (28) and (29), we have

F (XS) = −(Γ?SS)−1G(K?

S + XS) + XS

= (Γ?SS)−1

(K? + X)−1 −K∗−1

S −WS − λZS

+ XS.

Therefore,

||F (XS)||∞ ≤ ||T1||∞ + ||T2||∞,

where T1 = (Γ?SS)−1(K∗−1X)2J (x)K∗−1

S and T2 = (Γ?SS)−1

(WS + λZS

).

58

Using the definition of CΓ? , we get that

||T2||∞ ≤ CΓ?(||W||∞ + λ) =r

2.

It now remains to show that ||T1||∞ ≤ r2. We have

||T1||∞ ≤ CΓ? ||(K∗−1X)2J(x)K∗−1S||∞ ≤ CΓ?||R(X)||∞.

As the entries on Sc of X are zero, we get that ||X||∞ = ||XS||∞ ≤ r ≤ 13CΣ?d

. Then,

using Lemma 4, we obtain that

||T1||∞ ≤3

2dC3

Σ?CΓ∗||X||2∞ ≤3

2dC3

Σ?CΓ∗r2 ≤ r

2.

Proof of Lemma 6. By Lemma 5, we get that |kij−k∗ij| ≤ r < km, for all (i, j) ∈ S. Thus,

the desired result follows.

Proof of Lemma 7. According to (5), for large M , x ∈ [0, a1] and i, j ∈ V ,

P (|wij| > x) ≤ a2e−a3M

13 x2

.

Using this inequality over all n2 entries of W , we obtain that

P

(maxi,j|wij| > x

)≤ n2a2e

−a3M13 x2

.

Setting x = M− 16

[log(a2nτ )

a3

] 12

yields the desired result.

Proof of Lemma 8. According to (4), for large M , x ∈ [0, a1], and i, j ∈ V ,

P (|wij| ≥ x) ≤ a2Me−a3M14 x =

(a3M

14x)4

exp

(−a3M

14x

2

)a2

a43x

4exp

(−a3M

14x

2

).

59

We first claim that if M ≥16

„log

n−τ a43x4

a2

«4

a43x

4 and x ≤ a142

a3e−

278 , then

(a3M

14x)4

exp

(−a3M

14x

2

)≤ 1.

Indeed, this follows from the fact that f(y) := y4e−y/2 ≤ 1 when y ≥ 27, and the fact

that if x ≤ a142

a3e−

278 , then

logn−τa4

3x4

a2

≤ loga4

3x4

a2

≤ −27

2.

On the other hand, if M ≥16

„log

n−τ a43x4

a2

«4

a43x

4 , then

a2

a43x

4exp

(−a3M

14x

2

)≤ n−τ .

Thus, the desired result follows choosing c0 = min

(a1,

a142

a3e−

278

), and summing over all

the entries of W.

Proof of Theorem 3. In the proof of Lemma 8 we have shown that if M ≥16

„log

n−τ a43c40

a2

«4

a43c

40

,

then

f(c0) := a2Me−a3M14 c0 ≤ n−τ .

On the other hand, f(α8λ) = n−τ . Because the function f is decreasing, we conclude that

α8λ ≤ c0. Therefore, by Lemma 8, we conclude that

1− 1

nτ−2≤ P

(||W ||∞ ≤

α

8λ = M− 1

4log (a2Mnτ )

a3

).

Now, set σm = mini σ∗ii (observe that σ∗ii > 0 for all i ∈ V). By the same argument

above, if M ≥16

„log

n−τ a43σ4m

a2

«4

a43σ

4m

and σm ≤ c0, then f(σm) ≤ n−τ and thus α8λ ≤ σm. When

σm > σ0, as α8λ ≤ c0, we also get that α

8λ ≤ σm. Therefore,

1− 1

nτ−2≤ P

(||W ||∞ ≤

α

8λ)≤ P (||W ||∞ ≤ σm) .

60

Thus, if the event A :=||W ||∞ ≤M− 1

4log(a2Mnτ )

a3

holds a.s., then Σ has positive

diagonal entries, and by Lemma 1, K is unique.

Proceeding as above, if

M ≥16(

logn−τa4

3c4

a2

)4

a43c

4,

where c =(

6 (1 + 8α−1)2d max (CΣ?CΓ? , C

3Σ?C2

Γ?)−1

, then c ≥ α8λ.

Thus, we get that, if A holds, then

2CΓ?(||W||∞ + λ) ≤ 2CΓ?(1 + 8α−1

)M− 1

4log (a2Mnτ )

a3

≤(3d max

(CΣ? , C3

Σ?CΓ?)−1

.

Therefore, by Lemma 5,

||4||∞ ≤ 2CΓ?(1 + 8α−1

)M− 1

4log (a2n

τM)

a3

.

Hence, applying Lemma 4, we obtain that

||R(4)||∞ ≤ 3

2d||4||2∞C3

Σ? ≤ 6C3Σ?C2

Γ?d(1 + 8α−1

)2(M− 1

4log (a2n

τM)

a3

)2

= 6C3Σ?C2

Γ?d(1 + 8α−1

)2M− 1

4log (a2n

τM)

a3

αλ

8≤ αλ

8.

By Lemma 3, we conclude that K = K. Thus, ||K −K?||∞ satisfies the same bound as

||4||∞ if c1 =a43

a2min (c0, σm, c)

4 and c2 =a43

16min (c0, σm, c)

4.

Proof of Theorem 4. If M ≥16

„log

n−τ a43k4

a2

«4

a43k

4 , where k = km2CΓ? (1+8α−1)

, then

km ≥ 2CΓ?(1 + 8α−1

)M− 1

4log (a2Mnτ )

a3

≥ 2CΓ?(||W||∞ + λ).

Thus, choosing c3 =a43

a2min (c0, σm, c, k)4 and c4 =

a43

16min (c0, σm, c, k)4 , following the same

steps as in Theorem 5, and appealing to Lemma 6, we conclude the desired result.

61

Proof of Theorem 5. If M ≥(

log(a2nτ )

a3

)3

min(a1, σm)−6, then

min(a1, σm) ≥M− 16

(log (a2n

τ )

a3

) 12

,

and by Lemma 7,

1− 1

nτ−2≤ P

(||W ||∞ ≤M− 1

6

(log (a2n

τ )

a3

) 12

)≤ P (||W ||∞ ≤ σm) .

Thus, if the event A :=

||W ||∞ ≤M− 1

6

(log(a2nτ )

a3

) 12

holds a.s., then Σ has positive

diagonal entries, and by Lemma 1, K is unique.

Now, if

M ≥(

log (a2nτ )

a3

)3

c−6,

where c =(

6 (1 + 8α−1)2d max (CΣ?CΓ? , C

3Σ?C2

Γ?)−1

then

c ≥M− 16

(log (a2n

τ )

a3

) 12

.

Thus, choosing λ = 8α−1M− 16

(log(a2nτ )

a3

) 12

, we get that, if A holds then

2CΓ?(||W||∞ + λ) ≤(3d max

(CΣ? , C3

Σ?CΓ?)−1

.

Therefore, by Lemma 5,

||4||∞ ≤ 2CΓ?(1 + 8α−1

)M− 1

6

(log (a2n

τ )

a3

) 12

.

Hence, applying Lemma 4, we obtain that

||R(4)||∞ ≤ 3

2d||4||2∞C3

Σ? ≤ 6C3Σ?C2

Γ?d(1 + 8α−1

)2

(M− 1

6

(log (a2n

τ )

a3

) 12

)2

= 6C3Σ?C2

Γ?d(1 + 8α−1

)2M− 1

6

(log (a2n

τ )

a3

) 12 αλ

8≤ αλ

8.

62

As ||W ||∞ ≤ M− 16

(log(a2nτ )

a3

) 12

= αλ8

, by Lemma 3, we conclude that K = K. Thus,

||K−K?||∞ satisfies the same bound as ||4||∞ if c0 = min (a1, σm, c)−1 .

Proof of Theorem 6. If M >(

log(a2nτ )

a3

)3(

2CΓ?(1+8α−1)km

)6

, then

km > 2CΓ?(1 + 8α−1

)M− 1

6

(log (a2n

τ )

a3

) 12

≥ 2CΓ?(||W||∞ + λ).

Thus, choosing

c1 = max

(c0,

2CΓ? (1 + 8α−1)

km

),

following the same steps as in Theorem 5, and appealing to Lemma 6, we conclude the

desired result.

63

realized networks - aarhus universitet...we are grateful for useful comments to peter hansen, nour...

Documents