not all that glitters is rmt in the forecasting of risk of portfolios in the brazilian stock market

16
Physica A xx (xxxx) xxx–xxx Contents lists available at ScienceDirect Physica A journal homepage: www.elsevier.com/locate/physa Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market Q1 Leonidas Sandoval Jr. , Adriana Bruscato Bortoluzzo, Maria Kelly Venezuela Insper Instituto de Ensino e Pesquisa, Rua Quatá, 300, São Paulo, SP, 04546-2400, Brazil highlights Use of Random Matrix Theory and the Single Index Model in the building of portfolios. We compare combinations of techniques in the cleaning of the correlation matrix. Cleaning the correlation matrix is not always advisable for times of high volatility. article info Article history: Received 6 October 2013 Received in revised form 20 March 2014 Available online xxxx Keywords: Portfolio building Covariance matrix Random Matrix Theory Single Index Model BM&F-Bovespa abstract Using stocks of the Brazilian stock exchange (BM&F-Bovespa), we build portfolios of stocks based on Markowitz’s theory and test the predicted and realized risks. This is done using the correlation matrices between stocks, and also using Random Matrix Theory in order to clean such correlation matrices from noise. We also calculate correlation matrices using a regression model in order to remove the effect of common market movements and their cleaned versions using Random Matrix Theory. This is done for years of both low and high volatility of the Brazilian stock market, from 2004 to 2012. The results show that the use of regression to subtract the market effect on returns greatly increases the accuracy of the prediction of risk, and that, although the cleaning of the correlation matrix often leads to portfolios that better predict risks, in periods of high volatility of the market this procedure may fail to do so. The results may be used in the assessment of the true risks when one builds a portfolio of stocks during periods of crisis. © 2014 Elsevier B.V. All rights reserved. 1. Introduction 1 Modern portfolio theory is largely based on Markowitz’s ideas, where portfolios of various equities are built on the prin- 2 ciple of minimizing risk given some expected returns, allowing one to obtain an efficient frontier of risk and returns of 3 portfolios. Risk is assessed as the volatility of each stock that made up the portfolio, as well as their covariances. The covari- 4 ance matrix is used to predict the risk of a portfolio, and it is usually different from the realized risk of the same portfolio, 5 since the matrix is built using the stock returns of past data. 6 Three problems arise from this approach. The first one is that past data reflect the market as it was, and not as it will be. 7 So, the theory assumes the hypothesis that future events shall mimic past events, which is usually not true, since it does 8 not incorporate news releases, or the current mood of the market. There is not much that can be done about this, but to 9 minimize effects of events that might change the behavior of a market, one cannot use past data that is too far in the past. 10 Corresponding author. Tel.: +55 11 46362034; fax: +55 11 986994965. E-mail addresses: [email protected], [email protected] (L. Sandoval Jr.), [email protected] (A.B. Bortoluzzo), [email protected] (M.K. Venezuela). http://dx.doi.org/10.1016/j.physa.2014.05.006 0378-4371/© 2014 Elsevier B.V. All rights reserved.

Upload: maria-kelly

Post on 30-Dec-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

Physica A xx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Physica A

journal homepage: www.elsevier.com/locate/physa

Not all that glitters is RMT in the forecasting of risk ofportfolios in the Brazilian stock market

Q1 Leonidas Sandoval Jr. ∗, Adriana Bruscato Bortoluzzo, Maria Kelly VenezuelaInsper Instituto de Ensino e Pesquisa, Rua Quatá, 300, São Paulo, SP, 04546-2400, Brazil

h i g h l i g h t s

• Use of RandomMatrix Theory and the Single Index Model in the building of portfolios.• We compare combinations of techniques in the cleaning of the correlation matrix.• Cleaning the correlation matrix is not always advisable for times of high volatility.

a r t i c l e i n f o

Article history:Received 6 October 2013Received in revised form 20 March 2014Available online xxxx

Keywords:Portfolio buildingCovariance matrixRandomMatrix TheorySingle Index ModelBM&F-Bovespa

a b s t r a c t

Using stocks of the Brazilian stock exchange (BM&F-Bovespa), we build portfolios of stocksbased on Markowitz’s theory and test the predicted and realized risks. This is done usingthe correlation matrices between stocks, and also using RandomMatrix Theory in order toclean such correlation matrices from noise. We also calculate correlation matrices using aregression model in order to remove the effect of common market movements and theircleaned versions using RandomMatrix Theory. This is done for years of both low and highvolatility of the Brazilian stock market, from 2004 to 2012. The results show that the useof regression to subtract the market effect on returns greatly increases the accuracy of theprediction of risk, and that, although the cleaning of the correlation matrix often leads toportfolios that better predict risks, in periods of high volatility of themarket this proceduremay fail to do so. The results may be used in the assessment of the true risks when onebuilds a portfolio of stocks during periods of crisis.

© 2014 Elsevier B.V. All rights reserved.

1. Introduction 1

Modern portfolio theory is largely based on Markowitz’s ideas, where portfolios of various equities are built on the prin- 2

ciple of minimizing risk given some expected returns, allowing one to obtain an efficient frontier of risk and returns of 3

portfolios. Risk is assessed as the volatility of each stock that made up the portfolio, as well as their covariances. The covari- 4

ance matrix is used to predict the risk of a portfolio, and it is usually different from the realized risk of the same portfolio, 5

since the matrix is built using the stock returns of past data. 6

Three problems arise from this approach. The first one is that past data reflect the market as it was, and not as it will be. 7

So, the theory assumes the hypothesis that future events shall mimic past events, which is usually not true, since it does 8

not incorporate news releases, or the current mood of the market. There is not much that can be done about this, but to 9

minimize effects of events that might change the behavior of a market, one cannot use past data that is too far in the past. 10

∗ Corresponding author. Tel.: +55 11 46362034; fax: +55 11 986994965.E-mail addresses: [email protected], [email protected] (L. Sandoval Jr.), [email protected] (A.B. Bortoluzzo),

[email protected] (M.K. Venezuela).

http://dx.doi.org/10.1016/j.physa.2014.05.0060378-4371/© 2014 Elsevier B.V. All rights reserved.

Page 2: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

2 L. Sandoval Jr. et al. / Physica A xx (xxxx) xxx–xxx

This leads us to the second problem, which are the deviations associated with the finite sample effect, that arises purely1

from the fact that the available data are finite. Since one cannot go back in time indefinitely, and even if one could, it would2

not be advisable given the discussion in the preceding paragraph, there is only a limited amount of data (in our case, price3

quotations) fromwhich to build a covariancematrix. The problemgets evenmore severe ifwe think that an efficient portfolio4

should be built frommany and diverse equities, while maintaining a fairly recent scope of historical data, since that leads to5

more finite sample effects due to a smaller ratio between the number of days in the historical data and the number of stocks6

in the portfolio.7

A third problem is the statistical noise that emerges from the complex interactions between the many elements of a8

stockmarket: news, foreignmarkets, crisis, and the very prices of stocks interact in order to guide the price of a stock. Those9

interactions are usually too complex to be accommodated by any econometric model.10

So, all these effects are incorporated into the covariance matrix that is used in the attempt to forecast the risk of a partic-11

ular portfolio, and if one can remove some of those from the matrix, one is then able to make better risk predictions. Some12

authors made studies on the influence of noise and other factors on the covariancematrix in the building of portfolios [1–6].13

Most of the approaches for solving them involve the reduction of the dimensionality of the covariancematrix by introducing14

some structure into it, obtained by principal component analysis, and the separation of stocks into economic sectors, among15

other means [7,8].16

A technique that has been applied to a number of complex systems, and, particularly, to financial markets, is Random17

Matrix Theory [9]. Of the many results that were obtained, the building of portfolios that most closely resemble the realized18

risk of the future market, based on past data, is one of them [10–12], and it has been successfully applied to stocks [13,14],19

and to hedge funds [15].20

Random Matrix Theory had its origins in 1953, in the work of the Hungarian physicist Eugene Wigner [16,17]. He was21

studying the energy levels of complex atomic nuclei, such as uranium, and had no means of calculating the distances be-22

tween those levels. He then assumed that those distances between energy levels should be similar to the ones obtained from23

a random matrix which expressed the connections between the many energy levels. Surprisingly, he could then be able to24

make sensible predictions about how the energy levels related to one another by removing the results due to a randomma-25

trix. The theorywas later developed, withmany and surprising results arising. Of particular importance for our study are the26

results obtained by Marčenku and Pastur [18] on Random Matrix Theory applied to correlation matrices, better described27

in the section on methodology.28

Today, Random Matrix Theory is applied to quantum physics, nanotechnology, quantum gravity, the study of the struc-29

ture of crystals, andmay have applications in ecology, linguistics, andmany other fields where a huge amount of apparently30

unrelated information may be understood as being somehow connected. The theory has also been applied to finance in a31

series of works dealing with the correlation matrices of stock prices, as well as with risk management in portfolios [19–23]32

(for a recent review on the subject, see Ref. [24]).33

Another technique that can be used to better estimate the real relations among the components of thematrix correlation34

is to use a regression model to remove the market effect on the asset returns, i.e., to estimate the relationship between35

returns and an asset that represents the market (like the BM&F-Bovespa index, in Brazil’s case) and use only the residue36

of this model, thus eliminating the common variations of all stocks due to market movements. This procedure allows the37

estimation of the correlation matrix with greater precision, since there is just a part of the dependence which is due to38

the assets, which generates more reliable forecasts for the risk of a portfolio, being a large part of it due to the collective39

responses of themarket to news or to other factors. This procedure is standard inmanymodels in finance, most importantly40

in the CAPM (Capital Asset Pricing Model), and it is called Single Index Model (SIM), based on the idea that the majority of41

the systemic risk is captured by a single market index. The use of SIM is similar to the use of one component in the RMT42

filter, since the highest eigenvalue of the correlation matrix corresponds with the market.43

Other models can be used to remove internal or external effects, the so called factor models, that defend the hypothesis44

that the systemic risk is due to a number of factors, which may include statistical, macroeconomic, or fundamentalist45

influences, that also can be used to remove noise. Ross [25], as an example, presents a model, called Arbitrage Pricing Model46

(APT), which usesmore than one factor to explain systemic risk. According to Campbell, Lo andMcKinley [26], the APTmodel47

provides an approximate relation for expected asset returns with an unknown number of unidentified factors. In the same48

way as Rosenow [27] selected the number of factors to be used in a MV-GARCHmodel based on the number of eigenvectors49

of the correlation matrix outside the Wishart (noise) region, RMT may be used in order to decide how many factors should50

be used in a multifactor model of the APT type.51

Previous works on the stock exchanges of emerging markets using Random Matrix Theory have been conducted for52

South Africa [28,29], India [30], Sri Lanka [31], and Mexico [32]. Their results show some differences between the stock53

exchanges of emerging markets and the stock exchanges of more developed ones, such as less liquidity for the stocks, and54

less integration of different sectors.55

Recent results on the application of Random Matrix Theory to financial data are basically concerned with the actual56

calculation of the optimal portfolios, which involve the inversion of the correlationmatrix of log-returns of the time series of57

the stocks thatmay take part in the portfolio [33,34] and on a better formalization of the theory [35], with recent results [36]58

that claim to outperform the usual cleaning procedures used in this article.59

Following a similar methodology as ours, the authors in Ref. [37] studied the stock market from Chile using Random60

Matrix Theory and performing an analysis of the eigenvalues and of the eigenvectors of the correlation matrix, and also61

Page 3: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

L. Sandoval Jr. et al. / Physica A xx (xxxx) xxx–xxx 3

studying the dynamics of those eigenvectors, revealing some structure based on some key industrial sectors of the Chilean 1

economy. They also used Vector Autoregressive Analysis in order to pinpoint the main drivers of the Chilean stock market. 2

The contribution of this article is to combine the use of the RMT and the SIMmethods which are capable of ameliorating 3

the risk forecasts of a portfolio built with stock market assets, based on past data, and doing it for periods of both low and 4

high volatilities. UsingMarkowitz’s theory, we calculate portfolios of stocks by three differentways: cleaning the correlation 5

matrix using RMT, removing the market effect of the assets (SIM), and combining the two procedures. We then compare all 6

these results with the risk of portfolios built by the usual way, i.e., without using RMT and/or SIM. 7

In order to analyze the suitability of the proposed methods, we shall use the daily returns of the BM&F-Bovespa stocks 8

from 2004 to 2011, so involving years of both low and high volatility. We only use stocks with 100% liquidity, which means 9

that there was negotiation of those stocks every day the stock exchange was open, considering pairs of years ranging from 10

2004 to 2012. For each year being analyzed, we built a portfolio using data from the previous year in order tomake a forecast 11

of the risk for the target year, and that forecasted risk is then compared with the realized risk in that year. We use 61 stocks 12

for 2004–2005, 72 stocks for 2005–2006, 86 stocks for 2006–2007, 105 stocks for 2007–2008, 148 stocks for 2008–2009, 13

153 stocks for 2009–2010, 134 stocks for 2010–2011, and 125 stocks for 2011–2012. 14

We also analyzed the evolution of the portfolios in time using 32 stocks that were 100% liquid in the years ranging from 15

2003 to 2012 and studied the differences between predicted and realized portfolios in time. As data used in this article 16

include periods of both low and high volatility in the BM&F-Bovespa, in particular the data collected during the Subprime 17

Mortgage Crisis of 2007 and 2008, we are able to study how this technique of cleaning the correlation matrix and/or using 18

the Single Index Model applies to times of high volatility. 19

The article is organized as follows: Section 2 is dedicated to the standard way of building portfolios (according to 20

Markowitz). Section 3 introduces the basic concepts of RandomMatrix Theory, and the characteristics of the eigenvalues of 21

the correlationmatrix, in addition to building portfolios by cleaning the correlationmatrixwhen short selling is not allowed, 22

aswell as the regression for the removal of themarket effect. Themeasures of howwell predicted risk approximates realized 23

risk for equal values of returns and the discussion of the results are in Section 4. Section 5 presents the analysis of how the 24

proposed measures evolve in time, and the article ends with final remarks in Section 6. Q2 25

2. Building portfolios using Markowitz’s theory 26

In this section,we shall start by building portfolios using the stocks thatwere 100% liquid during the years 2004 and 2005, 27

as an example, based on the correlation matrix of their returns (we shall refer to log-returns as simply returns) in the year 28

2004, and then also for the remaining pairs of years. According to the usual portfolio theory, we can obtain w, the vector of 29

weights of the portfolio due to each stock, by fixing the portfolio return (RE) andminimizing the risk (RI) of the portfolio [38]. 30

The returns of the portfolio is given by 31

RE = wTR, (1) 32

where R is the vector of average returns of each stock. The returns, more precisely the log-returns of each stock are given by 33

Rt = ln(Pt) − ln(Pt−1) ≈Pt − Pt−1

Pt − 1, (2) 34

where Pt is the closing price of one stock at the trading day t . 35

The risk is defined as the variance of the portfolio 36

RI = wTΣRw, (3) 37

where ΣR is the estimated covariance matrix of the N stocks. The risk is then minimized with the constraint that the sum 38

of all weights in the portfolio should be equal to one: 39

Ni=1

wi = 1. (4) 40

One can do that for several values of the average returns, leaving the coordinates of w free to assume negative values, as 41

well as positive ones, so that short selling is allowed. Short selling involves the borrowing and selling of stocks not owned 42

by the seller. The stocks are then bought at the current market price and given back to the lender. This makes it possible to 43

raise the returns of a portfolio, but at the cost of also raising its risk. In finance, this is not always possible, or sometimes it 44

is limited, and so we shall consider the case of no short selling only. 45

In order to build a portfolio, the covariancematrix of a period of time (usually somemonths) prior to the period of invest- 46

ment is used together with a forecast of the expected returns. Those returns, which are unknown, may be approximated by 47

many means, with relative degrees of success. There is a vast literature on the forecasting of returns ([39] and bibliography 48

therein), but this does not concern us in our study of how to improve the prediction of risk. So, in order to restrict ourselves 49

to the analysis of the correlation matrix, we shall consider that our prediction of returns is the best one possible, which is a 50

perfect forecast of expected returns. Of course, if we had a perfect forecast of expected returns, andwe knew it was a perfect 51

Page 4: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

4 L. Sandoval Jr. et al. / Physica A xx (xxxx) xxx–xxx

forecast of returns, we would not need to make any portfolio analysis. We use here the perfect forecast of expected returns1

in order to compare different ways of calculating risk in a way that is independent of the way one tries to forecast returns.2

So, as an example, we first use the covariance matrix from 2004, together with the average returns in 2005 of each stock3

taking part in the data for 2004–2005 (perfect forecast of expected returns), in order to build minimum risk portfolios for4

2005. Doing so, we build an efficient frontier, which is a curve whose points are the minimum risk for a given return. We also5

use the data from 2005, which means perfect forecasts of expected risk and expected returns, in order to build an efficient6

frontier for the realized risk.7

The covariance matrices for both the predicted and the realized risks are calculated using8

V 1/2CV 1/2, (5)9

where C is the correlation matrix obtained from the data and V is a diagonal matrix with the vector of the variances of the10

time series for the target year, in this case 2005, at themain diagonal. So, both covariancematrices, the one for predicted risk11

(based on data from 2004), and the one for realized risk (based on data from 2005) are built on their respective correlation12

matrices, but with the variances of 2005. This is done so as to isolate the effects of the correlation matrices alone, and not of13

the difference in volatility between the two years.14

The Brazilian stock market is heavily influenced by the major stock markets in the world, particularly by the New York15

Stock Exchange, so the crises that affected theworld economy in general also affected Brazil, with slightly varying strengths.16

After some years of low volatility, in which the Brazilian stock market grew and received a good amount of international17

capital, the BM&F-Bovespawas hit by the Subprimemortgage crisis of 2008, likemostmarkets of theworld, with an increase18

of volatility and the flight of international capital, which was directed to safer investments and also used in order to cover19

deficits in many foreign companies. Brazil recovered quickly from that crisis, since inflation was under control and the20

country’s banks had followed a policy of responsible lending and so had no cash problems. So, 2009 and part of 2010 were21

years of relative normalcy for the Brazilian economy, which was seen by other countries as a fast growing one. But with the22

aggravation of the Sovereign Debt crisis of some European countries, and the return of inflation to the Brazilian economy,23

high volatility surged once more in the BM&F-Bovespa, fueled by some irresponsible use by the government of the main oil24

company of the country, Petrobras, and by the fall of the empire of entrepreneur Eike Batista.25

Fig. 1 shows the predicted (dashed lines) and realized (continuous lines) returns and risks of portfolios using the correla-26

tion matrices from 2004–2005, 2005–2006, 2006–2007, 2007–2008, 2008–2009, 2009–2010, 2010–2011, and 2011–2012.27

Note that the curves vary both in shape and in the values of risks and returns, and that the results of predicted risks are28

particularly bad for years of high volatility (see volatility of the predicted year in Table 2). The graphs are made for positive29

values of returns, only, and go from the minimum possible average returns to the maximum possible average returns. Note30

that, for a given return, the predicted risk is sometimes smaller and sometimes higher than the realized one. This may lead31

to a false perception of how risky an investment truly is, and may cause wrong decisions by the portfolio manager.32

Particularly, the results of predicted and realized risks are bad for years 2007–2008 and 2008–2009, when occurred the33

Subprime crisis. So, one can observe that, using just Markovitz’s theory, the use of one year of crisis overestimates the risk34

of the portfolio of a subsequent year without crisis, as in 2008–2009. In the same way, Markowitz’s theory applied to a year35

without crisis underestimates the risk of the portfolio of a subsequent year with crisis, as in 2007–2008. So, just the use of36

Markowitz’s theory is not able to build good portfolios from one previous year without crisis to one predicted year with the37

presence of crisis and this can also happen when the portfolio was built from one previous year with crisis to one predicted38

year with absence of crisis.39

This is most likely due to the fact that Markowitz’s theory is built on the hypothesis of the stationarity of the time series,40

which does not happen during the crisis years. So, the failure to predict risk and returns based on previous data is no fault41

of the theory, but motivated by the fact that there occurred some regime shifts, when the hypothesis of stationarity fails.42

In the next section,we present the RMTmethod to clean the correlationmatrix and the SIMmethod to remove themarket43

effect of the assets, and we build portfolios of stocks using RMT, using just SIM, and combining the two procedures.44

3. Methodology45

In this section, we briefly describe the method proposed for the construction of portfolios by cleaning the correlation46

matrix and removing the market effect, aiming at a better forecasting of risk based on the previous behaviors of the assets.47

Weuse the year 2004 as an example of the application of suchmethod in this section, and then apply the samemethodology48

to the remaining years.49

3.1. Random Matrix Theory50

The first result of Random Matrix Theory that we shall mention is that, given an L × N matrix with random numbers51

from a Gaussian distribution with zero mean and standard deviation σ , then, in the limit L → ∞ and N → ∞ such that52

Q = L/N remains finite and greater than one, the eigenvalues λ of the correlationmatrix built from this randommatrix will53

have the following probability density function, called a Marčenku–Pastur distribution [18]:54

ρ(λ) =Q

2πσ 2

√(λ+ − λ)(λ − λ−)

λ, (6)55

Page 5: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

L. Sandoval Jr. et al. / Physica A xx (xxxx) xxx–xxx 5

Fig. 1. Predicted (dashed lines) and realized (continuous lines) returns and risks of portfolios using the correlation matrices from 2004–2005, 2005–2006,2006–2007, 2007–2008, 2008–2009, 2009–2010, 2010–2011, and 2011–2012.

where 1

λ− = σ 2

1 +

1Q

− 2

1Q

, λ+ = σ 2

1 +

1Q

+ 2

1Q

, (7) 2

and λ is restricted to the interval [λ−, λ+]. 3

Since distribution (6) is only valid for the limit L → ∞ and N → ∞, finite distributions will present deviations from this 4

behavior. In the case of this article, L is the number of observations of the time series of the log-returns of each stock and N 5

is the number of stocks. If L/N is small, we have large finite sample size effect; the larger is L/N , the more reliable are the 6

results, and the fluctuations are more realistically governed by the true population variance. These effects are discussed in 7

detail in Ref. [35]. 8

Another source of deviations is the fact that financial time series are better described by non-Gaussian distributions, such 9

as the t-Student or the Tsallis distributions. The eigenvalue distribution of correlationmatrices derived from time serieswith 10

t-Student distributionswas studied in Ref. [40]. The authors derived a theoretical probability distribution for the eigenvalues 11

which have no higher limit, but which decays as a power law. 12

Knowing that the returns of some assets usually have tails heavier than the tails of the normal distribution, in the next 13

section we present the Kolmogorov–Smirnov test to check if the empirical distribution of eigenvalues of the Pearson’s cor- 14

relation matrix is compliant with the Marčenku–Pastur distribution or if we need to use the Student Ensemble distribution 15

or a Maximum Likelihood estimator of correlations in place of Pearson’s estimator [40]. 16

3.1.1. Eigenvalues and eigenvectors of the correlation matrix 17

We shall now explainwhy RandomMatrix Theory is useful for portfolio building, starting by clarifying how it can be used 18

to remove part of the noise from the correlation matrix. In order to do that we shall consider the data concerning the year 19

Page 6: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

6 L. Sandoval Jr. et al. / Physica A xx (xxxx) xxx–xxx

2004, the first of the years considered here in our study, as an example. For this period we chose Bovespa stocks (by then,1

Bovespa had not yet merged with BM&F) which were negotiated every trading day during the years 2004 and 2005 (20042

will be the past data that will be used to predict the risk in 2005), totalizing 61 stocks. Very similar results are obtained for3

the other pairs of years.4

The correlation matrix (a 61 × 61 matrix) between the variables Rt as defined in Eq. (2) for the year 2004 was then5

calculated. The distribution density of the eigenvalues of the correlationmatrix thus obtained is shown in Fig. 2 (left picture).6

Also, the eigenvalues are plotted in order of magnitude in the right picture of Fig. 2. The shaded area (between λ− and λ+)7

indicates the region predicted by the theory for the data related to a purely random behavior of the normalized returns,8

which is called the Wishart region.9

We have L = 248 days of data for each of the N = 61 stocks, so that Q = 248/61 ≈ 4.06. The probability distribution10

function for a random matrix with L → ∞ and N → ∞ with Q ≈ 4.06 is also plotted in Fig. 2 (left), so that we may11

compare the result of pure noise with the one obtained for our data. The minimum (λ−) and maximum (λ+) values of the12

probability distribution function are given by λ− = 0.254 and λ+ = 2.238.13

The first striking feature is that the largest eigenvalue is more than ten times larger than the maximum value predicted14

for a purely random correlation matrix. About 72% of the eigenvalues fall within the shaded region associated with pure15

noise, 15 of them fall below this region, and another one is above it.16

The eigenvectors e1 and e2 for the two largest eigenvalues, λ1 = 23.505, and λ2 = 2.540, are represented in Fig. 3 (first17

two graphs). The white bars represent positive values and the gray bars represent negative ones.18

The distribution of individual values of eigenvector e1 is very similar for all the stocks considered, showing that all stocks19

contribute to this mode, which is considered ‘‘the market mode’’. For eigenvector e2, one can see the prevalence of some20

stocks over others. In comparison, eigenvectors corresponding to the shaded region (Wishart region) do not show any21

preference for particular stocks. Considering the daily returns of portfolio P1 built with eigenvector e1 and the log-returns22

of the Ibovespa, which is an index that describes the general behavior of the São Paulo Stock Exchange, for the year 2004,23

the correlation between the two vectors is 0.9865, which is a very strong indication that the portfolio P1 corresponds to a24

combination of stocks that behaves much like the market, although with a much larger volatility: the standard deviation of25

the returns of P1 is 12.51%, and the standard deviation for the Ibovespa is 1.80%. The difference in volatilities is explained by26

the different weights that each stock has on the two indices.27

The third and fourth graphs of Fig. 3 show the distributions of the eigenvectors associated with two eigenvalues that are28

inside the Wishart region, λ18 = 0.853, and λ37 = 0.393. Note that there are no clearly defined stock structures. So, the29

situation changes if we consider a portfolio built with eigenvector e37, which corresponds to the noisy part of the eigenvalue30

spectrum: the correlation between this portfolio, P37, and the Ibovespa is 0.1824, and it has a standard deviation 1.72%,31

very close to the standard deviation of the Ibovespa. Although most of the eigenvectors associated with eigenvalues within32

the Wishart region may be associated with noise, some information about the inner structure of the market may also be33

hidden in this region. For instance, Ref. [41], analyzing the Polish stock exchange, found eigenvalues whose eigenvectors34

are associated with sectors of the Polish economy or with companies whose stocks are negotiated in that stock exchange.35

Analyzing our own data, we could also find within the Wishart region eigenvalues whose eigenvectors show peaks in some36

market sectors, like the mining and the metal industries.37

We also show the eigenvectors corresponding to the two lowest eigenvalues of the correlation matrix, λ60 = 0.046 and38

λ61 = 0.039 (last two graphs in Fig. 3). These eigenvectors corresponding to low eigenvalues represent ‘‘portfolios’’ of low39

risk, in opposition to the eigenvectors of the largest two eigenvectors, which represent the oscillations of the market and40

the common behavior of a cluster of stocks that behaves in a similar way. Eigenvector e61 represents a portfolio fromwhich41

the investor buys PETR4 and short-sells PETR3, which are stocks belonging to the same company, Petrobras, and buys ELET342

and short-sells ELET6, which also belong to the same company, Eletrobras. Eigenvector e60, in its turn, represents a portfolio43

from which the investor buys VALE3 and ELET6 and short-sells VALE5 and ELET3, which again are two pairs of stocks of44

the same companies, and also buys PETR3 and short-sells PETR4. For the portfolio P61, built with eigenvector e61, which45

corresponds to the lowest eigenvalue, the correlation with the Ibovespa is 0.0932, and its standard deviation is 0.44%. This46

portfolio presents the lowest correlation with the Ibovespa.47

3.1.2. Discussing the type of theoretical distribution to be used48

We said in Section 1 that the real probability distribution of the eigenvalues of the correlationmatrix, with the exception49

of the abnormally high eigenvalues, may be different from the Marčenku–Pastur probability distribution and more similar50

to the one derived by Biroli, Bouchaud, and Potters [40]. Here we use some instruments to clarify this point.51

First, herein, we use the Kolmogorov–Smirnov test, which in its one-sample form compares a sample of a probability52

distributionwith a reference probability distribution (in our case, theMarčenku–Pastur distribution). The statistic quantifies53

a distance measure between the empirical distribution function of the sample and the cumulative distribution function of54

the reference distribution, and the null distribution of this statistic is calculated under the null hypothesis that the sample55

is drawn from the reference distribution.56

Using the test for the eigenvalues obtained from the data from 2004, we obtained an average distance D = 0.169157

and a p-value = 0.06108, so that the test fails to reject the null hypothesis at the 1% significance level. This means that,58

even considering the high eigenvalues, the probability distribution of the eigenvalues obtained from the correlation matrix59

corresponds very closely to a Marčenku–Pastur distribution.60

Page 7: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

L. Sandoval Jr. et al. / Physica A xx (xxxx) xxx–xxx 7

Fig. 2. Left: histogram of eigenvalues for the correlation matrix of 61 stocks in 2004 and Marčenku–Pastur theoretical distribution (solid line). Right:eigenvalues for the correlation matrix of 61 stocks in 2004 and pure random region.

Fig. 3. Eigenvectors of some fixed eigenvalues: λ1, λ2 (largest), λ18, λ37 (noise region), and λ60, λ61 (lowest eigenvalues).

The results for 2004 and for the remaining years are summarized in Table 1. The Kolmogorov–Smirnov test did not reject 1

the hypothesis that the eigenvalues follow the Marčenku–Pastur distribution with a confidence level of 99% for each one of 2

the years 2004–2011. Thus, we consider the Marčenku–Pastur distribution valid for the eigenvalues calculated based on the 3

Pearson correlation matrix in the case of assets traded in the São Paulo stock exchange in the period evaluated. 4

Moreover, we calculate the correlation matrices based on randomized data, obtained by considering all time series and 5

randomly changing the orders of the observations, individually, so as to preserve the statistics of each time series but to 6

destroy any possible causality relation between the observations and any common response to external factors. A total of 7

10,000 simulations of eigenvalues wasmade using such randomized data, obtaining a probability distribution for the eigen- 8

values which closely resembles the common Marčenku–Pastur distribution, except for small differences in their borders, 9

which now decay like a power law, in a similar way as the Wishart–Student distribution [40], which behaves like λ−1−µ/210

Page 8: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

8 L. Sandoval Jr. et al. / Physica A xx (xxxx) xxx–xxx

Table 1Results for the Kolmogorov–Smirnov test for the eigenvalue probability distribution of thecorrelation matrix for the log-returns of the years from 2004 to 2012 and the correspondingMarčenku–Pastur distributions.

Year Number of stocks Average distance p-value Rejects the null hypothesis?

2004 61 0.1691 0.0611 No2005 72 0.1508 0.0756 No2006 86 0.1272 0.1239 No2007 105 0.1058 0.1906 No2008 148 0.1258 0.0185 No2009 153 0.0846 0.2237 No2010 153 0.0800 0.2808 No2011 134 0.1050 0.1039 No2012 125 0.0972 0.1881 No

Fig. 4. Left: histogram of eigenvalues for the correlation matrix of 61 stocks in 2004, the Marčenku–Pastur theoretical distribution (dashed line), and thedistribution obtained from 10,000 simulations with randomized data (continuous line). Right: the same distribution, but centered around the border λ+ .

for larger values of the eigenvalue λ, where µ is the parameter that leads to the best fit of a t-Student distribution for the1

time series of log-returns for each stock.Q32

In Fig. 4(left graph), data relative to the year 2004 are presented. The real distributions are in block format, theMarčenku–3

Pastur distributions are in dashed lines, and the distributions resulting from the 10,000 simulations are in black lines. As it4

can be seen, the two distributions, the one obtained from randomized data and the one obtained from theMarčenku–Pastur5

distribution, are nearly identical, except for a small difference at the latter’s borders, shown amplified in Fig. 4(right).6

What we obtained by the 10,000 simulations was the distribution of time series that had the exact probability distribu-7

tions as the original ones, be them closer to a Gaussian or a t-Student distribution, but that were unrelatedwith one another.8

That is not a theoretical result, and it is independent of the type of distribution associated with the real data. Since what we9

wish to do is differentiate the real distribution from a distribution of uncorrelated time series, we thought it would be a fit-10

ting result to be used for deciding which eigenvalues were within the noise region. Due to the power law decaying character11

of the Wishart–Student distribution, the results are quite similar to the ones that would be obtained from that distribution.12

Following the sameprocedure described in Ref. [42], we used the simulations to construct confidence bands for the eigen-13

values based on percentiles 0.5% and 99.5% for each of the years evaluated. Then we calculate the theoretical eigenvalues14

according to the Marčenku–Pastur distribution and find that they fall within the confidence bands, so we conclude that we15

cannot discard that the Marčenku–Pastur distribution is appropriate for the smoothed simulation with 99% confidence.16

3.2. Building portfolios with a clean correlation matrix17

The situation may be improved by trying to remove some of the noise of the correlation matrices of 2004 and 200518

returns. One way this can be done is by building a diagonal matrix Dwhere the elements of the diagonal are the eigenvalues19

of the original correlation matrix, but nowwith all eigenvalues corresponding to noise (those between λ− and λ+) replaced20

by their average [11,12,14,15]. Another way is to set all eigenvalues corresponding to noise as zero and then adjust the21

remaining eigenvalues so that their sum is still N [13].22

We shall be following the first approach here, commenting on the second one in the conclusion, and in our present case,23

this average is λ̄ = 0.748 for the eigenvalues based on data from 2004 and λ̄ = 0.790 for the eigenvalues based on data24

from 2005. The clean correlation matrix is then built using the formula25

Cclean = PDP−1, (8)26

where P are matrices whose columns are the eigenvectors of the original correlation matrix. The clean correlation matrix is27

then built using the average standard deviation of returns of the realized data.28

Calculating now the efficient frontier built with the covariancematrix obtained from the clean correlationmatrix of 2004,29

together with the average returns of 2005 (perfect forecast of expected returns), dashed line, and comparing with the real30

curve calculated with the covariance matrix obtained from the clean correlation matrix of 2005, continuous line, we obtain31

Page 9: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

L. Sandoval Jr. et al. / Physica A xx (xxxx) xxx–xxx 9

Fig. 5. Predicted (dashed lines) and realized (continuous lines) returns and risks of portfolios using the clean correlation matrices from 2004–2005,2005–2006, 2006–2007, 2007–2008, 2008–2009, 2009–2010, 2010–2011, and 2011–2012.

the results represented in the first graph of Fig. 5. In the same figure, we also represent the efficient frontier for the remaining 1

years. 2

Looking at Fig. 5, that uses the cleaned correlation matrices, one may see almost no improvement in the visual results 3

when compared with Fig. 1, that uses only Markowitz’s Theory. To both Figs. 1 and 5, the overestimation of the risk of the 4

portfolio occurs when the year before is one year of crisis, and there is underestimation of the risk of the portfolio when 5

the year before is not a year of crisis. Again, the bad results may be explained by regime shifts that occur during the crises 6

that occurred during these periods of time, which undermine the underlying assumptions of both Markowitz’s theory of 7

portfolio and of RandomMatrix Theory, which both assume the stationarity of data. 8

Because of the differences in market volatility, our proposal is to use, besides RMT, a Single Index Model that removes 9

the market factor from data, the theory of which is presented in the next section, and then compare the new results with 10

the ones of Figs. 1 and 5. 11

3.3. Building portfolios by Single Index Models 12

When trying to predict the future expected risk of a portfolio, the volatility due to market movements may make it a 13

difficult task, since one obtains a structure of dependence between the assets and themarket, and not solely the dependence 14

between assets. As an example, the prediction for 2008 using data from 2007 grossly underestimates the risk of 2008, 15

since 2007 was a year with relatively low volatility while 2008 witnessed the height of the USA Subprime Mortgage Crisis. 16

Similarly, risk prediction for 2009 using data from 2008 overestimates the risk for 2009. 17

The most common way to remove this so called systemic risk is to use a Single Index Model, where all log-returns Rt are 18

written in terms of a first degree function of a market index It as, for example, the Ibovespa, plus an error Et : 19

Rt = a + bIt + Et . (9) 20

The coefficients a and b are estimated for each equity using simple linear regression. 21

Page 10: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

10 L. Sandoval Jr. et al. / Physica A xx (xxxx) xxx–xxx

Fig. 6. Predicted (dashed lines) and realized (continuous lines) returns and risks of portfolios using the residues obtained from the single index regressionwith the uncleaned correlation matrices from 2004–2005, 2005–2006, 2006–2007, 2007–2008, 2008–2009, 2009–2010, 2010–2011, and 2011–2012.

As an alternative to the use of the Ibovespa as the market index, one may use the index obtained by the log returns of1

the portfolio of stocks that may be built using the eigenvector corresponding to the highest eigenvalue of the correlation2

matrix of those same stocks. As we showed for the data concerning the year 2004, this index and the Ibovespa are very3

highly correlated, so the results should not be substantially altered by using any of these two indices.4

We then calculated the residues for all stocks being considered for each pair of years using the portfolios built from5

the eigenvector of the highest eigenvalue for each time period being studied as the market index. We then proceeded into6

building portfolios using the correlation matrices between those residues. The resulting efficient frontiers for the pairs of7

years from 2004 to 2012 are drawn in Fig. 6. Once again, the predicted results are in dashed lines and the realized results are8

in continuous lines (almost indistinguishable in most of the graphs). The results for the cleaned correlation matrices of the9

residues are not shown, since they are nearly identical from the point of view of visual accuracy from the results without10

the cleaning procedure.11

A simple visual inspection will reveal that the results for the correlation matrices of the residues of the regression are12

better than the results previously obtained. Also, by looking at the graphs for the portfolios in Figs. 1, 5 and 6, one may see13

that the efficient frontiers for the residues of the regression are closer to zero.14

In the next section, we introduce measurements in order to obtain a quantitative estimate of the performances of RMT15

and SIM, when compared with the theory of Markovitz (our benchmark). A final remark is that, when using the cleaning16

with RMT for the residues of the SIM, one must recalculate λ− and λ+ using the new formula [10]17

λ− =

1 −

λmax

N

σ 2

1 +

1Q

− 2

1Q

, λ+ =

1 −

λmax

N

σ 2

1 +

1Q

+ 2

1Q

, (10)18

where λmax is the largest eigenvalue of the correlation matrix, which is removed using the regression.19

Page 11: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

L. Sandoval Jr. et al. / Physica A xx (xxxx) xxx–xxx 11

4. Results 1

In this section, we present one measure of agreement of the predicted and realized risks calculated with the many 2

combinations of techniques for building portfolios, and one measure of agreement between the correlation matrices of 3

which risks are calculated. 4

4.1. Mean Squared Error 5

The first measure is the Mean Squared Error (MSE), which is defined as 6

MSE =1n

ni=1

RIreali − RIpredi

2, (11) 7

where RIreali is the realized risk and RIpredi is the predicted risk, both for i = 1, . . . , n values of fixed returns. This measures 8

the sum of the squared differences between two risk values (predicted and realized) with the same expected returns. For 9

the pair of years 2004–2005, the Mean Squared Error is MSE = 10.50 × 10−11. 10

Table 2 shows the results for the Mean Squared Error (MSE) for all pairs of years considering four situations for the 11

removal of the market effect: the Benchmark (it means without RMT and SIM methodologies), using only RMT, using only 12

SIM, and using both RMT and SIM methodologies. 13

The MSE values show that the differences between predicted and realized risks were larger for the pairs of years 14

2007–2008 and 2008–2009. This was to be expected, as we saw in Figs. 1 and 5. The MSE measure in 2007–2008 is high 15

even for the best situation, since 2007 tries to forecast a year with high volatility. Nevertheless, the use of SIM is better than 16

with RMT, even though it is not necessary to use them together, probably because 2007 was not a year of crisis and so it was 17

not paramount to filter any high volatility from the data. Now, the years 2008 and 2011 were times of crisis, with higher 18

volatility in the first case. So, using RMT and SIM jointly, we obtain better results, since we eliminate more noise due to the 19

crisis, thus obtaining a better forecast for a following year without crisis. 20

In Table 2, all of the forecasted results were better with the use of regression in order to eliminate the effect of themarket 21

movements. In otherwords, by removing themarket effect andbyworkingwith the residues of the SIM, one eliminatesmuch 22

of the noise of the correlation matrix of the stocks’ returns and allows the forecasted risk to be closer to the realized risk. 23

This result has been verified for the years from 2004 to 2012, for the Brazilian stockmarket, independently of the occurrence 24

of a crisis or of the volatility of the market (see last column in Table 2). 25

In our case, we have different values for Q , according to each pair of years that is being analyzed (see Table 2), and in 26

Ref. [20] the authors say that a large discrepancy between predicted and realized risks can be explained by the high values 27

of Q = L/N . In order to gauge the influence of Q on the results, we considered 61 randomly chosen stocks (the minimum 28

number of stocks for every pair of years) for each pair of years, thus making the value of Q close to 4 for every pair of years, 29

and calculated all the results again. So, our results for fixedQ are very similar to the ones obtainedwith different values ofQ . 30

4.2. Kullback–Leibler distance 31

Other measures of how well a portfolio is related with another portfolio obtained by some cleaning procedure can be 32

devised, some of them based directly on the distances between the correlation matrices. Such distances avoid the actual 33

building of the portfolios and the issue of using or not short selling. One of those measures that can be used as a distance be- 34

tween matrices is the Kullback–Leibler distance [43–46,40]. The discrete version of this measure is based on the probability 35

distributions S and Q of, respectively, the correlation matrices for predicted and realized returns. It is given by 36

DKL =

Ni=1

Si lnSiQi

, (12) 37

where Si and Qi are the probabilities of bin i, i = 1, . . . ,N , occurring in state S of the correlation matrix for predicted risk 38

and in state Q of the correlation matrix for realized risk, respectively, and the element of the sum is considered as zero if Si 39

or Qi are zero. 40

By applying this measure to the data, we obtain the results given in Table 3 for the Kullback–Leibler distance (DKL). The 41

results depend directly on the correlation matrices that can obtained from four different situations: from the original data 42

(Benchmark); from cleaned correlation matrices (RMT); from the residues of the regression (SIM); and from combination of 43

both methodologies (RMT + SIM). This measure depends on the size of the correlation matrices, so that one period of time 44

cannot be truly compared with another, and also depends on the choice and number of bins used to derive the probability 45

distributions of the correlation matrices. 46

A brief study with other choices for bins reveals that the results are not significantly altered with the number of bins 47

used, under certain reasonable limits. Table 3 shows that the best results according to DKL are those with the use of SIM, but 48

no cleaning, no matter the Q values. 49

Page 12: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

12 L. Sandoval Jr. et al. / Physica A xx (xxxx) xxx–xxx

Table 2Mean Squared Error (MSE) ×10−11 of the curves with no short selling, Q = L/N of the previous year andvolatility of the predicted year. The best results for each line are shown in bold face. RMT means RandomMatrix Theory and SIM means Single Index Model.

Previous–predicted Benchmark RMT SIM RMT+ SIM Q Vol. (%) predicted year

2004–2005 10.50 24.99 0.46 4.35 4.07 1.57%2005–2006 2.01 2.60 0.80 1.61 3.44 1.53%2006–2007 50.47 90.03 20.04 3.83 2.85 1.73%2007–2008 105.58 151.90 51.27 90.26 2.32 3.32%2008–2009 397.95 359.41 22.85 17.91 1.68 1.93%2009–2010 8.53 6.79 11.18 4.11 1.60 1.28%2010–2011 50.14 36.73 5.87 3.93 1.61 1.56%2011–2012 46.59 53.87 3.34 1.02 1.84 1.36%

Table 3Kullback–Leibler distance (DKL) between predicted and realized correlationmatrices. The best results for each line are shown in bold face. RMT isRandomMatrix Theory and SIM is Single Index Model.

Previous–predicted Benchmark RMT SIM RMT+ SIM

2004–2005 0.1014 0.0875 0.0102 0.03872005–2006 0.0215 0.0411 0.0092 0.01612006–2007 0.2694 0.3264 0.0077 0.01312007–2008 0.2982 0.3516 0.0524 0.08952008–2009 0.8547 0.8494 0.0518 0.09902009–2010 0.0456 0.0360 0.0032 0.00682010–2011 0.1520 0.1981 0.0062 0.01572011–2012 0.6132 0.6118 0.0067 0.0238

Fig. 7. MSE for sliding windows of 100 days with steps of 5 days for original data without short-selling, without cleaning (left graph) and with cleaning(right graph). Time is shown as the last day of the moving window for each point of the graph (only month/year is shown).

5. Evolution in time1

Our analysis so far has been based on large windows, with a varying number of stocks for each window, and large jumps2

from one window to the other. In order to perform a temporal analysis of the evolution of the portfolios, we now consider3

the 32 stocks thatwere 100% liquid in the period from2003 to 2012 inmovingwindows of 100 days each, with a lag of 5 days4

between each window. For each of these windows, portfolios are built on efficient frontiers with and without regression,5

with and without cleaning, and with and without short selling. For each portfolio, we apply the measures Mean Squared6

Error (MSE) and Kullback–Leibler Distance (DKL).7

5.1. Mean Squared Error8

We start by comparing the results of the MSE measure for portfolios built without cleaning the correlation matrix and9

portfolios with the cleaning of the correlation matrix, without the use of the Single Index Model. In order to do so, we10

calculate a portfolio for each window of 100 days from 2003 to 2012 using the 32 stocks that were 100% liquid in the whole11

period. Fig. 7 shows the results without the cleaning of the correlationmatrices (left) andwith the cleaning of the correlation12

matrices (right). One can notice that the MSEs with the use of RMT are usually half the values obtained without it.13

Page 13: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

L. Sandoval Jr. et al. / Physica A xx (xxxx) xxx–xxx 13

Fig. 8. Left: volatility of the Ibovespa, calculated as the standard deviation of the log-returns of the index in sliding windows of 100 days with a step offive days. Right: volatility of the Ibovespa (dashed line) andMSEwithout the RMT (solid line). Time is shown as the last day of the moving window for eachpoint of the graph (only month/year is shown).

Fig. 9. MSE for sliding windows of 100 days with steps of 5 days for the residues of the regression without short selling and without cleaning (left graph)and with cleaning (right graph). Time is shown as the last day of the moving window for each point of the graph (only month/year is shown).

TheMSE is worse for times of high volatility of themarket, which can be shown by comparing the results of theMSEwith 1

the volatility of the Ibovespa, the main index of the BM&F-Bovespa. In Fig. 8 (left), we plotted the volatility of the Ibovespa, 2

calculated as the standard deviation of the log-returns of the index in sliding windows of 100 days with a step of five days. 3

Note that the times of high volatility during the crisis of 2008 coincided with the period in which the forecasted risk was 4

more different from the realized risk according to the MSEmeasure. Fig. 8 (right) makes this effect clearer by plotting in the 5

same graph the MSE without the RMT and the volatility of the Ibovespa. One may see that periods of high MSE occur when 6

there is a rise in volatility of the Ibovespa. This is clear for the crisis of 2008, but it can also be seen during other abrupt rises 7

of the volatility of the stock market. We also made calculations with running windows of sizes 50 and 200, and the results 8

are robust. 9

Now, we compare the results obtained with and without regression through the Single Index Model (SIM). Fig. 9 shows 10

the MSE for sliding windows of 100 days, with steps of 5 days, for the residues of the regression without (left graph) and 11

with (right graph) cleaning. The MSE for the residues of the regression is, in average, less than half of the results without 12

SIM. The cleaning makes the values of the MSE even smaller. We also notice from Fig. 9 that the results are worse in times 13

of high volatility, even though the regression lessens the effects of high volatility. 14

5.2. Kullback–Leibler distance 15

Here we place the graphs of the Kullback–Leibler distance by (12), which is a measure of the difference between the 16

correlation matrices for predicted and realized risks and not of the efficient frontiers of the portfolios. So, it is more funda- 17

mental, in terms of not depending on the building of portfolios or, for example, if short selling is allowed or not. The results 18

for the Kullback–Leibler distance are represented in Fig. 10 for the original data without cleaning (top left graph) and with 19

cleaning (top right graph), and for the residues of the regression without cleaning (bottom left graph) and with cleaning 20

(bottom right graph). There are many peaks now for the DKL based on original data. The peaks are strongly reduced by ap- 21

plying the SIM, and they tend to concentrate, then, around times of major crisis. One interesting feature is that the cleaning 22

of the correlation matrices does not improve results both for correlation matrices based on original data or residues of a 23

regression, and sometimes makes them worse. 24

Page 14: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

14 L. Sandoval Jr. et al. / Physica A xx (xxxx) xxx–xxx

Fig. 10. Kullback–Leibler distance (DKL) between the predicted and realized correlation matrices for windows of 100 days with sliding windows of 5 days,for the original data without cleaning (top left graph) and with cleaning (top right graph), and for the residues of the regression without cleaning (bottomleft graph) and with cleaning (bottom right graph). Time is shown as the last day of the moving window for each point of the graph (only month/year isshown).

Fig. 11. Minimum and maximum expected and realized risks for the original data (left) and for the residues of the regression (right), both for no shortselling and no cleaning. The predicted risks are in dashed lines and the realized risks are in continuous lines. Time is shown as the last day of the movingwindow for each point of the graph (only month/year is shown).

5.3. Global riskiness1

One final analysis must be made. A portfolio obtained from a cleaning procedure may produce risk predictions that are2

closer to the realized risk, but at the cost of augmenting the global riskiness of the portfolio [22,46]. In order to analyze this,3

we calculated theminimum andmaximum realized risks for each window of 100 days with a step of 5 days for the diversity4

of procedures we are using. Fig. 11 shows the minimum and maximum expected and realized risks for the original data5

(left) and also for the residues of the regression with the single index (right), both for no short selling and no cleaning. The6

graphs with the cleaning procedure are not represented, because they are nearly indistinguishable from their uncleaned7

counterparts.8

Page 15: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

L. Sandoval Jr. et al. / Physica A xx (xxxx) xxx–xxx 15

The predicted and realized risks for the residues are an order of magnitude smaller than the predicted and realized risks 1

for the original data. This is because, by using SIM, we removed the market risk, which affects all stocks in similar ways. 2

6. Conclusions 3

In this article, we used two techniques in order to clean the correlation matrix in the building of portfolios using 4

Markowitz’s theory. The first technique is the use of Random Matrix Theory in order to clean the correlation matrix built 5

from the time series data of stocks in the year prior to that for which the portfolio is to be built. The second technique is to 6

use a regression model in the removal of the market effect due to the common movement of all stocks. These are used in 7

order to forecast the risk of a portfolio in a particular year using data from its previous year with better precision. The data 8

were the time series returns of the 100% liquid assets of the BM&F-Bovespa covering the years from 2004 to 2012. The aim 9

was to combine these two methods in different configurations, and to compare the results in order to obtain the best risk 10

forecasts for portfolios. 11

Based on these twomeasures, one of the agreement between the forecasted and the realized risks – Mean Squared Error 12

(MSE) – and one of the forecasted and realized correlation matrices – Kullback–Leibler distance (DKL) – we concluded that 13

the forecasted risk is closer to the realized risk, depending on the volatility of the forecasted year being smaller or larger 14

than the volatility of the year used in the forecast. 15

In general, the cleaning of the correlationmatrix did not produce better results than using the original correlationmatrix 16

(without cleaning) for theMSE. The use of the regression for the removal of themarket produced better results thanwithout 17

the use of the regression in 100% of the cases, according to the measures MSE and DKL. Also, in all cases in which the use of 18

RMT produced better results, regression was also used, i.e., there are evidences that the joint use of the two methodologies 19

may improve the forecast of realized risk for the portfolios built usingMarkovitz’s criteria. The combination of the regression 20

with the cleaning of the correlationmatrix leads to better results in the forecast of the risk of the assets for 62,5% of the cases 21

with the measureMSE. 22

The differences between forecasted and realized risks for times of crisis may be associated with the fact that there 23

occur regime shifts between times of low and high volatilities, which undermines the very hypothesis over which both 24

Markowitz’s theory and RandomMatrix Theory are built, that data are stationary. 25

In Ref. [37], the authors obtained better results for the cleaning procedure of the correlation matrices for the Chilean 26

stock market. Only for data based on 2007 forecasting the results for 2008, and only then for portfolios with the restriction 27

of no short-selling, they obtained a worse result with the cleaning procedure. The possible reason for the relative failure of 28

RMT in times of crisis having been detected by us and by no others may be because no one else did this same analysis for a 29

period of crisis like the one in 2008. 30

Another approach to removing noise using RMT was developed by Plerou, Gopikrishnan, Rosenow, Amaral, Guhr, and 31

Stanley [13]. Making the calculations with this approach, we obtained similar results as we obtained before, but usually 32

with worse results for the Mean Squared Error (MSE). 33

In the temporal analysis, we could see that the difference between forecasted and realized risks is larger for times of high 34

volatility of the stock market when analyzed by the measure MSE, and that the difference between the predicted and the 35

realized correlation matrices is also larger for times of high volatility when analyzed by DKL. In general, the regression leads 36

to better results in all measures, but there is no significant difference between results obtained with cleaned or not cleaned 37

correlation matrices. In results that are not posted here, we verified that the use of a model of regression using the eigen- 38

vectors corresponding to the first and the second largest eigenvalues of the correlation matrix did not lead to better results. 39

In doing the calculations, it became clear to us that a crucial part in the calculation of optimal portfolios is the inversion 40

of the covariance matrix, which may lead to many numerical errors, particularly in times of crisis. Fortunately, there has 41

been some progress in this field [33,34]. 42

So, what this work shows is that the use of a regression method with a single index in the removal of market effects is 43

usually advisable, but the use of RandomMatrix Theory in the removal of noise from the correlationmatrices tends to fail in 44

the forecasting for years of high volatility, which are precisely the occasions in which a reliable risk forecast is most needed. 45

Acknowledgments 46

L. Sandoval Jr. and M.K. Venezuela thank the support of this work by a grant from Insper, Instituto de Ensino e Pesquisa. Q4 47

We are also grateful to Gustavo Curi Amarante, who collected the data and to Nicolas Eterovic, for useful discussions. We 48

also thank the anonymous reviewers, for their valuable suggestions and insights, which improved this article immensely. 49

This article was written using LATEX, all figures were made using PSTricks, and the calculations were made using Matlab, R 50

and Excel. All data are freely available upon request on [email protected]. Supplementary material containing more 51

complete tables and graphs for all the pairs of years can be obtained from the authors. 52

References 53

[1] G.M. Frankfurter, H.E. Phillips, J.P. Seagle, Portfolio selection: the effects of uncertain means, variances, and covariances, J. Finan. Quant. Anal. 6 (1971)1251–1262.

54

[2] G.M. Frankfurter, H.E. Phillips, J.P. Seagle, Estimation risk in the portfolio selection model: a comment, J. Finan. Quant. Anal. 7 (1972) 1423–1424. 55

Page 16: Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market

16 L. Sandoval Jr. et al. / Physica A xx (xxxx) xxx–xxx

[3] J.P. Dickinson, The reliability of estimation procedures in portfolio analysis, J. Finan. Quant. Anal. 9 (1974) 447–462.1

[4] J.D. Jobson, B.M. Korkie, Estimation for Markowitz efficient portfolios, J. Amer. Statist. Assoc. 75 (1980) 544–554.2

[5] R.O. Michaud, The Markowitz optimization enigma: is ‘optimized’ optimal? Financ. Anal. J. 45 (1989) 31–42.3

[6] V. Chopra, W.T. Ziemba, The effect of errors in mean and co-variance estimates on optimal portfolio choice, J. Portfolio Manage. (1993) 6–11.4

[7] P. Jorion, Bayes-stein estimation for portfolio analysis, J. Finan. Quant. Anal. 21 (1986) 279–292.5

[8] V. DeMiguel, L. Garlappi, F.J. Nogales, R. Uppal, A generalized approach to portfolio optimization: improving performance by constraining portfolionorms, Manag. Sci. 55 (2009) 782–812.

6

[9] M.L. Mehta, RandomMatrices, Academic Press, 2004.7

[10] L. Laloux, P. Cizeau, J.-P. Bouchaud, M. Potters, Noise dressing of financial correlation matrices, Phys. Rev. Lett. 83 (1999) 1467–1470.8

[11] L. Laloux, P. Cizeau, J.-P. Bouchaud, M. Potters, Randommatrix theory and financial correlations, Int. J. Theor. Appl. Finance 3 (2000) 391–397.9

[12] B. Rosenow, V. Plerou, P. Gopikrishnan, H.E. Stanley, Portfolio optimization and the randommagnet problem, Europhys. Lett. 59 (2002) 500.10

[13] V. Plerou, P. Gopikrishnan, B. Rosenow, L.A.N. Amaral, T. Guhr, H.E. Stanley, A random matrix theory approach to cross-correlations in financial data,Phys. Rev. E 65 (2002) 066126.

11

[14] S. Sharifi, M. Crane, A. Shamaie, H. Ruskin, Randommatrix theory for portfolio optimization: a stability approach, Physica A 335 (2004) 629–643.12

[15] T. Conlon, H.J. Ruskin, M. Crane, Randommatrix theory and fund of funds portfolio optimization, Physica A 382 (2007) 565–578.13

[16] E.P. Wigner, Characteristic vectors of bordered matrices with infinite dimensions, Ann. of Math. 62 (1955) 548–564.14

[17] E.P. Wigner, On the distribution of the roots of certain symmetric matrices, Ann. of Math. 67 (1958) 325–327.15

[18] V.A. Marěnko, L.A. Pastur, Distribution of eigenvalues for some sets of randommatrices, Math. USSR-Sb. 1 (1967) 457–486.16

[19] S. Pafka, I. Kondor, Noisy covariance matrices and portfolio optimization, Eur. Phys. J. B 27 (2002) 277–280.17

[20] S. Pafka, I. Kondor, Noisy covariance matrices and portfolio optimization II, Physica A 319 (2003) 487–494.18

[21] J.-P. Onnela, A. Chakraborti, K. Kaski, Dynamics of market correlations: taxonomy and portfolio analysis, Phys. Rev. E 68 (2003) 056110.19

[22] V. Tola, F. Lillo, M. Gallegati, R.N. Mantegna, Cluster analysis for portfolio optimization, J. Econom. Dynam. Control 32 (2008) 235–258.20

[23] E. Pantaleo, M. Tumminello, F. Lillo, R.S. Mantegna, When do improved covariance matrix estimators enhance portfolio optimization? An empiricalcomparative study of nine estimators, Quant. Finance 11 (2011) 1067–1080.

21

[24] J.-P. Bouchaud, M. Potters, Financial applications of random matrix theory: a short review, in: G. Akemann, J. Baik, P. Di Francesco (Eds.), The OxfordHandbook of RandomMatrix Theory, Oxford University Press, 2011.

22

[25] S. Ross, The Arbitrage theory of capital asset pricing, J. Econom. Theory 50 (1976) 30–49.23

[26] J.Y. Campbell, A.W. Lo, A.C. MacKinlay, The Econometrics of Financial Markets, second ed., Princeton University Press, 1997.24

[27] B. Rosenow, Determining the optimal dimensionality of multivariate volatility models with tools from random matrix theory, J. Econom. Dynam.Control 32 (2008) 279–302.

25

[28] E. Wilcox, T. Gebbie, On the analysis of cross-correlations in South African market data, Physica A 344 (2004) 294–298.26

[29] E. Wilcox, T. Gebbie, An analysis of cross-correlations in an emerging market, Physica A 375 (2007) 584–598.27

[30] R.K. Pan, S. Sinha, Collective behavior of stock price movements in an emerging market, Phys. Rev. E 76 (2007) 046116.28

[31] K,G.D.R. Nilantha, M. Ranasinghe, P.K.C. Malmini, Eigenvalue density of cross-correlations in Sri Lankan financial market, Physica A 378 (2007)345–356.

29

[32] L.M.H. Medina, R.C. Mansilla, Teoria de matrices aleatorias y correlacion de series financieras. El caso de la Bolsa mexicana de Valores Revista deAdministracion, Finan. Econ. 2 (2008) 125–135.

30

[33] N. El Karoui, High-dimensionality effects in the Markowitz problem and other quadratic programs with linear equality constraints: riskunderestimation, Ann. Statist. 38 (2010) 3487–3566.

31

[34] S. Matsumoto, General moments of the inverse realWishart distribution and orthogonalWeingarten functions, J. Theoret. Probab. 25 (2012) 798–822.Q532

[35] Z. Burda, A. Jarosz, M.A. Nowak, J. Jurkiewicz, G. Papp, I. Zahed, Applying free random variables to randommatrix analysis of financial data. Part I: theGaussian case, Quant. Finance 11 (2011) 1103–1124.

33

[36] B. Collins, D. McDonald, N. Saad, Compound Wishart matrices and noisy covariance matrices: risk underestimation, 2013. arXiv:1306.5510v1.34

[37] N.A. Eterovic, D.B. Eterovic, Separating the wheat from the chaff: understanding portfolio returns in an emergingmarket, Emerg. Mark. Rev. 16 (2013)145–169.

35

[38] H.M. Markowitz, Portfolio selection, J. Finance 7 (1) (1952) 77–91.36

[39] E.J. Elton, M.J. Gruber, S.J. Brown, W. Goetzmann, Modern Portfolio Theory and Investment Analysis, eighth ed., Wiley, 2009.37

[40] G. Biroli, J.-P. Bouchaud,M. Potters, The student ensemble of correlationmatrices: eigenvalue spectrum and Kullback–Leibler entropy, Acta Phys. Pol. B13 (2007) 4009–4026.

38

[41] M. Snarska, Applying free random variables to the analysis of temporal correlations in real complex systems, Thesis submitted in partial fulfilment of39

the requirements of the degree of Doctor of Philosofy in Physics, Jagellonian University, Poland, 2010.40

[42] R.S. Tsay, Model checking via parametric bootstraps in time series analysis, Appl. Stat. 41 (1992) 1–15.41

[43] S. Kullback, R.A. Leibler, On information and sufficiency, Ann. Math. Stat. 22 (1951) 79–86.42

[44] M. Tumminello, F. Lillo, R.N. Mantegna, Kullback–Leibler distance as a measure of the information filtered from multivariate data, Phys. Rev. E 76(2007) 031123.

43

[45] M. Tumminello, F. Lillo, R.N. Mantegna, Shrinkage and spectral filtering of correlation matrices: a comparison via the Kullback–Leibler distance, ActaPhys. Pol. B 38 (2007) 4079–4088.

44

[46] M. Tumminello, F. Lillo, R.N. Mantegna, Correlation, hierarchies, and networks in financial markets, J. Econ. Behav. Organ. 75 (2010) 40–58.Q645