liam_mescall - pca project

8/4/2019 Liam_Mescall - PCA Project

1/15

Project Title Principal Components (PC) Analysis A Portfolio Risk Analysis Application

Course MSc in Computational Finance

Module Empirical Finance (FI6061)

Student Liam Mescall (I.D. no 0144126)


2/15

TABLE OF CONTENTS

PAGE

Abstract 1

I. Introduction 1II. Data Summary and Description 2III. Methodology 3III.I. Fundamentals of PCA 3

III.II Application of Fundamentals to Equity Model 4

IV. MatLab Used in Analysing Data 6V. Empirical Analysis 6

VI. Results Observed from Analysis 8

VII. Conclusion 12

Appendices

Bibliography


3/15

Principal Components (PC) Analysis A Portfolio Risk Analysis Application

November 25, 2010

Abstract

The ability to accurately quantify the relationship between risk and return has far reaching consequences.Adopting PCA techniques allows for the division of the risk metrics into market/systematic risk and specific risk

inherent from the nature of the company itself. Separating these two risk components has allowed for the

introduction of investment techniques such as market risk neutral adopted by large hedge funds that

neutralize the market risk and identify stocks more or less susceptible to market movements allowing for, in

theory, a portfolio which will make relatively riskless profits. This report has undertaken a PCA analysis of the

Dow Jones Industrial Average ten largest stocks by market capitalisation (Appendix 1). This will involve

generating a portfolio with an equal weighted investment in the ten chosen stocks and another with a market

weighted portfolio. These will be evaluated and risk-return relationship assessed for potential investment.

Interpretation of concepts such as covariance, correlation, eigenvectors, eigenvalues and selection of

eigenvectors are fundamental to achieving this. These concepts will be addressed and data processed through

MatLab and an understanding of the theoretical concepts then used to develop recommendations as to

potential investments in the portfolio. Having successfully completed the analysis, we can conclude that a lowamount of principal component factors were sufficient to explain a satisfactory amount of the variance. We

also identified a relatively high degree of collinearity present.

Keywords: PCA, risk components, equal weight, market weight, covariance, correlation, eigenvectors,

eigenvalues, MatLab.

Introduction

Models for time series data have many forms and represent a variety of stochastic processes, both stationary

and integrated. Classification of the process type as stationary offers insight into the extent the data is subject

to trend, variance and the extent the joint distributions between variables are similar, data is often converted

to % returns to achieve this. When modelling these variations, three classes of practical importance emerge.

These are autoregressive (AR) models, the integrated (I) models, and the moving average (MA) models, all ofwhich depend on historical data points. These have been put together to produce autoregressive moving

average (ARMA) and autoregressive integrated moving average (ARIMA) models. Limitations in these models

ability to extract more subtle information has led to a rise in the use of Principal Component Analysis (PCA)

which offers a tool to extract this information with the minimum amount of noise. It is a powerful statistical

tool with numerous practical uses. A key assumption used is that the vast array of factors that affect data can

be neatly refined to a few uncorrelated composite variables, called principal components, which provides a

parsimonious description of the datas dynamics. Assumptions of linearity, importance of large variances and

that principal components are orthogonal are also present.

This reduction in dimensionality is particularly useful in finance, as stock prices are affected by a multitude of

economic variables that are difficult to translate into one detailed price model. From a practical perspective,

the use of PCA in bond markets has revealed that only three principal components (related to the level, thesteepness and the curvature of the yield curve) are sufficient to explain almost all the variations in interest

rates [1]. The ability to identify these three areas focuses risk management to managing the effects of these

three risk factors on the portfolio value, regardless of the number and characteristics of the bonds included in

the portfolio which saves untold amounts of time and resources. Being able to establish factors such as

moneyness and time as principal components with a high degree of confidence allows for the modelling of a

volatility surface and investment decisions to be made based on this. Modelling of the forward curve is

essential to all forms of risk management while traders and portfolio managers will deal with instruments

whose evolution through time must be modelled i.e. OTC derivatives. Any attempt to price or manage this risk

of securities such as OTC derivatives, swaptions or gas storage contracts (or other securities dependant on a

specific forward price) will require a model that describes this evolution. The pricing of any forward is further

complicated with the addition of forward prices to the curve which increases parameters and complexity. The

nature of PCA dimension reduction addresses this [3]. Value at risk (VAR) portfolio analysis has been refinedthrough the use of PCA to employ more accurate probability and of default over specific time horizons. It has

led to further precision in hedging bond and equity portfolios taking duration, twist and bend movements into
http://en.wikipedia.org/wiki/Stochastic_processeshttp://en.wikipedia.org/wiki/Autoregressivehttp://en.wikipedia.org/wiki/Moving_average_modelhttp://en.wikipedia.org/wiki/Autoregressive_moving_averagehttp://en.wikipedia.org/wiki/Autoregressive_moving_averagehttp://en.wikipedia.org/wiki/Autoregressive_integrated_moving_averagehttp://en.wikipedia.org/wiki/Autoregressive_integrated_moving_averagehttp://en.wikipedia.org/wiki/Autoregressive_moving_averagehttp://en.wikipedia.org/wiki/Autoregressive_moving_averagehttp://en.wikipedia.org/wiki/Moving_average_modelhttp://en.wikipedia.org/wiki/Autoregressivehttp://en.wikipedia.org/wiki/Stochastic_processes


4/15

account. The theme running through all these practical applications is the introduction of increased accuracy

which has cemented the importance of PCA in modern finance.

Data Summary and Description

Data selected for this study has been compiled from daily returns spanning the 26/9/08 and 6/10/10.

Historically we can observe series of asset prices are largely uncorrelated over time as we can see from figureone where the rebased values gradually separate over time and the range of deviation from par grows.

Figure 1

We have also noted during times of a three sigma event, correlations in all markets tend towa rds one [4].

Figure 2 shows values from days after the collapse of Lehman Brothers and the clustered pattern of return

movements over the following months. Studies specific to equity markets have also illustrated this point to be

true [7].

Figure 2

This graphical representation of returns over selected periods is an important illustration and insight into why

the figures per variance-covariance (V) matrix and correlation matrix may have certain values. The relationship

between the complete sample of data over the period in question is described below in figure 3 and figure 4.

This data is fundamental to the development of our principal component analysis.

A high degree of collinearity is evident which is typical of a system where there are only a few important

sources of information in the data, which are common to many variables [ ]. This will become obvious when


5/15

the principal components are calculated and the cumulative percentage of variation evident in first few

components reviewed.

Variance Covariance Matrix (V)

Figure 3

Correlation Matrix

Figure 4

Depending on how results will be used, the PCA analysis can be performed on either the covariance matrix or

correlation matrix. A principal component analysis based on the covariance matrix has the advantage of

providing a linear factor model for the returns, and not a linear factor model for the standardized returns, as in

the case when we use the correlation matrix. A PCA on the covariance matrix captures all the movement in the

variables, which may be dominated by the differing volatilities of individual variables while a PCA on the

correlation matrix only captures the similarities in movements in returns, ignoring their individual volatilities.

To suit the objectives of this project I have chosen to undertake the PCA on the covariance matrix.

Methodology

Fundamentals of PCA

Principal components can only be established once a series of mathematical concepts have been applied to a

data set. These include standard deviation, variance, covariance, correlation and matrix algebra (which include

eigenvectors and eigenvalues). Having obtained a data set to be subject to analysis, the mean is subtracted

from each of the data dimensions. One data set X will generate , which is the mean of the X values datapoints. Once subtracted, a dataset is produced with a mean of zero (as described in princomp function below).

This allows for the development of a covariance matrix. A sample two stock portfolio covariance matrix can be

represented by:


6/15

= Eqtn (1)where = Cov(,) = . . Once matrices are the same size they can be multiplied together. Eigenvectors describe the process of

multiplying a matrix and a vector. They arise from the nature of the transformation of the multiplication.

Consider a transformation matrix that, when multiplied reflected vectors in the line p=q. If there were a vector

that was represented on the line p=q, then its reflection would be itself. This vector would then become an

eigenvector of that transformation matrix. These eigenvectors can only be found in square matrices, not all

square matrices have eigenvectors but if there are then a n x n matrix will contain n eigenvectors. Often these

vectors are scaled before multiplying, which just serves to make it longer and not change its direction and

results in the same multiple of it when completed. Regardless of the dimensions present in a matrix, all

eigenvectors are orthogonal (aka perpendicular) allowing them be expressed in terms of eigenvectors and not

the usual x and y axis. Ideally these eigenvectors will have a length of one, we know its length is irrelevant but

its direction is not, scaling these values to one is commonplace. Extending your matrix outside a 3 x 3 increases

the difficulty associated with finding these eigenvalues and requires the use of iterative techniques.

Eigenvalues are a set of scalars associated with a linear set of equations and can be described as characteristic

roots and values. They are calculated by solving the characteristic equation described by:

- . - .- . X - = 0 Eqtn (2)where Q is an autocorrelation coefficient and p is an integer greater than 1.

Square matrices are decomposed into eigenvalues and eigenvectors, which basically means the elements of

the matrix are re-arranged to arrive at a new desired matrix configuration. This factorization is required so as

to arrive at the correct correlation relationship between the variables into the future. This decomposition is

based on the fact that any symmetric matrix A(p x p) can be written as A = where is a diagonal matrix

of eigenvalues of A and is an orthogonal matrix whose columns consist of corresponding standardized

vectors [5].

An eigenvalue appearing along the line p=q shows us how these two data sets are related along that line and

are ordered in sequence of importance of pattern strength in the data. In short the computation of these

eigenvalues of the covariance matrix has allowed us to extract lines that characterise the data set. The

remaining steps in the process involve transforming the data and expressing it in terms of those lines. For the

purposes of our report then we have chosen the input data of the matrix X and decomposed this using the

princomp function in MatLab. Results are described below.

Compressing this data and reducing dimensionality is undertaken by ordering the eigenvalues highest to low-

est. A vector is then constructed containing the selected eigenvalues. Arriving at the final set of data is com-pleted by taking the transpose of the newly created eigenvector and multiplying it on the left of the trans-

posed original data set. The transpose ensures the eigenvector of most significance is at the top. This results in

the original data now being represented in terms of the vectors we chose as opposed to being in terms of axis

x and y. This allows us to transform our data so that is expressed in terms of the patterns between them,

where the patterns are the lines that most accurately describe the various relationships between the data.

Application of Fundamentals to Equity Model

This theory has consequences for the equity markets as described in the coming section. For this undertaking I

have created a T x n matrix (which we will call X) where T is the number of data points and n is the number of

stocks (ten in our case). This covariance matrix will be denoted by V=V(X) and PCA performed to investigatehow many components will satisfactorily explain the variances noted in a market weighted and equally

weighted portfolio constructed. To begin an ordinary least squares (OLS) linear regression has been run on


7/15

each of the stocks returns on the principal components factors giving us an estimate of alpha for each stock

and the betas with respect to each principal component factor [2]. This model is denoted by:

= + . + Eqtn (3)where k represents the number of principal component factors. The regression has also been used to calculate

t-values, values and factor sensitivities. The estimated model provides the return on each stock that isexplained by the factor model as: = + . Eqtn (4)

We know that the principal components are based on a covariance or correlation matrix and have a mean of

zero with E() = 0 which leaves the return on the factor model as [2]:E( = Eqtn (5)

When we take the variance and covariance calculated in Eqtn 4 we can derive the covariance matrix of stock

returns which tells us the covariance identified by the model with the following elements:

est. V () = . V () Eqtn (6)est. Cov( , ) = . . V() Eqtn (7)

This can be more concisely represented by matrix notation as follows:

est. Eqtn (8)

whereB =

is the k x n matrix of OLS-estimated factor betas and takes the form:

B = Eqtn (9)

and is the k x k covariance of the principal components represented by:


8/15

=( ) Eqtn (10)

As the principal components are orthogonal, there is zero covariance between any two principal componentsmaking their covariance matrix diagonal in structure [2]. Knowing these values for each stock in our portfolio a

variety of different weightings can be applied so as to the potential risk return trade off and construct the

desired portfolio. The risk measure are further broken down into systematic risk as denoted by:

= Eqtn (11)and specific risk is arrived at by subtracting the systematic risk from the total risk per:

specific risk w V w - Eqtn (12)This model has been adapted and undertaken for the purposes of this report. Refer results section for further

detail.

MatLab Used in Analysing Data

When running code developed for this project, the following steps were addressed:

Data made stationary by converting prices movements to returns for the individual stocks.

Princomp function (COEFF = princomp(X)) used to perform PCA on covariance matrix. Essential to

produce a linear factor model. Means are also subtracted during this process.

Using the regstats function, regress the individual stocks against the newly arrived at uncorrelated

risk-factors noting the value at each stage.Recalculate the value of the T x n matrix times the n x n orthogonal matrix of eigenvectors.

Add back the mean subtracted when performing the princomp function.

Comparison of variance-covariance matrix for the input stock returns with PCA derived version.

Empirical Analysis

We must again refer to data described in Figure 1, 2, 3 and 4 in Data section which forms a large part of our

empirical analysis. This detail illustrates the extent of the correlation in the ten stock multi-variable system

over the sample period examined which is fundamental to the establishment of principal components. The

analysis performed on the covariance matrix (Figure 3) is detailed below in Figure 5.


9/15

Figure 5

Here we review how the orthogonal transformation of the eigenvectors have been projected onto the

subspace spanned by those eigenvectors corresponding to the largest eigenvalues and is where the

decorrelation of data takes place.

The DJIA dataset in question took the three largest principal components which explained 83.57% of the

variance. More or less values could have been selected to gain further or less comfort but as the

recommended range is between 70% and 90% as less than 70% offers little insight but greater than 90% picks

up too much noise. The other eigenvalues were discarded as they were small and doing so reduced the final

dimensions in the data. When interpreting the first principal component in asset changes, the more highly

correlated the system, the more similar the values of the elements of the first eigenvector. This tells us that

should the first principal component change when other components remain fixed then the returns will move

by a similar amount. As is evident from figure 5, the majority of stocks have a PC1 value around the 0.2 mark

and appears to capture a common trend in the data. No such pattern appears to be present in the PC2 and PC3

component values. I feel that this percentage still offers a relatively high level of confidence that the three risk

factors are representative of the stocks variance. This leaves the three-component representation as:

= 0.0251593 - 0.02747 + 0.052639 = 0.294086 - 0.03233 + 0.00757 0.414202 + 0.249442 + 0.22005 = 0.4697 - 0.78769 + 0.282537= 0.203308 + 0.204774 - 0.01967 = 0.385474 - 0.06655 - 0.30512 = 0.365547 + 0.339575 + 0.146143 = 0.200306 + 0.215904 + 0.114223 = 0.220781 + 0.263256 + 0.078486 = 0.213582 + 0.194774 + 0.093398where represents the T x1 vector of returns on the ith stock.During the period in question, we have noted periods of huge market turmoil during which time all

correlations have tended towards one as evident form figure 2. This is in contrast to observations from figure 1

over a relatively stable time in stock markets. This effects the uncorrelated eigenvectors by altering the values

to reflect the patterns noted becoming more pronounced as the correlations become stronger, both positive


10/15

and negative correlations impact in this way. Over time it has been seen that the impact of bad news has a far

stronger effect on equity markets than equivalent positive news. Factors such as the leverage effect and

investor fear lead to mass selloff and panic and it is at times like these we note the strongest correlations in

equity markets. From the data in figure 5 we can conclude that there was above average volatility for the

period in the stocks of Chevron (CVX), Boeing (BA) and Exxon Mobil (XOM) which noted PCA1 factors of

0.414202, 0.385474 and 0.365547 respectively. This is in comparison to the mean PCA1 value for the stocks of0.283560.

Results Observed from Analysis

The full listing of all factor sensitivities and the percentage of variance explained by each of them is detailed

below in Figure 6. As discussed previously, I consider it appropriate to use three principal component factors

to describe the variance present as it accounts for 83.57% of the variance.

Figure 6

Each principal component listed explains less variance spanning from components one to ten with little

sequence observable in the any figures excluding principal component one. We can draw from this there are

little common trends in the components applicable to each of the stocks. Having selected the first three

principal components to perform further analysis on, these are now regressed against the individual stocks in

question as described above in MatLab work performed. To satisfy eqtn (8) described by:

est. Eqtn (8)

I have calculated the estimated factor betas in figure 7:


11/15

Figure 7

The transpose of these values is also required as per figure 8:

Figure 8

Along with the element of the equation. As the principal components are orthogonal, this is represented in a

diagonal matrix as per figure 9:

Figure 9

Having these values we can satisfy the equation and by doing so we create the Systematic Covariance of Stock

returns as described by figure 10


12/15

Figure 10

Having compared the initial covariance matrix per figure 3 to the matrix with PCA applications we have noted

the following differences in values:

Figure 11

There is a wide variety of departures here from the initial covariance per figure 3. As covariance is a measure

of how much variables change together, and having noted the PCA applied, we can conclude that when PCA

factors are identified as the sole determinant of variance and all other factors discarded, the changes in

variance relationship between the stocks described in figure 11 can be attributed to the reduction in

dimensionality and focus of risk on the principal components identified. There are noticeable changes between

KO and IBM along with MMM and IBM which may indicate part of the variance was attributed to elements of

the discarded principal components.

Stemming from our regression we noted the following results per figure 12:

Figure 12

Taking the

value to represent the measure of how well a regression line approximates real data points and

the closer its proximity to 1 then the greater the ability of the model to predict a trend we note a range of


13/15

values spanning from 0.612 with MCD offering little trend forecasting ability to 0.998 with BA providing a clear

window into potential movements.

Generally, any t-value greater than +2 or less than - 2 is acceptable. The higher the t-value, the greater the

confidence we have in the coefficient as a predictor. Low t-values are indications of low reliability of the

predictive power of that coefficient. This is consistent with our values noted and example previouslymentioned with MCD and BA noting scores of 26.27 and 419.92 respectively. As expected PC1 has the mostpredictive power as you would expect considering it explains the majority of the variation. PC2 and PC3 offer

varying levels of predictive power and not in the same pattern as explanatory power noted in eigenvalues per

figure 6.

When conducting our analysis, we have performed it on two portfolios, one being equally weighted and the

second weighted by market capitalization per figure 14. The market capitalization was derived from the last

day of our sample (6-Oct-2010) based on shares in issue multiplied by share price on that day.

Figure 13 Figure 14

Figure 13 provides the breakdown of both portfolios constructed in terms of principal components. The total

risk noted in the sum of the components per market weighted portfolio (0.5365) is far larger than that of the

equally weighted portfolio (0.3643) making the equally weighted portfolio a more attractive potential

investment, this is largely the best practise over the short term [ ]. Obviously the more prominent stocks such

as XOM carry with it larger beta coefficients which are explained to a greater extent by PC2 and PC3. This is

consistent with figure 7 which details the beta values for each principal component.

Other risk metrics calculated are displayed in figure 15:

Figure 15


14/15

From these our indication based on beta values that the equally weighted portfolio may be a better

investment opportunity are proved correct as variance and volatility are again superior in the market weighted

portfolio, these are a suitable gauge of risk for the portfolios in question. There is minimal difference noted

between the portfolio variances and systematic variances with volatility identical for both. As these results

have not been run for the index as a whole we have no benchmark against which to compare the figures.

These figures can be represented in dollar terms by multiplying the betas and the volatility by the total value ofthe portfolio. Upon inspection of the risk results, an intuitive understanding of these concepts tells me that

these figures are quite low, in particular the specific variance and specific volatility figures noted. That said, the

figures are not implausible.

From the work undertaken we can conclude that a small number of principal components are needed to

explain the variance with the vast majority (70%) of this in PC1 which is sufficient to offer insight on its own.

The three components chosen have been done so to offer extra insight and for the purposes of this project.

Once PCA has been applied to a covariance matrix, the focus of risk to the principal components identified

changes the matrix values depending on the extent the principal component was a factor of the variance in the

first place. We also noted that having compared the equally weighted portfolio to the market weighted

portfolio, there is less risk present in the equally weighted portfolio and would be the preferred investment

choice.

Conclusion

Illustrated in this report are the practical groundings of the PCA statistical tool in areas of risk management

and portfolio optimization. The factor models representation of each of the series of returns as a linear

function of the principal component offers realistic, understandable insight into patterns in data while limiting

the dimensions of that data. The reduction in risk management workload, from dimensionality reduction, and

computational efficiencies in the daily measurement of risk in portfolios through the selection of a smallnumber of variables capable of explaining a large portion of risk is of huge benefit to modern finance.


15/15

Appendices

Appendix 1

List of Companies making up top ten stocks in DJIA:

1) IBM (IBM)2) 3M (MMM)3) Chevron Corporation(CVX)4) CAT (CAT)5) McDonalds (MCD)6) Boeing (BA)7) Exxon Mobil (XOM)8) Johnson & Johnson (JNJ)9) Proctor & Gamble (P&G)10) Coca Cola (KO)

Bibliography

[1] Soto, Gloria M., Using Principal Component Analysis to Explain Term Structure Movements: Performance

and Stability, 2004. Available at SSRN:http://ssrn.com/abstract=985404

[2] Alexander, C. Practical Financial Econometrics: Market risk Analysis ol.2.John Wiley and Sons, 2009.

[3] lanco, C. Multi-Factor Models for Forward Curve Analysis: An Introduction to Principal Component Analy-

sis, Financial Engineering Associates, 2002.

[4] Alexander, C.Multi-Factor Models for Forward Curve Analysis: An Introduction to Principal ComponentAnalysis. ISMA Centre, University of Reading, 2001.

[5] Alexander, C. Market Models: A Guide to Financial Data Analysis. John Wiley and Sons, 2001.

[6] Alexander, C. Quantitative Methods in Finance. John Wiley and Sons, 2009.

[7] Meric, I.Co-Movements of European Equity Markets efore and After the 1987 Crash. Multinational F i-nance Society, no date provided on paper.
http://ssrn.com/abstract=985404http://ssrn.com/abstract=985404http://ssrn.com/abstract=985404http://ssrn.com/abstract=985404

liam_mescall - pca project

Documents