liam_mescall - pca project
TRANSCRIPT
-
8/4/2019 Liam_Mescall - PCA Project
1/15
Project Title Principal Components (PC) Analysis A Portfolio Risk Analysis Application
Course MSc in Computational Finance
Module Empirical Finance (FI6061)
Student Liam Mescall (I.D. no 0144126)
-
8/4/2019 Liam_Mescall - PCA Project
2/15
TABLE OF CONTENTS
PAGE
Abstract 1
I. Introduction 1II. Data Summary and Description 2III. Methodology 3III.I. Fundamentals of PCA 3
III.II Application of Fundamentals to Equity Model 4
IV. MatLab Used in Analysing Data 6V. Empirical Analysis 6
VI. Results Observed from Analysis 8
VII. Conclusion 12
Appendices
Bibliography
-
8/4/2019 Liam_Mescall - PCA Project
3/15
Principal Components (PC) Analysis A Portfolio Risk Analysis Application
November 25, 2010
Abstract
The ability to accurately quantify the relationship between risk and return has far reaching consequences.Adopting PCA techniques allows for the division of the risk metrics into market/systematic risk and specific risk
inherent from the nature of the company itself. Separating these two risk components has allowed for the
introduction of investment techniques such as market risk neutral adopted by large hedge funds that
neutralize the market risk and identify stocks more or less susceptible to market movements allowing for, in
theory, a portfolio which will make relatively riskless profits. This report has undertaken a PCA analysis of the
Dow Jones Industrial Average ten largest stocks by market capitalisation (Appendix 1). This will involve
generating a portfolio with an equal weighted investment in the ten chosen stocks and another with a market
weighted portfolio. These will be evaluated and risk-return relationship assessed for potential investment.
Interpretation of concepts such as covariance, correlation, eigenvectors, eigenvalues and selection of
eigenvectors are fundamental to achieving this. These concepts will be addressed and data processed through
MatLab and an understanding of the theoretical concepts then used to develop recommendations as to
potential investments in the portfolio. Having successfully completed the analysis, we can conclude that a lowamount of principal component factors were sufficient to explain a satisfactory amount of the variance. We
also identified a relatively high degree of collinearity present.
Keywords: PCA, risk components, equal weight, market weight, covariance, correlation, eigenvectors,
eigenvalues, MatLab.
Introduction
Models for time series data have many forms and represent a variety of stochastic processes, both stationary
and integrated. Classification of the process type as stationary offers insight into the extent the data is subject
to trend, variance and the extent the joint distributions between variables are similar, data is often converted
to % returns to achieve this. When modelling these variations, three classes of practical importance emerge.
These are autoregressive (AR) models, the integrated (I) models, and the moving average (MA) models, all ofwhich depend on historical data points. These have been put together to produce autoregressive moving
average (ARMA) and autoregressive integrated moving average (ARIMA) models. Limitations in these models
ability to extract more subtle information has led to a rise in the use of Principal Component Analysis (PCA)
which offers a tool to extract this information with the minimum amount of noise. It is a powerful statistical
tool with numerous practical uses. A key assumption used is that the vast array of factors that affect data can
be neatly refined to a few uncorrelated composite variables, called principal components, which provides a
parsimonious description of the datas dynamics. Assumptions of linearity, importance of large variances and
that principal components are orthogonal are also present.
This reduction in dimensionality is particularly useful in finance, as stock prices are affected by a multitude of
economic variables that are difficult to translate into one detailed price model. From a practical perspective,
the use of PCA in bond markets has revealed that only three principal components (related to the level, thesteepness and the curvature of the yield curve) are sufficient to explain almost all the variations in interest
rates [1]. The ability to identify these three areas focuses risk management to managing the effects of these
three risk factors on the portfolio value, regardless of the number and characteristics of the bonds included in
the portfolio which saves untold amounts of time and resources. Being able to establish factors such as
moneyness and time as principal components with a high degree of confidence allows for the modelling of a
volatility surface and investment decisions to be made based on this. Modelling of the forward curve is
essential to all forms of risk management while traders and portfolio managers will deal with instruments
whose evolution through time must be modelled i.e. OTC derivatives. Any attempt to price or manage this risk
of securities such as OTC derivatives, swaptions or gas storage contracts (or other securities dependant on a
specific forward price) will require a model that describes this evolution. The pricing of any forward is further
complicated with the addition of forward prices to the curve which increases parameters and complexity. The
nature of PCA dimension reduction addresses this [3]. Value at risk (VAR) portfolio analysis has been refinedthrough the use of PCA to employ more accurate probability and of default over specific time horizons. It has
led to further precision in hedging bond and equity portfolios taking duration, twist and bend movements into
http://en.wikipedia.org/wiki/Stochastic_processeshttp://en.wikipedia.org/wiki/Autoregressivehttp://en.wikipedia.org/wiki/Moving_average_modelhttp://en.wikipedia.org/wiki/Autoregressive_moving_averagehttp://en.wikipedia.org/wiki/Autoregressive_moving_averagehttp://en.wikipedia.org/wiki/Autoregressive_integrated_moving_averagehttp://en.wikipedia.org/wiki/Autoregressive_integrated_moving_averagehttp://en.wikipedia.org/wiki/Autoregressive_moving_averagehttp://en.wikipedia.org/wiki/Autoregressive_moving_averagehttp://en.wikipedia.org/wiki/Moving_average_modelhttp://en.wikipedia.org/wiki/Autoregressivehttp://en.wikipedia.org/wiki/Stochastic_processes -
8/4/2019 Liam_Mescall - PCA Project
4/15
account. The theme running through all these practical applications is the introduction of increased accuracy
which has cemented the importance of PCA in modern finance.
Data Summary and Description
Data selected for this study has been compiled from daily returns spanning the 26/9/08 and 6/10/10.
Historically we can observe series of asset prices are largely uncorrelated over time as we can see from figureone where the rebased values gradually separate over time and the range of deviation from par grows.
Figure 1
We have also noted during times of a three sigma event, correlations in all markets tend towa rds one [4].
Figure 2 shows values from days after the collapse of Lehman Brothers and the clustered pattern of return
movements over the following months. Studies specific to equity markets have also illustrated this point to be
true [7].
Figure 2
This graphical representation of returns over selected periods is an important illustration and insight into why
the figures per variance-covariance (V) matrix and correlation matrix may have certain values. The relationship
between the complete sample of data over the period in question is described below in figure 3 and figure 4.
This data is fundamental to the development of our principal component analysis.
A high degree of collinearity is evident which is typical of a system where there are only a few important
sources of information in the data, which are common to many variables [ ]. This will become obvious when
-
8/4/2019 Liam_Mescall - PCA Project
5/15
the principal components are calculated and the cumulative percentage of variation evident in first few
components reviewed.
Variance Covariance Matrix (V)
Figure 3
Correlation Matrix
Figure 4
Depending on how results will be used, the PCA analysis can be performed on either the covariance matrix or
correlation matrix. A principal component analysis based on the covariance matrix has the advantage of
providing a linear factor model for the returns, and not a linear factor model for the standardized returns, as in
the case when we use the correlation matrix. A PCA on the covariance matrix captures all the movement in the
variables, which may be dominated by the differing volatilities of individual variables while a PCA on the
correlation matrix only captures the similarities in movements in returns, ignoring their individual volatilities.
To suit the objectives of this project I have chosen to undertake the PCA on the covariance matrix.
Methodology
Fundamentals of PCA
Principal components can only be established once a series of mathematical concepts have been applied to a
data set. These include standard deviation, variance, covariance, correlation and matrix algebra (which include
eigenvectors and eigenvalues). Having obtained a data set to be subject to analysis, the mean is subtracted
from each of the data dimensions. One data set X will generate , which is the mean of the X values datapoints. Once subtracted, a dataset is produced with a mean of zero (as described in princomp function below).
This allows for the development of a covariance matrix. A sample two stock portfolio covariance matrix can be
represented by:
-
8/4/2019 Liam_Mescall - PCA Project
6/15
= Eqtn (1)where = Cov(,) = . . Once matrices are the same size they can be multiplied together. Eigenvectors describe the process of
multiplying a matrix and a vector. They arise from the nature of the transformation of the multiplication.
Consider a transformation matrix that, when multiplied reflected vectors in the line p=q. If there were a vector
that was represented on the line p=q, then its reflection would be itself. This vector would then become an
eigenvector of that transformation matrix. These eigenvectors can only be found in square matrices, not all
square matrices have eigenvectors but if there are then a n x n matrix will contain n eigenvectors. Often these
vectors are scaled before multiplying, which just serves to make it longer and not change its direction and
results in the same multiple of it when completed. Regardless of the dimensions present in a matrix, all
eigenvectors are orthogonal (aka perpendicular) allowing them be expressed in terms of eigenvectors and not
the usual x and y axis. Ideally these eigenvectors will have a length of one, we know its length is irrelevant but
its direction is not, scaling these values to one is commonplace. Extending your matrix outside a 3 x 3 increases
the difficulty associated with finding these eigenvalues and requires the use of iterative techniques.
Eigenvalues are a set of scalars associated with a linear set of equations and can be described as characteristic
roots and values. They are calculated by solving the characteristic equation described by:
- . - .- . X - = 0 Eqtn (2)where Q is an autocorrelation coefficient and p is an integer greater than 1.
Square matrices are decomposed into eigenvalues and eigenvectors, which basically means the elements of
the matrix are re-arranged to arrive at a new desired matrix configuration. This factorization is required so as
to arrive at the correct correlation relationship between the variables into the future. This decomposition is
based on the fact that any symmetric matrix A(p x p) can be written as A = where is a diagonal matrix
of eigenvalues of A and is an orthogonal matrix whose columns consist of corresponding standardized
vectors [5].
An eigenvalue appearing along the line p=q shows us how these two data sets are related along that line and
are ordered in sequence of importance of pattern strength in the data. In short the computation of these
eigenvalues of the covariance matrix has allowed us to extract lines that characterise the data set. The
remaining steps in the process involve transforming the data and expressing it in terms of those lines. For the
purposes of our report then we have chosen the input data of the matrix X and decomposed this using the
princomp function in MatLab. Results are described below.
Compressing this data and reducing dimensionality is undertaken by ordering the eigenvalues highest to low-
est. A vector is then constructed containing the selected eigenvalues. Arriving at the final set of data is com-pleted by taking the transpose of the newly created eigenvector and multiplying it on the left of the trans-
posed original data set. The transpose ensures the eigenvector of most significance is at the top. This results in
the original data now being represented in terms of the vectors we chose as opposed to being in terms of axis
x and y. This allows us to transform our data so that is expressed in terms of the patterns between them,
where the patterns are the lines that most accurately describe the various relationships between the data.
Application of Fundamentals to Equity Model
This theory has consequences for the equity markets as described in the coming section. For this undertaking I
have created a T x n matrix (which we will call X) where T is the number of data points and n is the number of
stocks (ten in our case). This covariance matrix will be denoted by V=V(X) and PCA performed to investigatehow many components will satisfactorily explain the variances noted in a market weighted and equally
weighted portfolio constructed. To begin an ordinary least squares (OLS) linear regression has been run on
-
8/4/2019 Liam_Mescall - PCA Project
7/15
each of the stocks returns on the principal components factors giving us an estimate of alpha for each stock
and the betas with respect to each principal component factor [2]. This model is denoted by:
= + . + Eqtn (3)where k represents the number of principal component factors. The regression has also been used to calculate
t-values, values and factor sensitivities. The estimated model provides the return on each stock that isexplained by the factor model as: = + . Eqtn (4)
We know that the principal components are based on a covariance or correlation matrix and have a mean of
zero with E() = 0 which leaves the return on the factor model as [2]:E( = Eqtn (5)
When we take the variance and covariance calculated in Eqtn 4 we can derive the covariance matrix of stock
returns which tells us the covariance identified by the model with the following elements:
est. V () = . V () Eqtn (6)est. Cov( , ) = . . V() Eqtn (7)
This can be more concisely represented by matrix notation as follows:
est. Eqtn (8)
whereB =
is the k x n matrix of OLS-estimated factor betas and takes the form:
B = Eqtn (9)
and is the k x k covariance of the principal components represented by:
-
8/4/2019 Liam_Mescall - PCA Project
8/15
=( ) Eqtn (10)
As the principal components are orthogonal, there is zero covariance between any two principal componentsmaking their covariance matrix diagonal in structure [2]. Knowing these values for each stock in our portfolio a
variety of different weightings can be applied so as to the potential risk return trade off and construct the
desired portfolio. The risk measure are further broken down into systematic risk as denoted by:
= Eqtn (11)and specific risk is arrived at by subtracting the systematic risk from the total risk per:
specific risk w V w - Eqtn (12)This model has been adapted and undertaken for the purposes of this report. Refer results section for further
detail.
MatLab Used in Analysing Data
When running code developed for this project, the following steps were addressed:
Data made stationary by converting prices movements to returns for the individual stocks.
Princomp function (COEFF = princomp(X)) used to perform PCA on covariance matrix. Essential to
produce a linear factor model. Means are also subtracted during this process.
Using the regstats function, regress the individual stocks against the newly arrived at uncorrelated
risk-factors noting the value at each stage.Recalculate the value of the T x n matrix times the n x n orthogonal matrix of eigenvectors.
Add back the mean subtracted when performing the princomp function.
Comparison of variance-covariance matrix for the input stock returns with PCA derived version.
Empirical Analysis
We must again refer to data described in Figure 1, 2, 3 and 4 in Data section which forms a large part of our
empirical analysis. This detail illustrates the extent of the correlation in the ten stock multi-variable system
over the sample period examined which is fundamental to the establishment of principal components. The
analysis performed on the covariance matrix (Figure 3) is detailed below in Figure 5.
-
8/4/2019 Liam_Mescall - PCA Project
9/15
Figure 5
Here we review how the orthogonal transformation of the eigenvectors have been projected onto the
subspace spanned by those eigenvectors corresponding to the largest eigenvalues and is where the
decorrelation of data takes place.
The DJIA dataset in question took the three largest principal components which explained 83.57% of the
variance. More or less values could have been selected to gain further or less comfort but as the
recommended range is between 70% and 90% as less than 70% offers little insight but greater than 90% picks
up too much noise. The other eigenvalues were discarded as they were small and doing so reduced the final
dimensions in the data. When interpreting the first principal component in asset changes, the more highly
correlated the system, the more similar the values of the elements of the first eigenvector. This tells us that
should the first principal component change when other components remain fixed then the returns will move
by a similar amount. As is evident from figure 5, the majority of stocks have a PC1 value around the 0.2 mark
and appears to capture a common trend in the data. No such pattern appears to be present in the PC2 and PC3
component values. I feel that this percentage still offers a relatively high level of confidence that the three risk
factors are representative of the stocks variance. This leaves the three-component representation as:
= 0.0251593 - 0.02747 + 0.052639 = 0.294086 - 0.03233 + 0.00757 0.414202 + 0.249442 + 0.22005 = 0.4697 - 0.78769 + 0.282537= 0.203308 + 0.204774 - 0.01967 = 0.385474 - 0.06655 - 0.30512 = 0.365547 + 0.339575 + 0.146143 = 0.200306 + 0.215904 + 0.114223 = 0.220781 + 0.263256 + 0.078486 = 0.213582 + 0.194774 + 0.093398where represents the T x1 vector of returns on the ith stock.During the period in question, we have noted periods of huge market turmoil during which time all
correlations have tended towards one as evident form figure 2. This is in contrast to observations from figure 1
over a relatively stable time in stock markets. This effects the uncorrelated eigenvectors by altering the values
to reflect the patterns noted becoming more pronounced as the correlations become stronger, both positive
-
8/4/2019 Liam_Mescall - PCA Project
10/15
and negative correlations impact in this way. Over time it has been seen that the impact of bad news has a far
stronger effect on equity markets than equivalent positive news. Factors such as the leverage effect and
investor fear lead to mass selloff and panic and it is at times like these we note the strongest correlations in
equity markets. From the data in figure 5 we can conclude that there was above average volatility for the
period in the stocks of Chevron (CVX), Boeing (BA) and Exxon Mobil (XOM) which noted PCA1 factors of
0.414202, 0.385474 and 0.365547 respectively. This is in comparison to the mean PCA1 value for the stocks of0.283560.
Results Observed from Analysis
The full listing of all factor sensitivities and the percentage of variance explained by each of them is detailed
below in Figure 6. As discussed previously, I consider it appropriate to use three principal component factors
to describe the variance present as it accounts for 83.57% of the variance.
Figure 6
Each principal component listed explains less variance spanning from components one to ten with little
sequence observable in the any figures excluding principal component one. We can draw from this there are
little common trends in the components applicable to each of the stocks. Having selected the first three
principal components to perform further analysis on, these are now regressed against the individual stocks in
question as described above in MatLab work performed. To satisfy eqtn (8) described by:
est. Eqtn (8)
I have calculated the estimated factor betas in figure 7:
-
8/4/2019 Liam_Mescall - PCA Project
11/15
Figure 7
The transpose of these values is also required as per figure 8:
Figure 8
Along with the element of the equation. As the principal components are orthogonal, this is represented in a
diagonal matrix as per figure 9:
Figure 9
Having these values we can satisfy the equation and by doing so we create the Systematic Covariance of Stock
returns as described by figure 10
-
8/4/2019 Liam_Mescall - PCA Project
12/15
Figure 10
Having compared the initial covariance matrix per figure 3 to the matrix with PCA applications we have noted
the following differences in values:
Figure 11
There is a wide variety of departures here from the initial covariance per figure 3. As covariance is a measure
of how much variables change together, and having noted the PCA applied, we can conclude that when PCA
factors are identified as the sole determinant of variance and all other factors discarded, the changes in
variance relationship between the stocks described in figure 11 can be attributed to the reduction in
dimensionality and focus of risk on the principal components identified. There are noticeable changes between
KO and IBM along with MMM and IBM which may indicate part of the variance was attributed to elements of
the discarded principal components.
Stemming from our regression we noted the following results per figure 12:
Figure 12
Taking the
value to represent the measure of how well a regression line approximates real data points and
the closer its proximity to 1 then the greater the ability of the model to predict a trend we note a range of
-
8/4/2019 Liam_Mescall - PCA Project
13/15
values spanning from 0.612 with MCD offering little trend forecasting ability to 0.998 with BA providing a clear
window into potential movements.
Generally, any t-value greater than +2 or less than - 2 is acceptable. The higher the t-value, the greater the
confidence we have in the coefficient as a predictor. Low t-values are indications of low reliability of the
predictive power of that coefficient. This is consistent with our values noted and example previouslymentioned with MCD and BA noting scores of 26.27 and 419.92 respectively. As expected PC1 has the mostpredictive power as you would expect considering it explains the majority of the variation. PC2 and PC3 offer
varying levels of predictive power and not in the same pattern as explanatory power noted in eigenvalues per
figure 6.
When conducting our analysis, we have performed it on two portfolios, one being equally weighted and the
second weighted by market capitalization per figure 14. The market capitalization was derived from the last
day of our sample (6-Oct-2010) based on shares in issue multiplied by share price on that day.
Figure 13 Figure 14
Figure 13 provides the breakdown of both portfolios constructed in terms of principal components. The total
risk noted in the sum of the components per market weighted portfolio (0.5365) is far larger than that of the
equally weighted portfolio (0.3643) making the equally weighted portfolio a more attractive potential
investment, this is largely the best practise over the short term [ ]. Obviously the more prominent stocks such
as XOM carry with it larger beta coefficients which are explained to a greater extent by PC2 and PC3. This is
consistent with figure 7 which details the beta values for each principal component.
Other risk metrics calculated are displayed in figure 15:
Figure 15
-
8/4/2019 Liam_Mescall - PCA Project
14/15
From these our indication based on beta values that the equally weighted portfolio may be a better
investment opportunity are proved correct as variance and volatility are again superior in the market weighted
portfolio, these are a suitable gauge of risk for the portfolios in question. There is minimal difference noted
between the portfolio variances and systematic variances with volatility identical for both. As these results
have not been run for the index as a whole we have no benchmark against which to compare the figures.
These figures can be represented in dollar terms by multiplying the betas and the volatility by the total value ofthe portfolio. Upon inspection of the risk results, an intuitive understanding of these concepts tells me that
these figures are quite low, in particular the specific variance and specific volatility figures noted. That said, the
figures are not implausible.
From the work undertaken we can conclude that a small number of principal components are needed to
explain the variance with the vast majority (70%) of this in PC1 which is sufficient to offer insight on its own.
The three components chosen have been done so to offer extra insight and for the purposes of this project.
Once PCA has been applied to a covariance matrix, the focus of risk to the principal components identified
changes the matrix values depending on the extent the principal component was a factor of the variance in the
first place. We also noted that having compared the equally weighted portfolio to the market weighted
portfolio, there is less risk present in the equally weighted portfolio and would be the preferred investment
choice.
Conclusion
Illustrated in this report are the practical groundings of the PCA statistical tool in areas of risk management
and portfolio optimization. The factor models representation of each of the series of returns as a linear
function of the principal component offers realistic, understandable insight into patterns in data while limiting
the dimensions of that data. The reduction in risk management workload, from dimensionality reduction, and
computational efficiencies in the daily measurement of risk in portfolios through the selection of a smallnumber of variables capable of explaining a large portion of risk is of huge benefit to modern finance.
-
8/4/2019 Liam_Mescall - PCA Project
15/15
Appendices
Appendix 1
List of Companies making up top ten stocks in DJIA:
1) IBM (IBM)2) 3M (MMM)3) Chevron Corporation(CVX)4) CAT (CAT)5) McDonalds (MCD)6) Boeing (BA)7) Exxon Mobil (XOM)8) Johnson & Johnson (JNJ)9) Proctor & Gamble (P&G)10) Coca Cola (KO)
Bibliography
[1] Soto, Gloria M., Using Principal Component Analysis to Explain Term Structure Movements: Performance
and Stability, 2004. Available at SSRN:http://ssrn.com/abstract=985404
[2] Alexander, C. Practical Financial Econometrics: Market risk Analysis ol.2.John Wiley and Sons, 2009.
[3] lanco, C. Multi-Factor Models for Forward Curve Analysis: An Introduction to Principal Component Analy-
sis, Financial Engineering Associates, 2002.
[4] Alexander, C.Multi-Factor Models for Forward Curve Analysis: An Introduction to Principal ComponentAnalysis. ISMA Centre, University of Reading, 2001.
[5] Alexander, C. Market Models: A Guide to Financial Data Analysis. John Wiley and Sons, 2001.
[6] Alexander, C. Quantitative Methods in Finance. John Wiley and Sons, 2009.
[7] Meric, I.Co-Movements of European Equity Markets efore and After the 1987 Crash. Multinational F i-nance Society, no date provided on paper.
http://ssrn.com/abstract=985404http://ssrn.com/abstract=985404http://ssrn.com/abstract=985404http://ssrn.com/abstract=985404