liam_mescall - pca project

Upload: liam-mescall

Post on 07-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 Liam_Mescall - PCA Project

    1/15

    Project Title Principal Components (PC) Analysis A Portfolio Risk Analysis Application

    Course MSc in Computational Finance

    Module Empirical Finance (FI6061)

    Student Liam Mescall (I.D. no 0144126)

  • 8/4/2019 Liam_Mescall - PCA Project

    2/15

    TABLE OF CONTENTS

    PAGE

    Abstract 1

    I. Introduction 1II. Data Summary and Description 2III. Methodology 3III.I. Fundamentals of PCA 3

    III.II Application of Fundamentals to Equity Model 4

    IV. MatLab Used in Analysing Data 6V. Empirical Analysis 6

    VI. Results Observed from Analysis 8

    VII. Conclusion 12

    Appendices

    Bibliography

  • 8/4/2019 Liam_Mescall - PCA Project

    3/15

    Principal Components (PC) Analysis A Portfolio Risk Analysis Application

    November 25, 2010

    Abstract

    The ability to accurately quantify the relationship between risk and return has far reaching consequences.Adopting PCA techniques allows for the division of the risk metrics into market/systematic risk and specific risk

    inherent from the nature of the company itself. Separating these two risk components has allowed for the

    introduction of investment techniques such as market risk neutral adopted by large hedge funds that

    neutralize the market risk and identify stocks more or less susceptible to market movements allowing for, in

    theory, a portfolio which will make relatively riskless profits. This report has undertaken a PCA analysis of the

    Dow Jones Industrial Average ten largest stocks by market capitalisation (Appendix 1). This will involve

    generating a portfolio with an equal weighted investment in the ten chosen stocks and another with a market

    weighted portfolio. These will be evaluated and risk-return relationship assessed for potential investment.

    Interpretation of concepts such as covariance, correlation, eigenvectors, eigenvalues and selection of

    eigenvectors are fundamental to achieving this. These concepts will be addressed and data processed through

    MatLab and an understanding of the theoretical concepts then used to develop recommendations as to

    potential investments in the portfolio. Having successfully completed the analysis, we can conclude that a lowamount of principal component factors were sufficient to explain a satisfactory amount of the variance. We

    also identified a relatively high degree of collinearity present.

    Keywords: PCA, risk components, equal weight, market weight, covariance, correlation, eigenvectors,

    eigenvalues, MatLab.

    Introduction

    Models for time series data have many forms and represent a variety of stochastic processes, both stationary

    and integrated. Classification of the process type as stationary offers insight into the extent the data is subject

    to trend, variance and the extent the joint distributions between variables are similar, data is often converted

    to % returns to achieve this. When modelling these variations, three classes of practical importance emerge.

    These are autoregressive (AR) models, the integrated (I) models, and the moving average (MA) models, all ofwhich depend on historical data points. These have been put together to produce autoregressive moving

    average (ARMA) and autoregressive integrated moving average (ARIMA) models. Limitations in these models

    ability to extract more subtle information has led to a rise in the use of Principal Component Analysis (PCA)

    which offers a tool to extract this information with the minimum amount of noise. It is a powerful statistical

    tool with numerous practical uses. A key assumption used is that the vast array of factors that affect data can

    be neatly refined to a few uncorrelated composite variables, called principal components, which provides a

    parsimonious description of the datas dynamics. Assumptions of linearity, importance of large variances and

    that principal components are orthogonal are also present.

    This reduction in dimensionality is particularly useful in finance, as stock prices are affected by a multitude of

    economic variables that are difficult to translate into one detailed price model. From a practical perspective,

    the use of PCA in bond markets has revealed that only three principal components (related to the level, thesteepness and the curvature of the yield curve) are sufficient to explain almost all the variations in interest

    rates [1]. The ability to identify these three areas focuses risk management to managing the effects of these

    three risk factors on the portfolio value, regardless of the number and characteristics of the bonds included in

    the portfolio which saves untold amounts of time and resources. Being able to establish factors such as

    moneyness and time as principal components with a high degree of confidence allows for the modelling of a

    volatility surface and investment decisions to be made based on this. Modelling of the forward curve is

    essential to all forms of risk management while traders and portfolio managers will deal with instruments

    whose evolution through time must be modelled i.e. OTC derivatives. Any attempt to price or manage this risk

    of securities such as OTC derivatives, swaptions or gas storage contracts (or other securities dependant on a

    specific forward price) will require a model that describes this evolution. The pricing of any forward is further

    complicated with the addition of forward prices to the curve which increases parameters and complexity. The

    nature of PCA dimension reduction addresses this [3]. Value at risk (VAR) portfolio analysis has been refinedthrough the use of PCA to employ more accurate probability and of default over specific time horizons. It has

    led to further precision in hedging bond and equity portfolios taking duration, twist and bend movements into

    http://en.wikipedia.org/wiki/Stochastic_processeshttp://en.wikipedia.org/wiki/Autoregressivehttp://en.wikipedia.org/wiki/Moving_average_modelhttp://en.wikipedia.org/wiki/Autoregressive_moving_averagehttp://en.wikipedia.org/wiki/Autoregressive_moving_averagehttp://en.wikipedia.org/wiki/Autoregressive_integrated_moving_averagehttp://en.wikipedia.org/wiki/Autoregressive_integrated_moving_averagehttp://en.wikipedia.org/wiki/Autoregressive_moving_averagehttp://en.wikipedia.org/wiki/Autoregressive_moving_averagehttp://en.wikipedia.org/wiki/Moving_average_modelhttp://en.wikipedia.org/wiki/Autoregressivehttp://en.wikipedia.org/wiki/Stochastic_processes
  • 8/4/2019 Liam_Mescall - PCA Project

    4/15

    account. The theme running through all these practical applications is the introduction of increased accuracy

    which has cemented the importance of PCA in modern finance.

    Data Summary and Description

    Data selected for this study has been compiled from daily returns spanning the 26/9/08 and 6/10/10.

    Historically we can observe series of asset prices are largely uncorrelated over time as we can see from figureone where the rebased values gradually separate over time and the range of deviation from par grows.

    Figure 1

    We have also noted during times of a three sigma event, correlations in all markets tend towa rds one [4].

    Figure 2 shows values from days after the collapse of Lehman Brothers and the clustered pattern of return

    movements over the following months. Studies specific to equity markets have also illustrated this point to be

    true [7].

    Figure 2

    This graphical representation of returns over selected periods is an important illustration and insight into why

    the figures per variance-covariance (V) matrix and correlation matrix may have certain values. The relationship

    between the complete sample of data over the period in question is described below in figure 3 and figure 4.

    This data is fundamental to the development of our principal component analysis.

    A high degree of collinearity is evident which is typical of a system where there are only a few important

    sources of information in the data, which are common to many variables [ ]. This will become obvious when

  • 8/4/2019 Liam_Mescall - PCA Project

    5/15

    the principal components are calculated and the cumulative percentage of variation evident in first few

    components reviewed.

    Variance Covariance Matrix (V)

    Figure 3

    Correlation Matrix

    Figure 4

    Depending on how results will be used, the PCA analysis can be performed on either the covariance matrix or

    correlation matrix. A principal component analysis based on the covariance matrix has the advantage of

    providing a linear factor model for the returns, and not a linear factor model for the standardized returns, as in

    the case when we use the correlation matrix. A PCA on the covariance matrix captures all the movement in the

    variables, which may be dominated by the differing volatilities of individual variables while a PCA on the

    correlation matrix only captures the similarities in movements in returns, ignoring their individual volatilities.

    To suit the objectives of this project I have chosen to undertake the PCA on the covariance matrix.

    Methodology

    Fundamentals of PCA

    Principal components can only be established once a series of mathematical concepts have been applied to a

    data set. These include standard deviation, variance, covariance, correlation and matrix algebra (which include

    eigenvectors and eigenvalues). Having obtained a data set to be subject to analysis, the mean is subtracted

    from each of the data dimensions. One data set X will generate , which is the mean of the X values datapoints. Once subtracted, a dataset is produced with a mean of zero (as described in princomp function below).

    This allows for the development of a covariance matrix. A sample two stock portfolio covariance matrix can be

    represented by:

  • 8/4/2019 Liam_Mescall - PCA Project

    6/15

    = Eqtn (1)where = Cov(,) = . . Once matrices are the same size they can be multiplied together. Eigenvectors describe the process of

    multiplying a matrix and a vector. They arise from the nature of the transformation of the multiplication.

    Consider a transformation matrix that, when multiplied reflected vectors in the line p=q. If there were a vector

    that was represented on the line p=q, then its reflection would be itself. This vector would then become an

    eigenvector of that transformation matrix. These eigenvectors can only be found in square matrices, not all

    square matrices have eigenvectors but if there are then a n x n matrix will contain n eigenvectors. Often these

    vectors are scaled before multiplying, which just serves to make it longer and not change its direction and

    results in the same multiple of it when completed. Regardless of the dimensions present in a matrix, all

    eigenvectors are orthogonal (aka perpendicular) allowing them be expressed in terms of eigenvectors and not

    the usual x and y axis. Ideally these eigenvectors will have a length of one, we know its length is irrelevant but

    its direction is not, scaling these values to one is commonplace. Extending your matrix outside a 3 x 3 increases

    the difficulty associated with finding these eigenvalues and requires the use of iterative techniques.

    Eigenvalues are a set of scalars associated with a linear set of equations and can be described as characteristic

    roots and values. They are calculated by solving the characteristic equation described by:

    - . - .- . X - = 0 Eqtn (2)where Q is an autocorrelation coefficient and p is an integer greater than 1.

    Square matrices are decomposed into eigenvalues and eigenvectors, which basically means the elements of

    the matrix are re-arranged to arrive at a new desired matrix configuration. This factorization is required so as

    to arrive at the correct correlation relationship between the variables into the future. This decomposition is

    based on the fact that any symmetric matrix A(p x p) can be written as A = where is a diagonal matrix

    of eigenvalues of A and is an orthogonal matrix whose columns consist of corresponding standardized

    vectors [5].

    An eigenvalue appearing along the line p=q shows us how these two data sets are related along that line and

    are ordered in sequence of importance of pattern strength in the data. In short the computation of these

    eigenvalues of the covariance matrix has allowed us to extract lines that characterise the data set. The

    remaining steps in the process involve transforming the data and expressing it in terms of those lines. For the

    purposes of our report then we have chosen the input data of the matrix X and decomposed this using the

    princomp function in MatLab. Results are described below.

    Compressing this data and reducing dimensionality is undertaken by ordering the eigenvalues highest to low-

    est. A vector is then constructed containing the selected eigenvalues. Arriving at the final set of data is com-pleted by taking the transpose of the newly created eigenvector and multiplying it on the left of the trans-

    posed original data set. The transpose ensures the eigenvector of most significance is at the top. This results in

    the original data now being represented in terms of the vectors we chose as opposed to being in terms of axis

    x and y. This allows us to transform our data so that is expressed in terms of the patterns between them,

    where the patterns are the lines that most accurately describe the various relationships between the data.

    Application of Fundamentals to Equity Model

    This theory has consequences for the equity markets as described in the coming section. For this undertaking I

    have created a T x n matrix (which we will call X) where T is the number of data points and n is the number of

    stocks (ten in our case). This covariance matrix will be denoted by V=V(X) and PCA performed to investigatehow many components will satisfactorily explain the variances noted in a market weighted and equally

    weighted portfolio constructed. To begin an ordinary least squares (OLS) linear regression has been run on

  • 8/4/2019 Liam_Mescall - PCA Project

    7/15

    each of the stocks returns on the principal components factors giving us an estimate of alpha for each stock

    and the betas with respect to each principal component factor [2]. This model is denoted by:

    = + . + Eqtn (3)where k represents the number of principal component factors. The regression has also been used to calculate

    t-values, values and factor sensitivities. The estimated model provides the return on each stock that isexplained by the factor model as: = + . Eqtn (4)

    We know that the principal components are based on a covariance or correlation matrix and have a mean of

    zero with E() = 0 which leaves the return on the factor model as [2]:E( = Eqtn (5)

    When we take the variance and covariance calculated in Eqtn 4 we can derive the covariance matrix of stock

    returns which tells us the covariance identified by the model with the following elements:

    est. V () = . V () Eqtn (6)est. Cov( , ) = . . V() Eqtn (7)

    This can be more concisely represented by matrix notation as follows:

    est. Eqtn (8)

    whereB =

    is the k x n matrix of OLS-estimated factor betas and takes the form:

    B = Eqtn (9)

    and is the k x k covariance of the principal components represented by:

  • 8/4/2019 Liam_Mescall - PCA Project

    8/15

    =( ) Eqtn (10)

    As the principal components are orthogonal, there is zero covariance between any two principal componentsmaking their covariance matrix diagonal in structure [2]. Knowing these values for each stock in our portfolio a

    variety of different weightings can be applied so as to the potential risk return trade off and construct the

    desired portfolio. The risk measure are further broken down into systematic risk as denoted by:

    = Eqtn (11)and specific risk is arrived at by subtracting the systematic risk from the total risk per:

    specific risk w V w - Eqtn (12)This model has been adapted and undertaken for the purposes of this report. Refer results section for further

    detail.

    MatLab Used in Analysing Data

    When running code developed for this project, the following steps were addressed:

    Data made stationary by converting prices movements to returns for the individual stocks.

    Princomp function (COEFF = princomp(X)) used to perform PCA on covariance matrix. Essential to

    produce a linear factor model. Means are also subtracted during this process.

    Using the regstats function, regress the individual stocks against the newly arrived at uncorrelated

    risk-factors noting the value at each stage.Recalculate the value of the T x n matrix times the n x n orthogonal matrix of eigenvectors.

    Add back the mean subtracted when performing the princomp function.

    Comparison of variance-covariance matrix for the input stock returns with PCA derived version.

    Empirical Analysis

    We must again refer to data described in Figure 1, 2, 3 and 4 in Data section which forms a large part of our

    empirical analysis. This detail illustrates the extent of the correlation in the ten stock multi-variable system

    over the sample period examined which is fundamental to the establishment of principal components. The

    analysis performed on the covariance matrix (Figure 3) is detailed below in Figure 5.

  • 8/4/2019 Liam_Mescall - PCA Project

    9/15

    Figure 5

    Here we review how the orthogonal transformation of the eigenvectors have been projected onto the

    subspace spanned by those eigenvectors corresponding to the largest eigenvalues and is where the

    decorrelation of data takes place.

    The DJIA dataset in question took the three largest principal components which explained 83.57% of the

    variance. More or less values could have been selected to gain further or less comfort but as the

    recommended range is between 70% and 90% as less than 70% offers little insight but greater than 90% picks

    up too much noise. The other eigenvalues were discarded as they were small and doing so reduced the final

    dimensions in the data. When interpreting the first principal component in asset changes, the more highly

    correlated the system, the more similar the values of the elements of the first eigenvector. This tells us that

    should the first principal component change when other components remain fixed then the returns will move

    by a similar amount. As is evident from figure 5, the majority of stocks have a PC1 value around the 0.2 mark

    and appears to capture a common trend in the data. No such pattern appears to be present in the PC2 and PC3

    component values. I feel that this percentage still offers a relatively high level of confidence that the three risk

    factors are representative of the stocks variance. This leaves the three-component representation as:

    = 0.0251593 - 0.02747 + 0.052639 = 0.294086 - 0.03233 + 0.00757 0.414202 + 0.249442 + 0.22005 = 0.4697 - 0.78769 + 0.282537= 0.203308 + 0.204774 - 0.01967 = 0.385474 - 0.06655 - 0.30512 = 0.365547 + 0.339575 + 0.146143 = 0.200306 + 0.215904 + 0.114223 = 0.220781 + 0.263256 + 0.078486 = 0.213582 + 0.194774 + 0.093398where represents the T x1 vector of returns on the ith stock.During the period in question, we have noted periods of huge market turmoil during which time all

    correlations have tended towards one as evident form figure 2. This is in contrast to observations from figure 1

    over a relatively stable time in stock markets. This effects the uncorrelated eigenvectors by altering the values

    to reflect the patterns noted becoming more pronounced as the correlations become stronger, both positive

  • 8/4/2019 Liam_Mescall - PCA Project

    10/15

    and negative correlations impact in this way. Over time it has been seen that the impact of bad news has a far

    stronger effect on equity markets than equivalent positive news. Factors such as the leverage effect and

    investor fear lead to mass selloff and panic and it is at times like these we note the strongest correlations in

    equity markets. From the data in figure 5 we can conclude that there was above average volatility for the

    period in the stocks of Chevron (CVX), Boeing (BA) and Exxon Mobil (XOM) which noted PCA1 factors of

    0.414202, 0.385474 and 0.365547 respectively. This is in comparison to the mean PCA1 value for the stocks of0.283560.

    Results Observed from Analysis

    The full listing of all factor sensitivities and the percentage of variance explained by each of them is detailed

    below in Figure 6. As discussed previously, I consider it appropriate to use three principal component factors

    to describe the variance present as it accounts for 83.57% of the variance.

    Figure 6

    Each principal component listed explains less variance spanning from components one to ten with little

    sequence observable in the any figures excluding principal component one. We can draw from this there are

    little common trends in the components applicable to each of the stocks. Having selected the first three

    principal components to perform further analysis on, these are now regressed against the individual stocks in

    question as described above in MatLab work performed. To satisfy eqtn (8) described by:

    est. Eqtn (8)

    I have calculated the estimated factor betas in figure 7:

  • 8/4/2019 Liam_Mescall - PCA Project

    11/15

    Figure 7

    The transpose of these values is also required as per figure 8:

    Figure 8

    Along with the element of the equation. As the principal components are orthogonal, this is represented in a

    diagonal matrix as per figure 9:

    Figure 9

    Having these values we can satisfy the equation and by doing so we create the Systematic Covariance of Stock

    returns as described by figure 10

  • 8/4/2019 Liam_Mescall - PCA Project

    12/15

    Figure 10

    Having compared the initial covariance matrix per figure 3 to the matrix with PCA applications we have noted

    the following differences in values:

    Figure 11

    There is a wide variety of departures here from the initial covariance per figure 3. As covariance is a measure

    of how much variables change together, and having noted the PCA applied, we can conclude that when PCA

    factors are identified as the sole determinant of variance and all other factors discarded, the changes in

    variance relationship between the stocks described in figure 11 can be attributed to the reduction in

    dimensionality and focus of risk on the principal components identified. There are noticeable changes between

    KO and IBM along with MMM and IBM which may indicate part of the variance was attributed to elements of

    the discarded principal components.

    Stemming from our regression we noted the following results per figure 12:

    Figure 12

    Taking the

    value to represent the measure of how well a regression line approximates real data points and

    the closer its proximity to 1 then the greater the ability of the model to predict a trend we note a range of

  • 8/4/2019 Liam_Mescall - PCA Project

    13/15

    values spanning from 0.612 with MCD offering little trend forecasting ability to 0.998 with BA providing a clear

    window into potential movements.

    Generally, any t-value greater than +2 or less than - 2 is acceptable. The higher the t-value, the greater the

    confidence we have in the coefficient as a predictor. Low t-values are indications of low reliability of the

    predictive power of that coefficient. This is consistent with our values noted and example previouslymentioned with MCD and BA noting scores of 26.27 and 419.92 respectively. As expected PC1 has the mostpredictive power as you would expect considering it explains the majority of the variation. PC2 and PC3 offer

    varying levels of predictive power and not in the same pattern as explanatory power noted in eigenvalues per

    figure 6.

    When conducting our analysis, we have performed it on two portfolios, one being equally weighted and the

    second weighted by market capitalization per figure 14. The market capitalization was derived from the last

    day of our sample (6-Oct-2010) based on shares in issue multiplied by share price on that day.

    Figure 13 Figure 14

    Figure 13 provides the breakdown of both portfolios constructed in terms of principal components. The total

    risk noted in the sum of the components per market weighted portfolio (0.5365) is far larger than that of the

    equally weighted portfolio (0.3643) making the equally weighted portfolio a more attractive potential

    investment, this is largely the best practise over the short term [ ]. Obviously the more prominent stocks such

    as XOM carry with it larger beta coefficients which are explained to a greater extent by PC2 and PC3. This is

    consistent with figure 7 which details the beta values for each principal component.

    Other risk metrics calculated are displayed in figure 15:

    Figure 15

  • 8/4/2019 Liam_Mescall - PCA Project

    14/15

    From these our indication based on beta values that the equally weighted portfolio may be a better

    investment opportunity are proved correct as variance and volatility are again superior in the market weighted

    portfolio, these are a suitable gauge of risk for the portfolios in question. There is minimal difference noted

    between the portfolio variances and systematic variances with volatility identical for both. As these results

    have not been run for the index as a whole we have no benchmark against which to compare the figures.

    These figures can be represented in dollar terms by multiplying the betas and the volatility by the total value ofthe portfolio. Upon inspection of the risk results, an intuitive understanding of these concepts tells me that

    these figures are quite low, in particular the specific variance and specific volatility figures noted. That said, the

    figures are not implausible.

    From the work undertaken we can conclude that a small number of principal components are needed to

    explain the variance with the vast majority (70%) of this in PC1 which is sufficient to offer insight on its own.

    The three components chosen have been done so to offer extra insight and for the purposes of this project.

    Once PCA has been applied to a covariance matrix, the focus of risk to the principal components identified

    changes the matrix values depending on the extent the principal component was a factor of the variance in the

    first place. We also noted that having compared the equally weighted portfolio to the market weighted

    portfolio, there is less risk present in the equally weighted portfolio and would be the preferred investment

    choice.

    Conclusion

    Illustrated in this report are the practical groundings of the PCA statistical tool in areas of risk management

    and portfolio optimization. The factor models representation of each of the series of returns as a linear

    function of the principal component offers realistic, understandable insight into patterns in data while limiting

    the dimensions of that data. The reduction in risk management workload, from dimensionality reduction, and

    computational efficiencies in the daily measurement of risk in portfolios through the selection of a smallnumber of variables capable of explaining a large portion of risk is of huge benefit to modern finance.

  • 8/4/2019 Liam_Mescall - PCA Project

    15/15

    Appendices

    Appendix 1

    List of Companies making up top ten stocks in DJIA:

    1) IBM (IBM)2) 3M (MMM)3) Chevron Corporation(CVX)4) CAT (CAT)5) McDonalds (MCD)6) Boeing (BA)7) Exxon Mobil (XOM)8) Johnson & Johnson (JNJ)9) Proctor & Gamble (P&G)10) Coca Cola (KO)

    Bibliography

    [1] Soto, Gloria M., Using Principal Component Analysis to Explain Term Structure Movements: Performance

    and Stability, 2004. Available at SSRN:http://ssrn.com/abstract=985404

    [2] Alexander, C. Practical Financial Econometrics: Market risk Analysis ol.2.John Wiley and Sons, 2009.

    [3] lanco, C. Multi-Factor Models for Forward Curve Analysis: An Introduction to Principal Component Analy-

    sis, Financial Engineering Associates, 2002.

    [4] Alexander, C.Multi-Factor Models for Forward Curve Analysis: An Introduction to Principal ComponentAnalysis. ISMA Centre, University of Reading, 2001.

    [5] Alexander, C. Market Models: A Guide to Financial Data Analysis. John Wiley and Sons, 2001.

    [6] Alexander, C. Quantitative Methods in Finance. John Wiley and Sons, 2009.

    [7] Meric, I.Co-Movements of European Equity Markets efore and After the 1987 Crash. Multinational F i-nance Society, no date provided on paper.

    http://ssrn.com/abstract=985404http://ssrn.com/abstract=985404http://ssrn.com/abstract=985404http://ssrn.com/abstract=985404