algo trading hidden markov models on foreign exchange data

Upload: moon

Post on 07-Aug-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    1/95

    Master’s Thesis

    Algorithmic TradingHidden Markov Models on Foreign Exchange Data

    Patrik Idvall, Conny Jonsson

    LiTH - MAT - EX - - 08 / 01 - - SE

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    2/95

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    3/95

    Algorithmic TradingHidden Markov Models on Foreign Exchange Data

    Department of Mathematics, Linköpings Universitet

    Patrik Idvall, Conny Jonsson

    LiTH - MAT - EX - - 08 / 01 - - SE

    Master’s Thesis:  30 hp

    Level:   A

    Supervisor:  J. Blomvall,

    Department of Mathematics, Linköpings Universitet

    Examiner:  J. Blomvall,Department of Mathematics, Linköpings Universitet

    Linköping: January 2008

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    4/95

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    5/95

    Matematiska Institutionen

    581 83 LINK ÖPING

    SWEDEN

    January 2008

    x x

    http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-10719

    LiTH - MAT - EX - - 08 / 01 - - SE

    Algorithmic Trading – Hidden Markov Models on Foreign Exchange Data

    Patrik Idvall, Conny Jonsson

    In this master’s thesis, hidden Markov models (HMM) are evaluated as a tool for forecasting movements in a currencycross. With an ever increasing electronic market, making way for more automated trading, or so called algorithmictrading, there is constantly a need for new trading strategies trying to find alpha, the excess return, in the market.HMMs are based on the well-known theories of Markov chains, but where the states are assumed hidden, governingsome observable output. HMMs have mainly been used for speech recognition and communication systems, but havelately also been utilized on financial time series with encouraging results. Both discrete and continuous versions of the model will be tested, as well as single- and multivariate input data.In addition to the basic framework, two extensions are implemented in the belief that they will further improve theprediction capabilities of the HMM. The first is a Gaussian mixture model (GMM), where one for each state assign

    a set of single Gaussians that are weighted together to replicate the density function of the stochastic process. Thisopens up for modeling non-normal distributions, which is often assumed for foreign exchange data. The second is anexponentially weighted expectation maximization (EWEM) algorithm, which takes time attenuation in consideration

    when re-estimating the parameters of the model. This allows for keeping old trends in mind while more recentpatterns at the same time are given more attention.Empirical results shows that the HMM using continuous emission probabilities can, for some model settings,generate acceptable returns with Sharpe ratios well over one, whilst the discrete in general performs poorly. TheGMM therefore seems to be an highly needed complement to the HMM for functionality. The EWEM howeverdoes not improve results as one might have expected. Our general impression is that the predictor using HMMs that

    we have developed and tested is too unstable to be taken in as a trading tool on foreign exchange data, with toomany factors influencing the results. More research and development is called for.

    Algorithmic Trading, Exponentially Weighted Expectation Maximization Algorithm, Foreign

    Exchange, Gaussian Mixture Models, Hidden Markov Models

    Nyckelord

    Keyword

    Sammanfattning

    Abstract

    F örfattare

    Author

    Titel

    Title

    URL f  ör elektronisk version

    Serietitel och serienummer

    Title of series, numbering

    ISSN

    ISRN

    ISBNSpråk

    Language

    Svenska/Swedish

    Engelska/English

    Rapporttyp

    Report category

    Licentiatavhandling

    Examensarbete

    C-uppsats

    D-uppsats

    Övrig rapport

    Avdelning, InstitutionDivision, Department

    DatumDate

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    6/95

    vi

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    7/95

    Abstract

    In this master’s thesis, hidden Markov models (HMM) are evaluated as a tool for fore-

    casting movements in a currency cross. With an ever increasing electronic market,

    making way for more automated trading, or so called algorithmic trading, there is con-

    stantly a need for new trading strategies trying to find alpha, the excess return, in themarket.

    HMMs are based on the well-known theories of Markov chains, but where the states

    are assumed hidden, governingsome observable output. HMMs have mainly been used

    for speech recognition and communication systems, but have lately also been utilized

    on financial time series with encouraging results. Both discrete and continuous versions

    of the model will be tested, as well as single- and multivariate input data.

    In addition to the basic framework, two extensions are implemented in the be-

    lief that they will further improve the prediction capabilities of the HMM. The first

    is a Gaussian mixture model (GMM), where one for each state assign a set of single

    Gaussians that are weighted together to replicate the density function of the stochas-

    tic process. This opens up for modeling non-normal distributions, which is often as-

    sumed for foreign exchange data. The second is an exponentially weighted expectationmaximization (EWEM) algorithm, which takes time attenuation in consideration when

    re-estimating the parameters of the model. This allows for keeping old trends in mind

    while more recent patterns at the same time are given more attention.

    Empirical results shows that the HMM using continuous emission probabilities can,

    for some model settings, generate acceptable returns with Sharpe ratios well over one,

    whilst the discrete in general performs poorly. The GMM therefore seems to be an

    highly needed complement to the HMM for functionality. The EWEM however does

    not improve results as one might have expected. Our general impression is that the

    predictor using HMMs that we have developed and tested is too unstable to be taken

    in as a trading tool on foreign exchange data, with too many factors influencing the

    results. More research and development is called for.

    Keywords:  Algorithmic Trading, Exponentially Weighted Expectation MaximizationAlgorithm, Foreign Exchange, Gaussian Mixture Models, Hidden Markov Mod-

    els

    Idvall, Jonsson, 2008. vii

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    8/95

    viii

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    9/95

    Acknowledgements

    Writing this master’s thesis in cooperation with Nordea Markets has been a truly re-

    warding and exciting experience. During the thesis we have experienced great support

    and interest from many different individuals.

    First of all we would like to thank Per Brugge, Head of Marketing & Global Busi-ness Development at Nordea, who initiated the possibility of this master’s thesis and

    gave us the privilege to carry it out at Nordea Markets in Copenhagen. We thank him

    for his time and effort! Erik Alpkvist at e-Markets, Algorithmic Trading, Nordea Mar-

    kets in Copenhagen who has been our supervisor at Nordea throughout this project; it

    has been a true pleasure to work with him, welcoming us so warmly and for believing

    in what we could achieve, for inspiration and ideas!

    Jörgen Blomvall at Linköpings Universitet, Department of Mathematics has also

    been very supportive during the writing of this thesis. It has been a great opportunity

    to work with him! We would also like to thank our colleagues and opponents Viktor

    Bremer and Anders Samuelsson, for helpful comments during the process of putting

    this master’s thesis together.

    Patrik Idvall & Conny Jonsson

    Copenhagen, January 2008

    Idvall, Jonsson, 2008. ix

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    10/95

    x

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    11/95

    Nomenclature

    Most of the reoccurring symbols and abbreviations are described here.

    SymbolsS  = {s1, s2,...,sN }   a set of  N  hidden states,Q = {q 1, q 2,...,q T }   a state sequence of length T  taking values from S ,O = {o1, o2,...,oT }   a sequence consisting of  T  observations,A = {a11, a12,...,aNN }   the transition probability matrix A,B  =  bi(ot)   a sequence of observation likelihoods,Π = {π1, π2,...,πN }   the initial probability distribution,λ = {A,B, Π}   the complete parameter set of the HMM,αt(i)   the joint probability of  {o1, o2, . . . , ot} and q t   =

    si given λ,β t(i)   the joint probability of  {ot+1, ot+2, . . . , oT }  and

    q t  =  si given λ,γ t   the probability of  q t =  si given O and  λ,µim   the mean for state si and mixture component m,Σim   the covariance matrix for state   si   and mixture

    component m,ρ = {ρ1, ρ2, . . . , ρt}   a vector of real valued weights for the Exponen-

    tially Weighted Expectation Maximization,

    wim   the weights for the   mth mixture component instate si,

    ξ t(i, j)   the joint probability of  q t   =   si   and  q t+1   =   sjgiven O and  λ,

    δ t(i)   the highest joint probability of a state sequenceending in  q t   =   si  and a partial observation se-

    quence ending in ot given λ,ψt j   the state si at time t which gives us δ t( j), used for

    backtracking.

    Idvall, Jonsson, 2008. xi

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    12/95

    xii

    Abbreviations

     AT    Algorithmic Trading

    CAPM    Capital Asset Pricing Model

     EM    Expectation Maximization

     ERM    Exchange Rate Mechanism

     EWEM    Exponentially Weighted Expectation Maximization

     EWMA   Exponentially Weighted Moving Average

    FX    Foreign Exchange

    GMM    Gaussian Mixture Model

     HMM    Hidden Markov Model

     LIBOR   London Interbank Offered Rate

     MC    Monte Carlo

     MDD   Maximum DrawdownOTC    Over the Counter

    PPP   Purchasing Power Parity

    VaR   Value at Risk 

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    13/95

    Contents

    1 Introduction 1

    1.1 The Foreign Exchange Market . . . . . . . . . . . . . . . . . . . . . 1

    1.1.1 Market Structure . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.2 Shifting to Electronic Markets . . . . . . . . . . . . . . . . . . . . . 3

    1.2.1 Changes in the Foreign Exchange Market . . . . . . . . . . . 4

    1.3 Algorithmic Trading . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.3.1 Different Levels of Automation . . . . . . . . . . . . . . . . 5

    1.3.2 Market Microstructure . . . . . . . . . . . . . . . . . . . . . 6

    1.3.3 Development of Algorithmic Trading . . . . . . . . . . . . . 7

    1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    1.4.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    1.4.2 Purpose Decomposition . . . . . . . . . . . . . . . . . . . . 8

    1.4.3 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    1.4.4 Academic Contribution . . . . . . . . . . . . . . . . . . . . . 91.4.5 Disposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2 Theoretical Framework 11

    2.1 Foreign Exchange Indices . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.1.1 Alphas and Betas . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.2 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.2.1 Hidden Markov Models used in Finance . . . . . . . . . . . . 13

    2.2.2 Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.2.3 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.2.4 Extending the Markov Chain to a Hidden Markov Model . . . 16

    2.2.5 Three Fundamental Problems . . . . . . . . . . . . . . . . . 17

    2.2.6 Multivariate Data and Continuous Emission Probabilities . . . 26

    2.3 Gaussian Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . 26

    2.3.1 Possibilities to More Advanced Trading Strategies . . . . . . 28

    2.3.2 The Expectation Maximization Algorithm on Gaussian Mixtures 28

    2.4 The Exponentially Weighted Expectation Maximization Algorithm . . 30

    2.4.1 The Expectation Maximization Algorithm Revisited . . . . . 30

    2.4.2 Updating the Algorithm . . . . . . . . . . . . . . . . . . . . 31

    2.4.3 Choosing η   . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    2.5 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . 34

    2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    Idvall, Jonsson, 2008. xiii

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    14/95

    xiv Contents

    3 Applying Hidden Markov Models on Foreign Exchange Data 373.1 Used Input Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    3.2 Number of States and Mixture Components and Time Window Lengths 38

    3.3 Discretizing Continuous Data . . . . . . . . . . . . . . . . . . . . . . 38

    3.4 Initial Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . 39

    3.5 Generating Different Trading Signals . . . . . . . . . . . . . . . . . . 40

    3.5.1 Trading Signal in the Discrete Case . . . . . . . . . . . . . . 40

    3.5.2 Standard Signal in the Continuous Case . . . . . . . . . . . . 41

    3.5.3 Monte Carlo Simulation Signal in the Continuous Case . . . . 42

    3.6 Variable Constraints and Modifications . . . . . . . . . . . . . . . . . 42

    3.7 An Iterative Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 43

    3.8 Evaluating the Model . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    3.8.1 Statistical Testing . . . . . . . . . . . . . . . . . . . . . . . . 43

    3.8.2 Sharpe Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    3.8.3 Value at Risk . . . . . . . . . . . . . . . . . . . . . . . . . . 453.8.4 Maximum Drawdown . . . . . . . . . . . . . . . . . . . . . 45

    3.8.5 A Comparative Beta Index . . . . . . . . . . . . . . . . . . . 46

    4 Results 494.1 The Discrete Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    4.1.1 Using Only the Currency Cross as Input Data . . . . . . . . . 50

    4.1.2 Adding Features to the Discrete Model . . . . . . . . . . . . 50

    4.2 The Continuous Model . . . . . . . . . . . . . . . . . . . . . . . . . 54

    4.2.1 Prediction Using a Weighted Mean of Gaussian Mixtures . . . 54

    4.2.2 Using Monte Carlo Simulation to Project the Distribution . . 55

    4.3 Including the Spread . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    4.3.1 Filtering Trades . . . . . . . . . . . . . . . . . . . . . . . . . 62

    4.4 Log-likelihoods, Random Numbers and Convergence . . . . . . . . . 63

    5 Analysis 675.1 Effects of Different Time Windows . . . . . . . . . . . . . . . . . . . 67

    5.2 Hidden States and Gaussian Mixture Components . . . . . . . . . . . 68

    5.3 Using Features as a Support . . . . . . . . . . . . . . . . . . . . . . 68

    5.4 The Use of Different Trading Signals . . . . . . . . . . . . . . . . . . 69

    5.5 Optimal Circumstances . . . . . . . . . . . . . . . . . . . . . . . . . 70

    5.6 State Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    5.7 Risk Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    5.8 Log-likelihood Sequence Convergence . . . . . . . . . . . . . . . . . 71

    6 Conclusions 736.1 Too Many Factors Brings Instability . . . . . . . . . . . . . . . . . . 73

    6.2 Further Research Areas . . . . . . . . . . . . . . . . . . . . . . . . . 74

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    15/95

    List of Figures

    1.1 Daily turnover on the global FX market 1989-2007. . . . . . . . . . . 2

    1.2 Schematic picture over the process for trading . . . . . . . . . . . . . 5

    2.1 An example of a simple Markov chain . . . . . . . . . . . . . . . . . 15

    2.2 An example of a hidden Markov model . . . . . . . . . . . . . . . . 18

    2.3 The forward algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 20

    2.4 The backward algorithm . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.5 Determining ξ   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.6 An example of a Gaussian mixture . . . . . . . . . . . . . . . . . . . 27

    2.7 A hidden Markov model making use of a Gaussian mixture . . . . . . 29

    2.8 The effect of  η   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.9 (a) Monte Carlo simulation using 1000 simulations over 100 time pe-

    riods. (b) Histogram over the final value for the 1000 simulations. . . 34

    3.1 Gaussian mixture together with a threshold d   . . . . . . . . . . . . . 413.2 Maximun drawdown for EURUSD . . . . . . . . . . . . . . . . . . . 46

    3.3 Comparative beta index . . . . . . . . . . . . . . . . . . . . . . . . . 47

    4.1 Development of the EURUSD exchange rate . . . . . . . . . . . . . . 50

    4.2 Discrete model with a 30 days window . . . . . . . . . . . . . . . . . 51

    4.3 Discrete model with a 65 days window . . . . . . . . . . . . . . . . . 51

    4.4 Discrete model with a 100 days window . . . . . . . . . . . . . . . . 52

    4.5 Discrete model with threshold of 0.4 . . . . . . . . . . . . . . . . . . 52

    4.6 Discrete model with six additional features . . . . . . . . . . . . . . . 53

    4.7 Discrete model with three additional features . . . . . . . . . . . . . 53

    4.8 Continuous model, using only the cross, and a 20 days window . . . . 55

    4.9 Continuous model, using only the cross, and a 30 days window . . . . 56

    4.10 Continuous model, using additional features, and a 50 days window . 564.11 Continuous model, using additional features, and a 75 days window . 57

    4.12 Continuous model, using additional features, and a 100 days window . 57

    4.13 Continuous model, using additional features, and a 125 days window . 58

    4.14 Continuous model with a 75 days EWEM . . . . . . . . . . . . . . . 58

    4.15 Continuous model with the K-means clustering . . . . . . . . . . . . 59

    4.16 Continuous model with the MC simulation (50%) trading signal . . . 59

    4.17 Continuous model with the MC simulation (55%) trading signal . . . 60

    4.18 Best discrete model with the spread included . . . . . . . . . . . . . . 61

    4.19 Best continuous model with the spread included . . . . . . . . . . . . 61

    4.20 Continuous model with standard trade filtering . . . . . . . . . . . . . 62

    Idvall, Jonsson, 2008. xv

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    16/95

    xvi List of Figures

    4.21 Continuous model with MC simulation (55%) trade filtering . . . . . 63

    4.22 (a) Log-likelihood sequence for the first 20 days for the discrete model

    (b) Log-likelihood sequence for days 870 to 1070 for the discrete model 64

    4.23 (a) Log-likelihood sequence for days 990 to 1010 for the continuous

    model (b) Log-likelihood sequence for the last 20 days for the contin-

    uous model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    17/95

    Chapter 1

    Introduction

    In this first chapter a description of the background to this master’s thesis will be given.

    It will focus on the foreign exchange market on the basis of its structure, participants

    and recent trends. The objective will be pointed out, considering details such as the

    purpose of the thesis and the delimitations that have been considered throughout the

    investigation. In the end of the objectives the disposal will be gone through, just to

    give the reader a clearer view of the thesis’ different parts and its mutual order.

    1.1 The Foreign Exchange Market

    The Foreign Exchange (FX) market is considered the largest and most liquid 1 of all

    financial markets with a $3.2 trillion daily turnover. Between 2004 and 2007 the dailyturnover has increased with as much as 71 percent. [19]

    The increasing growth in turnover over the last couple of years, seen in figure 1.1

    seems to be led by two related factors. First of all, the presence of trends and higher

    volatility in FX markets between 2001 and 2004, led to an increase of momentum trad-

    ing, where investors took large positions in currencies that followed appreciating trends

    and short positions in decreasing currencies. These trends also induced an increase in

    hedging activity, which further supported trading volumes. [20]

    Second, interest differentials encouraged so called carry trading, i.e. investments in

    high interest rate currencies financed by short positions in low interest rate currencies, if 

    the target currencies, like the Australian dollar, tended to appreciate against the funding

    currencies, like the US dollar. Such strategies fed back into prices and supported the

    persistence of trends in exchange rates. In addition, in the context of a global searchfor yield, so called real money managers2 and leveraged3 investors became increasingly

    interested in foreign exchange as an asset class alternative to equity and fixed income.

    As one can see in figure 1.1, the trend is also consistent from 2004 and forward,

    with more and more money put into the global FX market. The number of participants

    1The degree to which an asset or security can be bought or sold in the market without affecting the asset’s

    price.2Real money managers are market players, such as pension funds, insurance companies and corporate

    treasurers, who invest their own funds. This distinguishes them from leveraged investors, such as for examplehedge funds, that borrow substantial amounts of money.

    3Leverage is the use of various financial instruments or borrowed capital, such as margin, to increase thepotential return of an investment.

    Idvall, Jonsson, 2008. 1

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    18/95

    2 Chapter 1. Introduction

    1989 1992 1995 1998 2001 2004 20070

    500

    1000

    1500

    2000

    2500

    3000

    3500

    Year

       D  a   i   l  y   t  u  r  n  o  v  e  r   i  n  m   i   l   l   i  o  n   U   S   D

     

    Spot transactions

    Outright transactions

    Foreign exchange swaps

    Estimated gaps in reporting

    Figure 1.1: Daily return on the global FX market 1989-2007

    and the share of the participants portfolio towards FX are continuously increasing,

    comparing to other asset classes.

    1.1.1 Market Structure

    The FX market is unlike the stock market an over the counter (OTC) market. Thereis no single physical located place were trades between different players are settled,

    meaning that all participants do not have access to the same price. Instead the markets

    core is built up by a number of different banks. That is why it sometimes is called

    an inter-bank market. The market is opened 24 hours a day and moves according to

    activity in large exporting and importing countries as well as in countries with highly

    developed financial sectors. [19]

    The participants of the FX market can roughly be divided into the following five

    groups, characterized by different levels of access:

    •  Central Banks•  Commercial Banks•  Non-bank Financial Entities•  Commercial Companies•  Retail Traders

    Central banks have a significant influence in FX markets by virtue of their role in

    controlling their countries’ money supply, inflation, and/or interest rates. They may

    also have to satisfy official/unofficial target rates for their currencies. While they may

    have substantial foreign exchange reserves that they can sell in order to support their

    own currency, highly overt intervention, or the stated threat of it, has become less

    common in recent years. While such intervention can indeed have the desired effect,

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    19/95

    1.2. Shifting to Electronic Markets 3

    there have been several high profile instances where it has failed spectacularly, such as

    Sterling’s exit from the Exchange Rate Mechanism (ERM) in 1992.

    Through their responsibility for money supply, central banks obviously have a con-

    siderable influence on both commercial and investment banks. An anti-inflationary

    regime that restricts credit or makes it expensive has an effect upon the nature and level

    of economic activity, e.g. export expansion, that can feed through into changes in FX

    market activity and behaviour.

    At the next level we have commercial banks. This level constitute the inter-bank 

    section of the FX market and consist of participants such as Deutsche Bank, UBS AG,

    City Group and many others including Swedish banks such as Nordea, SEB, Swedbank 

    and Handelsbanken.

    Within the inter-bank market, spreads, which are the difference between the bid

    and ask prices, are sharp and usually close to non-existent. These counterparts act as

    market makers toward customers demanding the ability to trade currencies, meaning

    that they determine the market price.As you descend the levels of access, from commercial banks to retail traders, the

    difference between the bid and ask prices widens, which is a consequence of volume.

    If a trader can guarantee large numbers of transactions for large amounts, they can

    demand a smaller difference between the bid and ask price, which is referred to as a

    better spread. This also implies that the spread is wider for currencies with less frequent

    transactions. The spread has an important role in the FX market, more important than

    in the stock market, because it is equivalent to the transaction cost. [22]

    When speculating in the currency market you speculate in currency pairs, which

    describes the relative price of one currency relative to another. If you believe that

    currency A will strengthen against currency B  you will go long in currency pair A/B.The largest currencies is the global FX market is US Dollar (USD), Euro (EUR) and

    Japanese Yen (JPY). USD stands for as much as 86 percent of all transactions, followedby EUR (37 percent) and JPY (17 percent). 4 [19]

    1.2 Shifting to Electronic Markets

    Since the era of floating exchange rates began in the early 1970s, technical trading

    has become widespread in the stock market as well as the FX markets. The trend in

    the financial markets industry in general is the increased automation of the trading, so

    called algorithmic trading (AT), at electronic exchanges. [2]

    Trading financial instruments has historically required face-to-face contact between

    the market participants. The Nasdaq OTC market was one of the first to switch from

    physical interaction to technological solutions, and today many other market placeshas also replaced the old systems. Some markets, like the New York Stock Exchange

    (NYSE), still uses physical trading but has at the same time opened up some functions

    to electronic trading. It is of course innovations in computing and communications that

    has made this shift possible, with global electronic order routing, broad dissemination

    of quote and trade information, and new types of trading systems. New technology also

    reduces the costs of building new trading systems, hence lowering the entry barriers for

    new entrants. Finally, electronic markets also enables for a faster order routing and data

    transmission to a much larger group of investors than before. The growth in electronic

    4Because two currencies are involved in each transaction, the sum of the percentage shares of individualcurrencies totals 200 percent instead of 100 percent.

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    20/95

    4 Chapter 1. Introduction

    trade execution has been explosive on an international basis, and new stock markets are

    almost exclusively electronic. [11]

    1.2.1 Changes in the Foreign Exchange Market

    The FX market differs somewhat from stock markets. These have traditionally been

    dealer markets that operate over the telephone, and physical locations where trading

    takes place has been non-existing. In such a market, trade transparency is low. But

    for the last years, more and more of the FX markets has moved over to electronic

    trading. Reuters and Electronic Broking Service (EBS) developed two major electronic

    systems for providing quotes, which later on turned into full trading platforms, allowing

    also for trade execution. In 1998, electronic trading accounted for 50 percent of all

    FX trading, and it has continued rising ever since. Most of the inter-dealer trading

    nowadays takes place on electronic markets, while trading between large corporations

    and dealers still remain mostly telephone based. [11] Electronic trading platforms forcompanies have however been developed, like Nordea e-Markets, allowing companies

    to place and execute trades without interaction from a dealer.

    1.3 Algorithmic Trading

    During the last years, the trend towards electronic markets and automated trading has,

    as mentioned in section 1.2, been significant. As a part of this many financial firms has

    inherited trading via so called algorithms5, to standardize and automate their trading

    strategies in some sense. As a central theme in this thesis AT deserves further clari-

    fication and explanation. In this section different views of AT is given, to present a

    definition of the term.

    In general AT can be described as trading, with some elements being executed

    by an algorithm. The participation of algorithms enables automated employment of 

    predefined trading strategies. Trading strategies are automated by defining a sequence

    of instructions executed by a computer, with little or no human intervention. AT is the

    common denominator for different trends in the area of electronic trading that result in

    increased automation in:

    1. Identifying investment opportunities (what to trade).

    2. Executing orders for a variety of asset classes (when, how and where to trade).

    This includes a broad variety of solutions employed by traders in different markets,

    trading different assets. The common denominator for the most of these solutions is

    the process from using data series for pre-trade analysis to final trade execution. Theexecution can be made by the computer itself or via a human trader. In figure 1.2 one

    can see a schematic figure over this process. This figure is not limited to traders using

    AT; rather it is a general description of some of the main elements of trading. The four

    steps presented in figure 1.2 defines the main elements of trading. How many of these

    steps that are performed by a computer is different between different traders and is a

    measure of how automated the process is. The first step includes analysis of market data

    as well as adequate external news. The analysis is often supported by computer tools

    such as spreadsheets or charts. The analysis is a very important step, which will end up

    5A step-by-step problem-solving procedure, especially an established, recursive computational procedurefor solving a problem in a finite number of steps. [21]

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    21/95

    1.3. Algorithmic Trading 5

    Figure 1.2: Schematic picture over the process for trading

    in a trading signal and a trading decision in line with the overlaying trading strategy.

    The last step of the process is the real execution of the trading decision. This can be

    made automatically by a computer or by a human trader. The trade execution contains

    an order, sent to the foreign exchange market and the response as a confirmation from

    the same exchange. [3]

    This simplified description might not hold for any specific trader taking into ac-count the many different kinds of players that exists on a financial market. However,

    it shows that the trading process can be divided into different steps that follow sequen-

    tially. If one are now to suggest how trading is automated, there is little difficulty in

    discuss the different steps being separately programmed into algorithms and executed

    by a computer. Of course this is not a trivial task, especially as the complex consid-

    erations previously made by humans have to be translated into algorithms executed by

    machines. One other obstacle that must be dealt with in order to achieve totally auto-

    mated trading is how to connect the different steps into a functional process and how to

    interface this with the external environment, in other words the exchange and the mar-

    ket data feed. In the next section different levels of AT will be presented. The levels

    are characterized by the number of stages, presented in figure 1.2, that are replaced by

    algorithms, performed by a machine.

    1.3.1 Different Levels of Automation

    A broad variety of definitions of AT is used, dependent on who is asked. The differ-

    ences is often linked to the level of technological skill that the questioned individual

    possess. Based on which steps that are automated and the implied level of human

    intervention in the trading process, different categories can be distinguished. It is im-

    portant to notice that the differences do not only consist of the number of steps auto-

    mated. There can also be differences between sophistication and performance within

    the steps. This will not be taken in consideration here though, focusing on the level of 

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    22/95

    6 Chapter 1. Introduction

    automation.

    Four different ways of defining AT are to be described, influenced by [3]. The first

    category, here named AT 1 presuppose that the first two steps are fully automated. Theysomewhat goes hand in hand because the pre-trade analysis often leads to a trading

    signal in one way or another. This means that the human intervention is limited to the

    two last tasks, namely the trading decision and the execution of the trade.

    The second category, AT 2  is characterized by an automation of the last step in thetrading process, namely the execution. The aim of execution algorithms is often to

    divide large trading volumes into smaller orders and thereby minimizing the adverse

    price impact a large order otherwise might suffer. It should be mentioned that different

    types of execution algorithms are often supplied by third party software, and also as a

    service to the buy-side investor of a brokerage. Using execution algorithms leaves the

    first three steps, analysis, trading signal and trading decision to the human trader.

    If one combines the first two categories a third variant of AT is developed, namely

    AT 3. AT 3 is just leaving the trading decision to the human trader, i.e. letting algorithmstaking care of step 1, 2 and 4.

    Finally, fully automated AT, AT 4, often referred to as black-box-trading, is obtainedif all four steps is replaced by machines performing according to algorithmically set

    decisions. This means that the human intervention is only control and supervision,

    programming and parameterizations. To be able to use systems, such as   AT 3   andAT 4, great skills are required, both when it comes to IT solutions and algorithmicdevelopment.

    Independent on what level of automation that is intended, one important issue is

    to be considered, especially if regarding high-frequency trading, namely the markets

    microstructure. The microstructure contains the markets characteristics when dealing

    with price, information, transaction costs, market design and other necessary features.

    In the next section a brief overview of this topic will be gone through, just to give thereader a feeling of its importance when implementing any level of algorithmic trading.

    1.3.2 Market Microstructure

    Market microstructure is the branch of financial economics that investigates trading and

    the organization of markets. [8] It is of great importance when trading is carried out on

    a daily basis or on an even higher frequency, where micro-based models, in contrary

    to macro-based, can account for a large part of variations in daily prices on financial

    assets. [4] And this is why the theory on market microstructure is also essential when

    understanding AT, taking advantage of for example swift changes in the market. This

    will not be gone through in-depth here, as it lies out of the this master’s thesis’ scope;

    the aim is rather to give a brief overview of the topic. Market microstructure mainly

    deals with four issues, which will be gone through in the following sections [5].

    Price formation and price discovery

    This factor focuses on the process by which the price for an asset is determined, and

    it is based on the demand and supply conditions for a given asset. Investors all have

    different views on the future prices of the asset, which makes them trade it at differ-

    ent prices, and price discovery is simply when these prices match and a trade takes

    place. Different ways of carrying out this match is through auctioning or negotiation.

    Quote-driven markets, as opposed to order-driven markets where price discovery takes

    place as just described, is where investors trade on the quoted prices set by the markets

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    23/95

    1.3. Algorithmic Trading 7

    makers, making the price discovery happen quicker and thus the market more price

    efficient.

    Transaction cost and timing cost

    When an investor trades in the market, he or she faces two different kinds of costs:

    implicit and explicit. The latter are those easily identified, e.g. brokerage fees and/or

    taxes. Implicit however are described as hard to identify and measure. Market impact

    costs relates to the price change due to large trades in a short time and timing costs to

    the price change that can occur between decision and execution. Since the competition

    between brokers has led to significant reduction of the explicit costs, enhancing the

    returns is mainly a question of reducing the implicit. Trading in liquid markets and

    ensuring fast executions are ways of coping with this.

    Market structure and design

    This factor focuses on the relationship between price determination and trading rules.

    These two factors have a large impact on price discovery, liquidity and trading costs,

    and refers to attributes of a market defined in terms of trading rules, which amongst oth-

    ers include degree of continuity, transparency, price discovery, automation, protocols

    and off-markets trading.

    Information and disclosure

    This factor focuses on the market information and the impact of the information on

    the behavior of the market participants. A well-informed trader is more likely to avoid

    the risks related to the trading, than one less informed. Although the theory of marketefficiency states that the market is anonymous and that all participants are equally in-

    formed, this is seldom the case which give some traders an advantage. When talking

    about market information, we here refer to information that has a direct impact on the

    market value of an asset.

    1.3.3 Development of Algorithmic Trading

    As the share of AT increases in a specific market, it provides positive feedback for

    further participants to automate their trading. Since algorithmic solutions benefits from

    faster and more detailed data, the operators in the market have started to offer these

    services on request from algorithmic traders. This new area of business makes the

    existing traders better of if they can deal with the increasing load of information. Theresult is that the non-automated competition will loose out on the algorithms even more

    than before. For this reason it is not bold to predict that as the profitability shifts in favor

    of AT, more trading will also shift in that direction. [3]

    With a higher number of algorithmic traders in the market, there will be an increas-

    ing competition amongst them. This will most certain lead to decreasing margins, tech-

    nical optimization and business development. Business development contains, amongst

    other things, innovation as firms using AT search for new strategies, conceptually dif-

    ferent from existing ones. Active firms on the financial market put a great deal of effort

    into product development with the aim to find algorithms, capturing excess return by

    unveiling the present trends in the FX market.

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    24/95

    8 Chapter 1. Introduction

    1.4 Objectives

    As the title of this master’s thesis might give away, an investigation of the use of hidden

    Markov models (HMM) as an AT tool for investments on the FX market will be carried

    out. HMMs is one among many other frameworks such as artificial neural networks

    and support vector machines that could constitute a base of a successful algorithm.

    The framework chosen to be investigated was given in advance from Algorithmic

    Trading at Nordea Markets in Copenhagen. Because of this initiation, other frame-

    works, as the two listed above, will not be investigated or compared to the HMMs

    throughout this study. Nordea is interested in how HMMs can be used as a base for

    developing an algorithm capturing excess return within the FX market, and from that

    our purpose do emerge.

    1.4.1 Purpose

    To evaluate the use of hidden Markov models as a tool for algorithmic trading on

     foreign exchange data.

    1.4.2 Purpose Decomposition

    The problem addressed in this study is, as we mentioned earlier, based on the insight

    that Nordea Markets is looking to increase its knowledge about tools for AT. This is

    necessary in order to be able to create new trading strategies for FX in the future. So,

    to be able to give a recommendation, the question to be answered is:  should Nordea

    use hidden Markov models as a strategy for algorithmic trading on foreign exchange

    data?To find an answer to this one have to come up with a way to see if the framework 

    of HMMs can be used to generate a rate of return that under risk adjusted manners

    exceeds the overall market return. To do so one have to investigate the framework of 

    HMMs in great detail to see how it could be applied to FX data. One also have to find

    a measure of market return to see if the algorithm is able to create higher return using

    HMMs. Therefore the main tasks is the following:

    1. Put together an index which reflects the market return.

    2. Try to find an algorithm using HMMs that exceeds the return given by the created

    index.

    3. Examine if the algorithm is stable enough to be used as a tool for algorithmic

    trading on foreign exchange data.

    If these three steps are gone through in a stringent manner one can see our purpose

    as fulfilled. The return, addressed in step one and two, is not only the rate of return

    itself. It will be calculated together with the risk of each strategy as the return-to-risk 

    ratio, also called Sharpe ratio described in detail in 3.8.2. The created index will be

    compared with well known market indices to validate its role as a comparable index

    for market return. To evaluate the models stability back-testing will be used. The

    models performance will be simulated using historical data for a fixed time period.

    The chosen data and time period is described in section 3.1.

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    25/95

    1.4. Objectives 9

    1.4.3 Delimitations

    The focus has during the investigation been set to a single currency cross, namely theEURUSD. For the chosen cross we have used one set of features given to us from

    Nordea Quantitative Research as supportive time series. The back testing period for

    the created models, the comparative index as well as the HMM, has been limited to

    the period for which adequate time series has been given, for both the chosen currency

    cross and the given features. Finally the trading strategy will be implemented to an

    extent containing step one and two in figure 1.2. It will not deal with the final trading

    decision or the execution of the trades.

    1.4.4 Academic Contribution

    This thesis is written from a financial engineering point of view, combining areas such

    as computer science, mathematical science and finance. The main academic contribu-tions of the thesis are the following:

    •  The master’s thesis is one of few applications using hidden Markov models ontime dependent sequences of data, such as financial time series.

    •  It is the first publicly available application of hidden Markov models on foreignexchange data.

    1.4.5 Disposal

    The rest of this master’s thesis will be organized as follows. In chapter 2, the theoretical

    framework will be reviewed in detail to give the reader a clear view of the theories used

    when evaluating HMM as a tool for AT. This chapter contains information about differ-ent trading strategies used on FX today as well as the theory of hidden Markov models,

    Gaussian mixture models (GMM), an exponentially weighted expectation maximiza-

    tion (EWEM) algorithm and Monte Carlo (MC) simulation.

    In chapter 3 the developed model is described in detail to clear out how the theories

    has been used in practice to create algorithms based on the framework of HMMs. The

    created market index will also be reviewed, described shortly to give a comparable

    measure of market return.

    Chapter 4 contains the results of the tests made for the developed models. Trajecto-

    ries will be presented and commented using different features and settings to see how

    the parameters affect the overall performance of the model. The models performance

    will be tested and commented using statistical tests and well known measurements.

    The analysis of the results are carried out in chapter 5, on the base of different

    findings made throughout the tests presented in chapter 4. The purpose of the analysisis to clear out the underlying background to the results one can see, both positive and

    negative.

    Finally, chapter 6 concludes our evaluation of the different models and the frame-

    work of HMMs as a tool for AT on FX. Its purpose is to go through the three steps

    addressed in section 1.4.2 to finally answer the purpose of the master’s thesis. This

    chapter will also point out the most adequate development needed to improve the mod-

    els further.

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    26/95

    10 Chapter 1. Introduction

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    27/95

    Chapter 2

    Theoretical Framework

    This chapter will give the reader the theoretical basis needed to understand HMMs,

    i.e. the model to be evaluated in this master’s thesis. Different variants, like the one

    making use of GMMs, and improvements to the HMM, that for example takes time

    attenuation in consideration, will also be presented. The chapter however starts with

    some background theories regarding market indices and FX benchmarks, which will

    constitute the theoretical ground for the comparative index.

    2.1 Foreign Exchange Indices

    Throughout the years many different trading strategies have been developed to cap-

    ture return from the FX market. As one finds in any asset class, the foreign exchange

    world contains a broad variety of distinct styles and trading strategies. For other as-

    set classes it is easy to find consistent benchmarks, such as indices like Standard and

    Poor’s 500 (S&P 500) for equity, Lehmans Global Aggregate Index (LGAI) for bonds

    and Goldman Sachs Commodity Index (GSCI) for commodities. It is harder to find a

    comparable index for currencies.

    When viewed as a set of trading rules, the accepted benchmarks of other asset

    classes indicate a level of subjectivity that would not otherwise be apparent. In fact,

    they really reflect a set of transparent trading rules of a given market. By being widely

    followed, they become benchmarks. By looking at benchmarks from this perspective

    there is no reason why there should not exist an applicable benchmark for currencies.

    [6]

    The basic criteria for establishing a currency benchmark is to find approaches that

    are widely known and followed to capture currency return on the global FX market.In march 2007 Deutsche Bank unveiled their new currency benchmark, The Deutsche

    Bank Currency Return (DBCR) Index. Their index contains a mixture of three strate-

    gies, namely Carry, Momentum and Valuation. These are commonly accepted indices

    that also other large banks around the world make use of.

    Carry  is a strategy in which an investor sells a certain currency with a relatively

    low interest rate and uses the funds to purchase a different currency yielding a higher

    interest rate. A trader using this strategy attempts to capture the difference between the

    rates, which can often be substantial, depending on the amount of leverage the investor

    chooses to use.

    To explain this strategy in more detail an example of a ”JPY carry trade” is here

    Idvall, Jonsson, 2008. 11

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    28/95

    12 Chapter 2. Theoretical Framework  

    presented. Lets say a trader borrows 1 000 000 JPY from a Japanese bank, converts

    the funds into USD and buys a bond for the equivalent amount. Lets also assume that

    the bond pays 5.0 percent and the Japanese interest rate is set to 1.5 percent. The

    trader stands to make a profit of 3.5 percent (5.0 - 1.5 percent), as long as the exchange

    rate between the countries does not change. Many professional traders use this trade

    because the gains can become very large when leverage is taken into consideration. If 

    the trader in the example uses a common leverage factor of 10:1, then he can stand to

    make a profit of 35 percent.

    The big risk in a carry trade is the uncertainty of exchange rates. Using the example

    above, if the USD were to fall in value relative to the JPY, then the trader would run

    the risk of losing money. Also, these transactions are generally done with a lot of 

    leverage, so a small movement in exchange rates can result in big losses unless hedged

    appropriately. Therefore it is important also to consider the expected movements in the

    currency as well as the interest rate for the selected currencies.

    Another commonly used trading strategy is  Momentum, which is based on the ap-pearance of trends in the currency markets. Currencies appear to trend over time, which

    suggests that using past prices may be informative to investing in currencies. This is

    due to the existence of irrational traders, the possibility that prices provide information

    about non-fundamental currency determinants or that prices may adjust slowly to new

    information. To see if a currency has a positive or negative trend one have to calculate a

    moving average for a specific historical time frame. If the currency has a higher return

    during the most recent moving average it said to have a positive trend and vice versa.

    [6]

    The last trading strategy, described by Deutsche Bank, is  Valuation. This strategy

    is purely based on the fundamental price of the currency, calculated using Purchasing

    Power Parity (PPP). A purchasing power parity exchange rate equalizes the purchasing

    power of different currencies in their home countries for a given basket of goods. If for example a basket of goods costs 125 USD in US and a corresponding basket in

    Europe costs 100 EUR, the fair value of the exchange rate would be 1.25 EURUSD

    meaning that people in US and Europe have the same purchasing power. This is why

    it is believed that the currencies in the long run tend to revert towards their fair value

    based on PPP. But in short- to medium-run they might deviate somewhat from this

    equilibrium due to trade, information and other costs. These movements allows for

    profiting by buying undervalued currencies and selling overvalued. [6]

    2.1.1 Alphas and Betas

    The described strategies are often referred to as beta strategies, reflecting market return.

    Beside these there is a lot of other strategies, trying to find the alpha in the market.

    Alpha is a way of describing excess return, captured by a specific fund or individualtrader. Alpha is defined using Capital Asset Pricing Model (CAPM). CAPM for a

    portfolio is

    r p = rf  + β  p(rM  − rf )where rf   is the risk free rate,  β  p   =

      ρpM σpσM σ2M 

    the volatility of the portfolio relative

    some index explaining market return, and  (rM  − rf )   the market risk premia. Alphacan now be described as the excess return, comparing to CAPM, as follows

    α =  r∗ p − r p = r∗ p − (rf  + β  p(rm − rf ))where r∗ p  is the actual return given by the asset and  r p the return given by CAPM.

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    29/95

    2.2. Hidden Markov Models 13

    The above description of alpha is valid for more or less all financial assets. But

    when it comes to FX, it might sometimes be difficult to assign a particular risk free rate

    to the portfolio. One suggestion is simply to use the rates of the country from where

    the portfolio is being managed, but the most reoccurring method is to leave out the risk 

    free rate. This gives

    α =  r∗ p − r p =  r∗ p − β  prmwhere r∗ p  is adjusted for various transaction costs, where the most common is the costrelated to the spread.

    2.2 Hidden Markov Models

    Although initially introduced in the 1960’s, HMM first gained popularity in the late

    1980’s. There are mainly two reasons for this; first the models are very rich in mathe-

    matical structure and hence can form the theoretical basis for many applications. Sec-ond, the models, when applied properly, work very well in practice. [17] The term

    HMM is more familiar in the speech recognition community and communication sys-

    tems, but has during the last years gained acceptance in finance as well as economics

    and management science. [16]

    The theory of HMM deals with two things: estimation and control. The first in-

    clude signal filtering, model parameter identification, state estimation, signal smooth-

    ing, and signal prediction. The latter refers to selecting actions which effect the signal-

    generating system in such a way as to achieve ceratin control objectives. Essentially,

    the goal is to develop optimal estimation algorithms for HMMs to filter out the ran-

    dom noise in the best possible way. The use of HMMs is also motivated by empirical

    studies that favors Markov-switching models when dealing with macroeconomic vari-

    ables. This provides flexibility to financial models and incorporates stochastic volatil-ity in a simple way. Early works, proposing to have an unobserved regime following

    a Markov process, where the shifts in regimes could be compared to business cycles,

    stock prices, foreign exchange, interest rates and option valuation. The motive for a

    regime-switching model is that the market may switch from time to time, between for

    example periods of high and low volatility.

    The major part of this section is mainly gathered from [17] if nothing else is stated.

    This article is often used as a main reference by other authors due to its thorough

    description of HMMs in general.

    2.2.1 Hidden Markov Models used in Finance

    Previous applications in the field of finance where HMMs have been used range all

    the way from pricing of options and variance swaps and valuation of life insurancespolicies to interest rate theory and early warning systems for currency crises. [16] In

    [14] the author uses hidden Markov models when pricing bonds through considering

    a diffusion model for the short rate. The drift and the diffusion parameters are here

    modulated by an underlying hidden Markov process. In this way could the value of the

    short rate successfully be predicted for the next time period.

    HMMs has also, with great success, been used on its own or in combination with

    e.g. GMMs or artificial neural networks for prediction of financial time series, as equity

    indices such as the S&P 500. [18, 9] In these cases the authors has predicted the rate

    of return for the indices during the next time step and thereby been able to create an

    accurate trading signal.

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    30/95

    14 Chapter 2. Theoretical Framework  

    The wide range of applications together with the proven functionality, in both fi-

    nance and other communities such as speak recognition, and the flexible underlying

    mathematical model is clearly appealing.

    2.2.2 Bayes Theorem

    A HMM is a statistical model in which the system being modeled is assumed to be a

    Markov process with unknown parameters, and the challenge is to determine the hidden

    parameters from the observable parameters. The extracted model parameters can then

    be used to perform further analysis, for example for pattern recognition applications.

    A HMM can be considered as the simplest dynamic Bayesian network, which is a

    probabilistic graphical model that represents a set of variables and their probabilistic

    independencies, and where the variables appear in a sequence.

    Bayesian probability is an interpretation of the probability calculus which holds

    that the concept of probability can be defined as the degree to which a person (orcommunity) believes that a proposition is true. Bayesian theory also suggests that

    Bayes theorem can be used as a rule to infer or update the degree of belief in light of 

    new information.

    The probability of an event A conditional on event B is generally different from theprobability of  B  conditional on A. However, there is a definite relationship betweenthe two, and Bayes’ theorem is the statement of that relationship.

    To derive the theorem, the definition of conditional probability used. The probabil-

    ity of event A given B  is

    P (A|B) =   P (A ∩ B)P (B)

      .

    Likewise, the probability of event B, given event A is

    P (B|A) =   P (B ∩A)P (A)

      .

    Rearranging and combining these two equations, one find

    P (A|B)P (B) = P (A ∩ B) = P (B ∩A) = P (B|A)P (A).This lemma is sometimes called the product rule for probabilities. Dividing both sides

    by P (B), given P (B) = 0, Bayes’ theorem is obtained:

    P (A|B) =   P (B|A)P (A)P (B)

    There is also a version of Bayes’ theorem for continuous distributions. It is some-

    what harder to derive, since probability densities, strictly speaking, are not probabili-

    ties, so Bayes’ theorem has to be established by a limit process. Bayes’ theorem forprobability density functions is formally similar to the theorem for probabilities:

    f (x|y) =  f (x, y)f (y)

      =  f (y|x)f (x)

    f (y)

    and there is an analogous statement of the law of total probability:

    f (x|y) =   f (y|x)f (x) ∞−∞

    f (y|x)f (x)dx .

    The notation have here been somewhat abused, using  f  for each one of these terms,although each one is really a different function; the functions are distinguished by the

    names of their arguments.

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    31/95

    2.2. Hidden Markov Models 15

    2.2.3 Markov Chains

    A Markov chain, sometimes also referred to as an observed Markov Model, can be

    seen as a weighted finite-state automaton, which is defined by a set of states and a

    set of transitions between the states based on the observed input. In the case of the

    Markov chain the weights on the arcs going between the different states can be seen as

    probabilities of how likely it is that a particular path is chosen. The probabilities on the

    arcs leaving a node (state) must all sum up to 1. In figure 2.1 there is a simple example

    of how this could work.

    s1: Sunny

    s2: Cloudy

    s3: Rainy

      a   1   2  =

     0.   2

    a13   = 0.4  a   2   1  =

     0.   3

    a   2   3   =  

    0   . 5   

    a31   = 0.25

    a   3   2   =  

    0   . 2   5   

    a11 = 0.4

    a22 = 0.2

    a33 = 0.5

    Figure 2.1: A simple example of a Markov chain explaining the weather.  a ij, found onthe arcs going between the nodes, represents the probability of going from state s i   tostate sj .

    In the figure a simple model of the weather is set up, specified by the three states;

    sunny, cloudy and rainy. Given that the weather is rainy (state 3) on day 1 (t   = 1),what is the probability that the three following days will be sunny? Stated more formal,

    one have an observation sequence O  = {s3, s1, s1, s1} for  t  = 1, 2, 3, 4, and wish todetermine the probability of  O  given the model in figure 2.1. The probability is given

    by:

    P (O|Model) = P (s3, s1, s1, s1|Model) == P (s3) · P (s1|s3) · P (s1|s1) · P (s1|s1) == 1 · 0.25 · 0.4 · 0.4 = 0.04

    For a more formal description, the Markov chain is specified, as mentioned above,

    by:

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    32/95

    16 Chapter 2. Theoretical Framework  

    S  =

     {s1, s2,...,sN 

    }  a set of  N  states,

    A = {a11, a12,...,aNN }   a transition probability matrix A, where each a ijrepresents the probability of moving from state ito state j , with

    N j=1 aij = 1, ∀i,

    Π = {π1, π2,...,πN }   an initial probability distribution, where π i   indi-cates the probability of starting in state  i. Also,N 

    i=1 πi = 1.

    Instead of specifying  Π, one could use a special start node not associated withthe observations, and with outgoing probabilities a start,i   =  πi  as the probabilities of going from the start state to state   i, and  astart,start   =   ai,start   = 0, ∀i. The timeinstants associated with state changes are defined as as  t  = 1, 2,... and the actual stateat time t as q t.

    An important feature of the Markov chain is its assumptions about the probabilities.In a first-order Markov chain, the probability of a state only depends on the previous

    state, that is:

    P (q t|q t−1,...,q 1) = P (q t|q t−1)

    Markov chains, where the probability of moving between any two states are non-

    zero, are called fully-connected or ergodic. But this is not always the case; in for ex-

    ample a left-right (also called Bakis) Markov model there are no transitions going from

    a higher-numbered to a lower-numbered state. The way the trellis is set up depends on

    the given situation.

    The Markov Property Applied in Finance

    Stock prices are often assumed to follow a Markov process. This means that the present

    value is all that is needed for predicting the future, and that the past history and the

    taken path to today’s value is irrelevant. Considering that equity and currency are

    both financial assets, traded under more or less the same conditions, it would not seem

    farfetched assuming that currency prices also follows a Markov process.

    The Markov property of stock prices is consistent with the weak form of market

    efficiency, which states that the present price contains all information contained in

    historical prices. This implies that using technical analysis on historical data would

    not generate an above-average return, and the strongest argument for weak-form mar-

    ket efficiency is the competition on the market. Given the many investors following

    the market development, any opportunities rising would immediately be exploited andeliminated. But the discussion about market efficiency is heavily debated and the view

    presented here is seen somewhat from an academic viewpoint. [12]

    2.2.4 Extending the Markov Chain to a Hidden Markov Model

    A Markov chain is useful when one want to compute the probability of a particular

    sequence of events, all observable in the world. However, the events that are of interest

    might not be directly observable, and this is where HMM comes in handy.

    Extending the definition of Markov chains presented in the previous section gives:

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    33/95

    2.2. Hidden Markov Models 17

    S  =

     {s1, s2,...,sN 

    }  a set of  N  hidden states,

    Q = {q 1, q 2,...,q T }   a state sequence of length T  taking values from S ,O = {o1, o2,...,oT }   an   observation sequence  consisting of   T   obser-

    vations, taking values from the discrete alphabet

    V   = {v1, v2,...,vM },A = {a11, a12,...,aNN }   a  transition probability matrix A, where each a ij

    represents the probability of moving from state s ito state sj , with

    N j=1 aij , ∀i,

    B  =  bi(ot)   a sequence of observation likelihoods, also calledemission probabilities, expressing the probability

    of an observation ot  being generated from a statesi at time t,

    Π =

     {π1, π2,...,πN 

    }  an   initial probability distribution, where π i   indi-

    cates the probability of starting in state  s i. Also,N i=1 πi = 1.

    As before, the time instants associated with state changes are defined as as  t   =1, 2,...  and the actual state at time  t  as  q t. The notation λ  = {A,B, Π} is also intro-duced, which indicates the complete parameter set of the model.

    A first-order HMM makes two assumptions; first, as with the first-order Markov

    chain above, the probability of a state is only dependent on the previous state:

    P (q t|q t−1,...,q 1) = P (q t|q t−1)Second, the probability of an output observation  o t  is only dependent on the state

    that produced the observation, q t, and not on any other observations or states:

    P (ot|q t, q t−1,...,q 1, ot−1,...,o1) = P (ot|q t)To clarify what is meant by all this, a simple example based on one originally given

    by [13] is here presented. Imagine that you are a climatologist in the year 2799 and

    want to study how the weather was in 2007 in a certain region. Unfortunately you do

    not have any records for this particular region and time, but what you do have is the

    diary of a young man for 2007, that tells you how many ice creams he had every day.

    Stated more formal, you have an observation sequence, O = {o 1, . . . , o365}, whereeach observation assumes a value from the discrete alphabet  V    = {1, 2, 3}, i.e. thenumber of ice creams he had every day. Your task is to find the ”correct” hidden state

    sequence, Q  = {q 1, . . . , q  365}, with the possible states sunny (s1) and rainy (s2), thatcorresponds to the given observations. Say for example that you know that he had one

    ice cream at time t − 1, three ice creams at time  t  and two ice creams at time  t  + 1.The most probable hidden state sequence for these three days might for example be

    q t−1   =   s2,  q t   =   s1  and q t+1   =  s2, given the number of eaten ice creams. In otherwords, the most likely weather during these days would be rainy, sunny and rainy. An

    example of how this could look is presented in figure 2.2.

    2.2.5 Three Fundamental Problems

    The structure of the HMM should now be clear, which leads to the question: in what

    way can HMM be helpful? [17] suggests in his paper that HMM should be character-

    ized by three fundamental problems:

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    34/95

    18 Chapter 2. Theoretical Framework  

    s1

    s2

    s1

    s2

    s1

    s2

    ot−1 = 1

    ot−1 = 2

    ot−1 = 3

    ot  = 1

    ot  = 2

    ot  = 3

    ot+1 = 1

    ot+1 = 2

    ot+1 = 3

    bj(ot) = P (ot|q t  =  sj)

    aij(t) = P (q t+1 = sj |q t  = si)

    t

    Figure 2.2: A HMM example, where the most probable state path ( s 2, s1, s2) is out-lined. aij  states the probability of going from state  s i to state sj , and bj(ot) the proba-bility of a specific observation at time t, given state s j .

    Problem 1 - Computing likelihood:  Given the complete parameter set  λ  and an ob-servation sequence O, determine the likelihood P (O|λ).

    Problem 2 - Decoding:  Given the complete parameter set  λ  and an observation se-quence O, determine the best hidden sequence Q.

    Problem 3 - Learning:  Given an observation sequence  O  and the set of states in theHMM, learn the HMM λ.

    In the following three subsections these problems will be gone through thoroughly

    and how they can be solved. This to give the reader a clear view of the underlying cal-

    culation techniques that constitutes the base of the evaluated mathematical framework.

    It should also describe in what way the framework of HMMs can be used for parameter

    estimation in the standard case.

    Computing LikelihoodThis is an evaluation problem, which means that given a model and a sequence of ob-

    servations, what is the probability that the observations was generated by the model.

    This information can be very valuable when choosing between different models want-

    ing to know which one that best matches the observations.

    To find a solution to problem 1, one wish to calculate the probability of a given

    observation sequence, O  = {o1, o2,...,oT }, given the model λ  = {A,B, Π}. In otherwords one want to find P (O|λ). The most intuitive way of doing this is to enumerateevery possible state sequence of length T . One such state sequence is

    Q = {q 1, q 2, . . . , q  T }   (2.1)

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    35/95

    2.2. Hidden Markov Models 19

    where q 1  is the initial state. The probability of observation sequence O  given a statesequence such as 2.1 can be calculated as

    P (O|Q, λ) =T t=1

    P (ot|q t, λ)   (2.2)

    where the different observations are assumed to be independent. The property of inde-

    pendence makes it possible to calculate equation 2.2 as

    P (O|Q, λ) = bq1(o1) · bq2(o2) · . . . · bqT (oT ).   (2.3)

    The probability of such a sequence can be written as

    P (Q|λ) = πq1 · aq1q2aq2q3 · . . . · aqT −1qT  .   (2.4)

    The joint probability of  O  and  Q, the probability that O  an Q  occurs simultaneously,is simply the product of 2.3 and 2.4 as

    P (O, Q|λ) = P (O|Q, λ)P (Q|λ).   (2.5)

    Finally the probability of O given the model λ is calculated by summing the right handside of equation 2.5 over all possible state sequences Q

    P (O|λ) = q1,q2,...,qT  P (O|Q, λ)P (Q|λ) ==

    q1,q2,...,qT πq1bq1(o1)aq1q2bq2(o2) . . . aqT −1qT bqT (oT ).

    (2.6)

    Equation 2.6 says that one at the initial time  t  = 1 are in state q 1  with probabilityπq1 , and generate the observation  o1  with probability bq1(o1). As time ticks from t  tot + 1 (t  = 2)  one transform from state q 1   to q 2   with probability aq1q2 , and generateobservation o2 with probability bq2(o2) and so on until t  =  T .

    This procedure involves a total of  2T N T  calculations, which makes it unfeasible,even for small values of  N   and  T . As an example it takes 2 · 100 · 5 100 ≈   1072calculations for a model with 5 states and 100 observations. Therefor it is needed to find

    a more efficient way of calculating P (O|λ). Such a procedure exists and is called theForward-Backward Procedure.1 For initiation one need to define the forward variable

    as

    αt(i) = P (o1, o2, . . . , ot, q t =  si|λ).In other words, the probability of the partial observation sequence, o 1, o2, . . . , ot  untiltime t  and given state s

    i at time t. One can solve for α

    t(i) inductively as follows:

    1. Initialization:

    α1(i) = πibi(o1),   1 ≤ i ≤ N.   (2.7)

    2. Induction:

    αt+1( j) =N 

    j=1 αt(i)aij

    bj(ot+1),   1 ≤ t ≤ T  − 1,

    1 ≤ j ≤ N.(2.8)

    1The backward part of the calculation is not needed to solve Problem 1. It will be introduced whensolving Problem 3 later on.

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    36/95

    20 Chapter 2. Theoretical Framework  

    3. Termination:

    P (O|λ) =N i=1

    αT (i).   (2.9)

    Step 1 sets the forward probability to the joint probability of state  s j  and initial obser-vation o1. The second step, which is the heart of the forward calculation is illustratedin figure 2.3.

     aNj

    a 1 j  

    sj

    sN 

    s3

    s2

    s1

    αt(i)t

    αt+1( j)

    t + 1

    Figure 2.3: Illustration of the sequence of operations required for the computation of 

    the forward variable αt(i).

    One can see that state  sj  at time  t  + 1 can be reached from N   different states attime t. By summing the product over all possible states  s i, 1 ≤ i ≤ N  at time t  resultsin the probability of  s j  at time  t  + 1  with all previous observations in consideration.

    Once it is calculated for sj , it is easy to see that  αt+1( j) is obtained by accounting forobservation ot+1  in state  sj , in other words by multiplying the summed value by theprobability bj(ot+1). The computation of 2.8 is performed for all states s j , 1 ≤ j ≤ N ,for a given time t and iterated for all t  = 1, 2, . . . , T   − 1. Step 3 then gives P (O|λ) bysumming the terminal forward variables αT (i). This is the case because, by definition

    αT (i) = P (o1, o2, . . . , oT , q T   = si|λ)

    and therefore P (O|λ)   is just the sum of the  αT (i)’s. This method just needs N 2T which is much more efficient than the more traditional method. Instead of  10 72 cal-culations, a total of 2500 is enough, a saving of about 69 orders of magnitude. In the

    next two sections — one can see that this decrease of magnitude is essential because

    P (O

    |λ) serves as the denominator when estimating the central variables when solving

    the last two problems.In a similar manner, one can consider a backward variable β t(i) defined as follows:

    β t(i) = P (ot+1, ot+2, . . . , oT |q t =  si, λ)

    β t(i) is the probability of the partial observation sequence from  t + 1  to the last time,T , given the state si  at time t  and the HMM λ. By using induction,  β t(i) is found asfollows:

    1. Initialization:

    β T (i) = 1,   1 ≤ i ≤ N.

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    37/95

    2.2. Hidden Markov Models 21

    2. Induction:

    β t(i) = N 

    j=1 aijbj(ot+1)β t+1( j), t =  T  − 1, T  − 2, . . . , 1,1 ≤ i ≤ N.

    Step 1 defines β T (i) to be one for all si. Step 2, which is illustrated in figure 2.4, showsthat in order to to have been in state  s i  at time  t, and to account for the observationsequence from time t  + 1 and on, one have to consider all possible states  s j  at timet + 1, accounting for the transition from si to sj  as well as the observation ot+1 in statesj , and then account for the remaining partial observation sequence from state  s j .

    a i N  

     a i 1

    t   t + 1

    β t(i)   β t+1( j)

    si

    sN 

    s3

    s2

    s1

    Figure 2.4: Illustration of the sequence of operations required for the computation of 

    the backward variable β t(i).

    As mentioned before the backward variable is not used to find the probability

    P (O|λ). Later on it will be shown how the backward as well as the forward calcula-

    tion are used extensively to help one solve the second as well as the third fundamental

    problem of HMMs.

    Decoding

    In this the second problem one try to find the ”correct” hidden path, i.e. trying to

    uncover the hidden path. This is often used when one wants to learn about the structure

    of the model or to get optimal state sequences.

    There are several ways of finding the ”optimal” state sequence according to a given

    observation sequence. The difficulty lies in the definition of a optimal state sequence.

    One possible way is to find the states q t which are individually most likely. This criteriamaximizes the total number of correct states. To be able to implement this as a solution

    to the second problem one start by defining the variableγ t(i) = P (q t  =  si|O, λ)   (2.10)

    which gives the probability of being in state s i at time t given the observation sequence,O, and the model,  λ. Equation 2.10 can be expressed simply using the forward andbackward variables, αt(i) and β t(i) as follows:

    γ t(i) =  αt(i)β t(i)

    P (O|λ)   =  αt(i)β t(i)N i=1 αt(i)β t(i)

    .   (2.11)

    It is simple to see that γ t(i) is a true probability measure. This since αt(i) accountsfor the partial observation sequence  o1, o2, . . . , ot   and the state  si   at   t, while  β t(i)

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    38/95

    22 Chapter 2. Theoretical Framework  

    accounts for the remainder of the observation sequence o t+1, ot+2, . . . , oT  given state

    si   at time   t. The normalization factor  P (O|λ) = N i=1 αt(i)β t(i)  makes  γ t(i)   aprobability measure, which means thatN i=1

    γ t(i) = 1.

    One can now find the individually most likely state  q t  at time  t  by using γ t(i)  asfollows:

    q t =  argmax1≤i≤N [γ t(i)],   1 ≤ t ≤ T .   (2.12)Although equation 2.12 maximizes the expected number of correct states there

    could be some problems with the resulting state sequence. For example, when the

    HMM has state transitions which has zero probability the optimal state sequence may,

    in fact, not even be a valid state sequence. This is due to the fact that the solutionof 2.12 simply determines the most likely state at every instant, without regard to the

    probability of occurrence of sequences of states.

    To solve this problem one could modify the optimality criterion. For example by

    solving for the state sequence that maximizes the number of correct pairs of states

    (q t, q t+1) or triples of states (q t, q t+1, q t+2).The most widely used criterion however, is to find the single best state sequence,

    in other words to maximize P (Q|O, λ) which is equivalent to maximizing P (Q, O|λ).To find the optimal state sequence one often uses a method, based on dynamic pro-

    gramming, called the Viterbi Algorithm.

    To find the best state sequence Q   = {q 1, q 2, . . . , q  T } for a given observation se-quence O  = {o1, o2, . . . , oT }, one need to define the quantity

    δ t(i) = maxq1,q2,...,qt−1 P [q 1, q 2, . . . , q  t =  si, o1, o2, . . . , ot|λ]which means the highest probability along a single path, at time  t, which accounts forthe first t observations and ends in state si. By induction one have:

    δ t+1( j) = [maxi

    δ t(i)aij]bj(ot+1).   (2.13)

    To be able to retrieve the state sequence, one need to keep track of the argument

    which maximized 2.13, for each t and j. This is done via the array ψ t( j). The completeprocedure for finding the best state sequence can now be stated as follows:

    1. Initialization:δ 1(i) = πibi(o1),  1

     ≤ i

     ≤ N 

    ψ1(i) = 0.

    2. Recursion:

    δ t( j) = max1≤i≤N [δ t−1(i)aij]bj(ot),   2 ≤ t ≤ T 1 ≤ j ≤ N 

    ψt( j) = argmax1≤i≤N [δ t−1(i)aij],   2 ≤ t ≤ T 1 ≤ j ≤ N 

    (2.14)

    3. Termination:P ∗ = max1≤i≤N [δ T (i)]q ∗T   = argmax1≤i≤N [δ T (i)].

      (2.15)

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    39/95

    2.2. Hidden Markov Models 23

    4. Path (state sequence) backtracking:

    q ∗t   = ψt+1(q ∗t+1), t =  T  − 1, T  − 2, . . . , 1.

    It is important to note that the Viterbi algorithm is similar in implementation to

    the forward calculation in 2.7 - 2.9. The major difference is the maximization in 2.14

    over the previous states which is used in place of the summing procedure in 2.8. It

    should also be clear that a lattice structure efficiently implements the computation of 

    the Viterbi procedure.

    Learning

    The third, last, and at the same time most challenging problem with HMMs is to de-

    termine a method to adjust the model parameters  λ   =

     {A,B, Π

    }  to maximize the

    probability of the observation sequence given the model. There is no known analyticalsolution to this problem, but one can however choose  λ  such that P (O|λ)  is locallymaximized using an iterative process such as the Baum-Welch method, a type of Ex-

    pectation Maximization (EM) algorithm. Options to this is to use a form of Maximum

    A Posteriori (MAP) method, explained in [7], or setting up the problem as an opti-

    mization problem, which can be solved with for example gradient technique, which

    can yield solutions comparable to those of the aforementioned method. But the Baum-

    Welch is however the most commonly used procedure, and will therefore be the one

    utilized in this the first investigation of HMM on FX data.

    To be able to re-estimate the model parameters, using the Baum-Welch method,

    one should start with defining ξ t(i, j), the probability of being in state s i at time t, andstate sj  at time t  + 1, given the model and the observations sequence. In other words

    the variable can be defined as:

    ξ t(i, j) = P (q t  =  si, q t+1 =  sj|O, λ).

    The needed information for the variable ξ t(i, j) is shown in figure 2.5.

    si   sj

    aijbj(ot+1)

    t − 1   t   t + 1   t + 2αt(i)   β t+1( j)

    Figure 2.5: Illustration of the sequence of operations required for the computation of 

    the joint event that the system is in state  si at time t  and state sj  at time t + 1.

    From this figure one should be able to understand that  ξ t(i, j) can be written using

  • 8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

    40/95

    24 Chapter 2. Theoretical Framework  

    the forward and backward variables as follows:

    ξ t(i, j) =   αt(i)aijbj(ot+1)β t+1( j)P (O|λ)   =

    =  αt(i)aijbj(ot+1)β t+1( j)N i=1

    N j=1 αt(i)aijbj(ot+1)β t+1( j)

    which is a probability measure since the numerator is simply   P (q t   =   si, q t+1   =sj , O|λ) and denominator is P (O|λ).

    As described in 2.11, γ t(i) is the probability of being in state  si at time t, given theobservation sequence and the model. Therefore there is a close relationship between

    γ t(i) and ξ t(i, j). One can express γ t(i) as the sum of al