algo trading hidden markov models on foreign exchange data

8/20/2019 Algo Trading Hidden Markov Models on Foreign Exchange Data

1/95

Master’s Thesis

Algorithmic TradingHidden Markov Models on Foreign Exchange Data

Patrik Idvall, Conny Jonsson

LiTH - MAT - EX - - 08 / 01 - - SE


2/95


3/95

Algorithmic TradingHidden Markov Models on Foreign Exchange Data

Department of Mathematics, Linköpings Universitet


LiTH - MAT - EX - - 08 / 01 - - SE

Master’s Thesis: 30 hp

Level: A

Supervisor: J. Blomvall,

Department of Mathematics, Linköpings Universitet

Examiner: J. Blomvall,Department of Mathematics, Linköpings Universitet

Linköping: January 2008


4/95


5/95

Matematiska Institutionen

581 83 LINK ÖPING

SWEDEN

January 2008

x x

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-10719

LiTH - MAT - EX - - 08 / 01 - - SE

Algorithmic Trading – Hidden Markov Models on Foreign Exchange Data


In this master’s thesis, hidden Markov models (HMM) are evaluated as a tool for forecasting movements in a currencycross. With an ever increasing electronic market, making way for more automated trading, or so called algorithmictrading, there is constantly a need for new trading strategies trying to find alpha, the excess return, in the market.HMMs are based on the well-known theories of Markov chains, but where the states are assumed hidden, governingsome observable output. HMMs have mainly been used for speech recognition and communication systems, but havelately also been utilized on financial time series with encouraging results. Both discrete and continuous versions of the model will be tested, as well as single- and multivariate input data.In addition to the basic framework, two extensions are implemented in the belief that they will further improve theprediction capabilities of the HMM. The first is a Gaussian mixture model (GMM), where one for each state assign

a set of single Gaussians that are weighted together to replicate the density function of the stochastic process. Thisopens up for modeling non-normal distributions, which is often assumed for foreign exchange data. The second is anexponentially weighted expectation maximization (EWEM) algorithm, which takes time attenuation in consideration

when re-estimating the parameters of the model. This allows for keeping old trends in mind while more recentpatterns at the same time are given more attention.Empirical results shows that the HMM using continuous emission probabilities can, for some model settings,generate acceptable returns with Sharpe ratios well over one, whilst the discrete in general performs poorly. TheGMM therefore seems to be an highly needed complement to the HMM for functionality. The EWEM howeverdoes not improve results as one might have expected. Our general impression is that the predictor using HMMs that

we have developed and tested is too unstable to be taken in as a trading tool on foreign exchange data, with toomany factors influencing the results. More research and development is called for.

Algorithmic Trading, Exponentially Weighted Expectation Maximization Algorithm, Foreign

Exchange, Gaussian Mixture Models, Hidden Markov Models

Nyckelord

Keyword

Sammanfattning

Abstract

F örfattare

Author

Titel

Title

URL f ör elektronisk version

Serietitel och serienummer

Title of series, numbering

ISSN

ISRN

ISBNSpråk

Language

Svenska/Swedish

Engelska/English

Rapporttyp

Report category

Licentiatavhandling

Examensarbete

C-uppsats

D-uppsats

Övrig rapport

Avdelning, InstitutionDivision, Department

DatumDate


6/95

vi


7/95

Abstract

In this master’s thesis, hidden Markov models (HMM) are evaluated as a tool for fore-

casting movements in a currency cross. With an ever increasing electronic market,

making way for more automated trading, or so called algorithmic trading, there is con-

stantly a need for new trading strategies trying to find alpha, the excess return, in themarket.

HMMs are based on the well-known theories of Markov chains, but where the states

are assumed hidden, governingsome observable output. HMMs have mainly been used

for speech recognition and communication systems, but have lately also been utilized

on financial time series with encouraging results. Both discrete and continuous versions

of the model will be tested, as well as single- and multivariate input data.

In addition to the basic framework, two extensions are implemented in the be-

lief that they will further improve the prediction capabilities of the HMM. The first

is a Gaussian mixture model (GMM), where one for each state assign a set of single

Gaussians that are weighted together to replicate the density function of the stochas-

tic process. This opens up for modeling non-normal distributions, which is often as-

sumed for foreign exchange data. The second is an exponentially weighted expectationmaximization (EWEM) algorithm, which takes time attenuation in consideration when

re-estimating the parameters of the model. This allows for keeping old trends in mind

while more recent patterns at the same time are given more attention.

Empirical results shows that the HMM using continuous emission probabilities can,

for some model settings, generate acceptable returns with Sharpe ratios well over one,

whilst the discrete in general performs poorly. The GMM therefore seems to be an

highly needed complement to the HMM for functionality. The EWEM however does

not improve results as one might have expected. Our general impression is that the

predictor using HMMs that we have developed and tested is too unstable to be taken

in as a trading tool on foreign exchange data, with too many factors influencing the

results. More research and development is called for.

Keywords: Algorithmic Trading, Exponentially Weighted Expectation MaximizationAlgorithm, Foreign Exchange, Gaussian Mixture Models, Hidden Markov Mod-

els

Idvall, Jonsson, 2008. vii


8/95

viii


9/95

Acknowledgements

Writing this master’s thesis in cooperation with Nordea Markets has been a truly re-

warding and exciting experience. During the thesis we have experienced great support

and interest from many different individuals.

First of all we would like to thank Per Brugge, Head of Marketing & Global Busi-ness Development at Nordea, who initiated the possibility of this master’s thesis and

gave us the privilege to carry it out at Nordea Markets in Copenhagen. We thank him

for his time and effort! Erik Alpkvist at e-Markets, Algorithmic Trading, Nordea Mar-

kets in Copenhagen who has been our supervisor at Nordea throughout this project; it

has been a true pleasure to work with him, welcoming us so warmly and for believing

in what we could achieve, for inspiration and ideas!

Jörgen Blomvall at Linköpings Universitet, Department of Mathematics has also

been very supportive during the writing of this thesis. It has been a great opportunity

to work with him! We would also like to thank our colleagues and opponents Viktor

Bremer and Anders Samuelsson, for helpful comments during the process of putting

this master’s thesis together.

Patrik Idvall & Conny Jonsson

Copenhagen, January 2008

Idvall, Jonsson, 2008. ix


10/95

x


11/95

Nomenclature

Most of the reoccurring symbols and abbreviations are described here.

SymbolsS = {s1, s2,...,sN } a set of N hidden states,Q = {q 1, q 2,...,q T } a state sequence of length T taking values from S ,O = {o1, o2,...,oT } a sequence consisting of T observations,A = {a11, a12,...,aNN } the transition probability matrix A,B = bi(ot) a sequence of observation likelihoods,Π = {π1, π2,...,πN } the initial probability distribution,λ = {A,B, Π} the complete parameter set of the HMM,αt(i) the joint probability of {o1, o2, . . . , ot} and q t =

si given λ,β t(i) the joint probability of {ot+1, ot+2, . . . , oT } and

q t = si given λ,γ t the probability of q t = si given O and λ,µim the mean for state si and mixture component m,Σim the covariance matrix for state si and mixture

component m,ρ = {ρ1, ρ2, . . . , ρt} a vector of real valued weights for the Exponen-

tially Weighted Expectation Maximization,

wim the weights for the mth mixture component instate si,

ξ t(i, j) the joint probability of q t = si and q t+1 = sjgiven O and λ,

δ t(i) the highest joint probability of a state sequenceending in q t = si and a partial observation se-

quence ending in ot given λ,ψt j the state si at time t which gives us δ t( j), used for

backtracking.

Idvall, Jonsson, 2008. xi


12/95

xii

Abbreviations

AT Algorithmic Trading

CAPM Capital Asset Pricing Model

EM Expectation Maximization

ERM Exchange Rate Mechanism

EWEM Exponentially Weighted Expectation Maximization

EWMA Exponentially Weighted Moving Average

FX Foreign Exchange

GMM Gaussian Mixture Model

HMM Hidden Markov Model

LIBOR London Interbank Offered Rate

MC Monte Carlo

MDD Maximum DrawdownOTC Over the Counter

PPP Purchasing Power Parity

VaR Value at Risk


13/95

Contents

1 Introduction 1

1.1 The Foreign Exchange Market . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Market Structure . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Shifting to Electronic Markets . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Changes in the Foreign Exchange Market . . . . . . . . . . . 4

1.3 Algorithmic Trading . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.1 Different Levels of Automation . . . . . . . . . . . . . . . . 5

1.3.2 Market Microstructure . . . . . . . . . . . . . . . . . . . . . 6

1.3.3 Development of Algorithmic Trading . . . . . . . . . . . . . 7

1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4.2 Purpose Decomposition . . . . . . . . . . . . . . . . . . . . 8

1.4.3 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4.4 Academic Contribution . . . . . . . . . . . . . . . . . . . . . 91.4.5 Disposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Theoretical Framework 11

2.1 Foreign Exchange Indices . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.1 Alphas and Betas . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.1 Hidden Markov Models used in Finance . . . . . . . . . . . . 13

2.2.2 Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.3 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.4 Extending the Markov Chain to a Hidden Markov Model . . . 16

2.2.5 Three Fundamental Problems . . . . . . . . . . . . . . . . . 17

2.2.6 Multivariate Data and Continuous Emission Probabilities . . . 26

2.3 Gaussian Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.1 Possibilities to More Advanced Trading Strategies . . . . . . 28

2.3.2 The Expectation Maximization Algorithm on Gaussian Mixtures 28

2.4 The Exponentially Weighted Expectation Maximization Algorithm . . 30

2.4.1 The Expectation Maximization Algorithm Revisited . . . . . 30

2.4.2 Updating the Algorithm . . . . . . . . . . . . . . . . . . . . 31

2.4.3 Choosing η . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.5 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . 34

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Idvall, Jonsson, 2008. xiii


14/95

xiv Contents

3 Applying Hidden Markov Models on Foreign Exchange Data 373.1 Used Input Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2 Number of States and Mixture Components and Time Window Lengths 38

3.3 Discretizing Continuous Data . . . . . . . . . . . . . . . . . . . . . . 38

3.4 Initial Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . 39

3.5 Generating Different Trading Signals . . . . . . . . . . . . . . . . . . 40

3.5.1 Trading Signal in the Discrete Case . . . . . . . . . . . . . . 40

3.5.2 Standard Signal in the Continuous Case . . . . . . . . . . . . 41

3.5.3 Monte Carlo Simulation Signal in the Continuous Case . . . . 42

3.6 Variable Constraints and Modifications . . . . . . . . . . . . . . . . . 42

3.7 An Iterative Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.8 Evaluating the Model . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.8.1 Statistical Testing . . . . . . . . . . . . . . . . . . . . . . . . 43

3.8.2 Sharpe Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.8.3 Value at Risk . . . . . . . . . . . . . . . . . . . . . . . . . . 453.8.4 Maximum Drawdown . . . . . . . . . . . . . . . . . . . . . 45

3.8.5 A Comparative Beta Index . . . . . . . . . . . . . . . . . . . 46

4 Results 494.1 The Discrete Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.1.1 Using Only the Currency Cross as Input Data . . . . . . . . . 50

4.1.2 Adding Features to the Discrete Model . . . . . . . . . . . . 50

4.2 The Continuous Model . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.2.1 Prediction Using a Weighted Mean of Gaussian Mixtures . . . 54

4.2.2 Using Monte Carlo Simulation to Project the Distribution . . 55

4.3 Including the Spread . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.3.1 Filtering Trades . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.4 Log-likelihoods, Random Numbers and Convergence . . . . . . . . . 63

5 Analysis 675.1 Effects of Different Time Windows . . . . . . . . . . . . . . . . . . . 67

5.2 Hidden States and Gaussian Mixture Components . . . . . . . . . . . 68

5.3 Using Features as a Support . . . . . . . . . . . . . . . . . . . . . . 68

5.4 The Use of Different Trading Signals . . . . . . . . . . . . . . . . . . 69

5.5 Optimal Circumstances . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.6 State Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.7 Risk Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.8 Log-likelihood Sequence Convergence . . . . . . . . . . . . . . . . . 71

6 Conclusions 736.1 Too Many Factors Brings Instability . . . . . . . . . . . . . . . . . . 73

6.2 Further Research Areas . . . . . . . . . . . . . . . . . . . . . . . . . 74


15/95

List of Figures

1.1 Daily turnover on the global FX market 1989-2007. . . . . . . . . . . 2

1.2 Schematic picture over the process for trading . . . . . . . . . . . . . 5

2.1 An example of a simple Markov chain . . . . . . . . . . . . . . . . . 15

2.2 An example of a hidden Markov model . . . . . . . . . . . . . . . . 18

2.3 The forward algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4 The backward algorithm . . . . . . . . . . . . . . . . . . . . . . . . 21

2.5 Determining ξ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.6 An example of a Gaussian mixture . . . . . . . . . . . . . . . . . . . 27

2.7 A hidden Markov model making use of a Gaussian mixture . . . . . . 29

2.8 The effect of η . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.9 (a) Monte Carlo simulation using 1000 simulations over 100 time pe-

riods. (b) Histogram over the final value for the 1000 simulations. . . 34

3.1 Gaussian mixture together with a threshold d . . . . . . . . . . . . . 413.2 Maximun drawdown for EURUSD . . . . . . . . . . . . . . . . . . . 46

3.3 Comparative beta index . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1 Development of the EURUSD exchange rate . . . . . . . . . . . . . . 50

4.2 Discrete model with a 30 days window . . . . . . . . . . . . . . . . . 51

4.3 Discrete model with a 65 days window . . . . . . . . . . . . . . . . . 51

4.4 Discrete model with a 100 days window . . . . . . . . . . . . . . . . 52

4.5 Discrete model with threshold of 0.4 . . . . . . . . . . . . . . . . . . 52

4.6 Discrete model with six additional features . . . . . . . . . . . . . . . 53

4.7 Discrete model with three additional features . . . . . . . . . . . . . 53

4.8 Continuous model, using only the cross, and a 20 days window . . . . 55

4.9 Continuous model, using only the cross, and a 30 days window . . . . 56

4.10 Continuous model, using additional features, and a 50 days window . 564.11 Continuous model, using additional features, and a 75 days window . 57

4.12 Continuous model, using additional features, and a 100 days window . 57

4.13 Continuous model, using additional features, and a 125 days window . 58

4.14 Continuous model with a 75 days EWEM . . . . . . . . . . . . . . . 58

4.15 Continuous model with the K-means clustering . . . . . . . . . . . . 59

4.16 Continuous model with the MC simulation (50%) trading signal . . . 59

4.17 Continuous model with the MC simulation (55%) trading signal . . . 60

4.18 Best discrete model with the spread included . . . . . . . . . . . . . . 61

4.19 Best continuous model with the spread included . . . . . . . . . . . . 61

4.20 Continuous model with standard trade filtering . . . . . . . . . . . . . 62

Idvall, Jonsson, 2008. xv


16/95

xvi List of Figures

4.21 Continuous model with MC simulation (55%) trade filtering . . . . . 63

4.22 (a) Log-likelihood sequence for the first 20 days for the discrete model

(b) Log-likelihood sequence for days 870 to 1070 for the discrete model 64

4.23 (a) Log-likelihood sequence for days 990 to 1010 for the continuous

model (b) Log-likelihood sequence for the last 20 days for the contin-

uous model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64


17/95

Chapter 1

Introduction

In this first chapter a description of the background to this master’s thesis will be given.

It will focus on the foreign exchange market on the basis of its structure, participants

and recent trends. The objective will be pointed out, considering details such as the

purpose of the thesis and the delimitations that have been considered throughout the

investigation. In the end of the objectives the disposal will be gone through, just to

give the reader a clearer view of the thesis’ different parts and its mutual order.

1.1 The Foreign Exchange Market

The Foreign Exchange (FX) market is considered the largest and most liquid 1 of all

financial markets with a $3.2 trillion daily turnover. Between 2004 and 2007 the dailyturnover has increased with as much as 71 percent. [19]

The increasing growth in turnover over the last couple of years, seen in figure 1.1

seems to be led by two related factors. First of all, the presence of trends and higher

volatility in FX markets between 2001 and 2004, led to an increase of momentum trad-

ing, where investors took large positions in currencies that followed appreciating trends

and short positions in decreasing currencies. These trends also induced an increase in

hedging activity, which further supported trading volumes. [20]

Second, interest differentials encouraged so called carry trading, i.e. investments in

high interest rate currencies financed by short positions in low interest rate currencies, if

the target currencies, like the Australian dollar, tended to appreciate against the funding

currencies, like the US dollar. Such strategies fed back into prices and supported the

persistence of trends in exchange rates. In addition, in the context of a global searchfor yield, so called real money managers2 and leveraged3 investors became increasingly

interested in foreign exchange as an asset class alternative to equity and fixed income.

As one can see in figure 1.1, the trend is also consistent from 2004 and forward,

with more and more money put into the global FX market. The number of participants

1The degree to which an asset or security can be bought or sold in the market without affecting the asset’s

price.2Real money managers are market players, such as pension funds, insurance companies and corporate

treasurers, who invest their own funds. This distinguishes them from leveraged investors, such as for examplehedge funds, that borrow substantial amounts of money.

3Leverage is the use of various financial instruments or borrowed capital, such as margin, to increase thepotential return of an investment.

Idvall, Jonsson, 2008. 1


18/95

2 Chapter 1. Introduction

1989 1992 1995 1998 2001 2004 20070

500

1000

1500

2000

2500

3000

3500

Year

D a i l y t u r n o v e r i n m i l l i o n U S D

Spot transactions

Outright transactions

Foreign exchange swaps

Estimated gaps in reporting

Figure 1.1: Daily return on the global FX market 1989-2007

and the share of the participants portfolio towards FX are continuously increasing,

comparing to other asset classes.

1.1.1 Market Structure

The FX market is unlike the stock market an over the counter (OTC) market. Thereis no single physical located place were trades between different players are settled,

meaning that all participants do not have access to the same price. Instead the markets

core is built up by a number of different banks. That is why it sometimes is called

an inter-bank market. The market is opened 24 hours a day and moves according to

activity in large exporting and importing countries as well as in countries with highly

developed financial sectors. [19]

The participants of the FX market can roughly be divided into the following five

groups, characterized by different levels of access:

• Central Banks• Commercial Banks• Non-bank Financial Entities• Commercial Companies• Retail Traders

Central banks have a significant influence in FX markets by virtue of their role in

controlling their countries’ money supply, inflation, and/or interest rates. They may

also have to satisfy official/unofficial target rates for their currencies. While they may

have substantial foreign exchange reserves that they can sell in order to support their

own currency, highly overt intervention, or the stated threat of it, has become less

common in recent years. While such intervention can indeed have the desired effect,


19/95

1.2. Shifting to Electronic Markets 3

there have been several high profile instances where it has failed spectacularly, such as

Sterling’s exit from the Exchange Rate Mechanism (ERM) in 1992.

Through their responsibility for money supply, central banks obviously have a con-

siderable influence on both commercial and investment banks. An anti-inflationary

regime that restricts credit or makes it expensive has an effect upon the nature and level

of economic activity, e.g. export expansion, that can feed through into changes in FX

market activity and behaviour.

At the next level we have commercial banks. This level constitute the inter-bank

section of the FX market and consist of participants such as Deutsche Bank, UBS AG,

City Group and many others including Swedish banks such as Nordea, SEB, Swedbank

and Handelsbanken.

Within the inter-bank market, spreads, which are the difference between the bid

and ask prices, are sharp and usually close to non-existent. These counterparts act as

market makers toward customers demanding the ability to trade currencies, meaning

that they determine the market price.As you descend the levels of access, from commercial banks to retail traders, the

difference between the bid and ask prices widens, which is a consequence of volume.

If a trader can guarantee large numbers of transactions for large amounts, they can

demand a smaller difference between the bid and ask price, which is referred to as a

better spread. This also implies that the spread is wider for currencies with less frequent

transactions. The spread has an important role in the FX market, more important than

in the stock market, because it is equivalent to the transaction cost. [22]

When speculating in the currency market you speculate in currency pairs, which

describes the relative price of one currency relative to another. If you believe that

currency A will strengthen against currency B you will go long in currency pair A/B.The largest currencies is the global FX market is US Dollar (USD), Euro (EUR) and

Japanese Yen (JPY). USD stands for as much as 86 percent of all transactions, followedby EUR (37 percent) and JPY (17 percent). 4 [19]

1.2 Shifting to Electronic Markets

Since the era of floating exchange rates began in the early 1970s, technical trading

has become widespread in the stock market as well as the FX markets. The trend in

the financial markets industry in general is the increased automation of the trading, so

called algorithmic trading (AT), at electronic exchanges. [2]

Trading financial instruments has historically required face-to-face contact between

the market participants. The Nasdaq OTC market was one of the first to switch from

physical interaction to technological solutions, and today many other market placeshas also replaced the old systems. Some markets, like the New York Stock Exchange

(NYSE), still uses physical trading but has at the same time opened up some functions

to electronic trading. It is of course innovations in computing and communications that

has made this shift possible, with global electronic order routing, broad dissemination

of quote and trade information, and new types of trading systems. New technology also

reduces the costs of building new trading systems, hence lowering the entry barriers for

new entrants. Finally, electronic markets also enables for a faster order routing and data

transmission to a much larger group of investors than before. The growth in electronic

4Because two currencies are involved in each transaction, the sum of the percentage shares of individualcurrencies totals 200 percent instead of 100 percent.


20/95


trade execution has been explosive on an international basis, and new stock markets are

almost exclusively electronic. [11]

1.2.1 Changes in the Foreign Exchange Market

The FX market differs somewhat from stock markets. These have traditionally been

dealer markets that operate over the telephone, and physical locations where trading

takes place has been non-existing. In such a market, trade transparency is low. But

for the last years, more and more of the FX markets has moved over to electronic

trading. Reuters and Electronic Broking Service (EBS) developed two major electronic

systems for providing quotes, which later on turned into full trading platforms, allowing

also for trade execution. In 1998, electronic trading accounted for 50 percent of all

FX trading, and it has continued rising ever since. Most of the inter-dealer trading

nowadays takes place on electronic markets, while trading between large corporations

and dealers still remain mostly telephone based. [11] Electronic trading platforms forcompanies have however been developed, like Nordea e-Markets, allowing companies

to place and execute trades without interaction from a dealer.

1.3 Algorithmic Trading

During the last years, the trend towards electronic markets and automated trading has,

as mentioned in section 1.2, been significant. As a part of this many financial firms has

inherited trading via so called algorithms5, to standardize and automate their trading

strategies in some sense. As a central theme in this thesis AT deserves further clari-

fication and explanation. In this section different views of AT is given, to present a

definition of the term.

In general AT can be described as trading, with some elements being executed

by an algorithm. The participation of algorithms enables automated employment of

predefined trading strategies. Trading strategies are automated by defining a sequence

of instructions executed by a computer, with little or no human intervention. AT is the

common denominator for different trends in the area of electronic trading that result in

increased automation in:

1. Identifying investment opportunities (what to trade).

2. Executing orders for a variety of asset classes (when, how and where to trade).

This includes a broad variety of solutions employed by traders in different markets,

trading different assets. The common denominator for the most of these solutions is

the process from using data series for pre-trade analysis to final trade execution. Theexecution can be made by the computer itself or via a human trader. In figure 1.2 one

can see a schematic figure over this process. This figure is not limited to traders using

AT; rather it is a general description of some of the main elements of trading. The four

steps presented in figure 1.2 defines the main elements of trading. How many of these

steps that are performed by a computer is different between different traders and is a

measure of how automated the process is. The first step includes analysis of market data

as well as adequate external news. The analysis is often supported by computer tools

such as spreadsheets or charts. The analysis is a very important step, which will end up

5A step-by-step problem-solving procedure, especially an established, recursive computational procedurefor solving a problem in a finite number of steps. [21]


21/95

1.3. Algorithmic Trading 5

Figure 1.2: Schematic picture over the process for trading

in a trading signal and a trading decision in line with the overlaying trading strategy.

The last step of the process is the real execution of the trading decision. This can be

made automatically by a computer or by a human trader. The trade execution contains

an order, sent to the foreign exchange market and the response as a confirmation from

the same exchange. [3]

This simplified description might not hold for any specific trader taking into ac-count the many different kinds of players that exists on a financial market. However,

it shows that the trading process can be divided into different steps that follow sequen-

tially. If one are now to suggest how trading is automated, there is little difficulty in

discuss the different steps being separately programmed into algorithms and executed

by a computer. Of course this is not a trivial task, especially as the complex consid-

erations previously made by humans have to be translated into algorithms executed by

machines. One other obstacle that must be dealt with in order to achieve totally auto-

mated trading is how to connect the different steps into a functional process and how to

interface this with the external environment, in other words the exchange and the mar-

ket data feed. In the next section different levels of AT will be presented. The levels

are characterized by the number of stages, presented in figure 1.2, that are replaced by

algorithms, performed by a machine.

1.3.1 Different Levels of Automation

A broad variety of definitions of AT is used, dependent on who is asked. The differ-

ences is often linked to the level of technological skill that the questioned individual

possess. Based on which steps that are automated and the implied level of human

intervention in the trading process, different categories can be distinguished. It is im-

portant to notice that the differences do not only consist of the number of steps auto-

mated. There can also be differences between sophistication and performance within

the steps. This will not be taken in consideration here though, focusing on the level of


22/95


automation.

Four different ways of defining AT are to be described, influenced by [3]. The first

category, here named AT 1 presuppose that the first two steps are fully automated. Theysomewhat goes hand in hand because the pre-trade analysis often leads to a trading

signal in one way or another. This means that the human intervention is limited to the

two last tasks, namely the trading decision and the execution of the trade.

The second category, AT 2 is characterized by an automation of the last step in thetrading process, namely the execution. The aim of execution algorithms is often to

divide large trading volumes into smaller orders and thereby minimizing the adverse

price impact a large order otherwise might suffer. It should be mentioned that different

types of execution algorithms are often supplied by third party software, and also as a

service to the buy-side investor of a brokerage. Using execution algorithms leaves the

first three steps, analysis, trading signal and trading decision to the human trader.

If one combines the first two categories a third variant of AT is developed, namely

AT 3. AT 3 is just leaving the trading decision to the human trader, i.e. letting algorithmstaking care of step 1, 2 and 4.

Finally, fully automated AT, AT 4, often referred to as black-box-trading, is obtainedif all four steps is replaced by machines performing according to algorithmically set

decisions. This means that the human intervention is only control and supervision,

programming and parameterizations. To be able to use systems, such as AT 3 andAT 4, great skills are required, both when it comes to IT solutions and algorithmicdevelopment.

Independent on what level of automation that is intended, one important issue is

to be considered, especially if regarding high-frequency trading, namely the markets

microstructure. The microstructure contains the markets characteristics when dealing

with price, information, transaction costs, market design and other necessary features.

In the next section a brief overview of this topic will be gone through, just to give thereader a feeling of its importance when implementing any level of algorithmic trading.

1.3.2 Market Microstructure

Market microstructure is the branch of financial economics that investigates trading and

the organization of markets. [8] It is of great importance when trading is carried out on

a daily basis or on an even higher frequency, where micro-based models, in contrary

to macro-based, can account for a large part of variations in daily prices on financial

assets. [4] And this is why the theory on market microstructure is also essential when

understanding AT, taking advantage of for example swift changes in the market. This

will not be gone through in-depth here, as it lies out of the this master’s thesis’ scope;

the aim is rather to give a brief overview of the topic. Market microstructure mainly

deals with four issues, which will be gone through in the following sections [5].

Price formation and price discovery

This factor focuses on the process by which the price for an asset is determined, and

it is based on the demand and supply conditions for a given asset. Investors all have

different views on the future prices of the asset, which makes them trade it at differ-

ent prices, and price discovery is simply when these prices match and a trade takes

place. Different ways of carrying out this match is through auctioning or negotiation.

Quote-driven markets, as opposed to order-driven markets where price discovery takes

place as just described, is where investors trade on the quoted prices set by the markets


23/95

1.3. Algorithmic Trading 7

makers, making the price discovery happen quicker and thus the market more price

efficient.

Transaction cost and timing cost

When an investor trades in the market, he or she faces two different kinds of costs:

implicit and explicit. The latter are those easily identified, e.g. brokerage fees and/or

taxes. Implicit however are described as hard to identify and measure. Market impact

costs relates to the price change due to large trades in a short time and timing costs to

the price change that can occur between decision and execution. Since the competition

between brokers has led to significant reduction of the explicit costs, enhancing the

returns is mainly a question of reducing the implicit. Trading in liquid markets and

ensuring fast executions are ways of coping with this.

Market structure and design

This factor focuses on the relationship between price determination and trading rules.

These two factors have a large impact on price discovery, liquidity and trading costs,

and refers to attributes of a market defined in terms of trading rules, which amongst oth-

ers include degree of continuity, transparency, price discovery, automation, protocols

and off-markets trading.

Information and disclosure

This factor focuses on the market information and the impact of the information on

the behavior of the market participants. A well-informed trader is more likely to avoid

the risks related to the trading, than one less informed. Although the theory of marketefficiency states that the market is anonymous and that all participants are equally in-

formed, this is seldom the case which give some traders an advantage. When talking

about market information, we here refer to information that has a direct impact on the

market value of an asset.

1.3.3 Development of Algorithmic Trading

As the share of AT increases in a specific market, it provides positive feedback for

further participants to automate their trading. Since algorithmic solutions benefits from

faster and more detailed data, the operators in the market have started to offer these

services on request from algorithmic traders. This new area of business makes the

existing traders better of if they can deal with the increasing load of information. Theresult is that the non-automated competition will loose out on the algorithms even more

than before. For this reason it is not bold to predict that as the profitability shifts in favor

of AT, more trading will also shift in that direction. [3]

With a higher number of algorithmic traders in the market, there will be an increas-

ing competition amongst them. This will most certain lead to decreasing margins, tech-

nical optimization and business development. Business development contains, amongst

other things, innovation as firms using AT search for new strategies, conceptually dif-

ferent from existing ones. Active firms on the financial market put a great deal of effort

into product development with the aim to find algorithms, capturing excess return by

unveiling the present trends in the FX market.


24/95


1.4 Objectives

As the title of this master’s thesis might give away, an investigation of the use of hidden

Markov models (HMM) as an AT tool for investments on the FX market will be carried

out. HMMs is one among many other frameworks such as artificial neural networks

and support vector machines that could constitute a base of a successful algorithm.

The framework chosen to be investigated was given in advance from Algorithmic

Trading at Nordea Markets in Copenhagen. Because of this initiation, other frame-

works, as the two listed above, will not be investigated or compared to the HMMs

throughout this study. Nordea is interested in how HMMs can be used as a base for

developing an algorithm capturing excess return within the FX market, and from that

our purpose do emerge.

1.4.1 Purpose

To evaluate the use of hidden Markov models as a tool for algorithmic trading on

foreign exchange data.

1.4.2 Purpose Decomposition

The problem addressed in this study is, as we mentioned earlier, based on the insight

that Nordea Markets is looking to increase its knowledge about tools for AT. This is

necessary in order to be able to create new trading strategies for FX in the future. So,

to be able to give a recommendation, the question to be answered is: should Nordea

use hidden Markov models as a strategy for algorithmic trading on foreign exchange

data?To find an answer to this one have to come up with a way to see if the framework

of HMMs can be used to generate a rate of return that under risk adjusted manners

exceeds the overall market return. To do so one have to investigate the framework of

HMMs in great detail to see how it could be applied to FX data. One also have to find

a measure of market return to see if the algorithm is able to create higher return using

HMMs. Therefore the main tasks is the following:

1. Put together an index which reflects the market return.

2. Try to find an algorithm using HMMs that exceeds the return given by the created

index.

3. Examine if the algorithm is stable enough to be used as a tool for algorithmic

trading on foreign exchange data.

If these three steps are gone through in a stringent manner one can see our purpose

as fulfilled. The return, addressed in step one and two, is not only the rate of return

itself. It will be calculated together with the risk of each strategy as the return-to-risk

ratio, also called Sharpe ratio described in detail in 3.8.2. The created index will be

compared with well known market indices to validate its role as a comparable index

for market return. To evaluate the models stability back-testing will be used. The

models performance will be simulated using historical data for a fixed time period.

The chosen data and time period is described in section 3.1.


25/95

1.4. Objectives 9

1.4.3 Delimitations

The focus has during the investigation been set to a single currency cross, namely theEURUSD. For the chosen cross we have used one set of features given to us from

Nordea Quantitative Research as supportive time series. The back testing period for

the created models, the comparative index as well as the HMM, has been limited to

the period for which adequate time series has been given, for both the chosen currency

cross and the given features. Finally the trading strategy will be implemented to an

extent containing step one and two in figure 1.2. It will not deal with the final trading

decision or the execution of the trades.

1.4.4 Academic Contribution

This thesis is written from a financial engineering point of view, combining areas such

as computer science, mathematical science and finance. The main academic contribu-tions of the thesis are the following:

• The master’s thesis is one of few applications using hidden Markov models ontime dependent sequences of data, such as financial time series.

• It is the first publicly available application of hidden Markov models on foreignexchange data.

1.4.5 Disposal

The rest of this master’s thesis will be organized as follows. In chapter 2, the theoretical

framework will be reviewed in detail to give the reader a clear view of the theories used

when evaluating HMM as a tool for AT. This chapter contains information about differ-ent trading strategies used on FX today as well as the theory of hidden Markov models,

Gaussian mixture models (GMM), an exponentially weighted expectation maximiza-

tion (EWEM) algorithm and Monte Carlo (MC) simulation.

In chapter 3 the developed model is described in detail to clear out how the theories

has been used in practice to create algorithms based on the framework of HMMs. The

created market index will also be reviewed, described shortly to give a comparable

measure of market return.

Chapter 4 contains the results of the tests made for the developed models. Trajecto-

ries will be presented and commented using different features and settings to see how

the parameters affect the overall performance of the model. The models performance

will be tested and commented using statistical tests and well known measurements.

The analysis of the results are carried out in chapter 5, on the base of different

findings made throughout the tests presented in chapter 4. The purpose of the analysisis to clear out the underlying background to the results one can see, both positive and

negative.

Finally, chapter 6 concludes our evaluation of the different models and the frame-

work of HMMs as a tool for AT on FX. Its purpose is to go through the three steps

addressed in section 1.4.2 to finally answer the purpose of the master’s thesis. This

chapter will also point out the most adequate development needed to improve the mod-

els further.


26/95



27/95

Chapter 2

Theoretical Framework

This chapter will give the reader the theoretical basis needed to understand HMMs,

i.e. the model to be evaluated in this master’s thesis. Different variants, like the one

making use of GMMs, and improvements to the HMM, that for example takes time

attenuation in consideration, will also be presented. The chapter however starts with

some background theories regarding market indices and FX benchmarks, which will

constitute the theoretical ground for the comparative index.

2.1 Foreign Exchange Indices

Throughout the years many different trading strategies have been developed to cap-

ture return from the FX market. As one finds in any asset class, the foreign exchange

world contains a broad variety of distinct styles and trading strategies. For other as-

set classes it is easy to find consistent benchmarks, such as indices like Standard and

Poor’s 500 (S&P 500) for equity, Lehmans Global Aggregate Index (LGAI) for bonds

and Goldman Sachs Commodity Index (GSCI) for commodities. It is harder to find a

comparable index for currencies.

When viewed as a set of trading rules, the accepted benchmarks of other asset

classes indicate a level of subjectivity that would not otherwise be apparent. In fact,

they really reflect a set of transparent trading rules of a given market. By being widely

followed, they become benchmarks. By looking at benchmarks from this perspective

there is no reason why there should not exist an applicable benchmark for currencies.

[6]

The basic criteria for establishing a currency benchmark is to find approaches that

are widely known and followed to capture currency return on the global FX market.In march 2007 Deutsche Bank unveiled their new currency benchmark, The Deutsche

Bank Currency Return (DBCR) Index. Their index contains a mixture of three strate-

gies, namely Carry, Momentum and Valuation. These are commonly accepted indices

that also other large banks around the world make use of.

Carry is a strategy in which an investor sells a certain currency with a relatively

low interest rate and uses the funds to purchase a different currency yielding a higher

interest rate. A trader using this strategy attempts to capture the difference between the

rates, which can often be substantial, depending on the amount of leverage the investor

chooses to use.

To explain this strategy in more detail an example of a ”JPY carry trade” is here

Idvall, Jonsson, 2008. 11


28/95

12 Chapter 2. Theoretical Framework

presented. Lets say a trader borrows 1 000 000 JPY from a Japanese bank, converts

the funds into USD and buys a bond for the equivalent amount. Lets also assume that

the bond pays 5.0 percent and the Japanese interest rate is set to 1.5 percent. The

trader stands to make a profit of 3.5 percent (5.0 - 1.5 percent), as long as the exchange

rate between the countries does not change. Many professional traders use this trade

because the gains can become very large when leverage is taken into consideration. If

the trader in the example uses a common leverage factor of 10:1, then he can stand to

make a profit of 35 percent.

The big risk in a carry trade is the uncertainty of exchange rates. Using the example

above, if the USD were to fall in value relative to the JPY, then the trader would run

the risk of losing money. Also, these transactions are generally done with a lot of

leverage, so a small movement in exchange rates can result in big losses unless hedged

appropriately. Therefore it is important also to consider the expected movements in the

currency as well as the interest rate for the selected currencies.

Another commonly used trading strategy is Momentum, which is based on the ap-pearance of trends in the currency markets. Currencies appear to trend over time, which

suggests that using past prices may be informative to investing in currencies. This is

due to the existence of irrational traders, the possibility that prices provide information

about non-fundamental currency determinants or that prices may adjust slowly to new

information. To see if a currency has a positive or negative trend one have to calculate a

moving average for a specific historical time frame. If the currency has a higher return

during the most recent moving average it said to have a positive trend and vice versa.

[6]

The last trading strategy, described by Deutsche Bank, is Valuation. This strategy

is purely based on the fundamental price of the currency, calculated using Purchasing

Power Parity (PPP). A purchasing power parity exchange rate equalizes the purchasing

power of different currencies in their home countries for a given basket of goods. If for example a basket of goods costs 125 USD in US and a corresponding basket in

Europe costs 100 EUR, the fair value of the exchange rate would be 1.25 EURUSD

meaning that people in US and Europe have the same purchasing power. This is why

it is believed that the currencies in the long run tend to revert towards their fair value

based on PPP. But in short- to medium-run they might deviate somewhat from this

equilibrium due to trade, information and other costs. These movements allows for

profiting by buying undervalued currencies and selling overvalued. [6]

2.1.1 Alphas and Betas

The described strategies are often referred to as beta strategies, reflecting market return.

Beside these there is a lot of other strategies, trying to find the alpha in the market.

Alpha is a way of describing excess return, captured by a specific fund or individualtrader. Alpha is defined using Capital Asset Pricing Model (CAPM). CAPM for a

portfolio is

r p = rf + β p(rM − rf )where rf is the risk free rate, β p =

ρpM σpσM σ2M

the volatility of the portfolio relative

some index explaining market return, and (rM − rf ) the market risk premia. Alphacan now be described as the excess return, comparing to CAPM, as follows

α = r∗ p − r p = r∗ p − (rf + β p(rm − rf ))where r∗ p is the actual return given by the asset and r p the return given by CAPM.


29/95

2.2. Hidden Markov Models 13

The above description of alpha is valid for more or less all financial assets. But

when it comes to FX, it might sometimes be difficult to assign a particular risk free rate

to the portfolio. One suggestion is simply to use the rates of the country from where

the portfolio is being managed, but the most reoccurring method is to leave out the risk

free rate. This gives

α = r∗ p − r p = r∗ p − β prmwhere r∗ p is adjusted for various transaction costs, where the most common is the costrelated to the spread.

2.2 Hidden Markov Models

Although initially introduced in the 1960’s, HMM first gained popularity in the late

1980’s. There are mainly two reasons for this; first the models are very rich in mathe-

matical structure and hence can form the theoretical basis for many applications. Sec-ond, the models, when applied properly, work very well in practice. [17] The term

HMM is more familiar in the speech recognition community and communication sys-

tems, but has during the last years gained acceptance in finance as well as economics

and management science. [16]

The theory of HMM deals with two things: estimation and control. The first in-

clude signal filtering, model parameter identification, state estimation, signal smooth-

ing, and signal prediction. The latter refers to selecting actions which effect the signal-

generating system in such a way as to achieve ceratin control objectives. Essentially,

the goal is to develop optimal estimation algorithms for HMMs to filter out the ran-

dom noise in the best possible way. The use of HMMs is also motivated by empirical

studies that favors Markov-switching models when dealing with macroeconomic vari-

ables. This provides flexibility to financial models and incorporates stochastic volatil-ity in a simple way. Early works, proposing to have an unobserved regime following

a Markov process, where the shifts in regimes could be compared to business cycles,

stock prices, foreign exchange, interest rates and option valuation. The motive for a

regime-switching model is that the market may switch from time to time, between for

example periods of high and low volatility.

The major part of this section is mainly gathered from [17] if nothing else is stated.

This article is often used as a main reference by other authors due to its thorough

description of HMMs in general.

2.2.1 Hidden Markov Models used in Finance

Previous applications in the field of finance where HMMs have been used range all

the way from pricing of options and variance swaps and valuation of life insurancespolicies to interest rate theory and early warning systems for currency crises. [16] In

[14] the author uses hidden Markov models when pricing bonds through considering

a diffusion model for the short rate. The drift and the diffusion parameters are here

modulated by an underlying hidden Markov process. In this way could the value of the

short rate successfully be predicted for the next time period.

HMMs has also, with great success, been used on its own or in combination with

e.g. GMMs or artificial neural networks for prediction of financial time series, as equity

indices such as the S&P 500. [18, 9] In these cases the authors has predicted the rate

of return for the indices during the next time step and thereby been able to create an

accurate trading signal.


30/95


The wide range of applications together with the proven functionality, in both fi-

nance and other communities such as speak recognition, and the flexible underlying

mathematical model is clearly appealing.

2.2.2 Bayes Theorem

A HMM is a statistical model in which the system being modeled is assumed to be a

Markov process with unknown parameters, and the challenge is to determine the hidden

parameters from the observable parameters. The extracted model parameters can then

be used to perform further analysis, for example for pattern recognition applications.

A HMM can be considered as the simplest dynamic Bayesian network, which is a

probabilistic graphical model that represents a set of variables and their probabilistic

independencies, and where the variables appear in a sequence.

Bayesian probability is an interpretation of the probability calculus which holds

that the concept of probability can be defined as the degree to which a person (orcommunity) believes that a proposition is true. Bayesian theory also suggests that

Bayes theorem can be used as a rule to infer or update the degree of belief in light of

new information.

The probability of an event A conditional on event B is generally different from theprobability of B conditional on A. However, there is a definite relationship betweenthe two, and Bayes’ theorem is the statement of that relationship.

To derive the theorem, the definition of conditional probability used. The probabil-

ity of event A given B is

P (A|B) = P (A ∩ B)P (B)

.

Likewise, the probability of event B, given event A is

P (B|A) = P (B ∩A)P (A)

.

Rearranging and combining these two equations, one find

P (A|B)P (B) = P (A ∩ B) = P (B ∩A) = P (B|A)P (A).This lemma is sometimes called the product rule for probabilities. Dividing both sides

by P (B), given P (B) = 0, Bayes’ theorem is obtained:

P (A|B) = P (B|A)P (A)P (B)

There is also a version of Bayes’ theorem for continuous distributions. It is some-

what harder to derive, since probability densities, strictly speaking, are not probabili-

ties, so Bayes’ theorem has to be established by a limit process. Bayes’ theorem forprobability density functions is formally similar to the theorem for probabilities:

f (x|y) = f (x, y)f (y)

= f (y|x)f (x)

f (y)

and there is an analogous statement of the law of total probability:

f (x|y) = f (y|x)f (x) ∞−∞

f (y|x)f (x)dx .

The notation have here been somewhat abused, using f for each one of these terms,although each one is really a different function; the functions are distinguished by the

names of their arguments.


31/95


2.2.3 Markov Chains

A Markov chain, sometimes also referred to as an observed Markov Model, can be

seen as a weighted finite-state automaton, which is defined by a set of states and a

set of transitions between the states based on the observed input. In the case of the

Markov chain the weights on the arcs going between the different states can be seen as

probabilities of how likely it is that a particular path is chosen. The probabilities on the

arcs leaving a node (state) must all sum up to 1. In figure 2.1 there is a simple example

of how this could work.

s1: Sunny

s2: Cloudy

s3: Rainy

a 1 2 =

0. 2

a13 = 0.4 a 2 1 =

0. 3

a 2 3 =

0 . 5

a31 = 0.25

a 3 2 =

0 . 2 5

a11 = 0.4

a22 = 0.2

a33 = 0.5

Figure 2.1: A simple example of a Markov chain explaining the weather. a ij, found onthe arcs going between the nodes, represents the probability of going from state s i tostate sj .

In the figure a simple model of the weather is set up, specified by the three states;

sunny, cloudy and rainy. Given that the weather is rainy (state 3) on day 1 (t = 1),what is the probability that the three following days will be sunny? Stated more formal,

one have an observation sequence O = {s3, s1, s1, s1} for t = 1, 2, 3, 4, and wish todetermine the probability of O given the model in figure 2.1. The probability is given

by:

P (O|Model) = P (s3, s1, s1, s1|Model) == P (s3) · P (s1|s3) · P (s1|s1) · P (s1|s1) == 1 · 0.25 · 0.4 · 0.4 = 0.04

For a more formal description, the Markov chain is specified, as mentioned above,

by:


32/95


S =

{s1, s2,...,sN

} a set of N states,

A = {a11, a12,...,aNN } a transition probability matrix A, where each a ijrepresents the probability of moving from state ito state j , with

N j=1 aij = 1, ∀i,

Π = {π1, π2,...,πN } an initial probability distribution, where π i indi-cates the probability of starting in state i. Also,N

i=1 πi = 1.

Instead of specifying Π, one could use a special start node not associated withthe observations, and with outgoing probabilities a start,i = πi as the probabilities of going from the start state to state i, and astart,start = ai,start = 0, ∀i. The timeinstants associated with state changes are defined as as t = 1, 2,... and the actual stateat time t as q t.

An important feature of the Markov chain is its assumptions about the probabilities.In a first-order Markov chain, the probability of a state only depends on the previous

state, that is:

P (q t|q t−1,...,q 1) = P (q t|q t−1)

Markov chains, where the probability of moving between any two states are non-

zero, are called fully-connected or ergodic. But this is not always the case; in for ex-

ample a left-right (also called Bakis) Markov model there are no transitions going from

a higher-numbered to a lower-numbered state. The way the trellis is set up depends on

the given situation.

The Markov Property Applied in Finance

Stock prices are often assumed to follow a Markov process. This means that the present

value is all that is needed for predicting the future, and that the past history and the

taken path to today’s value is irrelevant. Considering that equity and currency are

both financial assets, traded under more or less the same conditions, it would not seem

farfetched assuming that currency prices also follows a Markov process.

The Markov property of stock prices is consistent with the weak form of market

efficiency, which states that the present price contains all information contained in

historical prices. This implies that using technical analysis on historical data would

not generate an above-average return, and the strongest argument for weak-form mar-

ket efficiency is the competition on the market. Given the many investors following

the market development, any opportunities rising would immediately be exploited andeliminated. But the discussion about market efficiency is heavily debated and the view

presented here is seen somewhat from an academic viewpoint. [12]

2.2.4 Extending the Markov Chain to a Hidden Markov Model

A Markov chain is useful when one want to compute the probability of a particular

sequence of events, all observable in the world. However, the events that are of interest

might not be directly observable, and this is where HMM comes in handy.

Extending the definition of Markov chains presented in the previous section gives:


33/95


S =

{s1, s2,...,sN

} a set of N hidden states,

Q = {q 1, q 2,...,q T } a state sequence of length T taking values from S ,O = {o1, o2,...,oT } an observation sequence consisting of T obser-

vations, taking values from the discrete alphabet

V = {v1, v2,...,vM },A = {a11, a12,...,aNN } a transition probability matrix A, where each a ij

represents the probability of moving from state s ito state sj , with

N j=1 aij , ∀i,

B = bi(ot) a sequence of observation likelihoods, also calledemission probabilities, expressing the probability

of an observation ot being generated from a statesi at time t,

Π =

{π1, π2,...,πN

} an initial probability distribution, where π i indi-

cates the probability of starting in state s i. Also,N i=1 πi = 1.

As before, the time instants associated with state changes are defined as as t =1, 2,... and the actual state at time t as q t. The notation λ = {A,B, Π} is also intro-duced, which indicates the complete parameter set of the model.

A first-order HMM makes two assumptions; first, as with the first-order Markov

chain above, the probability of a state is only dependent on the previous state:

P (q t|q t−1,...,q 1) = P (q t|q t−1)Second, the probability of an output observation o t is only dependent on the state

that produced the observation, q t, and not on any other observations or states:

P (ot|q t, q t−1,...,q 1, ot−1,...,o1) = P (ot|q t)To clarify what is meant by all this, a simple example based on one originally given

by [13] is here presented. Imagine that you are a climatologist in the year 2799 and

want to study how the weather was in 2007 in a certain region. Unfortunately you do

not have any records for this particular region and time, but what you do have is the

diary of a young man for 2007, that tells you how many ice creams he had every day.

Stated more formal, you have an observation sequence, O = {o 1, . . . , o365}, whereeach observation assumes a value from the discrete alphabet V = {1, 2, 3}, i.e. thenumber of ice creams he had every day. Your task is to find the ”correct” hidden state

sequence, Q = {q 1, . . . , q 365}, with the possible states sunny (s1) and rainy (s2), thatcorresponds to the given observations. Say for example that you know that he had one

ice cream at time t − 1, three ice creams at time t and two ice creams at time t + 1.The most probable hidden state sequence for these three days might for example be

q t−1 = s2, q t = s1 and q t+1 = s2, given the number of eaten ice creams. In otherwords, the most likely weather during these days would be rainy, sunny and rainy. An

example of how this could look is presented in figure 2.2.

2.2.5 Three Fundamental Problems

The structure of the HMM should now be clear, which leads to the question: in what

way can HMM be helpful? [17] suggests in his paper that HMM should be character-

ized by three fundamental problems:


34/95


s1

s2

s1

s2

s1

s2

ot−1 = 1

ot−1 = 2

ot−1 = 3

ot = 1

ot = 2

ot = 3

ot+1 = 1

ot+1 = 2

ot+1 = 3

bj(ot) = P (ot|q t = sj)

aij(t) = P (q t+1 = sj |q t = si)

t

Figure 2.2: A HMM example, where the most probable state path ( s 2, s1, s2) is out-lined. aij states the probability of going from state s i to state sj , and bj(ot) the proba-bility of a specific observation at time t, given state s j .

Problem 1 - Computing likelihood: Given the complete parameter set λ and an ob-servation sequence O, determine the likelihood P (O|λ).

Problem 2 - Decoding: Given the complete parameter set λ and an observation se-quence O, determine the best hidden sequence Q.

Problem 3 - Learning: Given an observation sequence O and the set of states in theHMM, learn the HMM λ.

In the following three subsections these problems will be gone through thoroughly

and how they can be solved. This to give the reader a clear view of the underlying cal-

culation techniques that constitutes the base of the evaluated mathematical framework.

It should also describe in what way the framework of HMMs can be used for parameter

estimation in the standard case.

Computing LikelihoodThis is an evaluation problem, which means that given a model and a sequence of ob-

servations, what is the probability that the observations was generated by the model.

This information can be very valuable when choosing between different models want-

ing to know which one that best matches the observations.

To find a solution to problem 1, one wish to calculate the probability of a given

observation sequence, O = {o1, o2,...,oT }, given the model λ = {A,B, Π}. In otherwords one want to find P (O|λ). The most intuitive way of doing this is to enumerateevery possible state sequence of length T . One such state sequence is

Q = {q 1, q 2, . . . , q T } (2.1)


35/95


where q 1 is the initial state. The probability of observation sequence O given a statesequence such as 2.1 can be calculated as

P (O|Q, λ) =T t=1

P (ot|q t, λ) (2.2)

where the different observations are assumed to be independent. The property of inde-

pendence makes it possible to calculate equation 2.2 as

P (O|Q, λ) = bq1(o1) · bq2(o2) · . . . · bqT (oT ). (2.3)

The probability of such a sequence can be written as

P (Q|λ) = πq1 · aq1q2aq2q3 · . . . · aqT −1qT . (2.4)

The joint probability of O and Q, the probability that O an Q occurs simultaneously,is simply the product of 2.3 and 2.4 as

P (O, Q|λ) = P (O|Q, λ)P (Q|λ). (2.5)

Finally the probability of O given the model λ is calculated by summing the right handside of equation 2.5 over all possible state sequences Q

P (O|λ) = q1,q2,...,qT P (O|Q, λ)P (Q|λ) ==

q1,q2,...,qT πq1bq1(o1)aq1q2bq2(o2) . . . aqT −1qT bqT (oT ).

(2.6)

Equation 2.6 says that one at the initial time t = 1 are in state q 1 with probabilityπq1 , and generate the observation o1 with probability bq1(o1). As time ticks from t tot + 1 (t = 2) one transform from state q 1 to q 2 with probability aq1q2 , and generateobservation o2 with probability bq2(o2) and so on until t = T .

This procedure involves a total of 2T N T calculations, which makes it unfeasible,even for small values of N and T . As an example it takes 2 · 100 · 5 100 ≈ 1072calculations for a model with 5 states and 100 observations. Therefor it is needed to find

a more efficient way of calculating P (O|λ). Such a procedure exists and is called theForward-Backward Procedure.1 For initiation one need to define the forward variable

as

αt(i) = P (o1, o2, . . . , ot, q t = si|λ).In other words, the probability of the partial observation sequence, o 1, o2, . . . , ot untiltime t and given state s

i at time t. One can solve for α

t(i) inductively as follows:

1. Initialization:

α1(i) = πibi(o1), 1 ≤ i ≤ N. (2.7)

2. Induction:

αt+1( j) =N

j=1 αt(i)aij

bj(ot+1), 1 ≤ t ≤ T − 1,

1 ≤ j ≤ N.(2.8)

1The backward part of the calculation is not needed to solve Problem 1. It will be introduced whensolving Problem 3 later on.


36/95


3. Termination:

P (O|λ) =N i=1

αT (i). (2.9)

Step 1 sets the forward probability to the joint probability of state s j and initial obser-vation o1. The second step, which is the heart of the forward calculation is illustratedin figure 2.3.

aNj

a 1 j

sj

sN

s3

s2

s1

αt(i)t

αt+1( j)

t + 1

Figure 2.3: Illustration of the sequence of operations required for the computation of

the forward variable αt(i).

One can see that state sj at time t + 1 can be reached from N different states attime t. By summing the product over all possible states s i, 1 ≤ i ≤ N at time t resultsin the probability of s j at time t + 1 with all previous observations in consideration.

Once it is calculated for sj , it is easy to see that αt+1( j) is obtained by accounting forobservation ot+1 in state sj , in other words by multiplying the summed value by theprobability bj(ot+1). The computation of 2.8 is performed for all states s j , 1 ≤ j ≤ N ,for a given time t and iterated for all t = 1, 2, . . . , T − 1. Step 3 then gives P (O|λ) bysumming the terminal forward variables αT (i). This is the case because, by definition

αT (i) = P (o1, o2, . . . , oT , q T = si|λ)

and therefore P (O|λ) is just the sum of the αT (i)’s. This method just needs N 2T which is much more efficient than the more traditional method. Instead of 10 72 cal-culations, a total of 2500 is enough, a saving of about 69 orders of magnitude. In the

next two sections — one can see that this decrease of magnitude is essential because

P (O

|λ) serves as the denominator when estimating the central variables when solving

the last two problems.In a similar manner, one can consider a backward variable β t(i) defined as follows:

β t(i) = P (ot+1, ot+2, . . . , oT |q t = si, λ)

β t(i) is the probability of the partial observation sequence from t + 1 to the last time,T , given the state si at time t and the HMM λ. By using induction, β t(i) is found asfollows:

1. Initialization:

β T (i) = 1, 1 ≤ i ≤ N.


37/95


2. Induction:

β t(i) = N

j=1 aijbj(ot+1)β t+1( j), t = T − 1, T − 2, . . . , 1,1 ≤ i ≤ N.

Step 1 defines β T (i) to be one for all si. Step 2, which is illustrated in figure 2.4, showsthat in order to to have been in state s i at time t, and to account for the observationsequence from time t + 1 and on, one have to consider all possible states s j at timet + 1, accounting for the transition from si to sj as well as the observation ot+1 in statesj , and then account for the remaining partial observation sequence from state s j .

a i N

a i 1

t t + 1

β t(i) β t+1( j)

si

sN

s3

s2

s1


the backward variable β t(i).

As mentioned before the backward variable is not used to find the probability

P (O|λ). Later on it will be shown how the backward as well as the forward calcula-

tion are used extensively to help one solve the second as well as the third fundamental

problem of HMMs.

Decoding

In this the second problem one try to find the ”correct” hidden path, i.e. trying to

uncover the hidden path. This is often used when one wants to learn about the structure

of the model or to get optimal state sequences.

There are several ways of finding the ”optimal” state sequence according to a given

observation sequence. The difficulty lies in the definition of a optimal state sequence.

One possible way is to find the states q t which are individually most likely. This criteriamaximizes the total number of correct states. To be able to implement this as a solution

to the second problem one start by defining the variableγ t(i) = P (q t = si|O, λ) (2.10)

which gives the probability of being in state s i at time t given the observation sequence,O, and the model, λ. Equation 2.10 can be expressed simply using the forward andbackward variables, αt(i) and β t(i) as follows:

γ t(i) = αt(i)β t(i)

P (O|λ) = αt(i)β t(i)N i=1 αt(i)β t(i)

. (2.11)

It is simple to see that γ t(i) is a true probability measure. This since αt(i) accountsfor the partial observation sequence o1, o2, . . . , ot and the state si at t, while β t(i)


38/95


accounts for the remainder of the observation sequence o t+1, ot+2, . . . , oT given state

si at time t. The normalization factor P (O|λ) = N i=1 αt(i)β t(i) makes γ t(i) aprobability measure, which means thatN i=1

γ t(i) = 1.

One can now find the individually most likely state q t at time t by using γ t(i) asfollows:

q t = argmax1≤i≤N [γ t(i)], 1 ≤ t ≤ T . (2.12)Although equation 2.12 maximizes the expected number of correct states there

could be some problems with the resulting state sequence. For example, when the

HMM has state transitions which has zero probability the optimal state sequence may,

in fact, not even be a valid state sequence. This is due to the fact that the solutionof 2.12 simply determines the most likely state at every instant, without regard to the

probability of occurrence of sequences of states.

To solve this problem one could modify the optimality criterion. For example by

solving for the state sequence that maximizes the number of correct pairs of states

(q t, q t+1) or triples of states (q t, q t+1, q t+2).The most widely used criterion however, is to find the single best state sequence,

in other words to maximize P (Q|O, λ) which is equivalent to maximizing P (Q, O|λ).To find the optimal state sequence one often uses a method, based on dynamic pro-

gramming, called the Viterbi Algorithm.

To find the best state sequence Q = {q 1, q 2, . . . , q T } for a given observation se-quence O = {o1, o2, . . . , oT }, one need to define the quantity

δ t(i) = maxq1,q2,...,qt−1 P [q 1, q 2, . . . , q t = si, o1, o2, . . . , ot|λ]which means the highest probability along a single path, at time t, which accounts forthe first t observations and ends in state si. By induction one have:

δ t+1( j) = [maxi

δ t(i)aij]bj(ot+1). (2.13)

To be able to retrieve the state sequence, one need to keep track of the argument

which maximized 2.13, for each t and j. This is done via the array ψ t( j). The completeprocedure for finding the best state sequence can now be stated as follows:

1. Initialization:δ 1(i) = πibi(o1), 1

≤ i

≤ N

ψ1(i) = 0.

2. Recursion:

δ t( j) = max1≤i≤N [δ t−1(i)aij]bj(ot), 2 ≤ t ≤ T 1 ≤ j ≤ N

ψt( j) = argmax1≤i≤N [δ t−1(i)aij], 2 ≤ t ≤ T 1 ≤ j ≤ N

(2.14)

3. Termination:P ∗ = max1≤i≤N [δ T (i)]q ∗T = argmax1≤i≤N [δ T (i)].

(2.15)


39/95


4. Path (state sequence) backtracking:

q ∗t = ψt+1(q ∗t+1), t = T − 1, T − 2, . . . , 1.

It is important to note that the Viterbi algorithm is similar in implementation to

the forward calculation in 2.7 - 2.9. The major difference is the maximization in 2.14

over the previous states which is used in place of the summing procedure in 2.8. It

should also be clear that a lattice structure efficiently implements the computation of

the Viterbi procedure.

Learning

The third, last, and at the same time most challenging problem with HMMs is to de-

termine a method to adjust the model parameters λ =

{A,B, Π

} to maximize the

probability of the observation sequence given the model. There is no known analyticalsolution to this problem, but one can however choose λ such that P (O|λ) is locallymaximized using an iterative process such as the Baum-Welch method, a type of Ex-

pectation Maximization (EM) algorithm. Options to this is to use a form of Maximum

A Posteriori (MAP) method, explained in [7], or setting up the problem as an opti-

mization problem, which can be solved with for example gradient technique, which

can yield solutions comparable to those of the aforementioned method. But the Baum-

Welch is however the most commonly used procedure, and will therefore be the one

utilized in this the first investigation of HMM on FX data.

To be able to re-estimate the model parameters, using the Baum-Welch method,

one should start with defining ξ t(i, j), the probability of being in state s i at time t, andstate sj at time t + 1, given the model and the observations sequence. In other words

the variable can be defined as:

ξ t(i, j) = P (q t = si, q t+1 = sj|O, λ).

The needed information for the variable ξ t(i, j) is shown in figure 2.5.

si sj

aijbj(ot+1)

t − 1 t t + 1 t + 2αt(i) β t+1( j)


the joint event that the system is in state si at time t and state sj at time t + 1.

From this figure one should be able to understand that ξ t(i, j) can be written using


40/95


the forward and backward variables as follows:

ξ t(i, j) = αt(i)aijbj(ot+1)β t+1( j)P (O|λ) =

= αt(i)aijbj(ot+1)β t+1( j)N i=1

N j=1 αt(i)aijbj(ot+1)β t+1( j)

which is a probability measure since the numerator is simply P (q t = si, q t+1 =sj , O|λ) and denominator is P (O|λ).

As described in 2.11, γ t(i) is the probability of being in state si at time t, given theobservation sequence and the model. Therefore there is a close relationship between

γ t(i) and ξ t(i, j). One can express γ t(i) as the sum of al

algo trading hidden markov models on foreign exchange data

Documents