financial times series - math.chalmers.se

35
Financial Times Series Lecture 7

Upload: others

Post on 09-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Financial Times Series

Lecture 7

Regime Switching

β€’ We sometimes talk about bull and bear markets

β€’ By bull market we mean a market where prices are increasing and where volatility is relatively low

β€’ By bear market we mean a market where prices are decreasing and where volatility is relatively high

Regime Switching

β€’ So if we would like to model log-returns under the assumption that the market switches between the bull and bear states we let

π‘Ÿπ‘‘ = πœ‡π‘π‘’π‘™π‘™ + πœŽπ‘π‘’π‘™π‘™πœ€π‘‘, 𝑠𝑑= π‘π‘’π‘™π‘™πœ‡π‘π‘’π‘Žπ‘Ÿ + πœŽπ‘π‘’π‘Žπ‘Ÿπœ€π‘‘, 𝑠𝑑= π‘π‘’π‘Žπ‘Ÿ

β€’ Here 𝑠𝑑 denotes the state of the market at time 𝑑 and πœ€π‘‘ is i.i.d 𝑁 0,1

Regime Switching

β€’ Typically we cannot tell what state the market is in but using some filtering techniques and assumptions on the switching mechanism we can estimate switching probabilities, drifts and volatilities

β€’ We may assume that the switching between the states is Markovian, i.e. the probability of switching from one state to another depends only on which state the market is in a the time point of switching and not on in which states the market has been in or returns at previous time points

𝑃 𝑠𝑑|π‘ π‘‘βˆ’1 = 𝑃 𝑠𝑑|π‘ π‘‘βˆ’1, π‘ π‘‘βˆ’2, … , π‘Ÿπ‘‘βˆ’1, π‘Ÿπ‘‘βˆ’2, …

Regime Switching

β€’ If we denote the market states 0 and 1 we let

𝑝00 = 𝑃 𝑠𝑑 = 0|π‘ π‘‘βˆ’1 = 0 𝑝01 = 𝑃 𝑠𝑑 = 1|π‘ π‘‘βˆ’1 = 0 𝑝10 = 𝑃 𝑠𝑑 = 0|π‘ π‘‘βˆ’1 = 1 𝑝11 = 𝑃 𝑠𝑑 = 1|π‘ π‘‘βˆ’1 = 1

β€’ Clearly 𝑝00 = 1 βˆ’ 𝑝01 and 𝑝11 = 1 βˆ’ 𝑝10

Estimation

β€’ So given a series of observations π‘Ÿ1, … , π‘Ÿπ‘‡ we want to estimate the parameters πœƒ = 𝑝00, 𝑝11, πœ‡0, πœ‡1, 𝜎0, 𝜎1

β€’ Since we cannot tell which state we are in at a given time point it is not obvious how the estimations can be done but we may formally write the likelihood function

𝐿 πœƒ = 𝑓 π‘Ÿ1|πœƒ 𝑓 π‘Ÿ2|πœƒ, π‘Ÿ1 ⋯𝑓 π‘Ÿπ‘‡|πœƒ, π‘Ÿ1, … , π‘Ÿπ‘‡βˆ’1

where 𝑓 is the density of a 𝑁 πœ‡π‘ , πœŽπ‘ 2 random variable

Estimation

β€’ So the contribution of π‘Ÿπ‘‘ to the log-likelihood is

log𝑓 π‘Ÿπ‘‘|πœƒ, π‘Ÿ1, … , π‘Ÿπ‘‘βˆ’1

β€’ Using conditional probabilities and the Markov property, we can write (exercise)

𝑓 𝑠𝑑 , π‘ π‘‘βˆ’1, π‘Ÿπ‘‘|πœƒ, π‘Ÿ1, … , π‘Ÿπ‘‘βˆ’1= 𝑓 π‘ π‘‘βˆ’1|πœƒ, π‘Ÿ1, … , π‘Ÿπ‘‘βˆ’1 𝑓 𝑠𝑑|π‘ π‘‘βˆ’1, πœƒ 𝑓 π‘Ÿπ‘‘|𝑠𝑑 , πœƒ

Estimation

β€’ Above 𝑓 𝑠𝑑|π‘ π‘‘βˆ’1, πœƒ is the switching probability and

𝑓 π‘Ÿπ‘‘|𝑠𝑑, πœƒ =1

2πœ‹πœŽπ‘ π‘‘π‘’π‘₯𝑝 βˆ’

1

2

π‘Ÿπ‘‘βˆ’πœ‡π‘ π‘‘πœŽπ‘ π‘‘

2

β€’ The function 𝑓 π‘ π‘‘βˆ’1|πœƒ, π‘Ÿ1, … , π‘Ÿπ‘‘βˆ’1 is given by

𝑓 π‘ π‘‘βˆ’1,π‘ π‘‘βˆ’2=0,π‘Ÿπ‘‘βˆ’1|πœƒ,π‘Ÿ1,…,π‘Ÿπ‘‘βˆ’2 +𝑓 π‘ π‘‘βˆ’1,π‘ π‘‘βˆ’2=1,π‘Ÿπ‘‘βˆ’1|πœƒ,π‘Ÿ1,…,π‘Ÿπ‘‘βˆ’2

𝑓 π‘Ÿπ‘‘βˆ’1|πœƒ,π‘Ÿ1,…,π‘Ÿπ‘‘βˆ’2

Estimation

β€’ So we get

𝑓 π‘Ÿπ‘‘|πœƒ, π‘Ÿ1, … , π‘Ÿπ‘‘βˆ’1 = 𝑓 𝑠𝑑 = 𝑖, π‘ π‘‘βˆ’1 = 𝑗, π‘Ÿπ‘‘|πœƒ, π‘Ÿ1, … , π‘Ÿπ‘‘βˆ’1

2

𝑗=1

2

𝑖=1

β€’ To start the recursion we may let

𝑓 𝑠1 = 0, π‘Ÿ1|πœƒ =1

2

1

2πœ‹πœŽ0𝑒π‘₯𝑝 βˆ’

1

2

π‘Ÿ1 βˆ’ πœ‡0𝜎0

2

𝑓 𝑠1 = 1, π‘Ÿ1|πœƒ =1

2

1

2πœ‹πœŽ1𝑒π‘₯𝑝 βˆ’

1

2

π‘Ÿ1 βˆ’ πœ‡1𝜎1

2

Estimation

β€’ This gives

𝑓 π‘Ÿ1|πœƒ = 𝑓 𝑠1 = 0, π‘Ÿ1|πœƒ + 𝑓 𝑠1 = 1, π‘Ÿ1|πœƒ

β€’ and

𝑓 𝑠1 = 0|πœƒ, π‘Ÿ1 =𝑓 𝑠1 = 0, π‘Ÿ1|πœƒ

𝑓 π‘Ÿ1|πœƒ

𝑓 𝑠1 = 1|πœƒ, π‘Ÿ1 =𝑓 𝑠1 = 1, π‘Ÿ1|πœƒ

𝑓 π‘Ÿ1|πœƒ

Estimation

β€’ Next we get

𝑓 π‘Ÿ2|πœƒ, π‘Ÿ1 = 𝑓 𝑠1 = 𝑖, 𝑠2 = 𝑗, π‘Ÿ2|πœƒ, π‘Ÿ1

2

𝑗=1

2

𝑖=1

β€’ where

𝑓 𝑠1 = 𝑖, 𝑠2 = 𝑗, π‘Ÿ2|πœƒ, π‘Ÿ1

= 𝑓 𝑠1 = 𝑖|πœƒ, π‘Ÿ1 𝑝𝑗𝑖1

2πœ‹πœŽπ‘—π‘’π‘₯𝑝 βˆ’

1

2

π‘Ÿ2 βˆ’ πœ‡π‘—

πœŽπ‘—

2

β€’ And so forth…

Estimation

β€’ So, the maximization of the likelihood-function cannot be done manually but using standard routines like fminsearch or fmincon in matlab we can find our parameter estimates

β€’ There is a matlab package by Marcelo Perlin called MS_Regress available online (for free) that does the parameter estimation and then some…

For the ^N225

For the ^N225

β€’ By smoothed probabilities in the above plot we mean

𝑓 𝑠𝑑|πœƒ, π‘Ÿ1, … , π‘Ÿπ‘‡

β€’ The parameter estimates are 𝑝𝑏𝑒𝑙𝑙,𝑏𝑒𝑙𝑙 = 0.99, π‘π‘π‘’π‘Žπ‘Ÿ,π‘π‘’π‘Žπ‘Ÿ = 0.89, πœ‡π‘π‘’π‘™π‘™ = 0.0007, πœ‡π‘π‘’π‘Žπ‘Ÿ = βˆ’0.0086 πœŽπ‘π‘’π‘™π‘™ = 0.00015, πœŽπ‘π‘’π‘Žπ‘Ÿ = 0.0011

β€’ As expected the bear volatility is higher than the bull

volatility

Non-parametric models

β€’ What if we do not make any distribution assumptions?

β€’ Assume that π‘Ÿπ‘‘ and π‘₯𝑑 are two time series for which we want to explore their relationship

β€’ Maybe it is possible to fit a model

π‘Ÿπ‘‘ = π‘š π‘₯𝑑 + π‘Žπ‘‘ where π‘š is some smooth function to be estimated from the data

Non-parametric models

β€’ If we had independent observerations π‘Ÿ1, … , π‘Ÿπ‘‡ for a fixed π‘₯𝑑 = π‘₯ the we could write

π‘Ÿπ‘‘π‘‡π‘‘=1

𝑇= π‘š π‘₯ +

π‘Žπ‘‘π‘‡π‘‘=1

𝑇

β€’ For a sufficiently large 𝑇 the mean of the noise (LLN)

will be close to zero so in this case

π‘š π‘₯ = π‘Ÿπ‘‘π‘‡π‘‘=1

𝑇

Non-parametric models

β€’ In financial applications we will typically not have data as above. Rather we will have pairs of observations

π‘Ÿ1, π‘₯1 , … , π‘Ÿπ‘‡ , π‘₯𝑇

β€’ But if the function π‘š is sufficiently smooth then a

value of π‘Ÿπ‘‘ for which π‘₯𝑑 β‰ˆ π‘₯ will still give a good approximation of π‘š π‘₯

β€’ For a value of π‘Ÿπ‘‘ for which π‘₯𝑑 is not close to π‘₯ will give a less accurate approximation of π‘š π‘₯

Non-parametric models

β€’ So instead of a simple average, we use a weighted average

π‘š π‘₯ =1

𝑇 𝑀𝑑(π‘₯)π‘Ÿπ‘‘

𝑇

𝑑=1

where the weights 𝑀𝑑(π‘₯) are large for those π‘Ÿπ‘‘ with π‘₯𝑑 close to π‘₯ and weights are small for π‘Ÿπ‘‘ with π‘₯𝑑 not close to π‘₯

Non-parametric models β€’ Above we assume that 𝑀𝑑 π‘₯ = 𝑇𝑇

𝑑=1

β€’ One may also treat 1

𝑇 as part of the weights and work

under the assumption 𝑀𝑑 π‘₯ = 1𝑇𝑑=1

β€’ A construction like the one at hand where we will let the weights depend on a choice of measure for the distance between π‘₯𝑑 and π‘₯ and the size of the weights depends on the distance may be referred to as a local weighted average

Kernel regression

β€’ One way of finding appropriate weights is to use kernels 𝐾 π‘₯ which are typically probability density functions

𝐾(π‘₯) β‰₯ 0 and 𝐾 π‘₯ 𝑑π‘₯ = 1∞

βˆ’βˆž

β€’ For flexibility we will allow scaling of the kernel using a ”bandwidth” β„Ž

πΎβ„Ž π‘₯ =1

β„ŽπΎ π‘₯ β„Ž , πΎβ„Ž π‘₯ 𝑑π‘₯ = 1

∞

βˆ’βˆž

β€’ The weights may be defined as

𝑀𝑑 π‘₯ =πΎβ„Ž π‘₯ βˆ’ π‘₯𝑑

πΎβ„Ž π‘₯ βˆ’ π‘₯𝑑𝑇𝑑=1

Nadaraya-Watson

β€’ The N-W kernel estimator (N-W 1964) is given by

π‘š π‘₯ = 𝑀𝑑(π‘₯)π‘Ÿπ‘‘

𝑇

𝑑=1

= πΎβ„Ž π‘₯ βˆ’ π‘₯𝑑 π‘Ÿπ‘‘π‘‡π‘‘=1

πΎβ„Ž π‘₯ βˆ’ π‘₯𝑑𝑇𝑑=1

β€’ The choice of kernel is often (Gaussian)

πΎβ„Ž π‘₯ =1

β„Ž 2πœ‹π‘’π‘₯𝑝 βˆ’

π‘₯2

2β„Ž2

or (Epanechnikov)

πΎβ„Ž π‘₯ =3

4β„Ž1 βˆ’

π‘₯2

β„Ž2𝟏 π‘₯ ≀ β„Ž

What does the bandwidth do?

β€’ If we use the Epanechnikov kernel we get

π‘š π‘₯ = πΎβ„Ž π‘₯ βˆ’ π‘₯𝑑 π‘Ÿπ‘‘π‘‡π‘‘=1

πΎβ„Ž π‘₯ βˆ’ π‘₯𝑑𝑇𝑑=1

= 1 βˆ’

π‘₯ βˆ’ π‘₯𝑑2

β„Ž2𝟏 π‘₯ βˆ’ π‘₯𝑑 ≀ β„Ž π‘Ÿπ‘‘

𝑇𝑑=1

1 βˆ’π‘₯ βˆ’ π‘₯𝑑

2

β„Ž2𝟏 π‘₯ βˆ’ π‘₯𝑑 ≀ β„Žπ‘‡

𝑑=1

β€’ If β„Ž β†’ ∞

π‘š π‘₯ β†’1

𝑇 π‘Ÿπ‘‘

𝑇

𝑑=1

and if β„Ž β†’ 0

π‘š π‘₯ β†’ π‘Ÿπ‘‘ where π‘Ÿπ‘‘ is the observation for which π‘₯ βˆ’ π‘₯𝑑 is smallest within the sample

Bandwidth Selection

β€’ Fan and Yao (2003) suggest

β„Ž = 1.06π‘ π‘‡βˆ’1/5

for the Gaussian Kernel and

β„Ž = 2.34π‘ π‘‡βˆ’1/5

for the Epanechnikov kernel where 𝑠 is the sample standard deviation of π‘₯𝑑 which is assumed stationary

Cross Validation

β€’ Let

π‘š β„Ž,𝑗 π‘₯𝑗 =1

𝑇 βˆ’ 1 𝑀𝑑 π‘₯𝑗 𝑦𝑑𝑑≠𝑗

which is an estimate of 𝑦𝑗 where the weights sum to 𝑇 βˆ’ 1. β€’ Also let

𝐢𝑉 β„Ž =1

𝑇 𝑦𝑗 βˆ’π‘š β„Ž,𝑗 π‘₯𝑗

2π‘Š π‘₯𝑗

𝑇

𝑗=1

where π‘Š βˆ™ is anonnegative weight function satisfying π‘Š π‘₯𝑗 = 𝑇𝑇𝑗=1

Cross Validation

β€’ The function 𝐢𝑉 β„Ž is called the cross-validation function since it validates the ability of the smoother π‘š to predict 𝑦𝑑

β€’ The weight function π‘Š may be chosen to downweight certain observations if necessary

but π‘Š π‘₯𝑗 = 1 is often sufficient

Cross Validation

β€’ It is an exercise to show that

𝐢𝑉 β„Ž =1

𝑇 𝑦𝑗 βˆ’π‘š β„Ž,𝑗 π‘₯𝑗

2𝑇

𝑗=1

=1

𝑇 𝑦𝑗 βˆ’π‘š π‘₯𝑗

2/ 1 βˆ’

πΎβ„Ž 0

πΎβ„Ž π‘₯𝑗 βˆ’ π‘₯𝑖𝑇𝑖=1

2𝑇

𝑗=1

N-W Volatility Estimation

β€’ Assume that π‘Ÿπ‘‘ is our log-return series that has been centered at zero so that

𝐸 π‘Ÿ2𝑑 = 𝜎2𝑑

β€’ We also assume that π‘Ÿ2𝑑 = 𝜎2𝑑 + πœ€π‘‘ where πœ€π‘‘ is WN

N-W Volatility Estimation

β€’ We may then use the N–W kernel estimator

𝜎 2𝑑 = πΎβ„Ž 𝑑 βˆ’ 𝑖 π‘Ÿ2π‘–π‘‘βˆ’1𝑖=1

πΎβ„Ž 𝑑 βˆ’ π‘–π‘‘βˆ’1𝑖=1

β€’ Below we use Gaussian kernels for OMXS30 data

OMXS30 returns (100104-130412)

OMXS30 N-W volatility estimates

β€’ CV gives β„Ž = 23.26 and volatility estimates:

OMXS30 N-W volatility devolatized returns

The Ljung-Box null hypothesis of no autocorrelations cannot be rejected at 5% level p=0.4433

Autocorrelation for devolatized returns

Another application

β€’ Remember the ARIMA example for the Swedish GDP

β€’ What if we try to use N-W to model log GDP 𝑝𝑑 as

𝑝𝑑 = π‘š π‘π‘‘βˆ’1 + π‘Žπ‘‘

where π‘Žπ‘‘ is WN

Log BNP

β€’ Using CV we find β„Ž = 0.0367 to be the optimal threshold for the Gaussian kernel and

Residuals

β€’ The seasonality seen using the ARIMA is still left, but we know how to deal with that