robust estimation of shape constrained state price density ...faculty.baruch.cuny.edu › lwu ›...

Robust Estimation of Shape Constrained State Price Density Surfaces∗

Markus Ludwig§

First Version: August 30, 2012 This Version: March 25, 2013

Abstract

In order to better capture empirical phenomena, research on option price and implied volatility modeling increasinglyadvocates the use of nonparametric methods over simple functional forms. This, however, comes at a price, since theyrequire dense observations to yield sensible results. Calibration is therefore typically performed using aggregate data.Ironically, the use of time-series data in turn limits the accuracy with which current observations can be modeled.We propose a novel approach that enables the use of flexible functional forms using daily data alone. The resultingestimators yield excellent fits and generalize well beyond available data, all the while respecting theory imposed shapeconstraints. We demonstrate the numerical stability and the pricing performance of our method by approximatingarbitrage-free implied volatility, price and state price density surfaces from S&P 500 options over a period of 12 years.

Keywords: implied volatility, state price density, shape constraints, neural networksJEL classification: C14, C58, G13

Research on derivatives traditionally revolvesaround developing and refining methods forpricing and hedging. However, recently therehas been increased interest in approaches that

take market prices as given, and focus on the informationcontent priced in by means of supply and demand, cf.Xing et al. (2010), Conrad et al. (2013) and Ross (2013).

A popular method to analyze the sentiment of marketparticipants is to use the seminal option pricing formulaproposed by Black and Scholes (1973) and Merton (1973)to map observations from the space of prices to the spaceof implied volatilities. Implied volatilities correspond tothe volatility inputs that equate model and market prices,and contrary to the assumptions underlying the modelused to derive them, vary both with respect to strikeprices and times to maturity. The well known smile andterm structure patterns highlight that the market doesnot consider asset prices to be log-normally distributed.

Another approach builds on the fact that, analogousto the distinction between implied and historic volatility,there also exists an implied density that inherently differsfrom the historic density of the underlying price process.

∗I would like to gratefully acknowledge the valuable commentsby Markus Leippold, Francesco Audrino, Stephen Figlewski, JensCarsten Jackwerth, Manfred Gilli and Robert Huitema.§Department of Banking and Finance, University of Zurich,

Plattenstrasse 14, 8032 Zurich, Switzerland B

Breeden and Litzenberger (1978) show that the stateprice density can be obtained by differentiating the optionprice function twice with respect to strike. This densityreflects the expectations of market participants regardingthe evolution of the underlying asset as well as their riskpreferences and - unlike implied volatility - is model-free.

Approaches to approximate option price functions canbe distinguished into parametric and nonparametric.Parametric methods are structured techniques and relyon specific assumptions about the process generating theobservable data. While stringent assumptions allow foreasier calibration and facilitate both extrapolation andthe incorporation of shape constraints, they pose therisk of misspecification, i.e., the estimator may fail tocapture salient properties of the data. Nonparametricmethods, in contrast, are data-driven and do not rely onstrong assumptions about the underlying process. Theirdisadvantage is that they are typically not very effectiveon small samples and beyond the support of given data.

Recovering well-behaved state price densities from a setof option prices is a non-trivial exercise and poses severalchallenges: (i) smooth interpolation despite noisy andsparse data, (ii) extension of the density into the tails, i.e.,beyond the range of observations, (iii) compliance withshape constraints following from no-arbitrage arguments,and (iv) accurate fits, both in-sample and out-of-sample.

http://ssrn.com/abstract=2138911

mailto:[email protected]



2 Robust Estimation of Shape Constrained State Price Density Surfaces

These issues are further exacerbated by the fact that thestate price density corresponds to the second derivative ofthe function giving rise to observable prices. The qualityof a derivative is typically significantly worse than that ofits corresponding primitive, since differentiation amplifieseven minor irregularities in the estimated function. Thisissue is usually referred to as the curse of differentiation,cf. Aıt-Sahalia and Lo (1998).

This paper proposes a novel approach that producesrobust and adaptive implied volatility, price and stateprice density surfaces that satisfy theory imposed shapeconstraints, such that both strike and calendar arbitrageare ruled out. Potential applications range from pricing ofnon-traded or illiquid products to risk management, assetallocation and market timing, see e.g., Rosenberg (1998),Aıt-Sahalia and Lo (2000), Kostakis et al. (2011). Optionimplied densities also enable the analysis of how marketparticipants respond to new information, and how riskperceptions evolve, see e.g., Birru and Figlewski (2012).

Our work contributes to the existing literature alongseveral dimensions. We propose a flexible approach thatallows the incorporation of shape constraints in the modelselection process of neural network modeling, show thatexcellent generalization can be achieved without resortingto time-series data, and obtain perfectly smooth surfacesover a fixed domain as opposed to maturity-wise strings.

The stable observational grid provided by robust andadaptive surfaces over a fixed domain in turn enables theanalysis of how option implied measures evolve over time.

The remainder of this paper is organized as follows.Section 1 provides an introduction to state price densitiesand discusses some key challenges. Section 2 introducesneural networks as a tool for function approximation andpresents the main ideas behind our shape constrainednetworks. Section 3 presents the results of an empiricalstudy based on S&P 500 options data and examines themin the context of existing literature. Section 4 concludes.

1 Option Prices and Implied Densities

Options are derivative instruments whose payoff dependson future states of an underlying asset. Option marketsserve to integrate the expectations of market participants,as well as their perceptions regarding risk and ambiguity.

A widely used measure of market sentiment, calculatedfrom observable prices of plain vanilla options, is impliedvolatility (IV). A related concept is the state price density(SPD), which captures the risk neutral probabilities thatthe market assigns to the various possible states of theunderlying asset upon expiry of the option.

Ross (1976) and Cox and Ross (1976) demonstratethat in a dynamically complete arbitrage-free market,the price of an option is given by the expected presentvalue of the payoff computed under the SPD. Building

on these results Banz and Miller (1978) and Breedenand Litzenberger (1978) show that an explicit expressionfor the SPD can be obtained as the second derivative ofthe option price function with respect to strike prices.Harrison and Kreps (1979) prove the existence of aprobability density for the underlying process such thatthe value of a call option C(St,K, τ, r, δ) is given by

C(·) = e−rτ∫ +∞

0

max(ST −K, 0)q(ST )dST , (1)

where St is the price of the asset at date t, K the strikeprice, q the state price density, τ the time to maturity,T ≡ t+τ the expiration date, r the deterministic risk-freeinterest rate for that maturity, and δ the correspondingdividend yield of the asset. Following from (1) the stateprice density can be expressed as

q(ST ) = e−rτ∂2C(·)∂K2

. (2)

As noted by Cont (1997), the SPD should be viewed asa way of characterizing the prices of options on an assetas opposed to a mathematical property of the underlyingasset’s stochastic process.

Related Work

Extracting well-behaved densities from a discrete set ofoption prices is a non-trivial exercise and poses multiplechallenges, as evidenced by the considerable quantityof alternative approaches put forward in the literature.The field subsumes both research on implied volatilityand option price modeling, but has additional and morestringent requirements such as smoothness of the secondderivative, support beyond available data and absence ofstatic arbitrage.

Existing methods can be distinguished by whether theymodel the SPD directly, or estimate the implied volatilityor price function first, and then derive the correspondingdensity via the result of Breeden and Litzenberger (1978).

Within the first category we can discern the followingapproaches: Expansion methods yield approximations byaugmenting known distributions with correction terms,cf. Jarrow and Rudd (1982) and Rubinstein (1998).Mixture methods achieve flexibility by means of mixingsimpler distributions, cf. Bahra (1997) and Giacominiet al. (2008). Generalized distribution methods employdistributions with additional parameters, e.g., skewnessand kurtosis, making them more flexible, cf. Rosenberg(1998) and Lim et al. (2005). Maximum entropy methodsoptimize both the deviations from given observations andthe cross-entropy to a prior distribution, cf. Buchen andKelly (1996) and Stutzer (1996).



The latter category comprises the following approaches:Kernel smoothing methods approximate functions locallybased on neighboring observations, cf. Pritsker (1998)and Aıt-Sahalia and Lo (1998). Curve fitting methodsfit prices or implied volatilities using flexible functionalforms such as polynomials, cf. Shimko (1993) and Malz(1996), splines, cf. Campa et al. (1998) and Bliss andPanigirtzoglou (2002), or neural networks, cf. Herrmannand Narr (1997) and Garcia and Gencay (2000).

Another way to categorize the literature is according towhether the techniques are parametric or nonparametric.

Parametric approaches aim to provide tractable modelsusing only a parsimonious number of parameters. Whilefunctional forms facilitate the estimation procedure andallow for easy extrapolation, they typically rely on strongassumptions regarding the data generating process andoften lack the flexibility to fit the observable data.

Nonparametric methods, such as kernel smoothingand maximum entropy, however, are data-driven andthus less restricted. Their main drawback pertains tothe risk of overfitting the observable data, instead ofcapturing the salient features of the underlying functionalrelationship. Overfitting in price or implied volatilityspace will lead to sharp spikes in the corresponding stateprice density. This risk is further amplified on small andsparse data sets. For a lucid discussion of small sample, asopposed to asymptotic, properties in the context of shapeconstraints, see e.g., Aıt-Sahalia and Duarte (2003).

Detailed reviews of the literature can be found inBliss and Panigirtzoglou (2002), Jackwerth (2004) andFiglewski (2010), who concludes that none of the existingtechniques is clearly superior. Expansion methods maygive rise to negative tails, mixtures of lognormals tend tobe unstable and exhibit tails that are too thin, maximumentropy distributions can be multimodal, while kernelmethods suffer from slow convergence, cf. Cont (1997).By contrast, curve fitting methods, especially in impliedvolatility space, have been shown to yield stable densitiesthat exhibit good fits, cf. Bliss and Panigirtzoglou (2002).

Fitting the implied volatility smile with a quadraticpolynomial was originally proposed by Shimko (1993) andhas gained wide acceptance among practitioners. In thecontext of option pricing this approach is also known aspractitioners or ad hoc Black-Scholes.

Modeling Challenges

One of the key challenges in modeling option data arisesfrom the highly irregular data design. At any given time,there are only a limited number of maturities with adiscrete set of strikes available. Options appear in stringsthat are not evenly distributed and advance along thematurity dimension over time. The data is furthermorenoisy which poses a significant challenge in the context

of estimating SPDs via curve fitting, since derivativesexacerbate noise and irregularities, cf. Rebonato (2004).

For small samples, asymptotic properties offer little tono guidance about the actual performance of an estimatorcf. Pritsker (1998). In order to mitigate the problemsarising from finite observations, nonparametric methodsoften resort to aggregating data over time. For example,the kernel smoothing estimator of the call price surfacein Aıt-Sahalia and Lo (1998) is based on one year ofoptions. Fan and Mancini (2009) aggregate data overone week to fit a two-dimensional local linear estimator.While data aggregation alleviates the problems related tosmall samples, it opens the door to nonstationarity andregime shift issues, cf. Aıt-Sahalia and Duarte (2003).Furthermore, a surface estimator based on time-seriesdata will not result in a real average, but an amalgamatethat captures daily fluctuations along its term structure.

Kernel smoothing methods are also highly sensitive tothe chosen bandwidth parameter and furthermore exhibitsevere biases near the support boundary of observations,as well as in interior regions if data spacing is irregular.While local linear methods are more robust, they tend tobe biased in regions of curvature, a phenomenon knownas trimming the hills and fitting the valleys. Localquadratic methods, in turn, are generally able to yieldbetter fits, but re-introduce the erratic behavior near theboundaries, cf. Hastie et al. (2009). This sensitivity tosmall perturbations in the data also afflicts other flexibletechniques based on splines and high-degree polynomials.

Sensible extrapolation beyond available data poses aparticular challenge, since the range of observable strikesis typically not sufficient to recover the tails of the density.While numerous authors, such as Aıt-Sahalia and Duarte(2003) and Fan and Mancini (2009) neglect to addressthis issue, others such as Shimko (1993) and Bliss andPanigirtzoglou (2004) assume implied volatility to remainconstant outside the range of observable strikes. Thisis equivalent to pasting normal tails onto the implieddensity and is not only questionable since asset returnsexhibit fat tails, but also because it gives rise to individualstrings that imply globally inconsistent shapes. Figlewski(2010) proposes a combination of a fourth-degree splineinterpolation and tails from generalized extreme valuedistributions, but only yields two-dimensional estimators.

The vast majority of the literature on SPD extractionis confined to single option series. Due to the special datadesign of options moving towards expiration, the use ofsuch estimators leads to either maturity jumping, or theconstruction of a new estimator by means of interpolatingbetween neighboring strings. While the former precludesan analysis of the SPD at fixed maturities, the latteris prone to be afflicted by calendar arbitrage. As we willshow, the surfaces we obtain, do not only provide a stableobservational grid, but are also considerably more robust.


http://vimeo.com/markusludwig/consistency

http://vimeo.com/markusludwig/consistency


Another critical aspect is smoothness, both in the originalestimator and in the resulting SPD. Rebonato (2004)notes that a smooth price density is important becausethere is a link between the unconditional (marginal) pricedensities obtained from the quoted prices today, and theconditional densities that will prevail in the future. Thesmoothness of the estimator also determines the accuracywith which it can fit the data. The classical bias-variancetrade-off relates to the problem that, while more complexmodels provide better fits to available observations, theyare prone to generalize poorly to previously unseen data.

A final issue that ties back to all the other challengesis arbitrage. Aıt-Sahalia and Duarte (2003) were the firstto consider shape constrained SPD estimation. Arbitrageconstraints for entire surfaces are discussed in Carr andMadan (2005) and Fengler (2012). Roper (2010) remarksthat a call price surface is free of static arbitrage if therecan be no arbitrage opportunities trading in the surface.The following conditions must hold to guarantee that asurface is free of static arbitrage.

General price bounds:

Se−δτ ≥ C ≥ max(0, Se−δτ −Ke−rτ

), (3)

Constraints on strike and butterfly spreads:

− e−rτ ≤ ∂C

∂K≤ 0 and

∂2C

∂K2≥ 0. (4)

To avoid calendar arbitrage, implied total variance mustbe non-decreasing in forward-moneyness m ≡ K/FT .Defining total variance ν2(m,T ) ≡ σ2(m,T )T , we have

ν2(m,T2) > ν2(m,T1) given T2 > T1. (5)

Arbitrage poses an issue, since (i) the data we observeis noisy and often contains recording errors, (ii) methodsthat lack smoothness are prone to yield negative regionsin the corresponding SPD (iii) extrapolating the slopeat Kmin and Kmax may lead to negative tails, and (iv)two-dimensional fits might suffer from calendar spreadarbitrage between maturities.

Against the backdrop of these challenges we developeda novel approach that we present in the following section.

2 Neural Networks

Our approach revolves around the Darwinian principleof random variation and natural selection. In contrast tothe goals of classical optimization, we actively encouragea large variety of different solutions, as this allows us tocheck for properties beyond deviations from given prices.

The method builds on a specific class of neural networks,namely multilayer perceptrons, which perform functionapproximation via superpositions of sigmoid functions.Given observable data pairs {x1, y1}, . . . , {xN , yN} withxi ∈ Rp and yi ∈ R, where N denotes the number ofobservations and p indexes the dimension of the inputspace, function approximation aims to identify a mappingf(x) via a model such as yi = f(xi)+εi, where the error εis assumed to be iid noise. Neural networks achieve theirflexibility through the layered use of primitive functions,each performing a nonlinear transformation of linearlycombined inputs. Given a set of parameters θ = {β,w}and omitting the intercepts β0 and w0j , a network withone nonlinear layer can be written as

fθ(x) =

M∑j=1

βj · h(wTj x

)+ ε, (6)

where M denotes the number of nonlinear expansions,β corresponds to the coefficients in a linear model, andh(z) = 1/(1 + e−z) specifies a family of log-sigmoid basisfunctions parametrized by w. Since the parameters of thebasis functions are learned from the data, such a networkcan be thought of as an adaptive basis function method.

Multilayer perceptrons have been shown to be universalapproximators, i.e., given a sufficient number of hiddennodes M , they can approximate any continuous functionon a compact input domain up to an arbitrary degree ofaccuracy, cf. Hornik et al. (1989).

The parameters are typically estimated by minimizingthe residual sum-of-squares

RSS(θ) =

N∑i=1

(yi − fθ(xi))2 . (7)

Since the basis functions have hidden parameters w, theoptimization has no closed-form solution and needs to besolved by means of iterative numerical methods, typicallygradient descent. Probably the most common algorithmto minimize (7) is Levenberg-Marquardt, which combinesgradient descent with a Gauss-Newton algorithm in thevicinity of a minimum. It achieves very fast convergence,since it does not require the computation of second orderderivatives, cf. Hagan et al. (1996). However, like othergradient-based methods it can’t guarantee convergenceto a global optimum. Furthermore, in neural networks,permutations of the parameter values can yield the samefunctional input output mapping, cf. Chen et al. (1993).Due to these symmetries in the loss function, the numberof local minima is high. This causes solutions to be verysensitive to the initial starting values of the optimization.



Model Complexity and Generalization

At first glance, the existence of local minima may seem asa serious drawback. However, as for all flexible nonlinearmethods, the question of interest is not to which degreeof accuracy we can match the training data, but whetheror not the resulting estimator is predictive for novel data.

The practical usefulness of an estimator depends on itsability to successfully generalize beyond observable data.The real challenge is thus to find a model that is flexibleenough to capture the relationships implicit in a set offinite and noisy observations, without memorizing them.In statistical learning theory this problem is known as thebias-variance trade-off. Overly simplistic models, whilerobust to variations in the data, typically exhibit a lackof fit due to a high bias. Overly complex models on theother hand will start to fit the idiosyncratic noise in thedata. This variance phenomenon is known as overfitting.

The optimal degrees of freedom can however not solelybe determined from the error on the training data, sinceit generally decays with model complexity. One methodto estimate generalization performance is to partition theavailable data into a training and a validation set. Theerror on the validation data typically only decays up toa point with increasing complexity, and then rises again.

In neural networks, model complexity can be controlledboth through the choice of M and through regularization.Regularization modifies the loss function such that largeweights w are being penalized. If they are close to zero,the operative part of the sigmoid is roughly linear, andthe network collapses into an approximately linear model.

A similar effect can be achieved through early stopping.Since the parameters are typically initialized at randomstarting values near zero, the model starts out nearlylinear, cf. Nguyen and Widrow (1990). During trainingthe weights are updated to introduce nonlinearities whereneeded. Stopping the optimization after a few iterationskeeps them close to their highly regularized initial values.

This in turn raises the issue of how to determine theoptimal penalty function, or number of training epochs.Since regularization methods express our belief that thefunction we are looking for exhibits some kind of smoothbehavior, they can be cast in a Bayesian framework.MacKay (1992a,b) presents a method that determines theoptimal penalty parameters for neural network trainingin an automated fashion based on Bayes’ theorem. Oneof the biggest advantages of this approach is that it needsno validation set, i.e., all the available training data canbe used for parameter estimation and model comparison.

Despite their parametric functional form, the effectivecomplexity of neural networks is thus data-driven. Theycan be viewed as parametric models with nonparametricinterpretation. While they provide the same flexibility aslocal methods they have superior small sample propertiesand provide infinitely differentiable closed-form solutions.

Random Variation and Model Selection

Given the bias-variance trade-off, selecting the optimalmodel complexity is an issue of prime importance, beit the degree of a polynomial estimator, the bandwidthparameter for a local method or the number of knotpointsfor splines. Going back to Malliaris and Salchenberger(1993) and Hutchinson et al. (1994), research on modelingoption prices with neural networks uses cross-validation,often employing the first half of a year to train severalarchitectures, the third quarter for validation, and thefourth to test the predictive quality of the chosen model,see e.g., Garcia and Gencay (2000) or Dugas et al. (2009).

The number of iterations for parameter estimation istypically determined by the convergence of the objectivefunction, with a training process consisting of hundreds ifnot thousands of steps. In order to mitigate the influenceof local minima, the final estimator is usually an averageover several runs, initialized at different starting values.

For our approach, we adapt the perspective of heuristicoptimization and perform both parameter estimation andmodel selection by perturbing random solutions, and thenevaluating their properties. Initializing a neural networkwith small random weights corresponds to a Monte Carlosearch of the parameter space. The local neighborhoodsof these initial solutions are then explored by runningLevenberg-Marquardt with Bayesian regularization for afew steps, stopping the process long before convergence.

The resulting population of estimators is then checkedfor the no-arbitrage conditions discussed in the previoussection. Due to the considerable amount of extrapolationinvolved in obtaining surfaces that extend into the tailsof the SPD, these checks allow us to eliminate solutionsthat exhibit inferior generalization performance, withoutresorting to cross-validation. Within the subset of validnetworks, solutions can then be chosen based on theRSS.

In order to offset the reduced flexibility of this highlyregularized arrangement, we use significantly more basisfunctions than comparable approaches. We furthermorestochastically perturb M , which allows us to incorporatemodel selection, and further increases variability amongthe solutions. While employing the same building blocksas traditional neural network modeling, we essentiallysample smooth manifolds and harness their diversity toensure that the final estimator conforms to the desiredconstraints. Our approach is thus more akin to stochasticsearch than gradient-based optimization.

The resulting estimators are both highly adaptive androbust. They provide excellent results over a large varietyof market conditions, and do not sensitively depend onthe number of iterations, M , or the quantity of availabletraining data. The fact that we are able to perform bothparameter estimation and model selection using only asparse cross-section of current observations, allows us toyield better fits than methods relying on aggregate data.



3 Empirical Analysis

This section demonstrates both the pricing performanceand robustness of our shape constrained network (SCN)approach by fitting arbitrage-free implied volatility, priceand state price density surfaces over a fixed domain, withm ∈ [0.5, 1.5] and τ ∈ [20, 365]. We contrast the resultswith two versions of the widely used ad hoc Black-Scholesmodel, which, due to its simple functional form, can alsobe fitted on small samples.

Data

We use daily closing prices of out-of-the-money (OTM)call and put options on the S&P 500 for each Wednesdaybetween January 5, 2000 and December 28, 2011. Thechoice of only working with OTM options is motivated bythe fact that they are more liquid. In case a particularWednesday was a holiday, we use the preceding tradingday. Option data and interest rates were obtained fromOptionMetrics. We take the mean of bid and ask pricesas option prices and discard observations below $0.50 oroutside the-moneyness-maturity domain of our surfaces.We also exclude options that violate general price boundsor strike arbitrage constraints.

While the SCN is robust enough to yield arbitrage-freeestimators even when trained on contaminated data, thechecks provide a principled approach to clean numerousrecording errors. They also prevent an undue distortionof the error statistics, that would occur when measuringthe ability of an arbitrage-free model to fit tainted data.

Last but not least, they increase the overall quality ofthe data, for which we assume the put-call parity to hold

C +Ke−rτ = P + Fe−rτ , (8)

where F denotes the forward, and P is the price of a putoption with the same strike price and time to maturity.Following Aıt-Sahalia and Lo (1998) we use (8) to derivethe implied forward from close to at-the-money (ATM)call and put pairs. Given the implied forward, we cantranslate OTM puts into in-the-money (ITM) calls, andback out the unobservable implied dividend yield via thespot-forward parity

F = Se(r−δ)τ . (9)

Table 1 summarizes the resulting data set, which containsa total of 121’510 call options, along the dimensions ofmoneyness and time to maturity.

Benchmark Models

Going back to Dumas et al. (1998), ad hoc Black-Scholesmodels have been documented to be a tough benchmark.

The most common specification models implied volatilityalong moneyness and maturity via quadratic polynomials

σ = β0 + β1m+ β2m2 + β3τ + β4τ

2 + β5mτ. (10)

Ad hoc models (AHG) have been shown to perform betterthan the stochastic volatility model proposed by Heston(1993), and even two-factor extensions, cf. Christoffersenand Jacobs (2004) and Christoffersen et al. (2009). In thecontext of extracting option implied densities, the use ofquadratic forms to fit implied volatilities along strikes,has been proposed by Shimko (1993)

σ = β0 + β1m+ β2m2. (11)

The string-wise specification (AHS) poses a challengingbenchmark due to its additional flexibility. We fit (11) toall observable maturities and then linearly interpolate thecoefficients to recover the entire implied volatility surfacefor the out-of-sample test.

Shape Constrained Networks

After having introduced our approach to neural networktraining, we will now focus on the implementation. Wemodel implied total variance as a function of moneynessand maturity, which allows us to directly check theresulting solutions for calendar arbitrage

ν = β0 +

M∑j=1

βj · h(w0j + w1jm+ w2j

√τ). (12)

During modeling, up to 500 different solutions are createdand examined for arbitrage. We check (3) and (4) in theirrespective spaces, use Black-Scholes to map from impliedvolatility to price space, and obtain the derivatives vianumerical differentiation.1 The networks vary both withrespect to their initial parameter values, which we choosefollowing Nguyen and Widrow (1990), and the number ofnonlinear basis expansions: M = B + bsc, s ∼ N (0, 2).

We set B = 20 and train each network for 10 iterationsusing the method proposed by Foresee and Hagan (1997).This procedure stops either once 25 valid solutions havebeen obtained, or the global maximum has been reached.The final estimator is then a simple average over the threesolutions with the lowest in-sample errors. Since we workwith small samples and limit training to a few steps, wecan evaluate hundreds of solutions in a matter of seconds.

1Since our method yields an analytic solution, a direct mappingfrom implied volatilities to state price densities would also bepossible, see e.g., Jackwerth (2000).


http://vimeo.com/markusludwig/training

http://vimeo.com/markusludwig/training


Table 1. Sample characteristics

DITM ITM ATM OTM DOTMK/F < 0.90 0.90 − 0.98 0.99 − 1.01 1.02 − 1.10 > 1.10

Short-term options < 60 daysIV (%) 36.19 22.11 17.79 17.63 26.84Call Price 208.97 75.34 25.46 8.38 3.32# Observations 7562 8935 3847 7554 1484

Medium-term options 60 − 180 daysIV (%) 33.34 22.72 19.62 18.48 20.88Call Price 270.10 93.72 45.61 20.84 5.73# Observations 17204 9739 3871 9637 7131

Long-term options > 180 daysIV (%) 29.42 22.14 20.57 19.21 19.26Call Price 315.48 124.12 79.09 47.98 13.71# Observations 17179 6959 2382 6654 11372

Notes. For each Wednesday between January 5, 2000 and December 28, 2011 we combine out-of-the-money calls and puts on the S&P 500index to create a data set of call options with moneyness m ∈ [0.5, 1.5] and maturity τ ∈ [20, 365].

Since options close to ATM are more liquid, we weight thetraining errors with ω = N (m|1, 0.2) + N (m|1, 0.1). Inorder to provide some guidance at the boundaries of themodel domain, we keep the first string with τ > 365 days,and repeat the first string with τ ≥ 20 from m ∈ [0.9, 1.1]at τ = 10 days. Furthermore, we augment the trainingdata with artificial observations at the average ATM IVfrom m ∈ [1.2, 1.5] and τ ∈ [10, 20] days. This procedure,which we refer to as anchoring, curbs calendar arbitrage.

Figure 1. Error weighting and augmentation of training data

Notes. The schematic shows the model domain (black rectangle),Gaussian weighting of errors, string augmentation, and anchoring.

As elucidated in Figure 1, both string augmentation andanchoring take place outside of our model domain. Whilethey are not an integral part of the SCN approach, theyincrease the likelihood of obtaining valid solutions. Thisis especially the case for anchoring, which provides bothguidance in the short term OTM region and counteractsthe notoriously steep IV slopes caused by discrete quotes.

The figure also illustrates that the moneyness range ofobservable strings varies significantly between maturities.

Results

Table 2 shows the results for both the benchmark modelsand our SCN. For the out-of-sample test we compare theimplied volatilities observed at t+ 7 with σt(mt+7, τt+7).

As expected, the global specification AHG exhibits thehighest in-sample errors, followed by the more flexibleAHS, and our shape constrained networks. Interestingly,the ordering changes for the out-of-sample analysis, withthe AHS falling behind the AHG, hinting at overfitting.Despite having the highest degree of freedom, our modelalso exhibits the smallest errors in the out-of-sample test.Over the entire period, we have to initialize 35 networkson average, to obtain a set of 25 arbitrage-free solutions.

Table 2. Model comparison

Model SCN AHS AHG

2000 − 2003IV Bias −0.0030 - -IV RMSE 0.1661 0.3420 1.0561Out-of-Sample IV R2 0.9374 0.7876 0.9043



Notes. In-sample, we report the bias and the root-mean-squareerror (RMSE). Out-of-sample, the coefficient of determination R2.



Figure 2. Model comparison

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012600

800

1000

1200

1400

1600

S&

P 5

00

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 20120

0.5

1

1.5

2

2.5

3

IV R

MS

E

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 20120

0.2

0.4

0.6

0.8

1

Ou

t−o

f−S

am

ple

IV

R2

Notes. Index level (light) and evolution of daily errors for the AHG (grey), AHS (dotted), and SCN model (dark). The in-sample plotscover each Wednesday between January 5, 2000 and December 28, 2011, the out-of-sample plots span January 12, 2000 to January 4, 2012.

Figure 2 contrasts the evolution of errors with that of theS&P 500. It is evident that our SCN is almost completelyunfazed by crisis periods. We can also see that the AHSmodel generalizes extremely poorly and is highly volatile

out-of-sample. Table 3 provides a more detailed look atthe behavior of our method. The low ATM errors showthe effect of error weighting, the short-term DOTM levelsstem from shape constraints impeding calendar arbitrage.

Table 3. SCN performance

DITM ITM ATM OTM DOTMK/F < 0.90 0.90 − 0.98 0.99 − 1.01 1.02 − 1.10 > 1.10

Short-term options < 60 daysIV Bias 0.0132 0.0333 −0.0504 −0.0984 −0.2582IV RMSE 0.3433 0.1741 0.1826 0.2990 0.7046Out-of-Sample IV R2 0.9294 0.8978 0.9089 0.8755 0.7729

Medium-term options 60 − 180 daysIV Bias −0.0022 0.0132 0.0067 0.0019 0.0122IV RMSE 0.1981 0.0865 0.0794 0.1074 0.1895Out-of-Sample IV R2 0.9537 0.9241 0.9286 0.9113 0.8708

Long-term options > 180 daysIV Bias 0.0010 −0.0003 −0.0052 −0.0005 −0.0007IV RMSE 0.1049 0.0462 0.0427 0.0520 0.0977Out-of-Sample IV R2 0.9704 0.9558 0.9530 0.9524 0.9294

Notes. In-sample and out-of-sample errors for the SCN, partitioned along moneyness and maturity (DITM stands for deep-in-the-money).



Figures 3 and 4 illustrate the quality of the SCN surfaces.The implied volatility surface provides both an excellentfit to the given data and beautifully extrapolates beyond.

Figure 3. Implied volatility surface

0

100

200

300

400

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5

10

20

30

40

50

60

70

80

τ

K/F

IV

Notes. SCN implied volatility surface for September 5, 2007.

Figure 3 also shows how anchoring effectively modulatesthe extrapolation of the DOTM wing for short maturities.

Figure 4. State price density surface

0

100

200

300

400

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5

0

1

2

3

4

5

6

7

x 10−3

τ

K/F

SP

D

Notes. SCN state price density surface for September 5, 2007.

Despite the impeccable smoothness of the correspondingSPD surface, the SCN is adaptive enough to fit the convexas well as the concave IV regions on September 5, 2007.In order to fully appreciate these results, they have to beconsidered in the context of the current literature.

Most parametric models, including specialized forms,like the one proposed by Gatheral and Jacquier (2012),assume IV to be convex in moneyness, and can thus notachieve a comparable fit. Local polynomial models wouldnot be capable to achieve the same smoothness, given thelimited amount data. They would have to either use verylarge bandwidth parameters or work with aggregate data,both of which severely reduces the capacity to accuratelyreproduce current observations. Another issue with localmethods is extrapolation, which is crucial to obtain tails.

Figures 5 and 6 show cross-sectional cuts of SCN surfaces.

Figure 5. Implied volatility cross-section

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.510

15

20

25

30

35

40

45

50

K/F

IV

Notes. Implied volatility observations (red) and corresponding fit.September 5, 2007 – 73 days to maturity.

Figure 6. State price density cross-section

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50

0.5

1

1.5

2

2.5

3

3.5

4x 10

−3

K/F

SP

D

Notes. SPD sensitivity bounds for the SCN, obtained by removingone observation at a time. September 5, 2007 – 73 days to maturity.

The methods proposed in Benko et al. (2007), Glaser andHeider (2010), and Fengler and Hin (2011) are limitedto the range of observable strikes and do typically notyield true probability densities. Rescaling to unity hasquestionable implications regarding the null space of theresulting density and affects both the expected value andhigher order moments. Dennis and Mayhew (2002) showthat asymmetries in the domain of integration also distortthe implied skewness put forward in Bakshi et al. (2003).

Figure 6 highlights another key advantage of our globalestimator, namely its robustness regarding variations inthe training data. The second derivatives of the jackknifeestimates (orange) are virtually identical to our solution.


http://vimeo.com/markusludwig/density




Figure 7. SPD evolution

2000−2003

2004−2007

2008−2011

Notes. This figure shows the evolution of surface cross-sections for a fixed maturity of 30 days. We can see that despite the excellent fit tothe observable data, our SCN produces perfectly well-behaved SPDs with converging tails over a wide range of different market conditions.

This result is only attainable because adjacent maturitiesprovide guidance regarding the global shape.Figure 7 illustrates the evolution of SPD cross-sectionsover time and shows that our method consistently yieldswell-behaved implied densities. The figure also hints atthe potential to analyze changes in market sentiment on astable observational grid along moneyness and maturity.

Over the 12 years we consider in this work, the quantityand the quality of the observable data varies considerably.The robustness of the SCN surfaces is a testament to thecapability of our population-based approach. The sizablechanges in the market environment also demonstrate thatthe quality of our results does neither sensitively dependon the choice of M , nor the number of training iterations.

4 Conclusion

The exigency for accurate state price density surfaces hasrecently been highlighted by Ross (2013). In this paper,we propose a novel neural network-based approach toapproximate arbitrage-free implied volatility, price andstate price density surfaces from a sparse cross-section ofobservations. We demonstrate that our method is robustenough to carry out both model selection and parameterestimation using daily data alone, and obtain excellent

in-sample and out-of-sample fits over a period of 12 years.The corresponding state price density surfaces provide acomprehensive snapshot of the current market sentiment.Unlike maturity-wise estimators they enable us to tracethe evolution of expectations and risk perceptions along acontinuum of future spot trajectories and time horizons.

A natural extension would be to research whether ornot the superior quality of our estimators translates intonew insights regarding the information content of implieddensities. We will investigate this question in the future.

References

Aıt-Sahalia, Y. and Duarte, J. (2003). Nonparametricoption pricing under shape restrictions. Journal ofEconometrics, 116(1-2):9–47.

Aıt-Sahalia, Y. and Lo, A. (1998). Nonparametric estimationof state-price densities implicit in financial asset prices.Journal of Finance, 53(2):499–547.

Aıt-Sahalia, Y. and Lo, A. (2000). Nonparametric riskmanagement and implied risk aversion. Journal ofEconometrics, 94(1-2):9–51.

Bahra, B. (1997). Implied risk-neutral probability densityfunctions from option prices: Theory and application.Bank of England Working Paper, (66).


http://vimeo.com/markusludwig/evolution






Bakshi, G., Kapadia, N., and Madan, D. (2003). Stock returncharacteristics, skew laws, and the differential pricing ofindividual equity options. Review of Financial Studies,16(1):101–143.

Banz, R. and Miller, M. (1978). Prices for state-contingentclaims: Some estimates and applications. Journal ofBusiness, 51(4):653–72.

Benko, M., Fengler, M., Hardle, W., and Kopa, M.(2007). On extracting information implied in options.Computational Statistics, 22(4):543–553.

Birru, J. and Figlewski, S. (2012). Anatomy of a meltdown:The risk neutral density for the S&P 500 in the fall of2008. Journal of Financial Markets.

Black, F. and Scholes, M. (1973). The pricing of optionsand corporate liabilities. Journal of Political Economy,81(3):637–654.

Bliss, R. and Panigirtzoglou, N. (2002). Testing the stabilityof implied probability density functions. Journal ofBanking & Finance, 26(2-3):381–422.

Bliss, R. and Panigirtzoglou, N. (2004). Option-implied riskaversion estimates. Journal of Finance, 59(1):407–446.

Breeden, D. and Litzenberger, R. (1978). Prices of state-contingent claims implicit in option prices’. Journal ofBusiness, 51(4):621–651.

Buchen, P. and Kelly, M. (1996). The maximumentropy distribution of an asset inferred from optionprices. Journal of Financial and Quantitative Analysis,31(01):143–159.

Campa, J., Chang, P., and Reider, R. (1998). Impliedexchange rate distributions: Evidence from OTC optionmarkets. Journal of International Money and Finance,17(1):117–160.

Carr, P. and Madan, D. (2005). A note on sufficient conditionsfor no arbitrage. Finance Research Letters, 2(3):125–130.

Chen, A., Lu, H., and Hecht-Nielsen, R. (1993). On thegeometry of feedforward neural network error surfaces.Neural Computation, 5(6):910–927.

Christoffersen, P., Heston, S., and Jacobs, K. (2009). Theshape and term structure of the index option smirk:Why multifactor stochastic volatility models work sowell. Management Science, 55(12):1914.

Christoffersen, P. and Jacobs, K. (2004). The importanceof the loss function in option valuation. Journal ofFinancial Economics, 72(2):291–318.

Conrad, J., Dittmar, R., and Ghysels, E. (2013). Exante skewness and expected stock returns. Journal ofFinance, 68(1).

Cont, R. (1997). Beyond implied volatility: Extractinginformation from options prices. Econophysics.

Cox, J. and Ross, S. (1976). The valuation of options foralternative stochastic processes. Journal of FinancialEconomics, 3(1-2):145–166.

Dennis, P. and Mayhew, S. (2002). Risk-neutral skewness:Evidence from stock options. Journal of Financial andQuantitative Analysis, 37(03):471–493.

Dugas, C., Bengio, Y., Belisle, F., Nadeau, C., and Garcia,R. (2009). Incorporating functional knowledge in neuralnetworks. Journal of Machine Learning Research,10:1239–1262.

Dumas, B., Fleming, J., and Whaley, R. (1998). Impliedvolatility functions: Empirical tests. Journal ofFinance, 53(6):2059–2106.

Fan, J. and Mancini, L. (2009). Option pricing withmodel-guided nonparametric methods. Journal of theAmerican Statistical Association, 104(488):1351–1372.

Fengler, M. and Hin, L. (2011). Semi-nonparametricestimation of the call price surface under no-arbitrageconstraints. St. Gallen Working Paper Series.

Fengler, M. R. (2012). Option data and modeling BSMimplied volatility. Handbook of Computational Finance,pages 117–142.

Figlewski, S. (2010). Estimating the implied risk neutraldensity for the US market portfolio. Volatility and TimeSeries Econometrics, 1(9):323–354.

Foresee, F. and Hagan, M. (1997). Gauss-newtonapproximation to bayesian learning. In Proceedingsof the 1997 International Joint Conference on NeuralNetworks, volume 3, pages 1930–1935. IEEE.

Garcia, R. and Gencay, R. (2000). Pricing and hedgingderivative securities with neural networks and ahomogeneity hint. Journal of Econometrics, 94(1-2):93–115.

Gatheral, J. and Jacquier, A. (2012). Arbitrage-free SVIvolatility surfaces.

Giacomini, R., Gottschling, A., Haefke, C., and White, H.(2008). Mixtures of t-distributions for finance andforecasting. Journal of Econometrics, 144(1):175–192.

Glaser, J. and Heider, P. (2010). Arbitrage-free approximationof call price surfaces and input data risk. QuantitativeFinance, (1):1–13.

Hagan, M., Demuth, H., Beale, M., et al. (1996). NeuralNetwork Design. PWS Boston, MA.

Harrison, J. and Kreps, D. (1979). Martingales and arbitragein multiperiod securities markets. Journal of EconomicTheory, 20(3):381–408.

Hastie, T., Tibshirani, R., and Friedman, J. (2009).The Elements of Statistical Learning: Data Mining,Inference, and Prediction. Springer-Verlag New York.

Herrmann, R. and Narr, A. (1997). Neural networks and thevaluation of derivatives-some insights into the impliedpricing mechanism of German stock index options.Working Paper Department of Finance and Banking,University of Karlsruhe, (202).

Heston, S. (1993). A closed-form solution for optionswith stochastic volatility with applications to bondand currency options. Review of Financial Studies,6(2):327–43.

Hornik, K., Stinchcombe, M., and White, H. (1989). Multi-layer feedforward networks are universal approximators.Neural Networks, 2(5):359–366.



Hutchinson, J., Lo, A., and Poggio, T. (1994). A nonparamet-ric approach to pricing and hedging derivative securitiesvia learning networks. Journal of Finance, 49(3):851–889.

Jackwerth, J. (2000). Recovering risk aversion from optionprices and realized returns. Review of Financial Studies,13(2):433–451.

Jackwerth, J. (2004). Option-implied risk-neutral distri-butions and risk aversion. Charlotteville: ResearchFoundation of AIMR.

Jarrow, R. and Rudd, A. (1982). Approximate optionvaluation for arbitrary stochastic processes. Journal ofFinancial Economics, 10(3):347–369.

Kostakis, A., Panigirtzoglou, N., and Skiadopoulos, G. (2011).Market timing with option-implied distributions: Aforward-looking approach. Management Science,57(7):1231–1249.

Lim, G., Martin, G., and Martin, V. (2005). Parametricpricing of higher order moments in S&P 500 options.Journal of Applied Econometrics, 20(3):377–404.

MacKay, D. (1992a). Bayesian interpolation. NeuralComputation, 4(3):415–447.

MacKay, D. (1992b). A practical bayesian frameworkfor backpropagation networks. Neural Computation,4(3):448–472.

Malliaris, M. and Salchenberger, L. (1993). A neural networkmodel for estimating option prices. Applied Intelligence,3(3):193–206.

Malz, A. (1996). Using option prices to estimate realignmentprobabilities in the european monetary system: Thecase of sterling-mark. Journal of International Moneyand Finance, 15(5):717–748.

Merton, R. (1973). Theory of rational option pricing.Bell Journal of Economics and Management Science,4(1):141–183.

Nguyen, D. and Widrow, B. (1990). Improving the learningspeed of 2-layer neural networks by choosing initialvalues of the adaptive weights. In Neural Networks,1990., 1990 IJCNN International Joint Conference on,pages 21–26. IEEE.

Pritsker, M. (1998). Nonparametric density estimation andtests of continuous time interest rate models. Review ofFinancial Studies, 11(3):449–87.

Rebonato, R. (2004). Volatility and Correlation: The PerfectHedger and the Fox. Wiley.

Roper, M. (2010). Arbitrage free implied volatility surfaces.

Rosenberg, J. (1998). Pricing multivariate contingent claimsusing estimated risk-neutral density functions. Journalof International Money and Finance, 17(2):229–247.

Ross, S. (1976). Options and efficiency. The Quarterly Journalof Economics, 90(1):75.

Ross, S. (2013). The recovery theorem. Journal of Finance(forthcoming).

Rubinstein, M. (1998). Edgeworth binomial trees. TheJournal of Derivatives, 5(3):20–27.

Shimko, D. (1993). Bounds of probability. Risk, 6(4):33–37.

Stutzer, M. (1996). A simple nonparametric approach toderivative security valuation. Journal of Finance,51(5):1633–1652.

Xing, Y., Zhang, X., and Zhao, R. (2010). What does theindividual option volatility smirk tell us about futureequity returns? Journal of Financial and QuantitativeAnalysis, 45(3):641.


robust estimation of shape constrained state price density ...faculty.baruch.cuny.edu › lwu ›...

Documents