Tuesday, December 17, 2013

Levenberg Marquardt & Constraints by Domain Transformation

The Fortran minpack library has a good Levenberg-Marquardt minimizer, so good, that it has been ported to many programming languages. Unfortunately it does not support contraints, even simple bounds.

One way to achieve this is to transform the domain via a bijective function. For example, \(a+\frac{b-a}{1+e^{-\alpha t}}\) will transform \(]-\infty, +\infty[\) to ]a,b[. Then how should one choose \(\alpha\)?

A large \(\alpha\) will make tiny changes in \(t\) appear large. A simple rule is to ensure that \(t\) does not create large changes in the original range ]a,b[, for example we can make \(\alpha t \leq 1\), that is \( \alpha t= \frac{t-a}{b-a} \).


In practice, for example in the calibration of the Double-Heston model on real data, a naive \( \alpha=1 \) will converge to a RMSE of 0.79%, while our choice will converge to a RMSE of 0.50%. Both will however converge to the same solution if the initial guess is close enough to the real solution. Without any transform, the RMSE is also 0.50%. The difference in error might not seem large but this results in vastly different calibrated parameters. Introducing the transform can significantly change the calibration result, if not done carefully.

Another simpler way would be to just impose a cap/floor on the inputs, thus ensuring that nothing is evaluated where it does not make sense. In practice, it however will not always converge as well as the unconstrained problem: the gradient is not defined at the boundary. On the same data, the Schobel-Zhu, unconstrained converges with RMSE 1.08% while the transform converges to 1.22% and the cap/floor to 1.26%. The Schobel-Zhu example is more surprising since the initial guess, as well as the results are not so far:

Initial volatility (v0) 18.1961174789
Long run volatility (theta) 1
Speed of mean reversion (kappa) 101.2291161766
Vol of vol (sigma) 35.2221829015
Correlation (rho) -73.7995231799
ERROR MEASURE 1.0614889526


Initial volatility (v0) 17.1295934569
Long run volatility (theta) 1
Speed of mean reversion (kappa) 67.9818356373
Vol of vol (sigma) 30.8491256097
Correlation (rho) -74.614636128
ERROR MEASURE 1.2256421987

The initial guess is kappa=61% theta=11% sigma=26% v0=18% rho=-70%. Only the kappa is different in the two results, and the range on the kappa is (0,2000) (it is expressed in %), much larger than the result. In reality, theta is the issue (in (0,1000)). Forbidding a negative theta has an impact on how kappa is picked. The only way to be closer

Finally, a third way is to rely on a simple penalty: returning an arbitrary large number away from the boundary. In our examples this was no better than the transform or the cap/floor.

Trying out the various ways, it seemed that allowing meaningless parameters, as long as they work mathematically produced the best results with Levenberg-Marquardt, particularly, allowing for a negative theta in Schobel-Zhu made a difference.


Levenberg Marquardt & Constraints by Domain Transformation

The Fortran minpack library has a good Levenberg-Marquardt minimizer, so good, that it has been ported to many programming languages. Unfortunately it does not support contraints, even simple bounds.

One way to achieve this is to transform the domain via a bijective function. For example, \(a+\frac{b-a}{1+e^{-\alpha t}}\) will transform \(]-\infty, +\infty[\) to ]a,b[. Then how should one choose \(\alpha\)?

A large \(\alpha\) will make tiny changes in \(t\) appear large. A simple rule is to ensure that \(t\) does not create large changes in the original range ]a,b[, for example we can make \(\alpha t \leq 1\), that is \( \alpha t= \frac{t-a}{b-a} \).


In practice, for example in the calibration of the Double-Heston model on real data, a naive \( \alpha=1 \) will converge to a RMSE of 0.79%, while our choice will converge to a RMSE of 0.50%. Both will however converge to the same solution if the initial guess is close enough to the real solution. Without any transform, the RMSE is also 0.50%. The difference in error might not seem large but this results in vastly different calibrated parameters. Introducing the transform can significantly change the calibration result, if not done carefully.

Another simpler way would be to just impose a cap/floor on the inputs, thus ensuring that nothing is evaluated where it does not make sense. In practice, it however will not always converge as well as the unconstrained problem: the gradient is not defined at the boundary. On the same data, the Schobel-Zhu, unconstrained converges with RMSE 1.08% while the transform converges to 1.22% and the cap/floor to 1.26%. The Schobel-Zhu example is more surprising since the initial guess, as well as the results are not so far:

Initial volatility (v0) 18.1961174789
Long run volatility (theta) 1
Speed of mean reversion (kappa) 101.2291161766
Vol of vol (sigma) 35.2221829015
Correlation (rho) -73.7995231799
ERROR MEASURE 1.0614889526


Initial volatility (v0) 17.1295934569
Long run volatility (theta) 1
Speed of mean reversion (kappa) 67.9818356373
Vol of vol (sigma) 30.8491256097
Correlation (rho) -74.614636128
ERROR MEASURE 1.2256421987

The initial guess is kappa=61% theta=11% sigma=26% v0=18% rho=-70%. Only the kappa is different in the two results, and the range on the kappa is (0,2000) (it is expressed in %), much larger than the result. In reality, theta is the issue (in (0,1000)). Forbidding a negative theta has an impact on how kappa is picked. The only way to be closer

Finally, a third way is to rely on a simple penalty: returning an arbitrary large number away from the boundary. In our examples this was no better than the transform or the cap/floor.

Trying out the various ways, it seemed that allowing meaningless parameters, as long as they work mathematically produced the best results with Levenberg-Marquardt, particularly, allowing for a negative theta in Schobel-Zhu made a difference.


Saturday, December 14, 2013

Arbitrage Free SABR - Another View on Hagan Approach

Several months ago, I took a look at two interesting recent ways to price under SABR with no arbitrage:
  • One way is due to Andreasen and Huge, where they find an equivalent local volatility expansion, and then use a one-step finite difference technique to price.
  • The other way is due to Hagan himself, where he numerically solves an approximate PDE in the probability density, and then price with options by integrating on this density.
It turns out that the two ways are much closer than I first thought. Hagan PDE in the probability density is actually just the Fokker-Planck (forward) equation.
The \(\alpha D(F)\) is just the equivalent local volatility. Andreasen and Huge use nearly the same local volatility formula but without the exponential part (that is often negligible except for long maturities), directly in Dupire forward PDE:
A common derivation (for example in Gatheral book) of the Dupire forward PDE is to actually use the Fokker-Planck equation in the probability density integral formula. Out of curiosity, I tried to price direcly with Dupire forward PDE and the Hagan local volatility formula, using just linear boundary conditions. Here are the results on Hagan own example:



The Local Vol direct approach overlaps the Density approach nearly exactly, except at the high strike boundary, when it comes to probability density measure or to implied volatility smile. On Andreasen and Huge data, it gives the following:


One can see that the one step method approximation gives the overall same shape of smile, but shifted, while the PDE, in local vol or density matches the Hagan formula at the money.

Hagan managed to derive a slightly more precise local volatility by going through the probability density route, and his paper formalizes his model in a clearer way: the probability density accumulates at the boundaries. But in practice, this formalism does not seem to matter. The forward Dupire way is more direct and slightly faster. This later way also allows to use alternative boundaries, like Andreasen-Huge did.

Update March 2014 - I have now a paper around this "Finite Difference Techniques for Arbitrage Free SABR"

Arbitrage Free SABR - Another View on Hagan Approach

Several months ago, I took a look at two interesting recent ways to price under SABR with no arbitrage:
  • One way is due to Andreasen and Huge, where they find an equivalent local volatility expansion, and then use a one-step finite difference technique to price.
  • The other way is due to Hagan himself, where he numerically solves an approximate PDE in the probability density, and then price with options by integrating on this density.
It turns out that the two ways are much closer than I first thought. Hagan PDE in the probability density is actually just the Fokker-Planck (forward) equation.
The \(\alpha D(F)\) is just the equivalent local volatility. Andreasen and Huge use nearly the same local volatility formula but without the exponential part (that is often negligible except for long maturities), directly in Dupire forward PDE:
A common derivation (for example in Gatheral book) of the Dupire forward PDE is to actually use the Fokker-Planck equation in the probability density integral formula. Out of curiosity, I tried to price direcly with Dupire forward PDE and the Hagan local volatility formula, using just linear boundary conditions. Here are the results on Hagan own example:



The Local Vol direct approach overlaps the Density approach nearly exactly, except at the high strike boundary, when it comes to probability density measure or to implied volatility smile. On Andreasen and Huge data, it gives the following:


One can see that the one step method approximation gives the overall same shape of smile, but shifted, while the PDE, in local vol or density matches the Hagan formula at the money.

Hagan managed to derive a slightly more precise local volatility by going through the probability density route, and his paper formalizes his model in a clearer way: the probability density accumulates at the boundaries. But in practice, this formalism does not seem to matter. The forward Dupire way is more direct and slightly faster. This later way also allows to use alternative boundaries, like Andreasen-Huge did.

Update March 2014 - I have now a paper around this "Finite Difference Techniques for Arbitrage Free SABR"