Recently Eli Dourado introduced the bitcoin volatility index or btcvol, a measure of bitcoin’s volatility. Eli estimates the volatility of the return to holding bitcoin at a given date by computing the standard deviation of log returns from the past 30 days. This is fine so far as it goes – as a quick and dirty measure – but I think we can do better. The ideal best, as Eli admits, would be the implied volatility from the derivatives market. The problem with this measure is that the bitcoin derivatives market is very immature, though bitcoin derivatives do exist. Short of implied volatility, we can still use a formal method to estimate historical volatility and potentially predict future volatility. One fairly straightforward method of doing this is through estimating a stochastic volatility model. Below the jump, I’ll describe the model, show how to quickly fit it in R, and display some nifty graphs as a result of estimating the model. If you’re just looking for the estimated or predicted volatilities from the model, for the R script that I used to actually fit the model, or for information on the stochvol R package I used in that script, scroll to the end of the post.

**Stochastic volatility**

A common phenomenon in economic time series is something called volatility clustering. Consider the price of your favorite stock. Often the price of this stock will hardly move at all for days at a time, then after some major announcement the stock jumps around a lot before settling back in to a more stable pattern. In other words, the size of the price change today tends to predict the size of the price change tomorrow, but the *sign* of today’s change doesn’t necessarily tell you much about the sign of tomorrow’s change. Stochastic volatility models are one of several classes of models which try to capture this sort of behavior.

Suppose we have some time series $\{y_t\}$. Then the stochastic volatility model assumes that each $y_t$ is drawn from a mean zero normal distribution with its own variance. Conditional on these variances, the model assumes that the draws are independent. In other words

$$

y_t|\sigma^2_{1:T} \stackrel{ind}{\sim} N(0,\sigma^2_t).

$$

Next the model assumes some sort of evolution process for $\sigma^2_t$. Since the $\sigma^2_t$’s are variances, this is called a volatility process. Theoretically there’s a ton of stuff you could use for the volatility process, but practically speaking something relatively simple would be nice. The stochastic volatility model uses one of the simplest processes we could choose here – a stationary AR(1) process on the *log* variances. Let $h_t = log(\sigma^2_t)$. Then

$$

h_t|h_{t-1} \sim N(\mu + \phi(h_{t-1} – \mu), \sigma^2_h)

$$

where $\mu$ is the average value of the $h_t$ process, $\sigma^2_h$ is the conditional variance of $h_t$ given $h_{t-1}$, and $\phi$ is the first order autocorrelation of $h_t$ process, i.e. the correlation between $h_t$ and $h_{t-1}$. This leaves us with three unknown parameters, $\mu$, $\phi$, and $\sigma^2_h$, and a latent process $\{h_t\}$, to estimate. In order to initialize the $h_t$’s, the model assumes that $h_0$ is drawn from the stationary distribution of the volatility process, i.e.

$$

h_0 \sim N\left(\mu, \frac{\sigma_h^2}{1-\phi^2}\right).

$$

With everything put together, the model reads

$$

y_t|h_{1:T} \stackrel{ind}{\sim} N\left(0,e^{h_t}\right)\\

h_t|h_{t-1} \sim N\left(\mu + \phi(h_{t-1} – \mu), \sigma^2_h\right)\\

h_0 \sim N\left(\mu, \frac{\sigma_h^2}{1-\phi^2}\right).

$$

Now it may seem like a major restriction that the stochastic volatility model assumes that the data is mean zero – prices aren’t mean zero after all and in fact might have a trend. But typically this model is fit on the demeaned log returns, i.e. let

$$

\tilde{y}_t = log\left(\frac{p_t – p_{t-1}}{p_{t-1}}\right) = log(p_t) – log(p_{t-1})

$$

then

$$

y_t = \tilde{y}_t – \frac{1}{T}\sum_{s=1}^T\tilde{y}_s.

$$

This is usually enough to ensure that the time series is not only trendless but mean zero. If you want to also estimate the mean return, you can simply add that as a parameter in the model – specifically as the mean of the normal distribution from which the $y_t$’s are drawn.

**Fitting the model**

We’ll fit the model using Bayesian methods. There are advantages and disadvantages to this approach, but one major advantage is how easy it is to compute estimates and credence intervals for any quantity we are interested in. In order to do so, we need prior a prior distribution for $(\mu,\phi,\sigma^2_h)$. Choosing good priors is part art and part science, and not necessarily easy. Here, I’ll defer to the experts on this particular model. Gregor Kastner and Sylvia Frühwirth-Schnatter have a great paper (pdf) explaining a computational method for fitting the model, and Kastner has written an R package (stochvol) that implements this method with an excellent tutorial (pdf).

The basic guidance that Kastner gives is that if you want a noninformative prior in the sense that a small change in the prior doesn’t change the results much, then there are good options for $\mu$ and $\sigma_n^2$ but not really for $\phi$. He suggests putting independent priors on each of the parameters, which is common in Bayesian applications. In particular he suggests a normal prior on $\mu$ with a mean of zero and a large ($\geq 100$) variance. For $\sigma_n^2$ Kastner suggests a fun prior – putting a mean zero normal distribution on $\pm \sqrt{\sigma_n^2}$, which is sort of like specifying the prior on the standard deviation but not really since the prior allows the parameter to be negative. This is equivalent to a $Gamma(1/2, 1/(2Q))$ prior directly on $\sigma_n^2$ where $Q$ is the prior variance of $\pm \sqrt{\sigma_h^2}$. This prior is chosen primarily because it makes computation fairly straightforward (see the Kastner & Frühwirth-Schnatter paper linked above).

Edit: Gregor mentions in the comments that the real motivation for the gamma prior is that it avoids some well known problems with the most computationally convenient prior – the inverse gamma prior. This (pdf) paper by Gelman explains the issue pretty well.

In practice, the particular choice of variance for each of these normal priors doesn’t tend to impact inference much so long as their variances are both large enough, but the prior on the autocorrelation parameter $\phi$ is a different story. Since we want the volatility process to be stationary, this forces $-1\lt \phi\lt 1$. A common prior on bounded parameters such as this is a transformed beta distribution. The usual beta distribution is a probability distribution on the unit interval, $(0,1)$. If you want a beta distribution on the interval $(a,b)$ instead, then putting the usual beta distribution on $(y-a)/(b-a)$ will do the trick. So here, $(\phi + 1)/2$ will have a beta prior that depends on two hyperparameters that we must specify, $\alpha$ and $\beta$. It turns out that posterior inference for $\phi$ is highly sensitive to the choice of $\alpha$ and $\beta$, and in some cases the posterior distribution may be identical to the prior. Kastner cites another paper which suggests using $\alpha=20$ and $\beta=1.5$, which implies a prior mean of $0.86$ for $\phi$, a prior standard deviation of $0.11$, and that it’s extremely unlikely that $\phi < 0$. The default in Kastner’s R package sets $\alpha=5$ instead which makes the prior mean of $\phi$ only about $0.54$ with a much larger prior standard deviation of $0.31$. We’ll use these hyperparameters as a default option since a prior expected level of persistence of about $0.5$ seems roughly right to me (in the dark) and I assume that Kastner set it as the default for a reason. A serious analysis should put some careful thought into what these values should be though, and I’m certainly open to arguments. We’ll set prior variance of the mean of the volatility process to 100, and the prior variance of $\pm\sqrt{\sigma_n^2}$ to 10, which are good default values according to Kastner. So to summarize the prior on $(\mu,\sigma_n^2,\phi)$, we’re assuming that they’re independent a priori and that

$$

\mu \sim N(0,100)\\

\pm\sqrt{\sigma_n^2} \sim N(0,10)\\

(\phi + 1)/2 \sim Beta(5, 1.5).

$$

The actual fitting of the model requires an MCMC (Markov chain Monte Carlo) sampler in order to approximately sample from the joint posterior distribution of the parameters $(\mu,\sigma_n^2,\phi)$ and the latent volatility process $\{h_t\}$. Constructing a good MCMC sampler isn’t always easy, but luckily, Kastner’s R package, stochvol, implements the MCMC strategy that he and Frühwirth-Schnatter developed in the paper I linked above, so we don’t have to put any thought into it. If you’re into the nuts and bolts of MCMC, I recommend reading their paper – the strategy is pretty cutting edge and uses several ideas from different strands of the literature in order to solve several very commonly encountered problems when fitting these sorts of models.

**Results**

To fit the model, I used the stochvol R package to obtain 100,000 approximate draws from the posterior distribution after a burn in of 10,000 draws. This only took about a minute on my laptop. Here is an R script that contains all of the commands I used to load the data into R, fit the model, create the plots I’m about to show you, and a few other things.

First, let’s compare Eli’s btcvol, which uses a 30 day window to compute sample standard deviations, to the estimated daily standard deviations from the stochastic volatility model.

The red is btcvol while the black is an estimate of the daily volatility using the stochastic volatility model. Strictly speaking, we’re using the median of the posterior distribution of a given day’s volatility (i.e. standard deviation) as a point estimate of the true volatility that day. The posterior mean is another commonly used point estimate, but I prefer the median because it emphasizes that, short of using a realistic loss/utility function, we care about the whole posterior distribution and not just its mean. The grey lines combine to create a 90% credible interval for the daily volatilities. In other words, the posterior probability that a given day’s volatility is inside the interval created by the gray lines on that day is 0.9. This gives us some idea of how certain we are about the level of volatility on a given date which I think is a significant advantage of using this method to measure volatility.

The most obvious thing about this graph is that btcvol is much more persistent – when volatility is suddenly really high on Tuesday, btcvol assumes that volatility is also high for the next month or so. On the other hand, the estimates from the stochastic volatility model jump fairly freely so that today’s volatility can be drastically different from yesterday’s volatility. This is a direct consequence of how btcvol is calculated. If the log return of bitcoin is constant for a month after a massive spike, btcvol will tell us that volatility was high during that entire period since the spike is still in the 30 day window. The stochastic volatility model, on the other hand, will only tell us that volatility was high on the day of the spike and maybe for a couple of days after that. We can make the model estimates behave more like btcvol by adjusting the prior on $\phi$. Specifically, if we increase $\alpha$ the prior will put more weight on higher values of $\phi$, so the volatility process will consequently be more persistent both in the prior and the posterior. To some extent the data can overcome this prior, but how much weight to place on the data is a tough question, and one we probably want to answer if we want to pick a good prior for $\phi$.

Another advantage of estimating volatility this way is that we can easily predict volatility as well using the posterior distribution of the $h_t$’s and the recursion that gives us $h_{t+1}$’s distribution conditional on $h_t$. The next plot zooms in on the last 2-3 months and adds about a month of predictions.

This zoomed in look gives us a better view of the differences between btcvol and the stochastic volatility estimates. Btcvol smooths out the volatility of Bitcoin’s returns pretty heavily while the stochastic volatility estimates are happy to let each day have drastically different volatility. The new portion of this plot is the dashed lines, which represent the prediction version of their corresponding solid colored line. In other words, the dashed black line is the posterior predicted median for daily volatility while the dashed gray lines create a 90% posterior prediction interval.

If you want the data so you can fit this model yourself or perhaps the estimates or predictions from the model I fit, look no further:

The most recent bitcoin return data and btcvol estimate is here (csv), from Eli.

The volatility estimates along with the data used to fit the model are here (csv).

The volatility predictions from the model are here (csv).

The R script I wrote to fit the model and produce these plots (and do a few other things) is here (R – text file).

The tutorial on using the stochvol package by Gregor Kastner is here (pdf).

The paper where by Gregor Kastner and Sylvia Frühwirth-Schnatter describe the computational method used to fit the model in stochvol is here (pdf).

Hey Matt,

I have to say this post is a great example of of how science should be done nowadays! Inspiring and easy to build upon.

I am a master student, I love Baysian Statistics, it would be great to check out this method and refer to it in my thesis.

Some links and figures above that would be helpful for me seem to be dead, do you think you could fix, or just send a dropbox link?

Cheers

Anton

osika@kth.se

Thanks for letting me know, they should all be working now.

Hi Matt,

thanks a lot for this instructive work, I am really happy to see that stochvol gets used for fun applications! Just two quick comments:

a) There is a typo in the definition of the AR process which appears again in the specification of the SV model.

b) The reason we chose a (restricted) Gamma prior instead of the usual Inverted Gamma (IG) prior is not primarily computational convenience (in fact, IG would be the conjungate one in the centered parametrization). It’s rather that we think the IG prior might not be appropriate in this case, as it can be very informative in the neighbourhood of zero.

Keep up the good work!

/g

Thanks, fixed. I added a note about the gamma prior as well.

I’m getting an error when I tried reproducing your code:

“Error: Unknown parameters: subset”

I tried rerunning it and it worked for me. Is your R install up to date? What about the ggplot2 package?