## Bug alert! rmt function in mnormt R package.

UPDATE: As of version 1.5-5 of mnormt, the bug has been fixed.

Last week I discovered a nasty little bug in the rmt function of the mnormt package. I’ve emailed the package maintainer but so far haven’t received a response. I’m sure they’re very busy, but in the mean time I think it’s fair to warn potential users of the package. It’s also an object lesson in the weirdness of random number generators. But first we’ll go into a little background about what the rmt function intends to do.

## Sampling from a truncated Dirichlet distribution

Sampling from truncated distributions is hard, especially when the distribution in question is multivariate. I came across an interesting paper showing how to use MCMC in order to sample from truncated normal, gamma, and beta distributions – Paul Damien & Stephen G Walker (2001) Sampling Truncated Normal, Beta, and Gamma Densities, Journal of Computational and Statistical Graphics. The basic idea is pretty simple and I’ve applied it to a Dirichlet distribution.

Update: Code added below to simulate from the truncated Dirichlet distribution, along with an example.

## Exactly how volatile is bitcoin?

Recently Eli Dourado introduced the bitcoin volatility index or btcvol, a measure of bitcoin’s volatility. Eli estimates the volatility of the return to holding bitcoin at a given date by computing the standard deviation of log returns from the past 30 days. This is fine so far as it goes – as a quick and dirty measure – but I think we can do better. The ideal best, as Eli admits, would be the implied volatility from the derivatives market. The problem with this measure is that the bitcoin derivatives market is very immature, though bitcoin derivatives do exist. Short of implied volatility, we can still use a formal method to estimate historical volatility and potentially predict future volatility. One fairly straightforward method of doing this is through estimating a stochastic volatility model. Below the jump, I’ll describe the model, show how to quickly fit it in R, and display some nifty graphs as a result of estimating the model. If you’re just looking for the estimated or predicted volatilities from the model, for the R script that I used to actually fit the model, or for information on the stochvol R package I used in that script, scroll to the end of the post.

## Why infinite utility is like the Spanish inquisition

Very often in philosophical discussions involving decision theory or ethics, someone will attempt to show that some action has infinite expected utility and try to draw some conclusions from this proposition. Or at least confuse the hell out of everyone. The most common culprit is (some versions of) Pascal’s wager, but the St. Petersburg paradox is another. What I want to do is convince you to be extremely skeptical of any conclusions that depend on something or another having infinite expected utility. To do this, we’re going to have to learn some of the nitty-gritty math of decision theory – yes, this is partially just an excuse to talk about decision theory. There won’t be much math, but when there is the details will be important. (I wrote that last sentence before I wrote the rest and let’s just say I was a bit optimistic.)

## Generating random variates from a symmetric “signed” square root distribution.

Suppose you have some continuous random variable $X>0$ that is easy to sample from. Suppose you’re interested in sampling from another real valued random variable $Y$ where $X=Y^2$. In other words, $Y=\pm\sqrt{X}$. This is the signed square root of $X$ – signed because $Y$ is real valued and so can take on negative values. If the distribution of $Y$ is symmetric around zero, it turns out all you have to do in order to draw from $Y$ is draw from $X$ then flip a fair coin to choose the sign to give $Y$. This basic problem has come up twice in my research lately, and unsurprisingly once I abstracted away from the details of the distributions I was working with I came up with an elegant little proof.

## I’m Almost Sure it’s Probably Converging… Wait, What?

Every first year graduate student in statistics or economics has to learn the basics of probability and statistics theory. Most of this is fairly understandable – once you grasp the basic concepts behind probability and distribution theory, everything else is applications and details for the most part. There is at least one notable exception to this – the relationship between the various notions of convergence used in probability theory.

Now everyone learns that convergence almost surely implies convergence in probability which, in turn, implies convergence in distribution. Everyone also learns that none of the converse implications hold in general, but I don’t think anyone comes away really grasping the difference between all of these concepts – they learn the “that” but not the “why.” I’m guilty of this too. We memorize which is stronger then go on to use them to talk about central limit theorems, laws of large numbers, and properties of estimators. This is probably fine since these first year courses typically focus on getting comfortable using the main concepts of probability theory rather than on deep understanding – that’s for later courses. But some people can’t wait for those courses, so I’ll try to explain the difference both intuitively and with some math.

## Introduction to Bayesian Statistics Videos

My advisor, Jarad Niemi, is posting lectures on Bayesian statistics on his youtube channel while teaching Stat 544 – the master’s/Ph.D. level introduction to Bayesian statistics at Iowa State University. I’ve taken a look at a few of them and they’re pretty good. Most are short (~10 minute) explanations of a particular topic in order to supplement required readings. Some of them, like the extended Metropolis within Gibbs example, are full lecture length, i.e. about one hour. Since the course is ongoing this semester, you can expect a few more to be posted.

The lectures do assume a rather high level understanding of probability theory. The statistics students in the class have seen at least chapters 1 – 5 of Casella and Berger in some detail. Other students in the class have similar backgrounds, though perhaps not quite as strong. Some knowledge of R would also be useful to understand the more computationally centered lectures. While the videos might not be useful for everyone, they’re probably a great supplement if you’re learning some of this material elsewhere.

## Prior distributions for covariance matrices: the scaled inverse-Wishart prior

At some point in a variety of Bayesian applications, we end up having to specify a prior distribution on a covariance matrix. The canonical example is a hierarchical regression model. Suppose we have a response vector $y$ along with some covariate $x$ where we expect the simple linear regression model to provide a decent fit. In addition, suppose the data is organized into distinct groups such that each group possibly has its own regression line, though we expect the lines to be related. In order to model this situation, we start first with the data level: