*UPDATE: As of version 1.5-5 of mnormt, the bug has been fixed.*

Last week I discovered a nasty little bug in the `rmt`

function of the `mnormt`

package. I’ve emailed the package maintainer but so far haven’t received a response. I’m sure they’re very busy, but in the mean time I think it’s fair to warn potential users of the package. It’s also an object lesson in the weirdness of random number generators. But first we’ll go into a little background about what the `rmt`

function intends to do.

The `rmt`

function generates multivariate student’s T distributed random vectors. There are several ways to reasonably define a multivariate T distribution, but a common one that arises often in Bayesian statistics has the following density for a $p$-dimensional random vector $x$ with $p$-dimensional location vector parameter $\mu$, $p\times p$ scale matrix parameter $\Sigma$, and scalar degrees of freedom parameter $\nu$:

$$ f(x|\mu,\Sigma,\nu) = \frac{\Gamma[(\nu + p)/2]}{\Gamma(\nu/2)\nu^{p/2}\pi^{p/2}}|\Sigma|^{-1/2}\left[1 + \frac{1}{\nu}(x – \mu)’\Sigma^{-1}(x – \mu)\right]^{-(\nu + p)/2}. $$

where $\Gamma(.)$ is the gamma function. In order to generate random vectors from this density, it’s useful to have the Cholesky decomposition of $\Sigma$ lying around, i.e. the unique $p\times p$ lower triangular matrix $L$ such that $LL’ = \Sigma$. At this point, a common mistake that some people make when trying to generate multivariate T random vectors is to think the location-scale results that hold for multivariate normal distributions hold for multivariate T distributions. For example if we wanted to generate a $N(\mu, \Sigma)$ distributed random vector, we could instead generate $Z\sim N(0, I_p)$, i.e. a $p$-dimensional vector of iid $N(0,1)$ random variates, then set $X = \mu + LZ$. The analogous result for the $T$ distribution does not hold.

However, there’s a nifty result that a multivariate $T$ distribution of the sort I’m talking about can be written as a scale mixture of normals. Specifically if $d \sim \chi^2_\nu$ and $x|d \sim N\left(\mu, \frac{\nu}{d}\Sigma\right)$, then $x\sim T_\nu(\mu, \Sigma)$. Often this result is stated in terms of gamma distributions since $d/\nu$ has a gamma distribution and anyway the $\chi^2$ distribution is a special case of the gamma, but for the purposes of random number generation the $\chi_\nu^2$ distribution is easier and faster to generate from. So the standard method of generating a multivariate $T_\nu(\mu, \Sigma)$ random vector $X$ is as follows:

1. Generate $d \sim \chi^2_\nu$.

2. Generate $z_1, z_2, \dots, z_p \stackrel{iid}{\sim} N(0,1)$ and collect them into the vector $Z = (z_1, z_2, \dots, z_p)$.

3. Return

$$X = \mu + \sqrt{\frac{\nu}{d}}LZ.$$

If we look at `mnormt`

‘s `rmt`

function we’ll see that this is exactly the method that the function uses (from version 1.5-4 of the package):

> rmt function (n = 1, mean = rep(0, d), S, df = Inf, sqrt = NULL) { sqrt.S <- if (is.null(sqrt)) chol(S) else sqrt d <- if (is.matrix(sqrt.S)) ncol(sqrt.S) else 1 old.state <- get(".Random.seed", envir = .GlobalEnv) x <- if (df == Inf) 1 else rchisq(n, df)/df assign(".Random.seed", old.state, envir = .GlobalEnv) z <- rmnorm(n, rep(0, d), sqrt = sqrt.S) mean <- outer(rep(1, n), as.vector(matrix(mean, d))) drop(mean + z/sqrt(x)) }

This function accomplishes exactly these steps along with some others. The first few bits of code check to see if you have supplied a square root matrix for $\Sigma$ (`S`

in the function), and if not it constructs one for you - this is our $L$, or more precisely it's $L'$. Next it has weird looking line of code that we'll come back to later: `old.state <- get(".Random.seed", envir = .GlobalEnv)`

. This it finally gets around to our step 1 - drawing the $\chi^2_\nu$ variable, with the ability to handle $\nu = \infty$. Next we get another weird line of code: `assign(".Random.seed", old.state, envir = .GlobalEnv)`

. Then `rmt`

does our steps 2 and 3 and returns the output - the `rmnorm`

function uses the method I discussed above to generate multivariate normal random vectors. The only other wrinkle is that `rmt`

can generate $n$ $T_\nu(\mu,\Sigma)$ distributed random vectors instead of just one.

Unsurprisingly the weird lines of code are where the bug is at. The first one, `old.state <- get(".Random.seed", envir = .GlobalEnv)`

, stores the state of the seed for `R`

's internal random number generator. Then after the $\chi^2_\nu$ variate is drawn, the second one, `assign(".Random.seed", old.state, envir = .GlobalEnv)`

, restores the seed to it's state just before the the $\chi^2_\nu$ draw. Then the multivariate normal draw is made. This is very strange - the $\chi^2_\nu$ and multivariate normal draws need to be independent in order for the algorithm to work, but forcing the random number generator to be in the same state when you draw each one will force them to be correlated in hard to predict ways. I mentioned the bug on twitter last week, and someone suggested that it might be some leftover code from a unit test. That sounds plausible to me.

Even worse is that when you use `rmt`

and compare the results to the density or to another $T$ generator that you know works, everything seems fine! Below is a short R script for doing that using Kolmogorov-Smirnov tests. I generate standard univariate $T_\nu$ random variates with $\nu = 5$ using `rmt`

and a fixed version of `rmt`

that just comments out the two lines that mess with the RNG's seed. There is no apparent difference between the two or between either and the true CDF of the univariate $T_\nu$ distribution - my p-values ranged from 0.3 to 0.7.

library(mnormt) library(MCMCpack) ## only used for mcmc() and summary(mcmc()) set.seed(23242) ## Note: In a fresh R session, .Random.seed doesn't exist yet ## and rmt will throw an error while looking for it. ## same as rmt, except does not reset .Random.seed rmtfixed <- function (n = 1, mean = rep(0, d), S, df = Inf, sqrt = NULL) { sqrt.S <- if (is.null(sqrt)) chol(S) else sqrt d <- if (is.matrix(sqrt.S)) ncol(sqrt.S) else 1 ## old.state <- get(".Random.seed", envir = .GlobalEnv) x <- if (df == Inf) 1 else rchisq(n, df)/df ## assign(".Random.seed", old.state, envir = .GlobalEnv) z <- rmnorm(n, rep(0, d), sqrt = sqrt.S) mean <- outer(rep(1, n), as.vector(matrix(mean, d))) drop(mean + z/sqrt(x)) } df <- 5 niter <- 100000 mnormtout <- rmt(niter, 0, 1, df) fixedout <- rmtfixed(niter, 0, 1, df) ks.test(fixedout, mnormtout) ## no apparent difference between the rmt's ks.test(mnormtout, pt, df = df) ## both rmt's look good compared to the cdf ks.test(fixedout, pt, df = df)

This might make you think that `rmt`

is working fine, but my next bit of code will hopefully convince you that it isn't. The problem occurs in the context of an independent Metropolis-Hastings algorithm. Suppose we want to generate random variates from the density $p(x)$ but doing so is expensive. Suppose further we have another density $q(x)$ which is fairly close to $p(x)$ but much easier to draw from. Then we can construct a Markov chain that converges in distribution to $p(x)$ as follows. Suppose $x_t$ is our previous draw of $x$ in the chain - we can initialize $x$ at any value we want. Then generate a proposal value $x_{prop}$. Next, compute the acceptance ratio

$$ a = \frac{p(x_{prop})q(x_t)}{p(x_{t})q(x_{prop})} $$

and let $r = \min\{a, 1\}$. Then let $x_{t+1} = x_{prop}$ with probability $r$, and $x_{t+1} = x_t$ with probability $1 - r$. As $t\to \infty$ the probability distribution of $x_t$ converges to the target distribution, $p(x)$, under general conditions. Or at least it should if the random number generator is working correctly.

The code in the `R`

script at the end of this post constructs an independent Metropolis-Hastings algorithm for the inverse gamma density, $p(x) \propto x^{-a-1}e^{-\frac{b}{x}}$.

I don't include the normalizing constant because it isn't necessary for Metropolis algorithms. I'll use a normal approximation to the density of $y = \log x$ as my proposal, i.e. as my $q(y)$. The density of $y$ is $p(y) \propto (e^y)^{-a}e^{-be^{-y}}$. The normal approximation to this density is $N(y^*, s^2)$ where $y^*$ is the mode and

$$

s^2 = \left[\left.\frac{d^2 \log p(y)}{d y^2}\right|_{y = y^*}\right]^{-1}.

$$

The code in the script at the end of this post numerically finds $y^*$ and $s^2$. I compare the simulations to simulations from the same algorithm except using a fixed version of `rmt`

, and to directly simulating from an inverse gamma distribution. The results of my simulations are below - `mnormt`

is the Metropolis algorithm using `rmt`

to generate proposals, `fixed`

is the same algorithm using the fixed version of `rmt`

to generate proposals, and `mc`

directly simulates from the inverse gamma distribution. You can see that the fixed version of the Metropolis algorithm and directly simulating from an inverse gamma distribution yield the same results, but using `rmt`

biases the distribution significantly toward zero - the mean and many of the quantiles are attenuated.

> summary(mcmc(cbind(mnormt = exp(imh.mnormt.out$draws), fixed = exp(imh.fixed.out$draws), mc = mc))) Iterations = 1:1e+05 Thinning interval = 1 Number of chains = 1 Sample size per chain = 1e+05 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE mnormt 0.6331 0.2937 0.0009289 0.001192 fixed 1.0025 2.3885 0.0075530 0.008671 mc 1.0004 1.9812 0.0062651 0.006265 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% mnormt 0.1710 0.3646 0.6382 0.8681 1.169 fixed 0.1784 0.3710 0.5958 1.0345 4.128 mc 0.1804 0.3740 0.5977 1.0412 4.054

And the same problem viewed on the log scale - this the results are negatively biased.

> summary(mcmc(cbind(mnormt = imh.mnormt.out$draws, fixed = imh.fixed.out$draws, mc = log(mc)))) Iterations = 1:1e+05 Thinning interval = 1 Number of chains = 1 Sample size per chain = 1e+05 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE mnormt -0.5907 0.5528 0.001748 0.002189 fixed -0.4238 0.8045 0.002544 0.003202 mc -0.4197 0.8008 0.002532 0.002532 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% mnormt -1.766 -1.0088 -0.4490 -0.14147 0.1561 fixed -1.724 -0.9916 -0.5179 0.03396 1.4178 mc -1.712 -0.9834 -0.5146 0.04035 1.3996

This is a pretty big deal! Several `R`

packages depend on `mnormt`

and I'm sure many people use the package whenever they need to deal with multivariate T distributions. It appears that `rmt`

will work fine for some applications since the Kolmogorov-Smirnov test didn't detect any issues, but it's hard to predict where problems will arise in advance. Multivariate T distributions are often use for proposals in Metropolis-Hastings algorithms, so even if that's the only place where there's a problem it could impact a lot of people. Someone could very easily be making decisions based on incorrect results as a consequence of the bug. Luckily the fix is easy - just comment out the two lines of code in `rmt`

that mess with the internal state of `R`

's RNG. Hopefully the package gets fixed soon, but in the mean time at least now you know how to deal with the problem.

Code for all of the simulations above:

library(mnormt) library(MCMCpack) ## only used for mcmc() and summary(mcmc()) set.seed(23242) ## Note: In a fresh R session, .Random.seed doesn't exist yet ## and rmt will throw an error while looking for it. ## same as rmt, except does not reset .Random.seed rmtfixed <- function (n = 1, mean = rep(0, d), S, df = Inf, sqrt = NULL) { sqrt.S <- if (is.null(sqrt)) chol(S) else sqrt d <- if (is.matrix(sqrt.S)) ncol(sqrt.S) else 1 ## old.state <- get(".Random.seed", envir = .GlobalEnv) x <- if (df == Inf) 1 else rchisq(n, df)/df ## assign(".Random.seed", old.state, envir = .GlobalEnv) z <- rmnorm(n, rep(0, d), sqrt = sqrt.S) mean <- outer(rep(1, n), as.vector(matrix(mean, d))) drop(mean + z/sqrt(x)) } df <- 5 niter <- 100000 mnormtout <- rmt(niter, 0, 1, df) fixedout <- rmtfixed(niter, 0, 1, df) ks.test(fixedout, mnormtout) ## no apparent difference between the rmt's ks.test(mnormtout, pt, df = df) ## both rmt's look good compared to the cdf ks.test(fixedout, pt, df = df) ### Metropolis-Hastings example ## log density for x = log(y), y ~ InverseGamma(a, b) logf <- function(x, a, b){ -x*a - b*exp(-x) } ## independent metropolis using mnormt's rmt for proposal ## uses a T_df(mu, sig2) proposal imh.mnormt <- function(mcmciter, mu, sig2, df, a, b){ draws <- rep(0, mcmciter) accept <- rep(0, mcmciter) old <- mu lpold <- logf(old, a, b) ljold <- dmt(old, mu, sig2, df, TRUE) for(i in 1:mcmciter){ prop <- rmt(1, mu, sig2, df) lpprop <- logf(prop, a, b) ljprop <- dmt(prop, mu, sig2, df, TRUE) la <- lpprop - lpold - ljprop + ljold u <- runif(1) if(log(u) < la){ old <- prop lpold <- lpprop ljold <- ljprop accept[i] <- 1 } draws[i] <- old } list(draws = draws, accept = accept) } ## independent metropolis using rmtfixed for proposal ## uses a T_df(mu, sig2) proposal imh.fixed <- function(mcmciter, mu, sig2, df, a, b){ draws <- rep(0, mcmciter) accept <- rep(0, mcmciter) old <- mu lpold <- logf(old, a, b) ljold <- dmt(old, mu, sig2, df, TRUE) for(i in 1:mcmciter){ prop <- rmtfixed(1, mu, sig2, df) ## only line different from imh.mnormt lpprop <- logf(prop, a, b) ljprop <- dmt(prop, mu, sig2, df, TRUE) la <- lpprop - lpold - ljprop + ljold u <- runif(1) if(log(u) < la){ old <- prop lpold <- lpprop ljold <- ljprop accept[i] <- 1 } draws[i] <- old } list(draws = draws, accept = accept) } ## set inverse gamma parameters a <- 2 b <- 1 ## numerically find mode and hessian of logf modeout <- optim(0, logf, method = "Brent", hessian = TRUE, lower = -100, upper = 100, control = list(fnscale = -1), a = a, b = b) mu <- modeout$par sig2 <- -1/modeout$hessian mcmciter <- 100000 ## large to make sure differences aren't due to numerical error df <- 2 imh.mnormt.out <- imh.mnormt(mcmciter, mu, sig2, df, a, b) imh.fixed.out <- imh.fixed(mcmciter, mu, sig2, df, a, b) mc <- 1/rgamma(mcmciter, a, b) ## check M-H acceptance rates: plenty high in this case mean(imh.mnormt.out$accept) mean(imh.fixed.out$accept) ## mnormt gives the wrong mean & non-central quantiles - especially upper quantiles summary(mcmc(cbind(mnormt = exp(imh.mnormt.out$draws), fixed = exp(imh.fixed.out$draws), mc = mc))) ## same problem viewed on the log scale - the scale the M-H algorithms worked on summary(mcmc(cbind(mnormt = imh.mnormt.out$draws, fixed = imh.fixed.out$draws, mc = log(mc))))