Bug alert! rmt function in mnormt R package.

UPDATE: As of version 1.5-5 of mnormt, the bug has been fixed.

Last week I discovered a nasty little bug in the rmt function of the mnormt package. I’ve emailed the package maintainer but so far haven’t received a response. I’m sure they’re very busy, but in the mean time I think it’s fair to warn potential users of the package. It’s also an object lesson in the weirdness of random number generators. But first we’ll go into a little background about what the rmt function intends to do.

The rmt function generates multivariate student’s T distributed random vectors. There are several ways to reasonably define a multivariate T distribution, but a common one that arises often in Bayesian statistics has the following density for a $p$-dimensional random vector $x$ with $p$-dimensional location vector parameter $\mu$, $p\times p$ scale matrix parameter $\Sigma$, and scalar degrees of freedom parameter $\nu$:
$$ f(x|\mu,\Sigma,\nu) = \frac{\Gamma[(\nu + p)/2]}{\Gamma(\nu/2)\nu^{p/2}\pi^{p/2}}|\Sigma|^{-1/2}\left[1 + \frac{1}{\nu}(x – \mu)’\Sigma^{-1}(x – \mu)\right]^{-(\nu + p)/2}. $$
where $\Gamma(.)$ is the gamma function. In order to generate random vectors from this density, it’s useful to have the Cholesky decomposition of $\Sigma$ lying around, i.e. the unique $p\times p$ lower triangular matrix $L$ such that $LL’ = \Sigma$. At this point, a common mistake that some people make when trying to generate multivariate T random vectors is to think the location-scale results that hold for multivariate normal distributions hold for multivariate T distributions. For example if we wanted to generate a $N(\mu, \Sigma)$ distributed random vector, we could instead generate $Z\sim N(0, I_p)$, i.e. a $p$-dimensional vector of iid $N(0,1)$ random variates, then set $X = \mu + LZ$. The analogous result for the $T$ distribution does not hold.

However, there’s a nifty result that a multivariate $T$ distribution of the sort I’m talking about can be written as a scale mixture of normals. Specifically if $d \sim \chi^2_\nu$ and $x|d \sim N\left(\mu, \frac{\nu}{d}\Sigma\right)$, then $x\sim T_\nu(\mu, \Sigma)$. Often this result is stated in terms of gamma distributions since $d/\nu$ has a gamma distribution and anyway the $\chi^2$ distribution is a special case of the gamma, but for the purposes of random number generation the $\chi_\nu^2$ distribution is easier and faster to generate from. So the standard method of generating a multivariate $T_\nu(\mu, \Sigma)$ random vector $X$ is as follows:

1. Generate $d \sim \chi^2_\nu$.
2. Generate $z_1, z_2, \dots, z_p \stackrel{iid}{\sim} N(0,1)$ and collect them into the vector $Z = (z_1, z_2, \dots, z_p)$.
3. Return
$$X = \mu + \sqrt{\frac{\nu}{d}}LZ.$$

If we look at mnormt‘s rmt function we’ll see that this is exactly the method that the function uses (from version 1.5-4 of the package):

> rmt
function (n = 1, mean = rep(0, d), S, df = Inf, sqrt = NULL) 
{
    sqrt.S <- if (is.null(sqrt)) 
        chol(S)
    else sqrt
    d <- if (is.matrix(sqrt.S)) 
        ncol(sqrt.S)
    else 1
    old.state <- get(".Random.seed", envir = .GlobalEnv)
    x <- if (df == Inf) 
        1
    else rchisq(n, df)/df
    assign(".Random.seed", old.state, envir = .GlobalEnv)
    z <- rmnorm(n, rep(0, d), sqrt = sqrt.S)
    mean <- outer(rep(1, n), as.vector(matrix(mean, d)))
    drop(mean + z/sqrt(x))
}

This function accomplishes exactly these steps along with some others. The first few bits of code check to see if you have supplied a square root matrix for $\Sigma$ (S in the function), and if not it constructs one for you - this is our $L$, or more precisely it's $L'$. Next it has weird looking line of code that we'll come back to later: old.state <- get(".Random.seed", envir = .GlobalEnv). This it finally gets around to our step 1 - drawing the $\chi^2_\nu$ variable, with the ability to handle $\nu = \infty$. Next we get another weird line of code: assign(".Random.seed", old.state, envir = .GlobalEnv). Then rmt does our steps 2 and 3 and returns the output - the rmnorm function uses the method I discussed above to generate multivariate normal random vectors. The only other wrinkle is that rmt can generate $n$ $T_\nu(\mu,\Sigma)$ distributed random vectors instead of just one.

Unsurprisingly the weird lines of code are where the bug is at. The first one, old.state <- get(".Random.seed", envir = .GlobalEnv), stores the state of the seed for R's internal random number generator. Then after the $\chi^2_\nu$ variate is drawn, the second one, assign(".Random.seed", old.state, envir = .GlobalEnv), restores the seed to it's state just before the the $\chi^2_\nu$ draw. Then the multivariate normal draw is made. This is very strange - the $\chi^2_\nu$ and multivariate normal draws need to be independent in order for the algorithm to work, but forcing the random number generator to be in the same state when you draw each one will force them to be correlated in hard to predict ways. I mentioned the bug on twitter last week, and someone suggested that it might be some leftover code from a unit test. That sounds plausible to me.

Even worse is that when you use rmt and compare the results to the density or to another $T$ generator that you know works, everything seems fine! Below is a short R script for doing that using Kolmogorov-Smirnov tests. I generate standard univariate $T_\nu$ random variates with $\nu = 5$ using rmt and a fixed version of rmt that just comments out the two lines that mess with the RNG's seed. There is no apparent difference between the two or between either and the true CDF of the univariate $T_\nu$ distribution - my p-values ranged from 0.3 to 0.7.

library(mnormt)
library(MCMCpack) ## only used for mcmc() and summary(mcmc())

set.seed(23242)
## Note: In a fresh R session, .Random.seed doesn't exist yet
## and rmt will throw an error while looking for it.

## same as rmt, except does not reset .Random.seed
rmtfixed <- function (n = 1, mean = rep(0, d), S, df = Inf, sqrt = NULL) {
    sqrt.S <- if (is.null(sqrt)) 
                chol(S)
              else sqrt
    d <- if (is.matrix(sqrt.S)) 
           ncol(sqrt.S)
         else 1
    ## old.state <- get(".Random.seed", envir = .GlobalEnv)
    x <- if (df == Inf) 
           1
         else rchisq(n, df)/df
    ## assign(".Random.seed", old.state, envir = .GlobalEnv)
    z <- rmnorm(n, rep(0, d), sqrt = sqrt.S)
    mean <- outer(rep(1, n), as.vector(matrix(mean, d)))
    drop(mean + z/sqrt(x))
}

df <- 5
niter <- 100000
mnormtout <- rmt(niter, 0, 1, df)
fixedout <- rmtfixed(niter, 0, 1, df)
ks.test(fixedout, mnormtout)     ## no apparent difference between the rmt's
ks.test(mnormtout, pt, df = df)  ## both rmt's look good compared to the cdf
ks.test(fixedout, pt, df = df)

This might make you think that rmt is working fine, but my next bit of code will hopefully convince you that it isn't. The problem occurs in the context of an independent Metropolis-Hastings algorithm. Suppose we want to generate random variates from the density $p(x)$ but doing so is expensive. Suppose further we have another density $q(x)$ which is fairly close to $p(x)$ but much easier to draw from. Then we can construct a Markov chain that converges in distribution to $p(x)$ as follows. Suppose $x_t$ is our previous draw of $x$ in the chain - we can initialize $x$ at any value we want. Then generate a proposal value $x_{prop}$. Next, compute the acceptance ratio
$$ a = \frac{p(x_{prop})q(x_t)}{p(x_{t})q(x_{prop})} $$
and let $r = \min\{a, 1\}$. Then let $x_{t+1} = x_{prop}$ with probability $r$, and $x_{t+1} = x_t$ with probability $1 - r$. As $t\to \infty$ the probability distribution of $x_t$ converges to the target distribution, $p(x)$, under general conditions. Or at least it should if the random number generator is working correctly.

The code in the R script at the end of this post constructs an independent Metropolis-Hastings algorithm for the inverse gamma density, $p(x) \propto x^{-a-1}e^{-\frac{b}{x}}$.
I don't include the normalizing constant because it isn't necessary for Metropolis algorithms. I'll use a normal approximation to the density of $y = \log x$ as my proposal, i.e. as my $q(y)$. The density of $y$ is $p(y) \propto (e^y)^{-a}e^{-be^{-y}}$. The normal approximation to this density is $N(y^*, s^2)$ where $y^*$ is the mode and
$$
s^2 = \left[\left.\frac{d^2 \log p(y)}{d y^2}\right|_{y = y^*}\right]^{-1}.
$$
The code in the script at the end of this post numerically finds $y^*$ and $s^2$. I compare the simulations to simulations from the same algorithm except using a fixed version of rmt, and to directly simulating from an inverse gamma distribution. The results of my simulations are below - mnormt is the Metropolis algorithm using rmt to generate proposals, fixed is the same algorithm using the fixed version of rmt to generate proposals, and mc directly simulates from the inverse gamma distribution. You can see that the fixed version of the Metropolis algorithm and directly simulating from an inverse gamma distribution yield the same results, but using rmt biases the distribution significantly toward zero - the mean and many of the quantiles are attenuated.

> summary(mcmc(cbind(mnormt = exp(imh.mnormt.out$draws), fixed = exp(imh.fixed.out$draws), mc = mc)))

Iterations = 1:1e+05
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 1e+05 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

         Mean     SD  Naive SE Time-series SE
mnormt 0.6331 0.2937 0.0009289       0.001192
fixed  1.0025 2.3885 0.0075530       0.008671
mc     1.0004 1.9812 0.0062651       0.006265

2. Quantiles for each variable:

         2.5%    25%    50%    75% 97.5%
mnormt 0.1710 0.3646 0.6382 0.8681 1.169
fixed  0.1784 0.3710 0.5958 1.0345 4.128
mc     0.1804 0.3740 0.5977 1.0412 4.054

And the same problem viewed on the log scale - this the results are negatively biased.

> summary(mcmc(cbind(mnormt = imh.mnormt.out$draws, fixed = imh.fixed.out$draws, mc = log(mc))))

Iterations = 1:1e+05
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 1e+05 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

          Mean     SD Naive SE Time-series SE
mnormt -0.5907 0.5528 0.001748       0.002189
fixed  -0.4238 0.8045 0.002544       0.003202
mc     -0.4197 0.8008 0.002532       0.002532

2. Quantiles for each variable:

         2.5%     25%     50%      75%  97.5%
mnormt -1.766 -1.0088 -0.4490 -0.14147 0.1561
fixed  -1.724 -0.9916 -0.5179  0.03396 1.4178
mc     -1.712 -0.9834 -0.5146  0.04035 1.3996

This is a pretty big deal! Several R packages depend on mnormt and I'm sure many people use the package whenever they need to deal with multivariate T distributions. It appears that rmt will work fine for some applications since the Kolmogorov-Smirnov test didn't detect any issues, but it's hard to predict where problems will arise in advance. Multivariate T distributions are often use for proposals in Metropolis-Hastings algorithms, so even if that's the only place where there's a problem it could impact a lot of people. Someone could very easily be making decisions based on incorrect results as a consequence of the bug. Luckily the fix is easy - just comment out the two lines of code in rmt that mess with the internal state of R's RNG. Hopefully the package gets fixed soon, but in the mean time at least now you know how to deal with the problem.

Code for all of the simulations above:

library(mnormt)
library(MCMCpack) ## only used for mcmc() and summary(mcmc())

set.seed(23242)
## Note: In a fresh R session, .Random.seed doesn't exist yet
## and rmt will throw an error while looking for it.

## same as rmt, except does not reset .Random.seed
rmtfixed <- function (n = 1, mean = rep(0, d), S, df = Inf, sqrt = NULL) {
    sqrt.S <- if (is.null(sqrt)) 
                chol(S)
              else sqrt
    d <- if (is.matrix(sqrt.S)) 
           ncol(sqrt.S)
         else 1
    ## old.state <- get(".Random.seed", envir = .GlobalEnv)
    x <- if (df == Inf) 
           1
         else rchisq(n, df)/df
    ## assign(".Random.seed", old.state, envir = .GlobalEnv)
    z <- rmnorm(n, rep(0, d), sqrt = sqrt.S)
    mean <- outer(rep(1, n), as.vector(matrix(mean, d)))
    drop(mean + z/sqrt(x))
}

df <- 5
niter <- 100000
mnormtout <- rmt(niter, 0, 1, df)
fixedout <- rmtfixed(niter, 0, 1, df)
ks.test(fixedout, mnormtout)     ## no apparent difference between the rmt's
ks.test(mnormtout, pt, df = df)  ## both rmt's look good compared to the cdf
ks.test(fixedout, pt, df = df)




### Metropolis-Hastings example

## log density for x = log(y), y ~ InverseGamma(a, b)
logf <- function(x, a, b){
  -x*a - b*exp(-x)
}

## independent metropolis using mnormt's rmt for proposal
## uses a T_df(mu, sig2) proposal
imh.mnormt <- function(mcmciter, mu, sig2, df, a, b){
  draws <- rep(0, mcmciter)
  accept <- rep(0, mcmciter)
  old <- mu
  lpold <- logf(old, a, b)
  ljold <- dmt(old, mu, sig2, df, TRUE)
  for(i in 1:mcmciter){
    prop <- rmt(1, mu, sig2, df)           
    lpprop <- logf(prop, a, b)
    ljprop <- dmt(prop, mu, sig2, df, TRUE)
    la <- lpprop - lpold - ljprop + ljold
    u <- runif(1)
    if(log(u) < la){
      old <- prop
      lpold <- lpprop
      ljold <- ljprop
      accept[i] <- 1
    }
    draws[i] <- old
  }
  list(draws = draws, accept = accept)
}

## independent metropolis using rmtfixed for proposal
## uses a T_df(mu, sig2) proposal
imh.fixed <- function(mcmciter, mu, sig2, df, a, b){
  draws <- rep(0, mcmciter)
  accept <- rep(0, mcmciter)
  old <- mu
  lpold <- logf(old, a, b)
  ljold <- dmt(old, mu, sig2, df, TRUE)
  for(i in 1:mcmciter){
    prop <- rmtfixed(1, mu, sig2, df)   ## only line different from imh.mnormt
    lpprop <- logf(prop, a, b)
    ljprop <- dmt(prop, mu, sig2, df, TRUE)
    la <- lpprop - lpold - ljprop + ljold
    u <- runif(1)
    if(log(u) < la){
      old <- prop
      lpold <- lpprop
      ljold <- ljprop
      accept[i] <- 1
    }
    draws[i] <- old
  }
  list(draws = draws, accept = accept)
}

## set inverse gamma parameters
a <- 2
b <- 1

## numerically find mode and hessian of logf
modeout <- optim(0, logf, method = "Brent", hessian = TRUE, lower = -100, upper = 100,
                 control = list(fnscale = -1), a = a, b = b)
mu <- modeout$par
sig2 <- -1/modeout$hessian


mcmciter <- 100000  ## large to make sure differences aren't due to numerical error
df <- 2
imh.mnormt.out <- imh.mnormt(mcmciter, mu, sig2, df, a, b)
imh.fixed.out <- imh.fixed(mcmciter, mu, sig2, df, a, b)
mc <- 1/rgamma(mcmciter, a, b)

## check M-H acceptance rates: plenty high in this case
mean(imh.mnormt.out$accept)
mean(imh.fixed.out$accept)

## mnormt gives the wrong mean & non-central quantiles - especially upper quantiles
summary(mcmc(cbind(mnormt = exp(imh.mnormt.out$draws), fixed = exp(imh.fixed.out$draws), mc = mc)))

## same problem viewed on the log scale - the scale the M-H algorithms worked on
summary(mcmc(cbind(mnormt = imh.mnormt.out$draws, fixed = imh.fixed.out$draws, mc = log(mc))))

Leave a Reply

Your email address will not be published. Required fields are marked *