# Sampling from a truncated Dirichlet distribution

Sampling from truncated distributions is hard, especially when the distribution in question is multivariate. I came across an interesting paper showing how to use MCMC in order to sample from truncated normal, gamma, and beta distributions – Paul Damien & Stephen G Walker (2001) Sampling Truncated Normal, Beta, and Gamma Densities, Journal of Computational and Statistical Graphics. The basic idea is pretty simple and I’ve applied it to a Dirichlet distribution.

Update: Code added below to simulate from the truncated Dirichlet distribution, along with an example.

For the example of a beta distribution, which is similar to a Dirichlet, here’s what you do. The density you’re after is

$$p(x) \propto x^{\alpha-1}(1-x)^{\beta-1}\mathbf{1}(a\leq x\leq b)$$

for $0\leq a < b \leq 1$ and $\alpha,\beta >0$. One approach, called rejection sampling, is to simulate a bunch of random draws from the unconstrained beta distribution and only keep those that fall between $a$ and $b$. This is often extremely inefficient, however, when the constraints force $x$ to be in an area of the parameter space where the density puts very little mass.

Another approach is called CDF inversion. The CDF of this truncated beta distribution is

$$G(x) = \frac{F(x;\alpha,\beta)}{F(b;\alpha,\beta) – F(a;\alpha,\beta)}$$

where $F(x;\alpha,\beta)$ is the CDF of the unconstrained beta distribution — many statistical packages have routines for computing this CDF and its inverse, which is nontrivial. In order to use the CDF inversion technique we need only draw a single uniformly distributed random variable  from 0 to 1, $u$, then set

$$x=G^{-1}(u) = F^{-1}\left([F(b;\alpha,\beta) – F(a;\alpha,\beta)]u; \alpha,\beta\right).$$

This technique is as good as the methods we have to compute $G^{-1}$ and thus $F$ and $F^{-1}$ are, and they aren’t so good for the beta distribution that you would want to use CDF inversion for all possible cases – you will run into numerical problems.

Enter Damien and Walker – they use a relatively simple technique that only requires you add a latent variable and give up exact sampling for an MCMC algorithm that converges in distribution to the beta distribution we are after. The cost of moving from exact sampling to MCMC is small in my case since I want to draw from these truncated Dirichlets in the context of MCMC. But back to the beta distribution first. First we add a latent variable $y$ and define the joint density of $x$ and $y$:

$$p(x,y) \propto x^{\alpha-1}\mathbf{1}\left(0\leq y \leq (1-x)^{\beta-1}, a \leq x \leq b\right)$$

Integrating out $y$ will yield the original density for $x$, $p(x)$. Now we construct the following Gibbs sampler consisting of 2 steps:

1. Draw $y$ from a uniform distribution on the set $\left(0,(1-x)^{\beta-1}\right)$.

2. Draw $x$ using the CDF inversion technique from

$$x^{\alpha-1}\mathbf{1}\left(a\leq x \leq \min\left(1-y^{\frac{1}{\beta-1}}, b\right)\right)$$

if $\beta > 1$ and

$$x^{\alpha-1}\mathbf{1}\left(\max\left(a, 1 – y^{\frac{1}{\beta-1}}\right)\leq x \leq b\right)$$

if $\beta < 1$.

The CDF inversion technique works well for densities proportional to $x^{\alpha-1}$ because we can write down the CDF in closed form. What if $\beta=1$? Then you could use the CDF inversion directly without introducing the latent variable.

If you run this algorithm for many iterations, the marginal distribution of the last draw will converge to the truncated beta distribution we want to draw from. More importantly, the sample mean of any function of $x$ will converge to the expectation of that function of $x$, i.e.

$$\sum_{i=1}^N\frac{h(x_i)}{N} \to \int_0^1h(x)p(x)dx$$

by the law of large numbers.

This is great and all, but I really need to sample from a truncated Dirichlet distribution. A $k$ dimensional Dirichlet distribution has the density

$$p(x_1, x_2, …, x_k) \propto x_1^{\alpha_1-1} x_2^{\alpha_2-1} \cdots x_{k-1}^{\alpha_{k-1}-1}(1-x_1-…-x_{k-1})^{\alpha_k-1}$$

for $0\leq x_1, x_2,…,x_{k-1} \leq 1$ and $\alpha_1,\alpha_2,\cdots,\alpha_k > 0$. I’ll use the 4 dimensional case just to keep things relatively simple. The truncated version of this distribution I want to sample from is

$$p(x_1,x_2,x_3) \propto x_1^{\alpha_1-1}x_2^{\alpha_2-1}x_3^{\alpha_3-1}(1-x_1-x_2-x_3)^{\alpha_4-1}\mathbf{1}(x\in B)$$

where the set $B$ is a subset of $[0,1]^3$. The easiest way to apply Damien & Walker’s method is to add a latent variable $y$ with the joint distribution

$$p(x,y) \propto x_1^{\alpha_1}x_2^{\alpha_2-1}x_3^{\alpha_3-1}\mathbf{1}\left(0 \leq y \leq (1-x_1-x_2-x_3)^{\alpha_4-1}, x \in B \right).$$

Now the Gibbs sampler is straightforward:

1. Draw $y$ from a uniform distribution on $\left(0,(1-x_1-x_2-x_3-x_4)^{\alpha_4-1}\right)$.

2. For $i = 1, 2, 3$ draw use the cdf inversion technique to draw $x_i$ from the distribution

$$p(x_i|y, x_{-i}) \propto x_i^{\alpha_i – 1}\mathbf{1}\left(a_i(x_{-i}) \leq x_i \leq \min\left( 1 – y^{\frac{1}{\alpha_4-1}} – \sum_{j\neq i} x_j, b_i(x_{-i})\right)\right)$$

if $\alpha_4>1$ and

$$p(x_i|y, x_{-i}) \propto x_i^{\alpha_i – 1}\mathbf{1}\left(\max\left( 1 – y^{\frac{1}{\alpha_4-1}} – \sum_{j\neq i} x_j, a_i(x_{-i})\right) \leq x_i <\leq b_i(x_{-i})\right)$$

if $\alpha_4< 1$ and, again, use CDF inversion directly if $\alpha_4=1$. Here I’m assuming that the set $B$ constrains each $x_i$ to an interval $(a_i(x_{-i}), b_i(x_{-i}))$ conditional on the other $x_j$’s. If that’s not the case, it may take a little more work to make sure the CDF inversion technique works correctly. In the simplest case where $B$ is a hypercube, $a_i(x_{-i})=a_i$ and $b_i(x_{-i})=b_i$.

Then once again the marginal distribution of $(x_1,x_2,x_3)$ will converge to the target truncated Dirichlet and the sample mean of functions of $(x_1,x_2,x_3)$ will converge to their expectations.

The only annoying thing about this algorithm is the number of Gibbs steps – each $x_i$ is drawn in a separate Gibbs step  conditional on all of the others. If we could draw some of the $x_i$’s jointly, this would speed up the convergence of the algorithm. If the goal is just sample from a particular truncated Dirichlet, this probably won’t be much of a problem – we can run the algorithm long enough for it not to matter. For large enough $k$ or in the context of a larger MCMC sampler, it might become very slow. Though very slow is perhaps still better than the alternatives.

Update: I’ve added some R code below for simulating from an arbitrary truncated Dirichlet distribution using this the Gibbs samplers constructed in this post. Use rtruncdirichletiter to use this within the context of a larger Gibbs sampler and rtruncdirichlet to run a Gibbs sampler for the truncated Dirichlet itself.

## produces a random draw from the truncated exponential distribution
## a: exponent, density is propto x^a
## lb: lower bound, < ub
## ub: upper bound, > lb
## uses inverse cdf method
## returns a single real number
rtruncexp <- function(a,lb,ub){
u <- runif(1,0,1)
out <- (u*(ub^a - lb^a) + lb^a)^(1/a)
return(out)
}

## produces an iteration of a MCMC sampler for a truncated Dirichlet distribution
## xold: k dimensional vector, last draw of x_1,...,x_k
##       where x_{k+1}= 1 - x_1 - ... - x_k
## as: k dimensional vector, dirichlet parameters
## ubfun: function of the form ubfun(x, i) for i=1,2,...,k
##        computes the upper bound of x_i given x_{-i}
##        returns a real number between 0 (exclusive) and 1 (inclusive)
## lbfun: function of the form lbfun(x, i) for i=1,2,...,k
##        computes the lower bound of x_i given x_{-i}
##        returns a real number between 0 (inclusive) and 1 (exclusive)
## returns a vector whose first element is a latent variable, y, used in
##  the algorithm, and the last k+1 elements are x_1,...,x_k, x_{k+1}.
rtruncdirichletiter <- function(xold, as, lbfun, ubfun){
k <- length(xold)
alast <- as[k+1]
xnew <- xold
if(alast > 1){
ynew <- runif(1, 0, (1 - sum(xold))^(alast-1))
for(i in 1:k){
nb <- 1 - sum(xnew[-i]) - ynew^(1/(alast-1))
xnew[i] <- rtruncexp(as[i], lbfun(xnew,i), min(ubfun(xnew, i), nb))
}
}
if(alast < 1){
ynew <- runif(1, 0, (1 - sum(xold))^(alast-1))
for(i in 1:k){
nb <- 1 - sum(xnew[-i]) - ynew^(1/(alast-1))
xnew[i] <- rtruncexp(as[i], max(nb, lbfun(xnew,i)), ubfun(xnew, i))
}
}
if(alast == 1){
ynew <- 1
for(i in 1:k){
xnew[i] <- rtruncexp(as[i], lbfun(xnew,i), ubfun(xnew, i))
}
}
return(c(ynew, xnew))
}

## produces N iterations of an MCMC sampler for a truncated Dirichlet distribution
## xold: k dimensional vector, last draw of x_1,...,x_k
##       where x_{k+1}= 1 - x_1 - ... - x_k
## as: k dimensional vector, dirichlet parameters
## ubfun: function of the form ubfun(x, i) for i=1,2,...,k
##        computes the upper bound of x_i given x_{-i}
##        returns a real number between 0 (exclusive) and 1 (inclusive)
## lbfun: function of the form lbfun(x, i) for i=1,2,...,k
##        computes the lower bound of x_i given x_{-i}
##        returns a real number between 0 (inclusive) and 1 (exclusive)
## returns a matrix whose first column is a latent variable, y, used in
##  the algorithm, and the last k+1 columns are x_1,...,x_k, x_{k+1}.
rtruncdirichlet <- function(N, as, lbfun, ubfun, xstart){
out <- matrix(0,ncol=5,nrow=N)
xold <- xstart
colnames(out) <- c("y", "x1", "x2", "x3", "x4")
for(i in 1:N){
olds <- rtruncdirichletiter(xold, as, lbfun, ubfun)
xold <- olds[-1]
out[i,] <- c(olds, 1 - sum(xold))
}
return(out)
}


This should work for any truncated Dirichlet distribution where you can express the area $x_i$ is constrained to as an interval function of $x_{-i}$ for $i=1,2,...,k$. You have to supply the functions lbfun(x, i) and ubfun(x, i) to supply the lower and upper bounds of the $i$'th element of $x$ as a function of the full vector $x$. For example if the constraint is $x_1 < x_2 < ... < x_k$ you can use the functions below:

ubfun <- function(x, i){
xs <- c(0,x,1 - sum(x) )
return(xs[i+2])
}

lbfun <- function(x, i){
xs <- c(0,x,1 - sum(x) )
return(xs[i])
}


For example, suppose we have a $\mathcal{D}(10, 15, 28, 10)$ distribution truncated so that $x_1<x_2<x_3<x_4$ (recall that $x_4 = 1 - x_1 - x_2 - x_3$). Then we can start the chain at $x_1=.2$, $x_2=.3$, and $x_3=.4$ and obtain a Markov chain with 10,000 iterations as follows:

N <- 10000
as <- c(10, 15, 28, 10)
xstart <- c(.2, .3, .4)
out <- rtruncdirichlet(N, as, lbfun, ubfun, xstart)


And a quick look at the trace plots to check for convergence:

library(MCMCpack) ## for traceplot and mcmc functions
par(mfrow=c(3,2))
traceplot(mcmc(out))


which yields the plots below -- at a glance, it appears to converge quickly and mix fairly well. At least for this particular distribution.