An Introduction to mvnfast

Introduction

The mvn R package provides computationally efficient tools related to the multivariate normal and Student's t distributions. The tools are generally faster than those provided by other packages, thanks to the use of C++ code through the Rcpp\RcppArmadillo packages and parallelization through the OpenMP API. The most important functions are:

In the following sections we will benchmark each function against equivalent functions provided by other packages, while in the final section we provide an example application.

Simulating multivariate normal or Student's t random vectors

Simulating multivariate normal random variables is an essential step in many Monte Carlo algorithms (such as MCMC or Particle Filters), hence this operations has to be as fast as possible. Here we compare the rmvn function with the equivalent function rmvnorm (from the mvtnorm package) and mvrnorm (from the MASS package). In particular, we simulate \(10^4\) twenty-dimensional random vectors:

library("microbenchmark")
library("mvtnorm")
library("mvnfast")
library("MASS")
# We might also need to turn off BLAS parallelism 
library("RhpcBLASctl")
blas_set_num_threads(1)
N <- 10000
d <- 20

# Creating mean and covariance matrix
mu <- 1:d
tmp <- matrix(rnorm(d^2), d, d)
mcov <- tcrossprod(tmp, tmp)

microbenchmark(rmvn(N, mu, mcov, ncores = 2),
               rmvn(N, mu, mcov),
               rmvnorm(N, mu, mcov),
               mvrnorm(N, mu, mcov))
## Unit: milliseconds
##                           expr       min        lq      mean    median
##  rmvn(N, mu, mcov, ncores = 2)  2.898475  3.028725  5.597719  3.410570
##              rmvn(N, mu, mcov)  4.810192  4.946226  6.342315  5.415242
##           rmvnorm(N, mu, mcov) 15.075697 16.102871 22.289793 16.708962
##           mvrnorm(N, mu, mcov) 14.846937 16.026077 24.173698 16.699599
##         uq      max neval cld
##   3.940261 35.18251   100  a 
##   5.820625 37.71020   100  a 
##  18.064287 51.48729   100   b
##  42.166270 55.16909   100   b

In this example rmvn cuts the computational time, relative to the alternatives, even when a single core is used. This gain is attributable to several factors: the use of C++ code and efficient numerical algorithms to simulate the random variables. Parallelizing the computation on two cores gives another appreciable speed-up. To be fair, it is necessary to point out that rmvnorm and mvrnorm have many more safety check on the user's input than rmvn. This is true also for the functions described in the next sections.

Notice that this function does not use one of the Random Number Generators (RNGs) provided by R, but one of the parallel cryptographic RNGs described in (Salmon et al., 2011) and available here. It is important to point out that this RNG can safely be used in parallel, without risk of collisions between parallel sequence of random numbers, as detailed in the above reference.

We get similar performance gains when we simulate multivariate Student's t random variables:

# Here we have a conflict between namespaces
microbenchmark(mvnfast::rmvt(N, mu, mcov, df = 3, ncores = 2),
               mvnfast::rmvt(N, mu, mcov, df = 3),
               mvtnorm::rmvt(N, delta = mu, sigma = mcov, df = 3))
## Unit: milliseconds
##                                                expr       min        lq
##      mvnfast::rmvt(N, mu, mcov, df = 3, ncores = 2)  5.889675  6.278573
##                  mvnfast::rmvt(N, mu, mcov, df = 3)  8.009106  8.170333
##  mvtnorm::rmvt(N, delta = mu, sigma = mcov, df = 3) 19.096286 20.456117
##      mean    median        uq       max neval cld
##  12.41912  6.981301  7.482693  89.27455   100  a 
##  12.42678  9.081007  9.623562  85.65057   100  a 
##  35.55394 21.798510 24.253274 101.30815   100   b

When d and N are large, and rmvn or rmvt are called several times with the same arguments, it would make sense to create the matrix where to store the simulated random variable upfront. This can be done as follows:

A <- matrix(nrow = N, ncol = d)
class(A) <- "numeric" # This is important. We need the elements of A to be of class "numeric".  

rmvn(N, mu, mcov, A = A) 

Notice that here rmvn returns NULL, not the simulated random vectors! These can be found in the matrix provided by the user:

A[1:2, 1:5]             
##          [,1]     [,2]     [,3]     [,4]      [,5]
## [1,] 2.621417 1.250653 6.276235 5.654118 13.632100
## [2,] 7.882625 1.431326 1.232163 2.820529  2.803018

Pre-creating the matrix of random variables saves some more time:

microbenchmark(rmvn(N, mu, mcov, ncores = 2, A = A),
               rmvn(N, mu, mcov, ncores = 2), 
               times = 200)
## Unit: milliseconds
##                                  expr      min       lq     mean   median
##  rmvn(N, mu, mcov, ncores = 2, A = A) 2.605242 2.641033 2.765544 2.675178
##         rmvn(N, mu, mcov, ncores = 2) 2.854067 2.916154 4.386438 3.067153
##        uq       max neval cld
##  2.819408  3.667438   200  a 
##  4.004929 71.131911   200   b

Don't look at the median time here, the mean is much more affected by memory re-allocation.

Evaluating the multivariate normal and Student's t densities

Here we compare the dmvn function, which evaluates the multivariate normal density, with the equivalent function dmvtnorm (from the mvtnorm package). In particular we evaluate the log-density of \(10^4\) twenty-dimensional random vectors:

# Generating random vectors 
N <- 10000
d <- 20
mu <- 1:d
tmp <- matrix(rnorm(d^2), d, d)
mcov <- tcrossprod(tmp, tmp)
X <- rmvn(N, mu, mcov)

microbenchmark(dmvn(X, mu, mcov, ncores = 2, log = T),
               dmvn(X, mu, mcov, log = T),
               dmvnorm(X, mu, mcov, log = T), times = 500)
## Unit: milliseconds
##                                    expr      min       lq     mean
##  dmvn(X, mu, mcov, ncores = 2, log = T) 1.580086 1.659250 2.206805
##              dmvn(X, mu, mcov, log = T) 2.681140 2.875031 3.180422
##           dmvnorm(X, mu, mcov, log = T) 2.326221 2.535553 5.582572
##    median       uq       max neval cld
##  1.802728 1.986381 75.774183   500  a 
##  3.034956 3.340515  6.671343   500  a 
##  3.414073 3.836296 81.165081   500   b

Again, we get some speed-up using C++ code and some more from the parallelization. We get similar results if we use a multivariate Student's t density:

# We have a namespace conflict
microbenchmark(mvnfast::dmvt(X, mu, mcov, df = 4, ncores = 2, log = T),
               mvnfast::dmvt(X, mu, mcov, df = 4, log = T),
               mvtnorm::dmvt(X, delta = mu, sigma = mcov, df = 4, log = T), times = 500)
## Unit: milliseconds
##                                                         expr      min
##      mvnfast::dmvt(X, mu, mcov, df = 4, ncores = 2, log = T) 1.769784
##                  mvnfast::dmvt(X, mu, mcov, df = 4, log = T) 2.899262
##  mvtnorm::dmvt(X, delta = mu, sigma = mcov, df = 4, log = T) 2.565741
##        lq     mean   median       uq       max neval cld
##  1.870723 2.053673 1.990358 2.090293  5.922307   500 a  
##  3.079572 3.309611 3.216120 3.351971  5.936427   500  b 
##  2.763745 6.218099 4.008344 4.294918 83.288077   500   c

Evaluating the Mahalanobis distance

Finally, we compare the maha function, which evaluates the square mahalanobis distance with the equivalent function mahalanobis (from the stats package). Also in the case we use \(10^4\) twenty-dimensional random vectors:

# Generating random vectors 
N <- 10000
d <- 20
mu <- 1:d
tmp <- matrix(rnorm(d^2), d, d)
mcov <- tcrossprod(tmp, tmp)
X <- rmvn(N, mu, mcov)

microbenchmark(maha(X, mu, mcov, ncores = 2),
               maha(X, mu, mcov),
               mahalanobis(X, mu, mcov))
## Unit: milliseconds
##                           expr      min       lq     mean   median
##  maha(X, mu, mcov, ncores = 2) 1.460333 1.511454 1.658136 1.619147
##              maha(X, mu, mcov) 2.623915 2.746698 2.907350 2.818961
##       mahalanobis(X, mu, mcov) 2.671957 2.970613 5.845064 3.872397
##        uq       max neval cld
##  1.713244  2.369506   100  a 
##  2.952926  3.855351   100  a 
##  4.263406 80.308718   100   b

The acceleration is similar to that obtained in the previous sections.

Example: mean-shift mode seeking algorithm

As an example application of the dmvn function, we implemented the mean-shift mode seeking algorithm. This procedure can be used to find the mode or maxima of a kernel density function, and it can be used to set up clustering algorithms. Here we simulate \(10^4\) d-dimensional random vectors from mixture of normal distributions:

set.seed(5135)
N <- 10000
d <- 2
mu1 <- c(0, 0); mu2 <- c(2, 3)
Cov1 <- matrix(c(1, 0, 0, 2), 2, 2)
Cov2 <- matrix(c(1, -0.9, -0.9, 1), 2, 2)

bin <- rbinom(N, 1, 0.5)

X <- bin * rmvn(N, mu1, Cov1) + (!bin) * rmvn(N, mu2, Cov2)

Finally, we plot the resulting probability density and, starting from 10 initial points, we use mean-shift to converge to the nearest mode:

# Plotting
np <- 100
xvals <- seq(min(X[ , 1]), max(X[ , 1]), length.out = np)
yvals <- seq(min(X[ , 2]), max(X[ , 2]), length.out = np)
theGrid <- expand.grid(xvals, yvals) 
theGrid <- as.matrix(theGrid)
dens <- dmixn(theGrid, 
              mu = rbind(mu1, mu2), 
              sigma = list(Cov1, Cov2), 
              w = rep(1, 2)/2)
plot(X[ , 1], X[ , 2], pch = '.', lwd = 0.01, col = 3)
contour(x = xvals, y = yvals, z = matrix(dens, np, np),
        levels = c(0.002, 0.01, 0.02, 0.04, 0.08, 0.15 ), add = TRUE, lwd = 2)

# Mean-shift
library(plyr)
inits <- matrix(c(-2, 2, 0, 3, 4, 3, 2, 5, 2, -3, 2, 2, 0, 2, 3, 0, 0, -4, -2, 6), 
                10, 2, byrow = TRUE)
traj <- alply(inits,
              1,
              function(input)
                  ms(X = X, 
                     init = input, 
                     H = 0.05 * cov(X), 
                     ncores = 2, 
                     store = TRUE)$traj
              )

invisible( lapply(traj, 
                  function(input){ 
                    lines(input[ , 1], input[ , 2], col = 2, lwd = 1.5)
                    points(tail(input[ , 1]), tail(input[ , 2]))
           }))

plot of chunk mixPlot As we can see from the plot, each initial point leads one of two points that are very close to the true mode. Notice that the bandwidth for the kernel density estimator was chosen by trial-and-error, and less arbitrary choices are certainly possible in real applications.

References

h