Fast Pseudo Random Number Generators for R

Ralf Stubner

2018-06-08

The dqrng package provides fast random number generators (RNG) with good statistical properties for usage with R. It combines these RNGs with fast distribution functions to sample from uniform, normal or exponential distributions. Both the RNGs and the distribution functions are distributed as C++ header-only library.

Supported Random Number Generators

Support for the following 64bit RNGs are currently included:

Of these RNGs Xoroshiro128+ is fastest and therefore used in the examples.

Usage from R

Using these RNGs from R is delibratly similar to using R’s build-in RNGs:

Let’s look at the classical example of calculating \(\pi\) via simulation. The basic idea is to generate a large number of random points within the unit square. An approximation for \(\pi\) can then be calculated from the ratio of points within the unit circle to the total number of points. A vectorized implementation in R where we can switch the RNG might look like this:

N <- 1e7
piR <- function(n, rng = runif) {
    x <- rng(n)
    y <- rng(n)
    4 * sum(sqrt(x^2 + y^2) < 1.0) / n
}
set.seed(42)
system.time(cat("pi ~= ", piR(N), "\n"))
#> pi ~=  3.140899
#>    user  system elapsed 
#>   0.773   0.084   0.858

Using dqrng is about three times faster:

library(dqrng)
dqRNGkind("Xoroshiro128+")
dqset.seed(42)
system.time(cat("pi ~= ", piR(N, rng = dqrunif), "\n"))
#> pi ~=  3.14227
#>    user  system elapsed 
#>   1.668   0.085   1.753

Since the calculations add a constant off-set, the speed-up for the RNGs alone has to be even greater:

system.time(runif(N))
#>    user  system elapsed 
#>   0.287   0.012   0.299
system.time(dqrunif(N))
#>    user  system elapsed 
#>   0.893   0.025   0.917

Similar for the exponential distribution:

system.time(rexp(N))
#>    user  system elapsed 
#>   0.503   0.056   0.559
system.time(dqrexp(N))
#>    user  system elapsed 
#>   1.114   0.025   1.139

And for the normal distribution:

system.time(rnorm(N))
#>    user  system elapsed 
#>   0.660   0.013   0.672
system.time(dqrnorm(N))
#>    user  system elapsed 
#>   1.113   0.028   1.141

Usage from C++

The RNGs and distributions functions can also be used from C++ at various levels of abstraction. Technically there are three ways to make use of dqrng at the C++ level:

Here only the first approach is shown.

Using the compiled library functions

The functions available in R directly call corresponding C++ functions. These functions are also available at the C++ level if you include dqrng.h. Revisting the example of approximating \(\pi\) we can use:

// [[Rcpp::depends(dqrng)]]
#include <Rcpp.h>
#include <dqrng.h>

using Rcpp::NumericVector;
using Rcpp::sqrt;
using Rcpp::sum;
using dqrng::dqrunif;

// [[Rcpp::export]]
double piCpp(const int n) {
  dqrng::dqRNGkind("Xoroshiro128+");
  dqrng::dqset_seed(42);
  NumericVector x = dqrunif(n);
  NumericVector y = dqrunif(n);
  NumericVector d = sqrt(x*x + y*y);
  return 4.0 * sum(d < 1.0) / n;
}
/*** R
system.time(cat("pi ~= ", piCpp(1e7), "\n"))
*/

Note that in C++ you have to use dqrng::dqset_seed(), whereas the anlogue function in the R interface is called dqrng::dqset.seed().

Using the header only library

The RNG wrapper and distributions functions can be used from C++ by including dqrng_generator.h and dqrng_distribution.h. For example, you can use the distribution functions from dqrng together with some foreign 64bit RNG. Here a minimal SplitMix generator is used together with dqrng::normal_distribution:

#include <cstdint>
#include <Rcpp.h>
// [[Rcpp::depends(dqrng)]]
#include <dqrng_distribution.h>

class SplitMix : public dqrng::random_64bit_generator {
public:
  SplitMix (result_type seed) : state(seed) {};
  result_type operator() () {
    result_type z = (state += 0x9e3779b97f4a7c15ULL);
    z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9ULL;
    z = (z ^ (z >> 27)) * 0x94d049bb133111ebULL;
    return z ^ (z >> 31);
  }
  void seed(result_type seed) {state = seed;}

private:
  result_type state;
};

// [[Rcpp::export]]
Rcpp::NumericVector splitmix_rnorm(const int n, const double mean = 0.0, const double sd = 1.0) {
  auto rng = dqrng::generator<SplitMix>(42);
  dqrng::normal_distribution dist(mean, sd);
  return dqrng::generate<dqrng::normal_distribution, Rcpp::NumericVector>(n, rng, dist);
}
/*** R
splitmix_rnorm(10)
system.time(splitmix_rnorm(1e7))
*/

Since SplitMix is a very fast RNG, the speed of this function is comparable to dqrnorm. Alternatively, you could combine the included RNGs together with dqrng’s tooling and some other distribution function. For example, this function generates random numbers according to the normal distribution using the standard library from C++11:

// [[Rcpp::plugins(cpp11)]]   
#include <random>
#include <Rcpp.h>
// [[Rcpp::depends(dqrng)]]
#include <dqrng_distribution.h>
#include <xoroshiro.hpp>

// [[Rcpp::export]]
Rcpp::NumericVector std_rnorm(const int n, const double mean = 0.0, const double sd = 1.0) {
  auto rng = dqrng::generator<xoroshiro128plus_engine>(42);
  std::normal_distribution<double> dist(mean, sd);
  return dqrng::generate<std::normal_distribution<double>, Rcpp::NumericVector>(n, rng, dist);
}
/*** R
std_rnorm(10)
system.time(std_rnorm(1e7))
*/

Typically this is not as fast as dqrnorm, but the technique is useful to support distributions not (yet) included in dqrng. Finally you could directly use the base generators, which are provided as header-only libraries, without dqrng’s tooling. For example, the following function uses the 32 bit PCG variant together with Boost’s normal distribution function:

#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH)]]
#include <pcg_random.hpp>
#include <boost/random/normal_distribution.hpp>

// [[Rcpp::plugins(cpp11)]]   
// [[Rcpp::export]]
Rcpp::NumericVector boost_pcg_rnorm(const int n, const double mean = 0.0, const double sd = 1.0) {
  pcg32 rng(42);
  boost::random::normal_distribution<double> dist(mean, sd);
  Rcpp::NumericVector result(n);
  auto gen = std::bind(dist, rng);
  std::generate(result.begin(), result.end(), gen);
  return result;
}
/*** R
boost_pcg_rnorm(10)
system.time(boost_pcg_rnorm(1e7))
*/

This is quite fast since boost::random::normal_distribution uses the fast Ziggurat algorithm.