The Poisson binomial distribution is becoming increasingly important, especially in the areas of statistics, finance, insurance mathematics and quality management. Still, only a few R
packages exist for its calculation, namely poibin
and poisbinom
. Both are based on Hong (2013). The first one implements the exact DFT-CF algorithm along with an exact recursive method and Normal and Poisson approximations. The latter package only provides a more efficient DFT-CF implementation. Unfortunately, it sometimes yields negative probabilities, especially for large distributions. This numerical issue has not been addressed to date. Biscarri, Zhao & Brunner (2018) developed two more efficient procedures, but at the time of this writing, no package exists that implements them, because the authors published a reference implementation for R
, but refrained from releasing it as a package. In addition to the disadvantages regarding computational speed, especially for very large distributions, poibin
and poisbinom
do not provide headers to their internal C/C++ functions, so that they cannot be imported directly by C or C++ code of other packages that use for example Rcpp
. In our own project, we often have to deal with Poisson binomial distributions that include Bernoulli trials with \(p_i \in \{0, 1\}\). Computation can be further optimized by handling these trials before the actual computations. None of the aforementioned packages do that. That is why we decided to develop PoissonBinomial
. We needed a package that
While implementing the procedures of Biscarri, Zhao & Brunner (2018), it was decided to also include all methods that are described in Hong (2013), together with three additional binomial approximations.
In this package, the following exact algorithms for computing the Poisson Binomial distribution with Bernoulli probabilities \(p_1, ..., p_n\) are implemented:
Examples and performance comparisons of these procedures are provided in a separate vignette.
In addition, the following approximation methods are provided:
Again, examples and performance comparisons are for these approaches are presented in a separate vignette as well.
Unfortunately, some approximations do not work at all for Bernoulli trials with \(p_i = 1\). This is why handling these values before the actual computation of the distribution is not only a performance tweak, but sometimes even a necessity. It is achieved by some simple preliminary considerations:
These cases are illustrated in the following example:
library(PoissonBinomial)
# Case 1
dpbinom(NULL, rep(0.3, 7))
#> [1] 0.0823543 0.2470629 0.3176523 0.2268945 0.0972405 0.0250047 0.0035721
#> [8] 0.0002187
dbinom(0:7, 7, 0.3)
#> [1] 0.0823543 0.2470629 0.3176523 0.2268945 0.0972405 0.0250047 0.0035721
#> [8] 0.0002187
# equal results
# Case 2
dpbinom(NULL, c(0, 0, 0, 0, 0, 0, 0))
#> [1] 1 0 0 0 0 0 0 0
dpbinom(NULL, c(1, 1, 1, 1, 1, 1, 1))
#> [1] 0 0 0 0 0 0 0 1
dpbinom(NULL, c(0, 0, 0, 0, 1, 1, 1))
#> [1] 0 0 0 1 0 0 0 0
# Case 3
dpbinom(NULL, c(0, 0, 0.4, 0.2, 0.8, 0.1, 1), method = "Convolve")
#> [1] 0.0000 0.0864 0.4344 0.3784 0.0944 0.0064 0.0000 0.0000
How to import and use the internal C++ functions in Rcpp based packages is described in a separate vignette.