In this package, the following exact algorithms for computing the Poisson Binomial distribution with Bernoulli probabilities \(p_1, ..., p_n\) are implemented:
The computation of these procedures is optimized and accelerated by some simple preliminary considerations:
These cases are illustrated in the following example:
library(PoissonBinomial)
# Case 1
dpbinom(NULL, rep(0.3, 7))
#> [1] 0.0823543 0.2470629 0.3176523 0.2268945 0.0972405 0.0250047 0.0035721
#> [8] 0.0002187
dbinom(0:7, 7, 0.3)
#> [1] 0.0823543 0.2470629 0.3176523 0.2268945 0.0972405 0.0250047 0.0035721
#> [8] 0.0002187
# equal results
# Case 2
dpbinom(NULL, c(0, 0, 0, 0, 0, 0, 0))
#> [1] 1 0 0 0 0 0 0 0
dpbinom(NULL, c(1, 1, 1, 1, 1, 1, 1))
#> [1] 0 0 0 0 0 0 0 1
dpbinom(NULL, c(0, 0, 0, 0, 1, 1, 1))
#> [1] 0 0 0 1 0 0 0 0
# Case 3
dpbinom(NULL, c(0, 0, 0.4, 0.2, 0.8, 0.1, 1))
#> [1] 0.0000 0.0864 0.4344 0.3784 0.0944 0.0064 0.0000 0.0000
The Direct Convolution (DC) approach is requested with method = "Convolve"
.
set.seed(1)
pp <- runif(10)
wt <- sample(1:10, 10, TRUE)
dpbinom(NULL, pp, wt, "Convolve")
#> [1] 3.574462e-35 1.120280e-32 1.685184e-30 1.620524e-28 1.119523e-26
#> [6] 5.920060e-25 2.493263e-23 8.591850e-22 2.470125e-20 6.011429e-19
#> [11] 1.252345e-17 2.253115e-16 3.525477e-15 4.825171e-14 5.803728e-13
#> [16] 6.158735e-12 5.784692e-11 4.822437e-10 3.576566e-09 2.364563e-08
#> [21] 1.395965e-07 7.370448e-07 3.484836e-06 1.477208e-05 5.619632e-05
#> [26] 1.920240e-04 5.897928e-04 1.629272e-03 4.049768e-03 9.060183e-03
#> [31] 1.824629e-02 3.307754e-02 5.396724e-02 7.921491e-02 1.045505e-01
#> [36] 1.239854e-01 1.319896e-01 1.259938e-01 1.077029e-01 8.232174e-02
#> [41] 5.616422e-02 3.413623e-02 1.844304e-02 8.835890e-03 3.743554e-03
#> [46] 1.398320e-03 4.589049e-04 1.318064e-04 3.298425e-05 7.154649e-06
#> [51] 1.337083e-06 2.137543e-07 2.898296e-08 3.298587e-09 3.110922e-10
#> [56] 2.392070e-11 1.468267e-12 6.991155e-14 2.478218e-15 6.130807e-17
#> [61] 9.411166e-19 6.727527e-21
ppbinom(NULL, pp, wt, "Convolve")
#> [1] 3.574462e-35 1.123854e-32 1.696423e-30 1.637488e-28 1.135898e-26
#> [6] 6.033650e-25 2.553600e-23 8.847210e-22 2.558597e-20 6.267289e-19
#> [11] 1.315018e-17 2.384617e-16 3.763939e-15 5.201565e-14 6.323884e-13
#> [16] 6.791123e-12 6.463805e-11 5.468818e-10 4.123448e-09 2.776908e-08
#> [21] 1.673656e-07 9.044104e-07 4.389247e-06 1.916133e-05 7.535765e-05
#> [26] 2.673817e-04 8.571745e-04 2.486446e-03 6.536215e-03 1.559640e-02
#> [31] 3.384269e-02 6.692022e-02 1.208875e-01 2.001024e-01 3.046529e-01
#> [36] 4.286383e-01 5.606280e-01 6.866217e-01 7.943246e-01 8.766463e-01
#> [41] 9.328105e-01 9.669468e-01 9.853898e-01 9.942257e-01 9.979692e-01
#> [46] 9.993676e-01 9.998265e-01 9.999583e-01 9.999913e-01 9.999984e-01
#> [51] 9.999998e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
#> [56] 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
#> [61] 1.000000e+00 1.000000e+00
The Divide & Conquer FFT Tree Convolution (DC-FFT) approach is requested with method = "DivideFFT"
.
set.seed(1)
pp <- runif(10)
wt <- sample(1:10, 10, TRUE)
dpbinom(NULL, pp, wt, "DivideFFT")
#> [1] 3.574462e-35 1.120280e-32 1.685184e-30 1.620524e-28 1.119523e-26
#> [6] 5.920060e-25 2.493263e-23 8.591850e-22 2.470125e-20 6.011429e-19
#> [11] 1.252345e-17 2.253115e-16 3.525477e-15 4.825171e-14 5.803728e-13
#> [16] 6.158735e-12 5.784692e-11 4.822437e-10 3.576566e-09 2.364563e-08
#> [21] 1.395965e-07 7.370448e-07 3.484836e-06 1.477208e-05 5.619632e-05
#> [26] 1.920240e-04 5.897928e-04 1.629272e-03 4.049768e-03 9.060183e-03
#> [31] 1.824629e-02 3.307754e-02 5.396724e-02 7.921491e-02 1.045505e-01
#> [36] 1.239854e-01 1.319896e-01 1.259938e-01 1.077029e-01 8.232174e-02
#> [41] 5.616422e-02 3.413623e-02 1.844304e-02 8.835890e-03 3.743554e-03
#> [46] 1.398320e-03 4.589049e-04 1.318064e-04 3.298425e-05 7.154649e-06
#> [51] 1.337083e-06 2.137543e-07 2.898296e-08 3.298587e-09 3.110922e-10
#> [56] 2.392070e-11 1.468267e-12 6.991155e-14 2.478218e-15 6.130807e-17
#> [61] 9.411166e-19 6.727527e-21
ppbinom(NULL, pp, wt, "DivideFFT")
#> [1] 3.574462e-35 1.123854e-32 1.696423e-30 1.637488e-28 1.135898e-26
#> [6] 6.033650e-25 2.553600e-23 8.847210e-22 2.558597e-20 6.267289e-19
#> [11] 1.315018e-17 2.384617e-16 3.763939e-15 5.201565e-14 6.323884e-13
#> [16] 6.791123e-12 6.463805e-11 5.468818e-10 4.123448e-09 2.776908e-08
#> [21] 1.673656e-07 9.044104e-07 4.389247e-06 1.916133e-05 7.535765e-05
#> [26] 2.673817e-04 8.571745e-04 2.486446e-03 6.536215e-03 1.559640e-02
#> [31] 3.384269e-02 6.692022e-02 1.208875e-01 2.001024e-01 3.046529e-01
#> [36] 4.286383e-01 5.606280e-01 6.866217e-01 7.943246e-01 8.766463e-01
#> [41] 9.328105e-01 9.669468e-01 9.853898e-01 9.942257e-01 9.979692e-01
#> [46] 9.993676e-01 9.998265e-01 9.999583e-01 9.999913e-01 9.999984e-01
#> [51] 9.999998e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
#> [56] 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
#> [61] 1.000000e+00 1.000000e+00
By design, as proposed by Biscarri, Zhao & Brunner (2018), its results are identical to the DC procedure, if \(n \leq 750\). Thus, differences can be observed for larger \(n > 750\):
set.seed(1)
pp1 <- runif(751)
pp2 <- pp1[1:750]
sum(abs(dpbinom(NULL, pp2, method = "DivideFFT") - dpbinom(NULL, pp2, method = "Convolve")))
#> [1] 0
sum(abs(dpbinom(NULL, pp1, method = "DivideFFT") - dpbinom(NULL, pp1, method = "Convolve")))
#> [1] 5.704337e-16
The reason is that the DC-FFT method splits the input probs
vector into as equally sized parts as possible and computes their distributions separately with the DC approach. The results of the portions are then convoluted by means of the Fast Fourier Transformation. As proposed by Biscarri, Zhao & Brunner (2018), no splitting is done for \(n \leq 750\). In addition, the DC-FFT procedure does not produce probabilities \(\leq 5.55e\text{-}17\), i.e. smaller values are rounded off to 0, if \(n > 750\), whereas the smallest possible result of the DC algorithm is \(\sim 1e\text{-}323\). This is most likely caused by the used FFTW3 library.
The Discrete Fourier Transformation of the Characteristic Function (DFT-CF) approach is requested with method = "Characteristic"
.
set.seed(1)
pp <- runif(10)
wt <- sample(1:10, 10, TRUE)
dpbinom(NULL, pp, wt, "Characteristic")
#> [1] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
#> [6] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
#> [11] 0.000000e+00 2.238353e-16 3.549132e-15 4.829828e-14 5.804377e-13
#> [16] 6.158818e-12 5.784702e-11 4.822438e-10 3.576566e-09 2.364563e-08
#> [21] 1.395965e-07 7.370448e-07 3.484836e-06 1.477208e-05 5.619632e-05
#> [26] 1.920240e-04 5.897928e-04 1.629272e-03 4.049768e-03 9.060183e-03
#> [31] 1.824629e-02 3.307754e-02 5.396724e-02 7.921491e-02 1.045505e-01
#> [36] 1.239854e-01 1.319896e-01 1.259938e-01 1.077029e-01 8.232174e-02
#> [41] 5.616422e-02 3.413623e-02 1.844304e-02 8.835890e-03 3.743554e-03
#> [46] 1.398320e-03 4.589049e-04 1.318064e-04 3.298425e-05 7.154649e-06
#> [51] 1.337083e-06 2.137543e-07 2.898296e-08 3.298587e-09 3.110923e-10
#> [56] 2.392079e-11 1.468354e-12 6.994931e-14 2.513558e-15 5.551115e-17
#> [61] 0.000000e+00 0.000000e+00
ppbinom(NULL, pp, wt, "Characteristic")
#> [1] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
#> [6] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
#> [11] 0.000000e+00 2.238353e-16 3.772968e-15 5.207125e-14 6.325089e-13
#> [16] 6.791327e-12 6.463834e-11 5.468822e-10 4.123448e-09 2.776908e-08
#> [21] 1.673656e-07 9.044104e-07 4.389247e-06 1.916133e-05 7.535765e-05
#> [26] 2.673817e-04 8.571745e-04 2.486446e-03 6.536215e-03 1.559640e-02
#> [31] 3.384269e-02 6.692022e-02 1.208875e-01 2.001024e-01 3.046529e-01
#> [36] 4.286383e-01 5.606280e-01 6.866217e-01 7.943246e-01 8.766463e-01
#> [41] 9.328105e-01 9.669468e-01 9.853898e-01 9.942257e-01 9.979692e-01
#> [46] 9.993676e-01 9.998265e-01 9.999583e-01 9.999913e-01 9.999984e-01
#> [51] 9.999998e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
#> [56] 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
#> [61] 1.000000e+00 1.000000e+00
As can be seen, the DFT-CF procedure does not produce probabilities \(\leq 5.55e\text{-}17\), i.e. smaller values are rounded off to 0, most likely due to the used FFTW3 library.
The Recursive Formula (RF) approach is requested with method = "Recursive"
.
set.seed(1)
pp <- runif(10)
wt <- sample(1:10, 10, TRUE)
dpbinom(NULL, pp, wt, "Recursive")
#> [1] 3.574462e-35 1.120280e-32 1.685184e-30 1.620524e-28 1.119523e-26
#> [6] 5.920060e-25 2.493263e-23 8.591850e-22 2.470125e-20 6.011429e-19
#> [11] 1.252345e-17 2.253115e-16 3.525477e-15 4.825171e-14 5.803728e-13
#> [16] 6.158735e-12 5.784692e-11 4.822437e-10 3.576566e-09 2.364563e-08
#> [21] 1.395965e-07 7.370448e-07 3.484836e-06 1.477208e-05 5.619632e-05
#> [26] 1.920240e-04 5.897928e-04 1.629272e-03 4.049768e-03 9.060183e-03
#> [31] 1.824629e-02 3.307754e-02 5.396724e-02 7.921491e-02 1.045505e-01
#> [36] 1.239854e-01 1.319896e-01 1.259938e-01 1.077029e-01 8.232174e-02
#> [41] 5.616422e-02 3.413623e-02 1.844304e-02 8.835890e-03 3.743554e-03
#> [46] 1.398320e-03 4.589049e-04 1.318064e-04 3.298425e-05 7.154649e-06
#> [51] 1.337083e-06 2.137543e-07 2.898296e-08 3.298587e-09 3.110922e-10
#> [56] 2.392070e-11 1.468267e-12 6.991155e-14 2.478218e-15 6.130807e-17
#> [61] 9.411166e-19 6.727527e-21
ppbinom(NULL, pp, wt, "Recursive")
#> [1] 3.574462e-35 1.123854e-32 1.696423e-30 1.637488e-28 1.135898e-26
#> [6] 6.033650e-25 2.553600e-23 8.847210e-22 2.558597e-20 6.267289e-19
#> [11] 1.315018e-17 2.384617e-16 3.763939e-15 5.201565e-14 6.323884e-13
#> [16] 6.791123e-12 6.463805e-11 5.468818e-10 4.123448e-09 2.776908e-08
#> [21] 1.673656e-07 9.044104e-07 4.389247e-06 1.916133e-05 7.535765e-05
#> [26] 2.673817e-04 8.571745e-04 2.486446e-03 6.536215e-03 1.559640e-02
#> [31] 3.384269e-02 6.692022e-02 1.208875e-01 2.001024e-01 3.046529e-01
#> [36] 4.286383e-01 5.606280e-01 6.866217e-01 7.943246e-01 8.766463e-01
#> [41] 9.328105e-01 9.669468e-01 9.853898e-01 9.942257e-01 9.979692e-01
#> [46] 9.993676e-01 9.998265e-01 9.999583e-01 9.999913e-01 9.999984e-01
#> [51] 9.999998e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
#> [56] 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
#> [61] 1.000000e+00 1.000000e+00
Obviously, the RF procedure does produce probabilities \(\leq 5.55e\text{-}17\), because it does not rely on the FFTW3 library. Furthermore, it yields the same results as the DC method.
To assess the performance of the exact procedures, we use the microbenchmark
package. Each algorithm has to calculate the PMF repeatedly based on random probability vectors. The run times are then summarized in a table that presents, among other statistics, their minima, maxima and means. The following results were recorded on an AMD Ryzen 7 1800X with 32 GiB of RAM and Ubuntu 18.04.3 (running inside a VirtualBox VM; the host system is Windows 10 Education).
library(microbenchmark)
set.seed(1)
f1 <- function() dpbinom(NULL, runif(4000), method = "DivideFFT")
f2 <- function() dpbinom(NULL, runif(4000), method = "Convolve")
f3 <- function() dpbinom(NULL, runif(4000), method = "Characteristic")
f4 <- function() dpbinom(NULL, runif(4000), method = "Recursive")
microbenchmark(f1(), f2(), f3(), f4())
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> f1() 5.478379 5.884806 6.714503 6.16171 6.764316 21.78661 100
#> f2() 16.509786 17.051717 18.341106 17.46519 18.628877 59.29212 100
#> f3() 22.207496 22.957959 23.424588 23.14626 23.367803 41.59981 100
#> f4() 34.791255 36.084817 36.425160 36.34899 36.543581 42.72910 100
Clearly, the DC-FFT procedure is the fastest, followed by the DC and DFT-CF methods, which need roughly 3 times as much time, and the RF approach. DC and DFT-CF procedures exhibit almost equal mean execution speed, with the DC algorithm being slightly faster (and with some advantage in precision, as stated before). The RF approach is the slowest one and its computation takes roughly twice as long as those of the DC procedure.