R package coga: Convolution of Gamma Distributions

Chaoran Hu

2017-06-28

Introduction

This R package coga can help you to calculate density and distribution function of convolution of gamma distributions. The convolution of gamma distributions is the sum of series of independent gamma distributions. The algorithm of this package comes from Moschopoulos Peter G. (1985). The R coda in this vignette also can be considered as useful examples.

Algorithm

Assume that we have several random variables, \(X_1, ..., X_n\), and all random variables follow gamma distribution independently with shape parameters \(\alpha_i\) and scale parameters \(\beta_i\), where \(i = 1, ..., n\). Then, the density of \(Y = X_1 + ... + X_n\) can be expressed as:

\[g(y) = C \sum_{k=0}^{\infty} \lambda_k y^{\rho + k - 1} e^{-y/\beta_1} / (\Gamma(\rho + k) \beta_{1}^{\rho + k})\]

And the distribution function \(G(w)=Pr(Y<w)\) is expressed as:

\[G(w) = C \sum_{k=0}^{\infty} \lambda_k \int_{0}^{w} (y^{\rho + k - 1} e^{-y/\beta_1} / (\Gamma(\rho + k) \beta_{1}^{\rho + k})) dy\]

The integrate in this formula is incomplete gamma function and can be calculated by distribution function of gamma distribution.

More details about this algorithm can be found in paper of Moschopoulos Peter G. (1985).

Correctness

Assume that we have two random variables, \(X_1\) and \(X_2\), where \(X_1\) is a gamma distribution with shape parameter \(3\), and rate parameter \(2\), and \(X_2\) is a gamma distribution with shape parameter \(4\), and rate parameter \(3\). The density and distribution funciton of \(Y = X_1 + X_2\) will be calculated.

Correctness check for density function:

y <- rcoga(1000000, c(3,4), c(2,3))
grid <- seq(0, 8, length.out=1000)
pdf <- dcoga(grid, shape=c(3, 4), rate=c(2, 3))
 
plot(density(y), col="blue")
lines(grid, pdf, col="red")

Correctness check for distribution function:

y <- rcoga(1000000, c(3,4), c(2,3))
grid <- seq(0, 8, length.out=1000)
cdf <- pcoga(grid, shape=c(3, 4), rate=c(2, 3))

plot(ecdf(y), col="blue")
lines(grid, cdf, col="red")

Speed

The ‘dcoga’ and ‘pcoga’ functions in this package ‘coga’ is based on Cpp code. The following experiment shows the advantage of Cpp code, which runs on a Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz computer.

grid <- seq(0, 15, length.out=10)

microbenchmark::microbenchmark(
    dcoga(grid, shape=c(3,4,5), rate=c(2,3,4)),
    coga:::dcoga.R(grid, shape=c(3,4,5), rate=c(2,3,4)),
    pcoga(grid, shape=c(3,4,5), rate=c(2,3,4)),
    coga:::pcoga.R(grid, shape=c(3,4,5), rate=c(2,3,4))
)
## Unit: milliseconds
##                                                         expr       min
##           dcoga(grid, shape = c(3, 4, 5), rate = c(2, 3, 4))  1.291005
##  coga:::dcoga.R(grid, shape = c(3, 4, 5), rate = c(2, 3, 4)) 29.972928
##           pcoga(grid, shape = c(3, 4, 5), rate = c(2, 3, 4))  4.075797
##  coga:::pcoga.R(grid, shape = c(3, 4, 5), rate = c(2, 3, 4)) 37.945149
##         lq      mean    median        uq       max neval
##   1.449642  2.072026  1.733290  1.864550  38.17066   100
##  32.316995 38.529133 34.938479 35.619007 104.49107   100
##   4.931446  8.275947  5.248308  5.574377  46.21576   100
##  39.865460 50.433972 43.021125 53.660163  85.49018   100

Note: In this example, ‘dcoga.R’, and ‘pcoga.R’ are the R version functions for density, and distribution functions of convolution of gamma distributions. We do not put these two R functions as export functions in package ‘coga’, but you can still use them by ‘coga:::dcoga’, and ‘coga:::pcoga’.

The convolution of two gamma distributions is a special situation of convolution of gamma distributions. The functions ‘dcoga2dim’ and ‘pcoga2dim’ can solve this problem with higher efficiency (they are much more faster than the general functions, ‘dcoga’ and ‘pcoga’.)

grid <- seq(0, 15, length.out=100)

microbenchmark::microbenchmark(
    dcoga(grid, shape=c(3,4), rate=c(2,3)),
    dcoga2dim(grid, 3, 4, 2, 3),
    pcoga(grid, shape=c(3,4), rate=c(2,3)),
    pcoga2dim(grid, 3, 4, 2, 3))
## Unit: microseconds
##                                          expr       min        lq
##  dcoga(grid, shape = c(3, 4), rate = c(2, 3)) 16252.613 17328.591
##                   dcoga2dim(grid, 3, 4, 2, 3)    57.900    60.961
##  pcoga(grid, shape = c(3, 4), rate = c(2, 3)) 37377.394 39449.186
##                   pcoga2dim(grid, 3, 4, 2, 3)  3815.546  3822.076
##         mean     median        uq        max neval
##  25420.42989 18585.6300 30307.958 108431.324   100
##     68.83258    70.4005    74.751    104.662   100
##  55977.01005 60603.5225 65257.359  72600.251   100
##   3856.40268  3831.1960  3839.320   4843.248   100

Parameters Recycling

Please take care of that R functions dcoga, pcoga, and rcoga in this package can handle different lengths of parameter shape and rate by recycling shorter parameter. That means that dcoga(3, c(2,3), c(3,4,5,3,4)) and dcoga(3, c(2,3,2,3,2), c(3,4,5,3,4)) will give the same result. If the length of the longer parameter is not a multiple of the length of shorter one, these three R functions will give a Warning message.

References

[1] Moschopoulos, Peter G. “The distribution of the sum of independent gamma random variables.” Annals of the Institute of Statistical Mathematics 37.1 (1985): 541-544.

[2] Mathai, A.M.: Storage capacity of a dam with gamma type inputs. Ann. Inst. Statist.Math. 34, 591-597 (1982).