R package coga: The Description of Algorithm and Correctness

Chaoran Hu

2017-05-24

Introduction

This vignette is the description of algorithm in this package and the correctness check via simulation experiment. The algorithm of this package comes from Moschopoulos Peter G. (1985). This vignette also give some useful informations.

Algorithm

Assume that we have several random variables, \(X_1, ..., X_n\), and all random variables follow gamma distribution independently with shape parameters \(\alpha_i\) and scale parameters \(\beta_i\), where \(i = 1, ..., n\). Then, the density of \(Y = X_1 + ... + X_n\) can be expressed as:

\[g(y) = C \sum_{k=0}^{\infty} \lambda_k y^{\rho + k - 1} e^{-y/\beta_1} / (\Gamma(\rho + k) \beta_{1}^{\rho + k})\]

And the distribution function \(G(w)=Pr(Y<w)\) is expressed as:

\[G(w) = C \sum_{k=0}^{\infty} \lambda_k \int_{0}^{w} (y^{\rho + k - 1} e^{-y/\beta_1} / (\Gamma(\rho + k) \beta_{1}^{\rho + k})) dy\]

The integrate in this formula is incomplete gamma function and can be calculated by distribution function of gamma distribution.

More details about this algorithm can be found in paper of Moschopoulos Peter G. (1985).

Correctness

Assume that we have two random variables, \(X_1\) and \(X_2\), where \(X_1\) is a gamma distribution with shape parameter \(3\), and rate parameter \(2\), and \(X_2\) is a gamma distribution with shape parameter \(4\), and rate parameter \(3\). The density and distribution funciton of \(Y = X_1 + X_2\) will be calculated.

Correctness check for density function:

y <- rcoga(1000000, c(3,4), c(2,3))
grid <- seq(0, 8, length.out=1000)
pdf <- dcoga(grid, shape=c(3, 4), rate=c(2, 3))
 
plot(density(y), col="blue")
lines(grid, pdf, col="red")

Correctness check for distribution function:

y <- rcoga(1000000, c(3,4), c(2,3))
grid <- seq(0, 8, length.out=1000)
cdf <- pcoga(grid, shape=c(3, 4), rate=c(2, 3))

plot(ecdf(y), col="blue")
lines(grid, cdf, col="red")

Speed

The ‘dcoga’ and ‘pcoga’ functions in this package ‘coga’ is based on Cpp code. The following experiment shows the advantage of Cpp code, which runs on a Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz computer.

grid <- seq(0, 15, length.out=10)

microbenchmark::microbenchmark(
    dcoga(grid, shape=c(3,4,5), rate=c(2,3,4)),
    coga:::dcoga.R(grid, shape=c(3,4,5), rate=c(2,3,4)),
    pcoga(grid, shape=c(3,4,5), rate=c(2,3,4)),
    coga:::pcoga.R(grid, shape=c(3,4,5), rate=c(2,3,4))
)
## Unit: milliseconds
##                                                         expr       min
##           dcoga(grid, shape = c(3, 4, 5), rate = c(2, 3, 4))  1.317128
##  coga:::dcoga.R(grid, shape = c(3, 4, 5), rate = c(2, 3, 4)) 40.042678
##           pcoga(grid, shape = c(3, 4, 5), rate = c(2, 3, 4))  4.266748
##  coga:::pcoga.R(grid, shape = c(3, 4, 5), rate = c(2, 3, 4)) 50.086309
##         lq      mean    median        uq       max neval
##   1.420452  1.922557  1.606208  1.866404  22.05880   100
##  43.311773 52.665573 46.301379 62.560726 113.68096   100
##   4.955067  5.783562  5.157502  5.749830  27.32323   100
##  53.192434 65.423030 58.647805 76.481842 118.76037   100

Parameters Recycling

Please take care of that dcoga, and pcoga in this package can handle different lengths of parameter shape and rate by recycling shorter parameter. That means that dcoga(3, c(2,3), c(3,4,5,3,4)) and dcoga(3, c(2,3,2,3,2), c(3,4,5,3,4)) will give the same result.

References

[1] Moschopoulos, Peter G. “The distribution of the sum of independent gamma random variables.” Annals of the Institute of Statistical Mathematics 37.1 (1985): 541-544.