The dirichlet() function in the hyper2 package

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

The dirichlet() function in the hyper2 package

Robin K. S. Hankin

To cite the hyper2 package in publications, please use Hankin (2017). We say that non-negative \(p_1,\ldots, p_n\) satisfying \(\sum p_i=1\) follow a Dirichlet distribution, and write \(\mathcal{D}(p_1,\ldots,p_n)\), if their density function is

\[\begin{equation} \frac{\Gamma(\alpha_1+\cdots +\alpha_n)}{\Gamma(\alpha_1)\cdots\Gamma(\alpha_n)} \prod_{i=1}^n p_i^{\alpha_i-1} \end{equation}\]

for some parameters \(\alpha_i>0\). It is interesting in the current context because it is conjugate to the multinomial distribution; specifically, given a Dirichlet prior \(\mathcal{D}(\alpha_1,\ldots,\alpha_n)\) and likelihood function \(\mathcal{L}(p_1,\ldots,p_n)\propto\prod_{i=1}^n p_i^{\alpha_i}\) then the posterior is \(\mathcal{D}(\alpha_1+a_1,\ldots,\alpha_n+a_n)\).

In the context of the hyper2 package, function dirichlet() presents some peculiarities, discussed here.

A simple example

Consider a three-way trial between players a, b, and c with eponymous Bradley-Terry strengths. Suppose that, out of \(4+5+3=12\) trials, a wins 4, b wins 5, and c wins 3. Then an appropriate likelihood function would be:

dirichlet(c(a=4,b=5,c=3))

## log(a^4 * (a + b + c)^-12 * b^5 * c^3)

Note the (a+b+c)^-11 term, which is not strictly necessary according the Dirichlet density function above which has no such term. However, we see that the returned likelihood function is \({\propto} p_{a\vphantom{abc}}^4p_{b\vphantom{abc}}^5p_{c\vphantom{abc}}^3\) (because \(p_a+p_b+p_c=1\)).

(L1 <- dirichlet(c(a=4,b=5,c=3)))

## log(a^4 * (a + b + c)^-12 * b^5 * c^3)

Now suppose we observe three-way trials between b, c, and d:

(L2 <- dirichlet(c(b=1,c=1,d=8)))

## log( b * (b + c + d)^-10 * c * d^8)

The overall likelihood function would be \(L_1+L_2\):

L1+L2

## log(a^4 * (a + b + c)^-12 * b^6 * (b + c + d)^-10 * c^4 * d^8)

maxp(L1+L2)

##          a          b          c          d 
## 0.09090991 0.10909114 0.07272658 0.72727237

Observe the dominance of competitor d, reasonable on the grounds that d won 8 of the 10 trials against b and c; and a, b, c are more or less evenly matched [chi square test, \(p=0.94\)]. Observe further that we can include four-way observations easily:

(L3 <- dirichlet(c(a=4,b=3,c=2,d=6)))

## log(a^4 * (a + b + c + d)^-15 * b^3 * c^2 * d^6)

L1+L2+L3

## log(a^8 * (a + b + c)^-12 * (a + b + c + d)^-15 * b^9 * (b + c + d)^-10
## * c^6 * d^14)

Package idiom L1+L2+L3 operates as expected because dirichlet() includes the denominator. Sometimes we see situations in which a competitor does not win any trials. Consider the following:

(L4 <- dirichlet(c(a=5,b=3,c=0,d=3)))

## log(a^5 * (a + b + c + d)^-11 * b^3 * d^3)

Above, we see that c won no trials and is not present in the numerator of the expression. However, L4 is informative about competitor c:

maxp(L4)

##            a            b            c            d 
## 4.546526e-01 2.727224e-01 1.020077e-06 2.726240e-01

We see that the maximum likelihood estimate for c is zero (to within numerical tolerance). Further, we may reject the hypothesis that \(p_c=\frac{1}{4}\) [which might be a reasonable consequence of the assumption that all four competitors have the same strength]:

specificp.test(L4,'c',0.25)

## 
##  Constrained support maximization
## 
## data:  H
## null hypothesis: sum p_i=1, c = 0.25
## null estimate:
##         a         b         c         d 
## 0.3122093 0.2162741 0.2500000 0.2215167 
## (argmax, constrained optimization)
## Support for null:  -14.93581 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##            a            b            c            d 
## 4.546526e-01 2.727224e-01 1.020077e-06 2.726240e-01 
## (argmax, free optimization)
## Support for alternative:  -11.738 + K
## 
## degrees of freedom: 1
## support difference = 3.197811
## p-value: 0.01144022 (two sided)

However, observe that we cannot reject the equality hypothesis, that is, \(p_a=p_b=p_c=p_d=\frac{1}{4}\):

equalp.test(L4)

## 
##  Constrained support maximization
## 
## data:  L4
## null hypothesis: a = b = c = d
## null estimate:
##    a    b    c    d 
## 0.25 0.25 0.25 0.25 
## (argmax, constrained optimization)
## Support for null:  -15.24924 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##            a            b            c            d 
## 4.546526e-01 2.727224e-01 1.020077e-06 2.726240e-01 
## (argmax, free optimization)
## Support for alternative:  -11.738 + K
## 
## degrees of freedom: 3
## support difference = 3.511242
## p-value: 0.07118459

References

Hankin, R. K. S. 2017. “Partial Rank Data with the hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.