The rat tumor data consists of observations of endometrial stromal polyp incidence in \(k=71\) groups of rats. For each group, \(y_i\) is the number of rats with polyps and \(n_i\) is the total number of rats in the experiment. Here we describe the analysis of Rat tumor data using Bayes-\({\rm DS}(G,m)\) modeling.
Step 1. We begin by finding the starting parameter values for \(g \sim Beta(\alpha, \beta)\) by MLE:
library(BayesGOF)
set.seed(8697)
data(rat)
###Use MLE to determine starting values
rat.start <- gMLE.bb(rat$y, rat$n)$estimate
We use our starting parameter values to run the main DS.prior function:
rat.ds <- DS.prior(rat, max.m = 6, rat.start, family = "Binomial")
Step 2. We display the U-function to quantify and characterize the uncertainty of the a priori selected \(g\):
plot(rat.ds, plot.type = "Ufunc")
The deviations from the uniform distribution (the red dashed line) indicates that our initial selection for \(g\), \(\text{Beta}(\alpha = 2.3,\beta = 14.1)\), is incompatible with the observed data and requires repair; the data indicate that there are, in fact, two different groups of incidence in the rats.
Step 3a. Extract the parameters for the corrected prior \(\hat{\pi}\):
rat.ds
## $g.par
## alpha beta
## 2.304768 14.079707
##
## $LP.coef
## LP1 LP2 LP3
## 0.0000000 0.0000000 -0.5040361
Therefore, the DS prior \(\hat{\pi}\) given \(g\) is: \[\hat{\pi}(\theta) = g(\theta; \alpha,\beta)\Big[1 - 0.52T_3(\theta;G) \Big]\]
Step 3b. We can now plot the estimated DS prior \(\hat{\pi}\) along with the original parametric \(g\):
plot(rat.ds, plot.type = "DSg", main = "DS vs g: Rat")
Step 4. Here we are interested in the overall macro-level inference by combining the \(k=70\) parallel studies. The group-specific modes along with their SEs can be computed as folows:
rat.macro.md <- DS.macro.inf(rat.ds, num.modes = 2 , iters = 25, method = "mode")
rat.macro.md
## 1SD Lower Limit Mode 1SD Upper Limit
## [1,] 0.0161 0.0340 0.0520
## [2,] 0.1442 0.1562 0.1681
plot(rat.macro.md, main = "MacroInference: Rat")
Step 5. Given an additional study \(\theta_{71}\) where \(y_{71} = 4\) and \(n_{71} = 14\), the goal is to estimate the probability of a tumor for this new clinical study. The following code performs the desired microinference (posterior distribution along with its mean and mode):
rat.y71.micro <- DS.micro.inf(rat.ds, y.0 = 4, n.0 = 14)
rat.y71.micro
## Posterior summary for y = 4, n = 14:
## Posterior Mean = 0.1897
## Posterior Mode = 0.1833
## Use plot(x) to generate posterior plot
plot(rat.y71.micro, main = "Rat (4,14)")
For this example, we will focus on the macroinference for the arsenic data set. The arsenic data set details the measurements of the level of arsenic in oyster tissue from \(k=28\) laboratories.
Step 1. We begin by finding the starting parameter values for \(g \sim Normal(\mu, \tau^2)\) by MLE:
data(arsenic)
arsn.start <- gMLE.nn(arsenic$y, arsenic$se, method = "DL")$estimate
We use our starting parameter values to run the main DS.prior function:
arsn.ds <- DS.prior(arsenic, max.m = 8, arsn.start, family = "Normal")
Step 2. We display the U-function to quantify and characterize the uncertainty of the a priori selected \(g\):
plot(arsn.ds, plot.type = "Ufunc")
Step 3. We now extract the parameters for the corrected prior \(\hat{\pi}\) and plot it, along with the original \(g\):
arsn.ds
## $g.par
## mu tau^2
## 13.220522 3.407165
##
## $LP.coef
## LP1 LP2 LP3 LP4 LP5 LP6
## 0.0000000 -0.4777655 -0.5091652 0.4401269 0.3457535 -0.3862848
plot(arsn.ds, plot.type = "DSg", main = "DS vs g: Arsenic")
Step 4. We now execute the macroinference to find a global estimate to summarize the \(k = 28\) studies.
arsn.macro <- DS.macro.inf(arsn.ds, num.modes = 2, iters = 25, method = "mode")
arsn.macro
## 1SD Lower Limit Mode 1SD Upper Limit
## [1,] 10.1102 10.776 11.4418
## [2,] 13.0750 13.470 13.8649
Based on our results, we find two significant modes. Therefore, the prior shows structured heterogeneity and requires both modes to describe the distribution and its two groups. We plot the results, including an interval for one standard error for each mode.
plot(arsn.macro, main = "MacroInference: Arsenic Data")
The next example will conduct microinference on the child illness data. The child illness data comes from a study where researchers followed \(k=602\) pre-school children in north-east Thailand, recording the number of times (\(y\)) a child became sick during every 2-week period for over three years. In particular, we want to compare posterior distributions for the number of children who became sick 1,3, 5, and 10 times during a two week period.
Step 1. We begin by finding the starting parameter values for \(g \sim Gamma(\alpha, \beta)\) by MLE:
data(ChildIll)
child.start <- gMLE.pg(ChildIll)
We use our starting parameter values to run the main DS.prior function for the Poisson family:
child.ds <- DS.prior(ChildIll, max.m = 8, child.start, family = "Poisson")
Step 2. We display the U-function to quantify and characterize the uncertainty of the selected \(g\):
plot(child.ds, plot.type = "Ufunc")
Step 3. We now extract the parameters for the corrected prior \(\hat{\pi}\):
child.ds
## $g.par
## alpha beta
## 1.060878 4.193337
##
## $LP.coef
## LP1 LP2 LP3 LP4 LP5 LP6
## 0.0000000 0.0000000 -0.1259159 0.0000000 0.0000000 -0.2797667
The DS prior \(\hat{\pi}\) given \(g\) is: \[\hat{\pi}(\theta) = g(\theta; \alpha,\beta)\Big[1 - 0.13T_3(\theta;G) - 0.28T_6(\theta;G) \Big].\] We can plot \(\hat{\pi}\), along with \(g\):
plot(child.ds, plot.type = "DSg", main = "DS vs. g: Child Illness Data")
Step 4. The plot shows some very interesting behavior in \(\hat{\pi}\). We want to explore the posterior distributions for \(y = 1,3,5,10\). For those results, we use the microinference functions.
child.micro.1 <- DS.micro.inf(child.ds, y.0 = 1)
child.micro.3 <- DS.micro.inf(child.ds, y.0 = 3)
child.micro.5 <- DS.micro.inf(child.ds, y.0 = 5)
child.micro.10 <- DS.micro.inf(child.ds, y.0 = 10)
By plotting the posterior distributions we see how the distributions change based on the number of times a child is ill. The plots for each of the four microinferences are shown below.
plot(child.micro.1, xlim = c(0,10), main = "y = 1")
plot(child.micro.3, xlim = c(0,10), main = "y = 3")
plot(child.micro.5, xlim = c(0,10), main = "y = 5")
plot(child.micro.10, xlim = c(0,20), main = "y = 10")