The R Inference Functions
Contents
General properties
Samples
Summary
Rank
DIC
General properties
[top]
These R functions are for making inferences about parameters of the model or about the fit of the model. The commands are divided into three groups: the first group 'Samples' concern an entire set of monitored values for a variable; the next group 'Summary' and 'Rank' are space-saving short-cuts that monitor running statistics; and the final group, DIC, concerns evaluation of the
Deviance Information Criterion
proposed by
Spiegelhalter
et al
. (2002)
.
Users should ensure their simulation has converged before using functions in the Summary, Rank or DIC groups.
Note that if the MCMC simulation has an adaptive phase it will not be possible to make inference using values sampled before the end of this phase.
Samples...
[top]
This command opens a non-modal dialog for analysing stored samples of variables produced by the MCMC simulation. The fields are:
The functions
samples.set, samples.clear, samples.stats, samples.history, samples.autoC, samples.density, samples.bgr, samples.correl
act on a variable of interest. This variable of interest must be given as the node argument of above samples functions. It can either be the name of a variable in the model or an R object with the same name as a variable in the model. If the variable of interest is an array, slices of the array can be selected using the notation variable[lower0:upper0, lower1:upper1, ...].
A star '*' can be entered as shorthand for all the stored samples. The beg and end arguments can be used to select a slice of monitored values corresponding to iterations beg:end. Likewise the firstChain and lastChain arguments can be to select a sub group of chains to calculate statistics for. The thin argument can be used to only use every thin th value of the stored sample for statistics. If these parameters are left at their default values the whole sample for all chains will be used in calculating statistics.
WinBUGS
generally automatically sets up a logical node to measure a quantity known as
deviance
; this may be accessed, in the same way as any other variable of interest, by typing its name, i.e. "deviance", in the
node
field of the
Sample Monitor Tool
. The definition of deviance is -2 * log(likelihood): 'likelihood' is defined as p(
y
|
theta
), where
y
comprises all stochastic nodes given values (i.e. data), and
theta
comprises the
stochastic parents
of
y
- 'stochastic parents' are the stochastic nodes upon which the distribution of
y
depends, when collapsing over all logical relationships.
samples.set:
The function
samples.set(node)
is used to start recording a chain of values for the variable node.
samples.clear:
The function
samples.clear(node)
is used to removes the stored values of the variable from computer memory.
samples.history:
The function
samples.history(
node, beg = samples.get.beg(), end = samples.get.end(),firstChain = samples.get.firstChain(), lastChain = samples.get.lastChain(),
thin = samples.get.thin()
)
plots out a complete trace for the variable.
The next four functions can only be executed if the MCMC simulation is not in an adaptive phase.
samples.density:
The function
samples.density(
node, beg = samples.get.beg(), end = samples.get.end(), firstChain = samples.get.firstChain(), lastChain = samples.get.lastChain(),
thin = samples.get.thin()
)
plots a smoothed kernel density estimate for the variable if it is continuous or a histogram if it is discrete.
samples.autoC:
The function
samples.autoC(
node, beg = samples.get.beg(), end = samples.get.end(), firstChain = samples.get.firstChain(), lastChain = samples.get.lastChain(),
thin = samples.get.thin()
)
plots the autocorrelation function of the variable.
sampless.stats:
The function
samples.stats(
node, beg = samples.get.beg(), end = samples.get.end(), firstChain = samples.get.firstChain(), lastChain = samples.get.lastChain(),
thin = samples.get.thin()
)
produces summary statistics for the variable, pooling over the chains selected. The required percentiles can be selected using the
percentile
selection box. The quantity reported in the MC error column gives an estimate of s / N
1/2
, the Monte Carlo standard error of the mean. The batch means method outlined by Roberts (1996; p.50) is used to estimate s.
samples.bgr:
The function
samples.bgr(
node, beg = samples.get.beg(), end = samples.get.end(), firstChain = samples.get.firstChain(), lastChain = samples.get.lastChain(),
thin = samples.get.thin()
)
calculates the Gelman-Rubin convergence statistic, as modified by Brooks and Gelman (1998). The width of the central 80% interval of the pooled runs is green, the average width of the 80% intervals within the individual runs is blue, and their ratio
R
(= pooled / within) is red - for plotting purposes the pooled and within interval widths are normalised to have an overall maximum of one. The statistics are calculated in bins of length 50:
R
would generally be expected to be greater than 1 if the starting values are suitably over-dispersed. Brooks and Gelman (1998) emphasise that one should be concerned both with convergence of
R
to 1, and with convergence of both the pooled and within interval widths to stability.
The following low level functions can be used to perform calculations on stored samples.
samples.set.beg
:The function
samples.set.beg(beg)
is used to set the first iteration of the stored sample used for calculating statistics to
beg
.
samples.set.end:
The function
samples.set.end(end)
is used to set the last iteration of the stored sample used for calculating statistics to
end.
samples.set.thin:
The function
samples.set.thin(thin)
is used to set
numerical field used to select every
k
th
iteration of each chain to contribute to the statistics being calculated, where
k
is the value of the field. Note the difference between this and the thinning facility of the update function: when thinning via the update function we are
permanently
discarding samples as the MCMC simulation runs, whereas here we have already generated (and stored) a suitable number of (posterior) samples and may wish to discard some of them only temporarily. Thus, setting
k
> 1 here will not have any impact on the storage (memory) requirements; if you wish to reduce the number of samples actually stored (to free-up memory) you should thin via the update function.
samples.set.firstChain:
The function
samples.set.firstChain(firstChain)
is used to set the first chain of the stored sample used for calculating statistics to be
firstChain
.
samples.set.lastChain:
The function
samples.set.lastChain(lastChain)
is used to set the last chain of the stored sample used for calculating statistics to be
lastChain
.
samples.get.beg
:The function
samples.get.beg()
returns the first iteration of the stored sample used for calculating statistics.
samples.get.end:
The function
samples.get.end()
returns the last iteration of the stored sample used for calculating statistics to
end.
samples.get.thin:
The function
samples.set.thin(thin)
returns the thin parameter.
samples.get.firstChain:
The function
samples.get.firstChain()
returns the first chain of the stored sample used for calculating statistics.
samples.get.lastChain:
The function
samples.get.lastChain()
returns the last chain of the stored sample used for calculating statistics to be
lastChain
.
The next three functions have the implicit arguments beg = samples.get.beg(), end = samples.get.end(), thin = samples.get.thin(), firstChain = samples.get.firstChain(), lastChain = samples.get.lastChain(). They can be used to retrieve stored samples for a a set of nodes.
samples.size:
The
samples.size(node)
function returns the size of the stored sample for the scalar quantity node.
samples.sample:
The
samples.samples(node)
function returns an array of stored values for the scalar quanity node.
samples.monitors:
The
samples.monitors(node)
function returns a list of scalar names as strings that have a monitor set for them and with the monitors having some stored values between beg and end.
node
can be a vector quantity with sub ranges given to indices and
node
can be '*'.
Summary...
[top]
The functions
summary.set, summary.stats, summary.clear
, are used to
calculate running means, standard deviations and quantiles. The
functions are less powerful and general than the
samples
functions, but they also require much less storage (an important consideration when many variables and/or long runs are of interest). They take a single argument
node
which can either be a name of a quantity in the model as a string or an R object with the same name as a quantity in the model.
summary.set:
The function
summary.set(node)
creates a monitor that starts recording the running totals for
node
.
summary.stats:
The function
summary.stats(node)
displays the running means, standard deviations, and 2.5%, 50% (median) and 97.5% quantiles for
node
.
Note that these running quantiles are calculated via an approximate algorithm (see
here
for details) and should therefore be used with caution.
summary.clear:
The function
summary.clear(node)
removes the monitor calculating running totals for
node
.
Rank...
[top]
The functions
rank.set, rank.stats, rank.clear
, are used to
calculate ranks of vector valued quantities in the model.
They take a single argument
node
which can either be a name of a quantity in the model as a string or an R object with the same name as a quantity in the model.
rank.set:
The function
rank.set(node)
creates a monitor that starts building running histograms to represent the rank of each component of
node
. An amount of storage proportional to the square of the number of components of
node
is allocated. Even when
node
has thousands of components this can require less storage than calculating the ranks explicitly in the model specification and storing their samples, and it is also much quicker.
rank.stats:
The function
rank.stats(node)
displays
summarises of the distribution of the ranks of each component of the variable
node
.
rank.clear:
The function
rank.clear(node)
removes the monitor calculating running running histograms for
node
.
DIC...
[top]
The
DIC
functions are used to evaluate the
Deviance Information Criterion
(DIC;
Spiegelhalter
et al
., 2002
) and related statistics
-
these can be used to assess model complexity and compare different models. Most of the
examples
packaged with
WinBUGS
contain an example of their usage.
It is important to note that DIC assumes the posterior mean to be a good estimate of the stochastic parameters. If this is not so, say because of extreme skewness or even bimodality, then DIC may not be appropriate. There are also circumstances, such as with mixture models, in which WinBUGS will not permit the calculation of DIC and so the menu option is greyed out. Please see the WinBUGS 1.4 web-page for current restrictions:
http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml
dic.set:
The function
dic.set()
creates monitors that start calculating DIC and related statistics - the user should ensure that convergence has been achieved before pressing
set
as all subsequent iterations will be used in the calculation.
dic.clear:
The function
dic.set()
deletes monitors that have ben created calculating DIC and related statistics
dic.stats:
The function
dic.stats()
displays the calculated statistics, as described below; please see
Spiegelhalter
et al
. (2002)
for full details; the section
Tricks: Advanced Use of the BUGS Language
also contains some comments on the use of DIC.
Dbar:
this is the posterior mean of the deviance, which is exactly the same as if the node 'deviance' had been monitored (see
here
). This deviance is defined as -2 * log(likelihood): 'likelihood' is defined as p(
y
|
theta
), where
y
comprises all stochastic nodes given values (i.e. data), and
theta
comprises the
stochastic parents
of
y
- 'stochastic parents' are the stochastic nodes upon which the distribution of
y
depends, when collapsing over all logical relationships.
Dhat:
this is a point estimate of the deviance (-2 * log(likelihood)) obtained by substituting in the posterior means
theta.bar
of
theta
: thus Dhat = -2 * log(p(
y
|
theta.bar
)).
pD:
this is 'the effective number of parameters', and is given by
pD = Dbar - Dhat
. Thus pD is the posterior mean of the deviance minus the deviance of the posterior means.
DIC:
this is the 'Deviance Information Criterion', and is given by
DIC = Dbar + pD = Dhat + 2 * pD
. The model with the smallest DIC is estimated to be the model that would best predict a replicate dataset of the same structure as that currently observed.