The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Human physical growth is not a smooth progression through time; it is inherently dynamic in nature. Growth occurs in a series of spurts, reflecting increases in growth velocity (Bogin, 2010; Cameron & Bogin, 2012a; Stulp & Barrett, 2016). The adolescent growth spurt (also known as the pubertal growth spurt) is experienced by all individuals and is the most readily recognized aspect of adolescence. A clear understanding of growth dynamics during adolescence is essential when planning treatment to correct skeletal discrepancies, such as scoliosis, where the spine has a sideways curve (Busscher et al., 2012; Sanders et al., 2007). Similarly, the timing of the peak growth velocity spurt is a crucial factor to consider when initiating growth modification procedures to correct skeletal malocclusion, which is characterized by discrepancies in jaw size (Dean et al., 2016; Proffit, 2014a, 2014b).
As the concept of change is central to studying growth, an accurate understanding of growth dynamics and the relationship between different growth phases can only be obtained from longitudinal growth data (Cameron & Bogin, 2012b; Hauspie et al., 2004a). Analyzing longitudinal growth data using an appropriate statistical method allows for the description of growth trajectories (distance and velocity) and the estimation of the timing and intensity of the adolescent growth spurt. Unlike in earlier years, when linear growth curve models (GCMs) based on conventional polynomials were extensively used to study human growth (McArdle, 2015), nonlinear mixed-effects models are now gaining popularity in modeling longitudinal skeletal growth data, such as height (Hauspie et al., 2004b; McArdle, 2015).
The superimposition by translation and rotation (SITAR) model is a shape-invariant nonlinear mixed-effects growth curve model that fits a population average (i.e., mean) curve to the data and aligns each individual’s growth trajectory to the population average curve using a set of three random effects (Cole et al., 2010): size relative to the mean growth curve (vertical shift), timing of the adolescent growth spurt relative to the average age at peak growth velocity (horizontal shift), and the intensity of the growth spurt compared to the mean growth intensity (horizontal stretch). The concept of the shape-invariant model (SIM) was first described by Lindstrom (1995) and later used by Beath (2007) for modeling infant growth data (birth to 2 years). The current version of the SITAR model that we describe below was developed by Cole et al. (2010). The SITAR model is particularly useful for modeling human physical growth during adolescence. Recent studies have used SITAR to analyze height and weight data (Cole & Mori, 2018; Mansukoski et al., 2019; Nembidzane et al., 2020; Riddell et al., 2017), as well as to study jaw growth during adolescence (Sandhu, 2020). All of these studies estimated the SITAR model within the frequentist framework, as implemented in the R package, ‘sitar’ (T. Cole, 2022).
Consider a dataset consisting of \(j\) individuals \((j = 1,..,j)\) where individual \(j\) provides \(n_j\) measurements (\(i = 1,.., n_j\)) of height (\(y_{ij}\)), recorded at age \(x_{ij}\).
\[\begin{equation} \label{eq:1} y_{ij}=\alpha_{0\ }+\alpha_{j\ }+\sum_{r=1}^{p-1}{\beta_r\mathbf{Spl}\left(\frac{x_{ij}-\bar{x_{ij}}-\left(\zeta_0+\zeta_j\right)}{e^{-\ \left(\left.\ \gamma_0+\ \gamma_j\right)\right.}}\right)}+e_{ij} \end{equation}\]Where Spl(.) is the natural cubic spline function that generates the spline design matrix, and \(\beta_1, \dots, \beta_{p-1}\) are the spline regression coefficients for the mean curve, with \(\alpha_0\), \(\zeta_0\), and \(\gamma_0\) representing the population average size, timing, and intensity parameters. By default, the predictor, age (\(x_{ij}\)), is mean-centered by subtracting the mean age (\(\bar{x}\)), where \(\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_{i.}\). The individual-specific random effects for size (\(\alpha_j\)), timing (\(\zeta_j\)), and intensity (\(\gamma_j\)) describe how an individual’s growth trajectory differs from the mean growth curve. The residuals \(e_{ij}\) are assumed to be normally distributed with zero mean and residual variance parameter \(\sigma^2\), and are independent of the random effects. The random effects are assumed to be multivariate normally distributed with zero means and an unstructured variance-covariance matrix (i.e., distinct variances and co-variance between random effects) as shown below:
\[\begin{equation} \label{eq:2} \begin{matrix}&\\&\\\left(\begin{matrix}\begin{matrix}\alpha_j\\\zeta_j\\\gamma_j\\\end{matrix}\\\end{matrix}\right)&\sim M V N\left(\left(\begin{matrix}\begin{matrix}0\\0\\0\\\end{matrix}\\\end{matrix}\right),\left(\begin{matrix}\sigma_{\alpha_j}^2&\rho_{\alpha_j\zeta_j}&\rho_{\alpha_j\gamma_j}\\\rho_{\zeta_j\alpha_j}&\sigma_{\zeta_j}^2&\rho_{\zeta_j\gamma_j}\\\rho_{\gamma_j\alpha_j}&\rho_{\gamma_j\zeta_j}&\sigma_{\gamma_j}^2\\\end{matrix}\right)\right)\mathrm{,\ for\ individual\ j\ =\ 1,} \ldots\mathrm{,J} \\\end{matrix} \end{equation}\]Here we describe the Bayesian model specification for a two-level SITAR model along with the default priors specified for each parameter. To better understand the data-generative mechanism and to simplify the presentation of prior specifications for each individual parameter, we re-express the model in Equation \(\ref{eq:1}\) as follows:
\[\begin{equation} \label{eq:3} \begin{aligned} \text{y}_{ij} & \sim \operatorname{Normal}(\mu_{ij}, \sigma) \\ \\ \mu_{ij} & = \alpha_0+ \alpha_j+\sum_{r=1}^{p-1}{\beta_r\mathbf{Spl}\left(\frac{x_{ij}-\bar{x_{ij}}-\left(\zeta_0+\zeta_j\right)}{e^{-\ \left(\left.\ \gamma_0+\gamma_j\right)\right.}}\right)} \\ \sigma & = \sigma_\epsilon \\ \\ \begin{bmatrix} \alpha_{j} \\ \zeta_{j} \\ \gamma_{j} \end{bmatrix} & \sim {\operatorname{MVNormal}\begin{pmatrix} \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix},\ \mathbf \Sigma_{ID} \end{pmatrix}} \\ \\ \mathbf {\Sigma_{ID}} & = \mathbf S \mathbf R \mathbf S \\ \\ \mathbf S & = \begin{bmatrix} \sigma_{\alpha_j} & 0 & 0 \\ 0 &\sigma_{\zeta_j} & 0 \\ 0 & 0 & \sigma_{\gamma_j} \end{bmatrix} \\ \\ \mathbf R & = \begin{bmatrix} 1 & \rho_{\alpha_j\zeta_j} & \rho_{\alpha_j\gamma_j} \\\rho_{\zeta_j\alpha_j} & 1 & \rho_{\zeta_j\gamma_j} \\\rho_{\gamma_j\alpha_j} & \rho_{\gamma_j\zeta_j} & 1 \end{bmatrix} \\ \\ \alpha_0 & \sim \operatorname{normal}(\ y_{mean},\ {y_{sd}}) \\ \zeta_0 & \sim \operatorname{normal}(\ 0,\ 1.5) \\ \gamma_0 & \sim \operatorname{normal}(\ 0,\ 0.5) \\ \beta_1 \text{,..,} \beta_r & \sim \operatorname{normal}(\ 0, \ \mathbf {X_{scaled}}) \\ \alpha_j & \sim \operatorname{normal_{Half}}(\ {0},\ {y_{sd}}) \\ \zeta_j & \sim \operatorname{normal_{Half}}(\ {0},\ {1.0}) \\ \gamma_j & \sim \operatorname{normal_{Half}}(\ {0},\ {0.25}) \\ \sigma_\epsilon & \sim \operatorname{normal_{Half}}(\ {0},\ {y_{sd}}) \\ \mathbf R & \sim \operatorname{LKJ}(1), \end{aligned} \end{equation}\]
The first line in the equation above represents the likelihood, which
states that the outcome is distributed with a mean mu
and
standard deviation sigma
, \(\sigma_\epsilon\). The mu
is a
function of the growth curve parameters, as described earlier (see
Equation \(\ref{eq:1}\)). The
unstructured variance co-variance matrix \(\mathbf {\Sigma_{ID}}\) is same as show
earlier (Equation \(\ref{eq:1}\)) and
is constructed using the separation strategy. In this strategy., which
is followed in Stan, the variance covariance matrix \(\mathbf {\Sigma_{ID}}\) is decomposed into
a diagonal matrix \(\mathbf S\)
(composed of standard deviation vector), and a correlation matrix \(\mathbf R\) (see here
for details). The residuals are assumed to be independent and
normally distributed with a mean of 0
and a \(n_j \times n_j\) dimensional identity
covariance matrix with a diagonal constant variance parameter, \(\sigma_\epsilon\), i.e., \(\mathbf {I\sigma_\epsilon}\), where \(\mathbf I\) is the identity matrix (a
diagonal matrix of 1s) and \(\sigma_\epsilon\) is the residual standard
deviation. In other words, the residual variance (i.e., within
individual variance) matrix is composed of a single parameter \(\sigma_\epsilon\) which form the diagonal
of the matrix. The assumption of homoscedasticity (constant variance) of
the residuals can be relaxed.
We follow the recommendations made in the popular packages rstanrm and brms for prior specification. Technically, the priors used in the ‘rstanarm’ and ‘brms’ packages are data-dependent, and hence weakly informative. This is because the priors are scaled based on the distribution (i.e., standard deviation) of the outcome and predictor(s). However, the amount of information used is weak and mainly regulatory, helping to stabilize the computation. An important feature of this approach is that the default priors are reasonable for many models. Like ‘rstanarm’ and ‘brms’, the ‘bsitar’ package offers full flexibility in setting a wide range of priors that encourage users to specify priors that reflect their prior knowledge about the human growth processes.
Similar to the ‘brms’ and ‘rstanarm’ packages, the ‘bsitar’ package
allows user to control the scale parameter for the location-scale based
priors such as normal
, student_t
and
cauchy
distributions via the autoscale
option.
Here again we adopt an amalgamation of the best strategies offered by
the ‘brms’ and ‘rstanarm’ packages. While ‘rstanarm’ earlier used to set
autoscale=TRUE
which transformed prior by multiplying scale
parameter with a fixed value \(2.5\)
(recently authors changed this behavior to FALSE
), the
‘brms’ package sets scale factor between \(1.0\) and \(2.5\) depending on the the Median Absolute
Deviation (MAD) of the outcome. If MAD is less than \(2.5\), it scales prior by a factor of \(2.5\) otherwise the scale factor is \(1.0\) (i.e., no auto scaling). The ‘bsitar’
package, on the other hand, offers full flexibility in choosing the
scale factor via a built in option, autoscale
. Setting
autoscale=TRUE
scales prior by a factor of \(2.5\) (as earlier used in ‘rstanarm’).
However, as mentioned earlier, the scaling factor can be set as any real
number such as \(1.5\) (e.g.,
autoscale = 1.5
). The autoscale
option is
available for all location-scale based distibutions such as
normal
, student_t
, cauchy
etc. We
strongly recommend to go through the documentation on priors included in
the brms and
rstanrm
packages.
Below we describe the default priors used for the regression
coefficients as well as the standard deviation of random effects for
a
(size), b
(timing) and c
(intensity) parameters and their correlations. The default distribution
for each parameter (regression coefficients, standard deviation for
group level random effects, and residuals) is normal
distribution.
The population regression parameter a
is assigned
‘normal’ prior centered at \(y_{\text{mean}}\) (i.e., mean of the
outcome) with scale defined as the standard deviation of the outcome
\(y_{\text{sd}}\) multiplied by the
default scaling factor \(2.5\) i.e.,
normal(ymean, ysd, autoscale = TRUE)
. The prior on the
standard deviation of the random effect parameter a
is
identical to the regression parameter with the exception that it is
centered at mean 0
i.e.,
normal(0, ysd, autoscale = TRUE)
.
For the population regression parameter b
, the
default prior follows a ‘normal’ distribution with mean 0
and a scale of 1.5
, i.e.,
normal(0, 1.5, autoscale = FALSE)
. Note that the
autoscale
option is set to FALSE.
Since the
predictor age
is typically mean-centered when fitting the
SITAR model, this prior implies that 95% of the distribution’s mass
(assuming it approaches a normal curve) for the timing parameter will
cover range between 11 and 17 years when the predictor age
is centered at 14 years. Depending on the mean age and whether the data
correspond to males or females, the scale factor can be adjusted
accordingly. The prior for the standard deviation of b
parameter is also normal
, with a mean of 0 and a standard
deviation of 1.0
(normal(0, 1.0, autoscale = FALSE)
), implying that 95% of
the distribution’s mass for the individual variability in the timing
parameter will cover 4 years (\(\pm
2.0\)) around the population average parameter
b
.
The default prior for the population average intensity regression
parameter c
is ‘normal’ with a mean of 0
and a
scale of 0.5
, i.e.,
normal(0, 0.5, autoscale = FALSE)
. Note that intensity
parameter is estimated on the exp
scale, and therefore is
interpreted as percentage increase in size. For the standard deviation
of c
parameter, prior assigned is
normal(0, 0.25, autoscale = FALSE)
The prior for the correlations between random effect parameters
follows the Lewandowski-Kurowicka-Joe (LKJ) distribution. The
LKJ
prior is specified via a single parameter
eta
. If eta = 1
(the default), all correlation
matrices are equally likely a priori
. If
eta > 1
, extreme correlations become less likely,
whereas if 0 < eta < 1
, higher probabilities are
assigned to extreme correlations. See brms for more
details.
The univariate
described earlier can be easily extended to analyze two or more outcomes
simultaneously. Consider two outcomes \(Y1\) and \(Y2\) (e.g., height and weight) measured
repeatedly on \(j\) individuals
(j = 1,..,j
) where individual \(j\) provides \(n_j\) measurements (\(i = 1,.., n_j\)) of \(y_{ij}^\text{Y1}\) (outcome \(Y1\)) and \(y_{ij}^\text{Y2}\) (outcome \(Y2\)) recorded at age, \(x_{ij}\). A multivariate model is then
written as follows:
\[\begin{equation} \label{eq:4} \begin{aligned} \begin{bmatrix} \text{Y1}_{ij} \\ \text{Y2}_{ij} \end{bmatrix} & \sim \operatorname{MVNormal}\begin{pmatrix} \begin{bmatrix} \mu_{ij}^\text{Y1} \\ \mu_{ij}^\text{Y2} \end{bmatrix}, \mathbf {\Sigma_{Residual}} \end{pmatrix} \\ \\ \mu_{ij}^{\text{Y1}} & = \alpha_0^\text{Y1}+ \alpha_j^\text{Y1}+\sum_{r^\text{Y1}=1}^{p^\text{Y1}-1}{\beta_r^\text{Y1}\mathbf{Spl}^\text{Y1}\left(\frac{x_{ij}-\bar{x_{ij}}-\left(\zeta_0^\text{Y1}+\zeta_j^\text{Y1}\right)}{e^{-\ \left(\left.\ \gamma_0^\text{Y1}+\gamma_j^\text{Y1}\right)\right.}}\right)} \\ \\ \mu_{ij}^{\text{Y2}} & = \alpha_0^\text{Y2}+ \alpha_j^\text{Y2}+\sum_{r^\text{Y2}=1}^{p^\text{Y2}-1}{\beta_r^\text{Y2}\mathbf{Spl}^\text{Y2}\left(\frac{x_{ij}-\bar{x_{ij}}-\left(\zeta_0^\text{Y2}+\zeta_j^\text{Y2}\right)}{e^{-\ \left(\left.\ \gamma_0^\text{Y2}+\gamma_j^\text{Y2}\right)\right.}}\right)} \\ \\ \mathbf {\Sigma_{Residual}} & = \mathbf S_W \mathbf R_W \mathbf S_W \\ \\ \mathbf S_W & = \begin{bmatrix} \sigma_{ij}^\text{Y1} & 0 \\ 0 & \sigma_{ij}^\text{Y2} \\ \end{bmatrix} \\ \\ \mathbf R_W & = \begin{bmatrix} 1 & \rho_{\sigma_{ij}^\text{Y1}\sigma_{ij}^\text{Y2}} \\ \rho_{\sigma_{Ij}^\text{Y2}\sigma_{Ij}^\text{Y1}} & 1 \end{bmatrix} \\ \\ \begin{bmatrix} \alpha_{j}^\text{Y1} \\ \zeta_{j}^\text{Y1} \\ \gamma_{j}^\text{Y1} \\ \alpha_{j}^\text{Y2} \\ \zeta_{j}^\text{Y2} \\ \gamma_{j}^\text{Y2} \end{bmatrix} & \sim {\operatorname{MVNormal}\begin{pmatrix} \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix},\ \mathbf {\Sigma_{ID}} \end{pmatrix}} \\ \\ \\ \mathbf {\Sigma_{ID}} & = \mathbf S_{ID} \mathbf R_{ID} \mathbf S_{ID} \\ \\ \mathbf S_{ID} & = \begin{bmatrix} \alpha_{j}^\text{Y1} & 0 & 0 & 0 & 0 & 0 \\ 0 & \zeta_{j}^\text{Y1} & 0 & 0 & 0 & 0 \\ 0 & 0 & \gamma_{j}^\text{Y1} & 0 & 0 & 0 \\ 0 & 0 & 0 & \alpha_{j}^\text{Y2} & 0 & 0 \\ 0 & 0 & 0 & 0 & \zeta_{j}^\text{Y2} & 0 \\ 0 & 0 & 0 & 0 & 0 & \gamma_{j}^\text{Y2} \\ \end{bmatrix} \\ \\ \mathbf R_{ID} & = \begin{bmatrix} 1 & \rho_{\alpha_{j}^\text{Y1}\zeta_{j}^\text{Y1}} & \rho_{\alpha_{j}^\text{Y1}\gamma_{j}^\text{Y1}} & \rho_{\alpha_{j}^\text{Y1}\alpha_{j}^\text{Y2}} & \rho_{\alpha_{j}^\text{Y1}\zeta_{j}^\text{Y2}} & \rho_{\alpha_{j}^\text{Y1}\gamma_{j}^\text{Y2}} \\ \rho_{\zeta_{j}^\text{Y1}\alpha_{j}^\text{Y1}} & 1 & \rho_{\zeta_{j}^\text{Y1}\gamma_{j}^\text{Y1}} & \rho_{\zeta_{j}^\text{Y1}\alpha_{j}^\text{Y2}} & \rho_{\zeta_{j}^\text{Y1}\zeta_{j}^\text{Y2}} & \rho_{\zeta_{j}^\text{Y1}\gamma_{j}^\text{Y2}} \\ \rho_{\gamma_{j}^\text{Y1}\alpha_{j}^\text{Y1}} & \rho_{\gamma_{j}^\text{Y1}\zeta_{j}^\text{Y1}} & 1 & \rho_{\gamma_{j}^\text{Y1}\alpha_{j}^\text{Y2}} & \rho_{\gamma_{j}^\text{Y1}\zeta_{j}^\text{Y2}} & \rho_{\gamma_{j}^\text{Y1}\gamma_{j}^\text{Y2}} \\ \rho_{\alpha_{j}^\text{Y1}\alpha_{j}^\text{Y2}} & \rho_{\zeta_{j}^\text{Y1}\alpha_{j}^\text{Y2}} & \rho_{\gamma_{j}^\text{Y1}\alpha_{j}^\text{Y2}} & 1 & \rho_{\alpha_{j}^\text{Y2}\zeta_{j}^\text{Y2}} & \rho_{\alpha_{j}^\text{Y2}\gamma_{j}^\text{Y2}} \\ \rho_{\alpha_{j}^\text{Y1}\zeta_{j}^\text{Y2}} & \rho_{\zeta_{j}^\text{Y1}\zeta_{j}^\text{Y2}} & \rho_{\gamma_{j}^\text{Y1}\zeta_{j}^\text{Y2}} & \rho_{\zeta_{j}^\text{Y2}\alpha_{j}^\text{Y2}} & 1 & \rho_{\zeta_{j}^\text{Y2}\gamma_{j}^\text{Y2}} \\ \rho_{\alpha_{j}^\text{Y1}\gamma_{j}^\text{Y2}} & \rho_{\zeta_{j}^\text{Y1}\gamma_{j}^\text{Y2}} & \rho_{\gamma_{j}^\text{Y1}\gamma_{j}^\text{Y2}} & \rho_{\gamma_{j}^\text{Y2}\alpha_{j}^\text{Y2}} & \rho_{\gamma_{j}^\text{Y2}\zeta_{j}^\text{Y2}} & 1 \end{bmatrix} \\ \\ \alpha_0^\text{Y1} & \sim \operatorname{normal}(\ y^\text{Y1}_{mean},\ {y^\text{Y1}_{sd}}) \\ \alpha_0^\text{Y2} & \sim \operatorname{normal}(\ y^\text{Y2}_{mean},\ {y^\text{Y2}_{sd}}) \\ \zeta_0^\text{Y1} & \sim \operatorname{normal}(\ 0, 1.5) \\ \zeta_0^\text{Y2} & \sim \operatorname{normal}(\ 0, 1.5) \\ \gamma_0^\text{Y1} & \sim \operatorname{normal}(\ 0, 0.5) \\ \gamma_0^\text{Y2} & \sim \operatorname{normal}(\ 0, 0.5) \\ \beta_1^\text{Y1} \text{,..,} \beta_r^\text{Y1} & \sim \operatorname{normal}(\ 0, \ \mathbf {X^\text{Y1}_{scaled}}) \\ \beta_1^\text{Y2} \text{,..,} \beta_r^\text{Y2} & \sim \operatorname{normal}(\ 0, \ \mathbf {X^\text{Y2}_{scaled}}) \\ \alpha_j^\text{Y1} & \sim \operatorname{normal_{Half}}(\ {0},\ {y^\text{Y1}_{sd}}) \\ \alpha_j^\text{Y2} & \sim \operatorname{normal_{Half}}(\ {0},\ {y^\text{Y2}_{sd}}) \\ \zeta_j^\text{Y1} & \sim \operatorname{normal_{Half}}(\ {0},\ {1.0}) \\ \zeta_j^\text{Y2} & \sim \operatorname{normal_{Half}}(\ {0},\ {1.0}) \\ \gamma_j^\text{Y1} & \sim \operatorname{normal_{Half}}(\ {0},\ {0.25}) \\ \gamma_j^\text{Y2} & \sim \operatorname{normal_{Half}}(\ {0},\ {0.25}) \\ \sigma_{ij}^\text{Y1} & \sim \operatorname{normal_{Half}}(\ {0},\ {y^\text{Y1}_{sd}}) \\ \sigma_{ij}^\text{Y2} & \sim \operatorname{normal_{Half}}(\ {0},\ {y^\text{Y2}_{sd}}) \\ \mathbf R_W & \sim \operatorname{LKJ}(1) \\ \mathbf R_{ID} & \sim \operatorname{LKJ}(1), \end{aligned} \end{equation}\]
Where \(^\text{Y1}\) and \(^\text{Y2}\) superscripts indicate which
variable is connected with which parameter. This is a straightforward
multivariate generalization from the previous model (see Equation \(\ref{eq:3}\)). At the individual level, we
have six parameters varying across individuals, resulting in a \(6 \times 6\) \(\mathbf{S}_{ID}\) matrix and a \(6 \times 6\) \(\mathbf{R}_{ID}\) matrix. The
within-individual variability is captured by the residual parameters,
which include a \(2 \times 2\) \(\mathbf{S}_{W}\) matrix and a \(2 \times 2\) \(\mathbf{R}_{W}\) matrix. The priors
described above for the Univariate model
specification are applied to each outcome. The prior on the residual
correlation between outcomes is the lkj
prior, as described
earlier for
the correlation between group level random effects.
There are two competing philosophies of model estimation (Bland & Altman, 1998; Schoot et al., 2014): the Bayesian (based on Bayes’ theorem) and the frequentist (e.g., maximum likelihood estimation). While the frequentist approach was predominant in earlier years, the advent of powerful computers has given new impetus to Bayesian analysis (Bland & Altman, 1998; Hamra et al., 2013; Schoot et al., 2014). As a result, Bayesian statistical methods are becoming increasingly popular in applied and fundamental research. The key difference between Bayesian statistical inference and frequentist statistical methods concerns the nature of the unknown parameters. In the frequentist framework, a parameter of interest is assumed to be unknown, but fixed. That is, it is assumed that in the population, there is only one true population parameter— for example, one true mean or one true regression coefficient. In the Bayesian view of subjective probability, all unknown parameters are treated as uncertain and, therefore, should be described by a probability distribution (Schoot et al., 2014). A particularly attractive feature of Bayesian modeling is its ability to handle otherwise complex model specifications, such as hierarchical models (i.e., multilevel/mixed-effects models) that involve nested data structures (e.g., repeated height measurements in individuals) (Hamra et al., 2013). Bayesian statistical methods are becoming increasingly popular in applied and clinical research (Schoot et al., 2014).
There are three essential components underlying Bayesian statistics (Bayes, 1763; Stigler, 1986): the prior distribution, the likelihood function, and the posterior distribution. The prior distribution refers to all knowledge available before seeing the data, whereas the likelihood function expresses the information in the data given the parameters defined in the model. The third component, the posterior distribution, is obtained by combining the first two components via Bayes’ theorem, and the results are summarized by the so-called posterior inference. The posterior distribution, therefore, reflects one’s updated knowledge, balancing prior knowledge with observed data.
The task of combining these components can lead to a complex model in which the exact distribution of one or more variables is unknown. Estimators that rely on assumptions of normality may perform poorly in such cases. This limitation is often mitigated by estimating Bayesian models using Markov Chain Monte Carlo (MCMC). Unlike deterministic maximum-likelihood algorithms, MCMC is a stochastic procedure that repeatedly generates random samples to characterize the distribution of parameters of interest (Hamra et al., 2013). The popular software platforms for Bayesian estimation include the BUGS family, such as WinBUGS (Lunn et al., 2000), OpenBUGS (Spiegelhalter et al., 2007), and JAGS (Plummer, 2003). More recently, the software Stan has been developed to achieve higher computational and algorithmic efficiency by using the No-U-Turn Sampler (NUTS), an adaptive variant of Hamiltonian Monte Carlo (HMC) (Gelman et al., 2015; Hoffman & Gelman, 2011; Neal, 2011).
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.