Estimation - Note 2

Likelihood based estimation for models on trees

Stefka Asenova

2021-12-20


For application of this estimator, see Vignette “Code - Note 2”.


\[\begin{equation} \{\mu_{W_u,u}(\theta)\}_v = -\frac{1}{2}\sum_{e \in p(u,v)} \theta_{e}^2, \quad v\in W_u \setminus u \end{equation}\]

\[\begin{equation} \big(\Lambda(\theta)\big)_{ij} = \lambda^2_{ij}(\theta) = \frac{1}{4}\sum_{e \in p(i,j)} \theta_e^2\, , \qquad i,j\in V, \ i \ne j, e\in E. \end{equation}\]

\[\begin{equation} \label{eq:hrdist} \big(\Sigma_{W_u,u}(\Lambda)\big)_{ij} = 2(\lambda_{iu}^2 + \lambda_{ju}^2 - \lambda^2_{ij}), \qquad i,j\in W_u\setminus u. \end{equation}\]

Maximum likelihood method - Version 1

The estimator of \((\theta_e, e\in E)\) is obtained in a two-step procedure:

Maximum likelihood method - Version 2

Consider the likelihood function of a random sample \(y_{i}, i=1, \ldots, k\) of multivariate normal distribution with mean vector \(\mu\) and covariance matrix \(\Sigma\), where \(y_i\) is of dimension \(d\).

\[\begin{align*} L(\mu,\Sigma;\, &y_1,\ldots,y_k) = \prod_{i=1}^k\phi_d(y_i-\mu;\Sigma) \\&= (2\pi)^{-kd/2}(\det \Sigma^{-1})^{k/2} %\\& %\times \exp\Big( -\frac{1}{2}\sum_{i=1}^k(y_i-\mu)^T\Sigma^{-1}(y_i-\mu) \Big)\, . \end{align*}\]

The method of composite likelihoods consists of optimizing a function that collects the likelihood functions across all the sets \(W_u, u\in U\). So let for all \(u\in U\) the subsets \(W_u\) be given.

Consider the composite likelihood function \[\begin{equation} \begin{split} L\big(\theta; \, & \{\Delta_{uv,i}: v\in W_u\setminus u,\, i\in I_u, u\in U\}\big) \\&= \prod_{u\in U}L\big(\theta_{W_u}; \{\Delta_{uv,i}: v\in W_u\setminus u, i\in I_u\}\big) \\&= \prod_{u\in U}\prod_{i\in I_u} \phi\Big(\{\Delta_{uv,i}: v\in W_u\setminus u, i\in I_u\} - \mu_{W_u, u}(\theta); \Sigma_{W_u, u}(\theta) \Big)\, . \end{split} \end{equation}\]

The estimator is given by

\[\begin{equation} \hat{\theta}^{MLE2}_{k,n} = \arg\max_{\theta\in(0,\infty)^{|E|}} L\big(\theta; \{\Delta_{uv,i}: v\in W_u\setminus u, i\in I_u, u\in U\}\big) \end{equation}\]

The assumption under this definition is that for any \(u,v \in U\) we have \(\Delta_{W_u\setminus u}\perp \Delta_{W_v\setminus v}\), which is clearly not true for overlapping vertex sets \(W_u\) and \(W_v\). However this simplifies the joint likelihood function and simulation results show that the estimator has comparable qualities to the moment estimator or the one based on extremal coefficients.

Covariance selection model

Let \(G(W_u)\) is the subgarph induced on the node set \(W_u\). This graph must be connected. The ML estimator of \(\theta\in (0,\infty)^{|E|}\) is obtained in a two-step procedure similar to the MM estimator, but using the MLE of \(\Sigma_{\cdot\setminus u}(\theta)\) instead.

As a first step a maximum likelihood approach is used to obtain an estimator of \(\Sigma_{\cdot\setminus u}(\theta)\) and then a least squares procedure is used to estimate \(\theta\) in the second step. The first step is an implementation of the Iterative Proportional Scaling algorithm from the R package gRim by Højsgaard (2017). For description of the IPS we refer to Lauritzen (1996).

References

Højsgaard, Søren. 2017. GRim: Graphical Interaction Models. https://CRAN.R-project.org/package=gRim.

Lauritzen, Steffen. 1996. Graphical Models. Oxford University Press.