Code - Note 2

Likelihood based estimation for models on trees

Stefka Asenova

2021-12-20

library(gremes)

Load the data.

data("SeineData", package = "gremes")

head(Seine)
#>       Paris      Meaux      Melun   Nemours     Sens
#> 1  2.124338  -1.736640  9.0907845 0.5187825 14.26062
#> 2  9.646991   2.632994  8.7440743 0.6394548 15.47040
#> 3 19.172176  21.584171  9.1692941 2.1770178 15.26537
#> 4  5.357301 -14.759895 -0.4941262 1.5233623 13.47500
#> 5  6.044298 -20.772417  1.0182723 2.1690976 13.39804
#> 6  3.805912 -40.779572 -7.3982814 1.2666607 14.60144

Generate the graph and name the nodes. Assigning names to nodes is crucial. The names of the nodes should correspond to the names of the columns in the dataset.

seg<- graph(c(1,2,
              2,3,
              2,4,
              4,5,
              5,6,
              5,7), directed = FALSE)
name_stat<- c("Paris", "2", "Meaux", "Melun", "5", "Nemours", "Sens")
seg<- set.vertex.attribute(seg, "name", V(seg), name_stat) # 

Extract the nodes for which we do not observe realizations.

tobj<- Tree(seg, Seine)
#> From validate.Network: Edges have been assigned names
#> From validate.Network: There are nodes with latent variables
#> From validate.Network: Edges have been assigned names
#> From validate.Network: There are nodes with latent variables
Uc<- getNoDataNodes(tobj)

Create the subsets.

subs<- Neighborhood()
subs<- subset(subs, 3, seg, Uc) # neighborhood of order three

# verify if the identifiability criterion is satisfied for every subgraph induced by a subset
is_identifiable(subs, tobj) 
#> The nodes with latent variables {  5  } in set {  Paris 2 Meaux Melun 5  } have degree less than three.
#>                   The subgraph contains edge parameters that are non-identifiable.
#> 
#> The nodes with latent variables {  5  } in set {  Meaux 2 Paris Melun 5  } have degree less than three.
#>                   The subgraph contains edge parameters that are non-identifiable.
#> 
#> The nodes with latent variables {  2  } in set {  Nemours 5 Melun Sens 2  } have degree less than three.
#>                   The subgraph contains edge parameters that are non-identifiable.
#> 
#> The nodes with latent variables {  2  } in set {  Sens 5 Melun Nemours 2  } have degree less than three.
#>                   The subgraph contains edge parameters that are non-identifiable.
#> 

# change the order of the neighborhood and verify the identifiability again
subs<- subset(subs, 2, seg, Uc) # neighborhood of order two
is_identifiable(subs, tobj)

Subsets are created on the principle of neighborhood of order two for every observed variable.

Estimate MLE Version 1

mle1<- MLE1(seg)
#> From HRMnetwork: Edges have been assigned names
mle1<- estimate(mle1, Seine, subs, k_ratio=0.2)
#> From validate.Network: There are nodes with latent variables
#> From validate.Network: There are nodes with latent variables
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From validate.Network: There are nodes with latent variables
#> From validate.Network: There are nodes with latent variables
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From validate.Network: There are nodes with latent variables
#> From validate.Network: There are nodes with latent variables
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From validate.Network: There are nodes with latent variables
#> From validate.Network: There are nodes with latent variables
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From validate.Network: There are nodes with latent variables
#> From validate.Network: There are nodes with latent variables
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From validate.Network: There are nodes with latent variables
#> From validate.Network: There are nodes with latent variables
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From setParams.HRMtree: Names have been attributed to the vector 'value' in the order corresponding to the order of the edges: The fist element has the name of the first edge, the second element the name of the second edge, etc.
#> From setParams.HRMtree: The parameters have been attached to the edges according to their names

The messages are informative. They inform you about certain things but as long as they do not stop the estimation they are not errors.

Estimate MLE Version 2

mle2<- MLE2(seg)
#> From HRMnetwork: Edges have been assigned names
mle2<- estimate(mle2, Seine, subs, k_ratio=0.2)
#> From validate.Network: There are nodes with latent variables
#> From validate.Network: There are nodes with latent variables

The estimates from the two versions of the likelihood based estimators are very similar.

mle1$depParams
#>        e1        e2        e3        e4        e5        e6 
#> 0.3111864 1.0747109 0.6152536 0.6901670 1.0305526 0.6546275
mle2$depParams
#>        e1        e2        e3        e4        e5        e6 
#> 0.4097665 1.0571181 0.6065951 0.6859331 1.0104937 0.6784847

The Covariance selection method

The Covariance Selection Method in Højsgaard (2017) is not applicable when there are latent variables. To illustrate the Covariance Selection method using the dataset on Seine we create a graph without nodes with latent variables.

seg_short<- graph(c(1,2,
              2,3,
              2,4,
              2,5), directed = FALSE)
name_stat<- c("Paris", "Melun", "Meaux", "Nemours", "Sens")
seg_short<- set.vertex.attribute(seg_short, "name", V(seg_short), name_stat) # 

Extract the nodes for which we do not observe realizations.

tobj<- Tree(seg_short, Seine)
#> From validate.Network: Edges have been assigned names
#> From validate.Network: No latent variables
#> From validate.Network: Edges have been assigned names
#> From validate.Network: No latent variables
Uc<- getNoDataNodes(tobj)

Create the subsets.

subs_short<- Neighborhood()
subs_short<- subset(subs, 2, seg_short, Uc) # neighborhood of level three

Estimate using Covariance Selection Model.

mle<- MLE(seg_short)
#> From HRMnetwork: Edges have been assigned names
mle<- estimate(mle, Seine, subs_short,  k_ratio=0.2)
#> From validate.Network: No latent variables
#> From validate.Network: No latent variables
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From validate.Network: There are nodes with latent variables
#> From validate.Network: There are nodes with latent variables
#> Calling 'ggmfit' from package 'gRim'
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From validate.Network: There are nodes with latent variables
#> From validate.Network: There are nodes with latent variables
#> Calling 'ggmfit' from package 'gRim'
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From validate.Network: There are nodes with latent variables
#> From validate.Network: There are nodes with latent variables
#> Calling 'ggmfit' from package 'gRim'
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From validate.Network: There are nodes with latent variables
#> From validate.Network: There are nodes with latent variables
#> Calling 'ggmfit' from package 'gRim'
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From validate.Network: There are nodes with latent variables
#> From validate.Network: There are nodes with latent variables
#> Calling 'ggmfit' from package 'gRim'
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From setRootDepSet.RootDepSet: The order of the subset must correspond to the
#>             order of its corresponding root
#> From setParams.HRMtree: Names have been attributed to the vector 'value' in the order corresponding to the order of the edges: The fist element has the name of the first edge, the second element the name of the second edge, etc.
#> From setParams.HRMtree: The parameters have been attached to the edges according to their names

The messages are informative. They inform you about certain things but as long as they do not stop the estimation they are not errors.

The estimates are squares of the parameters, hence take the square root.

sqrt(mle$depParams)
#>        e1        e2        e3        e4 
#> 0.5010519 0.9133062 1.0333349 0.6907978

References

Højsgaard, Søren. 2017. GRim: Graphical Interaction Models. https://CRAN.R-project.org/package=gRim.