The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

How to use the estimation procedures

Data simulation

DAG = create_empty_DAG(7)
DAG = bnlearn::set.arc(DAG, 'U1', 'U2')
DAG = bnlearn::set.arc(DAG, 'U1', 'U3')
DAG = bnlearn::set.arc(DAG, 'U1', 'U4')
DAG = bnlearn::set.arc(DAG, 'U2', 'U4')
DAG = bnlearn::set.arc(DAG, 'U3', 'U4')
DAG = bnlearn::set.arc(DAG, 'U4', 'U6')
DAG = bnlearn::set.arc(DAG, 'U5', 'U6')
DAG = bnlearn::set.arc(DAG, 'U4', 'U7')
DAG = bnlearn::set.arc(DAG, 'U6', 'U7')

DAG |> bnlearn::as.igraph() |>
  igraph::plot.igraph(size = 20, label.cex = 2
                      # , layout = igraph::layout_as_tree
  )

order_hash = r2r::hashmap()
order_hash[['U4']] = c("U2", "U1", "U3")
order_hash[['U6']] = c("U4", "U5")
order_hash[['U7']] = c("U4", "U6")
complete_and_check_orders(DAG, order_hash)

fam = matrix(c(0, 1, 1, 1, 0, 0, 0,
               0, 0, 0, 1, 0, 0, 0,
               0, 0, 0, 1, 0, 0, 0,
               0, 0, 0, 0, 0, 1, 1,
               0, 0, 0, 0, 0, 1, 0,
               0, 0, 0, 0, 0, 0, 1,
               0, 0, 0, 0, 0, 0, 0), 
             byrow = TRUE, ncol = 7)
tau = 0.2 * fam

my_PCBN = new_PCBN(
  DAG, order_hash,
  copula_mat = list(tau = tau, fam = fam))

N = 100
mydata = PCBN_sim(my_PCBN, N = N)

Estimation

Copula of U1 and U2

We prepare the environment for storing the results.

e = default_envir()

We start by fitting the copula of $(U_1, U_2)$.

BiCopCondFit(data = mydata, DAG = DAG, v = "U1", w = "U2",
             cond_set = c(), familyset = 1, order_hash = order_hash, e = e,
             method = "mle")
#> Estimating the copula of  U1  and  U2
#> Bivariate copula: Gaussian (par = 0.27, tau = 0.17)

This is stored in e$copula_hash. To access it, we can do:

copula_key = e$keychain[[list(margins = c("U1", "U2"), cond = character(0))]]

e$copula_hash[[copula_key]]
#> Bivariate copula: Gaussian (par = 0.27, tau = 0.17)

There is a level of indirection, because the copula_key actually stores the whole computation tree. Therefore, if the statistician decides to use a different PCBN for the same structure, some estimated parts can be reused if the computation path is the same.

Let’s see what is actually the copula_key.

print(data.tree::FromListSimple(copula_key))
#>   levelName
#> 1    U1, U2
#> 2     ¦--U1
#> 3     °--U2

Copulas related to U3 and U4

In the same way as before, we fit the copula of $(U_1, U_3)$.

BiCopCondFit(data = mydata, DAG = DAG, v = "U1", w = "U3",
             cond_set = c(), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the copula of  U1  and  U3
#> Bivariate copula: Gaussian (par = 0.22, tau = 0.14)

Remember that the ordered parents of $U_4$ are $U_2$, $U_1$ and $U_3$. Therefore, we fit first the copula of $(U_2, U_4)$.

BiCopCondFit(data = mydata, DAG = DAG, v = "U2", w = "U4",
             cond_set = c(), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the copula of  U2  and  U4
#> Bivariate copula: Gaussian (par = 0.18, tau = 0.12)

Then, we fit the copula of $(U_1, U_4) \, | \, U_2$.

BiCopCondFit(data = mydata, DAG = DAG, v = "U1", w = "U4",
             cond_set = c("U2"), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the cond cdf of  U1  given  U2 
#> Estimating the cond cdf of  U4  given  U2 
#> Estimating the copula of  U1  and  U4  given  U2
#> Bivariate copula: Gaussian (par = 0.32, tau = 0.21)

Finally, we fit the copula of $(U_3, U_4) \, | \, U_1, U_2$.

BiCopCondFit(data = mydata, DAG = DAG, v = "U3", w = "U4",
             cond_set = c("U1", "U2"), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the cond cdf of  U3  given  U1 
#> Estimating the cond cdf of  U4  given  U1 U2 
#> Estimating the copula of  U3  and  U4  given  U1 U2
#> Bivariate copula: Gaussian (par = 0.25, tau = 0.16)

Corresponding computation trees of the copulas

The computation trees of these copulas can be found as before.

e$keychain[[list(margins = c("U2", "U4"), cond = character(0))]] |>
  data.tree::FromListSimple() |>
  print()
#>   levelName
#> 1    U2, U4
#> 2     ¦--U2
#> 3     °--U4

e$keychain[[list(margins = c("U1", "U4"), cond = c("U2"))]] |>
  data.tree::FromListSimple() |>
  print()
#>        levelName
#> 1 U1, U4 | U2   
#> 2  ¦--U1 | U2   
#> 3  ¦   °--U1, U2
#> 4  ¦       ¦--U1
#> 5  ¦       °--U2
#> 6  °--U4 | U2   
#> 7      °--U2, U4
#> 8          ¦--U2
#> 9          °--U4

e$keychain[[list(margins = c("U3", "U4"), cond = c("U1", "U2"))]] |>
  data.tree::FromListSimple() |>
  print()
#>                 levelName
#> 1  U3, U4 | U1, U2       
#> 2   ¦--U3 | U1           
#> 3   ¦   °--U1, U3        
#> 4   ¦       ¦--U1        
#> 5   ¦       °--U3        
#> 6   °--U4 | U1, U2       
#> 7       °--U1, U4 | U2   
#> 8           ¦--U1 | U2   
#> 9           ¦   °--U1, U2
#> 10          ¦       ¦--U1
#> 11          ¦       °--U2
#> 12          °--U4 | U2   
#> 13              °--U2, U4
#> 14                  ¦--U2
#> 15                  °--U4

Corresponding computation trees of the margins

In the same way, we obtain the computation tree of the margin $U_4 \, | \, U_1, U_2$ by:

e$keychain[[list(margin = c("U4"), cond = c("U1", "U2"))]] |>
  data.tree::FromListSimple() |>
  print()
#>             levelName
#> 1  U4 | U1, U2       
#> 2   °--U1, U4 | U2   
#> 3       ¦--U1 | U2   
#> 4       ¦   °--U1, U2
#> 5       ¦       ¦--U1
#> 6       ¦       °--U2
#> 7       °--U4 | U2   
#> 8           °--U2, U4
#> 9               ¦--U2
#> 10              °--U4

You can remark that this is a sub-tree of the computation tree of the copula of $(U_3, U_4) \, | \, U_1, U_2$. This is coherent, because we needed the margin $U_4 \, | \, U_1, U_2$ to estimate this copula. Nevertheless, note that

e$keychain[[list(margin = c("U3"), cond = c("U1", "U2"))]]
#> NULL

This is because the margin $U_3 \, | \, U_1, U_2$ is actually the same as $U_3 \, | \, U_1$ by conditional independence. This can be seen using this function:

remove_CondInd(DAG = DAG, node = "U3", cond_set = c("U1", "U2"))
#> [1] "U1"

Conditional marginal pseudo-observations

The conditional marginal pseudo-observations can be found in e$margin_hash. For example, the conditional pseudo-observations $\hat U_{i, \, 3|1}$, $i=1, \dots, n$, can be obtained by:

e$margin_hash[[ e$keychain[[list(margin = c("U3"), cond = c("U1"))]] ]]  |>
  head()
#> [1] 0.7016929 0.3503202 0.2715592 0.9891115 0.1999063 0.1960950

These conditional margins are internally computed by the function ComputeCondMargin.

Fitting the rest of the DAG

Copulas related to the node U6

BiCopCondFit(data = mydata, DAG = DAG, v = "U4", w = "U6",
             cond_set = c(), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the copula of  U4  and  U6
#> Bivariate copula: Gaussian (par = 0.23, tau = 0.15)

BiCopCondFit(data = mydata, DAG = DAG, v = "U5", w = "U6",
             cond_set = c("U4"), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the cond cdf of  U6  given  U4 
#> Estimating the copula of  U5  and  U6  given  U4
#> Bivariate copula: Gaussian (par = 0.43, tau = 0.28)

Copulas related to the node U7

BiCopCondFit(data = mydata, DAG = DAG, v = "U4", w = "U7",
             cond_set = c(), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the copula of  U4  and  U7
#> Bivariate copula: Gaussian (par = 0.3, tau = 0.19)

BiCopCondFit(data = mydata, DAG = DAG, v = "U6", w = "U7",
             cond_set = c("U4"), familyset = 1, order_hash = order_hash,
             e = e, method = "mle")
#> Estimating the cond cdf of  U6  given  U4 
#> Estimating the cond cdf of  U7  given  U4 
#> Estimating the copula of  U6  and  U7  given  U4
#> Bivariate copula: Gaussian (par = 0.34, tau = 0.22)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.