Example: Inferred binary causal graph from simulation

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

title: “BiCausality: Binary Causality Inference Framework” author: “ C. Amornbunchornvej” date: “2023-11-28” output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{BiCausality_demo} %\VignetteEngine{knitr::knitr} \usepackage[utf8]{inputenc}

Example: Inferred binary causal graph from simulation

In the first step, we generate a simulation dataset as an input.

seedN<-2022

n<-200 # 200 individuals
d<-10 # 10 variables
mat<-matrix(nrow=n,ncol=d) # the input of framework

#Simulate binary data from binomial distribution where the probability of value being 1 is 0.5.
for(i in seq(n))
{
  set.seed(seedN+i)
  mat[i,] <- rbinom(n=d, size=1, prob=0.5)
}

mat[,1]<-mat[,2] | mat[,3]  # 1 causes by 2 and 3
mat[,4] <-mat[,2] | mat[,5] # 4 causses by 2 and 5
mat[,6] <- mat[,1] | mat[,4] # 6 causes by 1 and 4

We use the following function to infer whether X causes Y.

# Run the function
library(BiCausality)
resC<-BiCausality::CausalGraphInferMainFunc(mat = mat,CausalThs=0.1, nboot =50, IndpThs=0.05)

## Inferring dependent graph

## Removing confounder(s)

## Inferring causal graph

The result of the adjacency matrix of the directed causal graph is below:

resC$CausalGRes$Ehat

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    0    0    0    0    0    1    0    0    0     0
##  [2,]    1    0    0    1    0    0    0    0    0     0
##  [3,]    1    0    0    0    0    0    0    0    0     0
##  [4,]    0    0    0    0    0    1    0    0    0     0
##  [5,]    0    0    0    1    0    0    0    0    0     0
##  [6,]    0    0    0    0    0    0    0    0    0     0
##  [7,]    0    0    0    0    0    0    0    0    0     0
##  [8,]    0    0    0    0    0    0    0    0    0     0
##  [9,]    0    0    0    0    0    0    0    0    0     0
## [10,]    0    0    0    0    0    0    0    0    0     0

The value in the element EValHat[i,j] represents that i causes j if the value is not zero. For example, EValHat[2,1] = 1 implies node 2 causes node 1, which is correct since node 1 have nodes 2 and 3 as causal nodes.

The directed causal graph also can be plot using the code below.

library(igraph)

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

net <- graph_from_adjacency_matrix(resC$CausalGRes$Ehat ,weighted = NULL)
plot(net, edge.arrow.size = 0.3, vertex.size =20 , vertex.color = '#D4C8E9',layout=layout_with_kk)

plot of chunk unnamed-chunk-4

For the causal relation of variables 2 and 1, we can use the command below to see further information.

**Note that the odd difference between X and Y denoted oddDiff(X,Y) is define as |P (X = 1, Y = 1) P (X = 0, Y = 0) −P (X = 0, Y = 1) P (X = 1, Y = 0)|. If X is directly proportional to Y, then oddDiff(X,Y) is close to 1. If X is inverse of Y, then oddDiff(X,Y) is close to -1. If X and Y have no association, then oddDiff(X,Y) is close to zero.

resC$CausalGRes$causalInfo[['2,1']]

## $CDirConfValInv
##  2.5% 97.5% 
##     1     1 
## 
## $CDirConfInv
##      2.5%     97.5% 
## 0.3152526 0.4386415 
## 
## $CDirmean
## [1] 0.371347
## 
## $testRes2
## 
## 	Wilcoxon signed rank test with continuity correction
## 
## data:  abs(bCausalDirDist)
## V = 1275, p-value = 3.893e-10
## alternative hypothesis: true location is greater than 0.1
## 
## 
## $testRes1
## 
## 	Wilcoxon signed rank test with continuity correction
## 
## data:  abs(bSignDist)
## V = 1275, p-value = 3.889e-10
## alternative hypothesis: true location is greater than 0.05
## 
## 
## $sign
## [1] 1
## 
## $SignConfInv
##      2.5%     97.5% 
## 0.0865425 0.1282719 
## 
## $Signmean
## [1] 0.1090915

Below are the details of result explanation.

#This value represents the 95th percentile confidence interval of P(Y=1|X=1). 
$CDirConfValInv
 2.5% 97.5% 
    1     1 
#This value represents the 95th percentile confidence interval of |P(Y=1|X=1) - P(X=1|Y=1)|.
$CDirConfInv
     2.5%     97.5% 
0.3217322 0.4534494 

#This value represents the mean of |P(Y=1|X=1) - P(X=1|Y=1)|.
$CDirmean
[1] 0.3787904

#The test that has the null hypothesis that |P(Y=1|X=1) - P(X=1|Y=1)| below
#or equal the argument of parameter "CausalThs" and the alternative hypothesis
#is that |P(Y=1|X=1) - P(X=1|Y=1)| is greater than "CausalThs".
$testRes2

	Wilcoxon signed rank test with continuity correction

data:  abs(bCausalDirDist)
V = 1275, p-value = 3.893e-10
alternative hypothesis: true location is greater than 0.1


#The test that has the null hypothesis that |oddDiff(X,Y)| below 
#or equal the argument of parameter "IndpThs" and the alternative hypothesis is
#that |oddDiff(X,Y)| is greater than "IndpThs". 
$testRes1

	Wilcoxon signed rank test with continuity correction

data:  abs(bSignDist)
V = 1275, p-value = 3.894e-10
alternative hypothesis: true location is greater than 0.05

#If the test above rejects the null hypothesis with the significance threshold
#alpha (default alpha=0.05), then the value "sign=1", otherwise, it is zero.
$sign
[1] 1

#This value represents the 95th percentile confidence interval of oddDiff(X,Y)
$SignConfInv
      2.5%      97.5% 
0.08670325 0.13693900 

#This value represents the mean of oddDiff(X,Y)
$Signmean
[1] 0.1082242

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.