| Title: | Tools for Causal Discovery on Observational Data |
| Version: | 0.9.5 |
| Description: | Various tools for inferring causal models from observational data. The package includes an implementation of the temporal Peter-Clark (TPC) algorithm. Petersen, Osler and Ekstrøm (2021) <doi:10.1093/aje/kwab087>. It also includes general tools for evaluating differences in adjacency matrices, which can be used for evaluating performance of causal discovery procedures. |
| Depends: | R (≥ 3.5.0) |
| License: | GPL-2 |
| Encoding: | UTF-8 |
| LazyData: | true |
| URL: | https://github.com/disco-coders/causalDisco |
| BugReports: | https://github.com/disco-coders/causalDisco/issues |
| RoxygenNote: | 7.3.3 |
| Imports: | pcalg, igraph, RColorBrewer, gtools, clipr, methods, scales, Rgraphviz, graph, magick, rmarkdown |
| Collate: | 'amat.R' 'causalDisco-package.R' 'compare.R' 'confusion.R' 'corTest.R' 'edges.R' 'evaluate.R' 'tpc.R' 'fci.R' 'gausCorScore.R' 'maketikz.R' 'metrics.R' 'misc.R' 'nDAGs.R' 'pc.R' 'plot.R' 'plotTempoMech.R' 'probmat2amat.R' 'regTest.R' 'simDAG.R' 'simGausFromDAG.R' 'tamat.R' 'tfci.R' 'tges.R' 'tplot.R' |
| NeedsCompilation: | no |
| Packaged: | 2026-01-20 08:04:11 UTC; hwm271 |
| Author: | Anne Helby Petersen [aut], Tobias Ellegaard Larsen [ctb], Claus Thorn Ekstrøm [ctb], Bjarke Hautop Kristensen [ctb, cre] |
| Maintainer: | Bjarke Hautop Kristensen <bjarke.kristensen@sund.ku.dk> |
| Repository: | CRAN |
| Date/Publication: | 2026-01-20 09:10:02 UTC |
F1 score
Description
Computes F1 score from a confusion matrix, see confusion.
The F1 score is defined as 2 * TP/(2 * TP + FP + FN), where TP are true positives,
FP are false positives, and FN are false negatives. If TP + FP + FN = 0, 1 is returned.
Usage
F1(confusion)
Arguments
confusion |
Confusion matrix as obtained from confusion |
Value
A numeric in [0,1].
False Discovery Rate
Description
Computes false discovery rate from a confusion matrix, see confusion. False discovery rate is defined as FP/(FP + TP), where FP are false positives and TP are true positives. If FP + TP = 0, 0 is returned.
Usage
FDR(confusion)
Arguments
confusion |
Confusion matrix as obtained from confusion |
Value
A numeric in [0,1].
False Omission Rate
Description
Computes false omission rate from a confusion matrix, see confusion. False omission rate is defined as FN/(FN + TN), where FN are false negatives and TN are true negatives. If FN + TN = 0, 0 is returned.
Usage
FOR(confusion)
Arguments
confusion |
Confusion matrix as obtained from confusion |
Value
A numeric in [0,1].
G1 score
Description
Computes G1 score from a confusion matrix, see confusion. G1 score is F1 score with
reversed roles of 0/1 classifications, see Petersen et al. 2022.
The G1 score is defined as 2 * TN/(2 * TN + FN + FP), where TN are true negatives,
FP are false positives, and FN are false negatives. If TN + FN + FP = 0, 1 is returned.
Usage
G1(confusion)
Arguments
confusion |
Confusion matrix as obtained from confusion |
Value
A numeric in [0,1].
References
Petersen, Anne Helby, et al. "Causal discovery for observational sciences using supervised machine learning." arXiv preprint arXiv:2202.12813 (2022).
Negative predictive value
Description
Computes negative predictive value recall from a confusion matrix, see confusion. Negative predictive value is defined as TN/(TN + FN), where TN are true negatives and FN are false negatives. If TP + FN = 0, 0 is returned.
Usage
NPV(confusion)
Arguments
confusion |
Confusion matrix as obtained from confusion |
Value
A numeric in [0,1].
Temporal Bayesian Dirichlet equivalent uniform (Score criterion)
Description
A reference class for categorical observational data Scoring with Tiered Background Knowledge. This class represents a score for causal discovery using tiered background knowledge from observational categorical
data; it is used in the causal discovery function tges.
Arguments
data |
A numeric matrix with |
order |
A vector specifying the order each variable. Can be either a vector of integers or an vector of prefixes. If integers, such that the ith entry will detail the order of the ith variable in the dataset. Must start at 1 an increase with increments of 1. If prefixes, must be in order. |
iss |
Imaginary Sample Size (ISS), also referred to as Equivalent Sample Size (ESS), determines how much weight is assigned to the prior in terms of the size of animaginary sample supporting it. Increasing the ISS will increase the density of the estimated graph. |
Details
The class implements a score which scores all edges contradicting the ordering
(edge going from a later tier to an earlier) to minus \infty. If the
the edges does not contradict, the score is equal to that of the standard BDeu.
Extends
Class pcalg::Score directly.
All reference classes extend and inherit methods from envRefClass.
Constructor
new("TemporalBdeu",
data = matrix(1, 1, 1),
order = rep(1,ncol(data)),
iss = 1
...)
Alternative implementation of TemporalBDeu score
We provide here a faster alternative to the implemented version of TemporalBDeu. However, this version relies on a non-exporten function from bnlearn. We provide the code for it here:
setRefClass("TemporalBDeu",
contains = "DataScore",
fields = list(
.order = "vector",
.iss = "numeric"),
methods = list(
initialize = function(data = matrix(1, 1, 1),
nodes = colnames(data),
iss = 1,
order = rep(1,ncol(data)),
...) {
.order <<- order
.iss <<- iss
callSuper(data = data,
nodes = nodes,
iss = iss,
...)},
local.score = function(vertex, parents,...) {
## Check validity of arguments
validate.vertex(vertex)
validate.parents(parents)
order <- .order
iss <- .iss
if (order[vertex] >= max(c(order[parents],-Inf))){
#Checks if the tier of parents are before or same as node
# Create local dataset
D <- pp.dat$data
pa_nam <- colnames(D)[parents]
ve_nam <- colnames(D)[vertex]
res_nam <- colnames(D)[-c(parents,vertex)]
# Create local bn object
if (length(parents) > 0){
mod_string <- paste(c("[",
paste(c(pa_nam,res_nam),collapse = "]["),
"][",
ve_nam,
"|",
paste(pa_nam,collapse = ":"),
"]"
),collapse = "")
} else {
mod_string <- paste(c("[",
paste(c(pa_nam,res_nam),collapse = "]["),
"][",
ve_nam,
"]"
),collapse = "")
}
bn_ob <- as.bn(mod_string)
BdeuScore <- bnlearn:::per.node.score(network = bn_ob,
data = D,
score = "bde",
targets = ve_nam,
extra.args = list(iss = iss,
prior = "uniform"))
return(BdeuScore)
}
else { skip <- -Inf
return(skip)}#set score to minus infinity if vertex earlier than parents
}),
inheritPackage = TRUE
)
Author(s)
Tobias Ellegaard Larsen
See Also
Examples
# For reproducibility
set.seed(123)
# Number of samples
n <- 1000
# Define probabilities for A
p_A <- c(0.4, 0.35, 0.25) # Probabilities for A = {1, 2, 3}
# Simulate A from a categorical distribution
A <- sample(1:3, n, replace = TRUE, prob = p_A)
# Define conditional probabilities for B given A
p_B_given_A <- list(
c(0.7, 0.3), # P(B | A=1)
c(0.4, 0.6), # P(B | A=2)
c(0.2, 0.8) # P(B | A=3)
)
# Sample B based on A
B <- sapply(A, function(a) sample(1:2, 1, prob = p_B_given_A[[a]]))
# Define conditional probabilities for C given A and B
p_C_given_A_B <- list(
"1_1" = c(0.6, 0.4), # P(C | A=1, B=1)
"1_2" = c(0.3, 0.7), # P(C | A=1, B=2)
"2_1" = c(0.5, 0.5), # P(C | A=2, B=1)
"2_2" = c(0.2, 0.8), # P(C | A=2, B=2)
"3_1" = c(0.7, 0.3), # P(C | A=3, B=1)
"3_2" = c(0.4, 0.6) # P(C | A=3, B=2)
)
# Sample C based on A and B
C <- mapply(function(a, b) sample(1:2, 1, prob = p_C_given_A_B[[paste(a, b, sep = "_")]]), A, B)
# Create dataset
simdata <- data.frame(as.factor(A), as.factor(B), as.factor(C))
# Define order in prefix way
colnames(simdata) <- c("child_A","child_B","adult_C")
prefix_order <- c("child", "adult")
# Define TemporalBDeu score
t_score <- new("TemporalBDeu", order = prefix_order
, data = simdata)
# Run tges
tges_pre <- tges(t_score)
# Plot MPDAG
plot(tges_pre)
# Define order in integer way
colnames(simdata) <- c("A","B","C")
integer_order <- c(1,1,2)
# Define TemporalBDeu score
t_score <- new("TemporalBDeu", order = integer_order
, data = simdata)
# Run tges
tges_integer <- tges(t_score)
# Plot MPDAG
plot(tges_integer)
Temporal Bayesian Information Criterion (Score criterion)
Description
A Reference Class for Gaussian Observational Data Scoring with Tiered Background Knowledge. This class represents a score for causal discovery using tiered background knowledge from observational Gaussian
data; it is used in the causal discovery function tges.
Arguments
data |
A numeric matrix with |
order |
A vector specifying the order each variable. Can be either a vector of integers or an vector of prefixes. If integers, such that the ith entry will detail the order of the ith variable in the dataset. Must start at 1 an increase with increments of 1. If prefixes, must be in order. |
lambda |
Penalization constant (see details). |
intercept |
Logical; indicates whether an intercept is allowed in the linear structural equations (i.e., whether a nonzero mean is allowed). |
Details
The class implements a score which scores all edges contradicting the ordering
(edge going from a later tier to an earlier) to minus \infty. If the
the edges does not contradict, the score is equal to that of pcalg::GaussL0penObsScore:
The class implements an \ell_0-penalized Gaussian maximum
likelihood estimator. The penalization is a constant (specified by
the argument lambda in the constructor) times the number of
parameters of the DAG model. By default, the constant \lambda is
chosen as \log(n)/2, which corresponds to the BIC score.
Extends
Class pcalg::GaussL0penObsScore directly.
All reference classes extend and inherit methods from envRefClass.
Constructor
new("TemporalBIC",
data = matrix(1, 1, 1),
order = rep(1,ncol(data)),
lambda = 0.5 * log(nrow(data)),
intercept = TRUE,
...)
Author(s)
Tobias Ellegaard Larsen
See Also
Examples
#Simulate Gaussian data
set.seed(123)
n <- 500
child_x <- rnorm(n)
child_y <- 0.5*child_x + rnorm(n)
child_z <- 2*child_x + child_y + rnorm(n)
adult_x <- child_x + rnorm(n)
adult_z <- child_z + rnorm(n)
adult_w <- 2*adult_z + rnorm(n)
adult_y <- 2*child_x + adult_w + rnorm(n)
simdata <- data.frame(child_x, child_y, child_z,
adult_x, adult_z, adult_w,
adult_y)
# Define order in prefix way
prefix_order <- c("child", "adult")
# Define TBIC score
t_score <- new("TemporalBIC", order = prefix_order
, data = simdata)
# Run tges
tges_pre <- tges(t_score)
# Plot MPDAG
plot(tges_pre)
# Define order in integer way
integer_order <- c(1,1,1,2,2,2,2)
# Define TBIC score
t_score <- new("TemporalBIC", order = integer_order
, data = simdata)
# Run tges
tges_int <- tges(t_score)
# Plot MPDAG
plot(tges_int)
Compute confusion matrix for comparing two adjacency matrices
Description
Two adjacency matrices are compared either in terms of adjacencies
(type = "adj") or orientations (type = "dir").
Usage
adj_confusion(est_amat, true_amat)
Arguments
est_amat |
The estimated adjacency matrix, or |
true_amat |
The true adjacency matrix, or |
Details
Adjacency comparison: The confusion matrix is a cross-tabulation
of adjacencies. Hence, a true positive means that the two inputs agree on
the presence of an adjacency. A true negative means that the two inputs agree
on no adjacency. A false positive means that est_amat places an adjacency
where there should be none. A false negative means that est_amat does
not place an adjacency where there should have been one.
Orientation comparison: The orientation confusion matrix is conditional on agreement on adjacency. This means that only adjacencies that are shared in both input matrices are considered, and agreement wrt. orientation is then computed only among these edges that occur in both matrices. A true positive is a correctly placed arrowhead (1), a false positive marks placement of arrowhead (1) where there should have been a tail (0), a false negative marks placement of tail (0) where there should have been an arrowhead (1), and a true negative marks correct placement of a tail (0).
Value
A list with entries $tp (number of true positives), $tn (number of true negatives),
$fp (number of false positives), and $tp (number of false negatives).
Examples
#############################################################################
# Compare two adjacency matrices ############################################
#############################################################################
x1 <- matrix(c(0, 0, 0, 0,
1, 0, 1, 0,
1, 0, 0, 0,
0, 0, 1, 0), 4, 4, byrow = TRUE)
x2 <- matrix(c(0, 0, 1, 0,
1, 0, 0, 0,
0, 0, 0, 0,
1, 0, 1, 0), 4, 4, byrow = TRUE)
# confusion matrix for adjacencies
confusion(x2, x1)
# confusion matrix for conditional orientations
confusion(x2, x1, type = "dir")
#############################################################################
# Compare estimated cpdag with true adjacency matrix ########################
#############################################################################
# simulate DAG adjacency matrix and Gaussian data
set.seed(123)
x3 <- matrix(c(0, 0, 0, 0,
1, 0, 1, 0,
0, 0, 0, 0,
0, 0, 1, 0), 4, 4, byrow = TRUE)
ex_data <- simGausFromDAG(x3, n = 50)
pcres <- pc(ex_data, sparsity = 0.1, test = corTest)
# compare adjacencies with true amat (x1)
confusion(pcres, x3)
# compare conditional orientations with true amat
confusion(pcres, x1, type = "dir")
Extract adjacency matrix from tpdag, cpdag, tpag or pag object
Description
If the input is a tpdag or cpdag, the resulting adjacency matrix A is "from-to" matrix encoded as follows: - A(i,j) = 1 and A(j,i) = 0 means there is an edge j -> i. - A(j,i) = 1 and A(i,j) = 0 means there is an edge i -> j. - A(i,j) = 1 and A(j,i) = 1 means there is an undirected edge between i and j, i - j. - A(i,j) = 0 and A(j,i) = 0 means there is no edge between i and j.
Usage
amat(x)
Arguments
x |
|
Details
If the inout is a tpag or pag, there are four possible entry values: 0 (no edge), 1 (circle), 2 (arrowhead), 3 (tail). It is still encoded as a "to-from" adjacency matrix, which means that e.g. A(i,j) = 1 places a circle in the directed from j to i. For example, if A(i,j) = 1 and A(j,i) = 2, we have that i o-> j. Similarly, A(i,j) = 2 and A(j,i) = 3 mean that i <- j.
Convert adjacency matrix to graphNEL object
Description
Convert adjacency matrix to graphNEL object
Usage
as.graphNEL(amat)
Arguments
amat |
An adjacency matrix |
Value
A graphNEL object, see graphNEL-class.
Compute average degree for adjacency matrix
Description
Computes the average degree, i.e. the number of edges divided by the number of nodes.
Usage
average_degree(amat)
Arguments
amat |
An adjacency matrix |
Value
A numeric.
Compare two tpdag or tskeleton objects
Description
Compare edges in two tpdag objects or two tskeleton objects. Note that they should be based on the same variables. Only edge absence/presence is compared, not edge orientation.
Usage
compare(x, y = NULL)
Arguments
x |
First object |
y |
Second object (optional) |
Value
A list with entries: $nedges1 (the number of
edges in the first object), $nedges2 (the number of edges
in the second object), $psi1 (the test significance level
of the first object), $psi2 (the test significance level of
the second object), $nadded (the number of additional edges in
object 2, relative to object 1), and nremoved (the number of
absent edges in object 2, relative to object 1).
Compute confusion matrix for comparing two adjacency matrices
Description
Two adjacency matrices are compared either in terms of adjacencies
(type = "adj") or orientations (type = "dir").
Usage
confusion(est_amat, true_amat, type = "adj")
Arguments
est_amat |
The estimated adjacency matrix, or |
true_amat |
The true adjacency matrix, or |
type |
String indicating whether the confusion matrix should be computed for adjacencies
( |
Details
Adjacency comparison: The confusion matrix is a cross-tabulation
of adjacencies. Hence, a true positive means that the two inputs agree on
the presence of an adjacency. A true negative means that the two inputs agree
on no adjacency. A false positive means that est_amat places an adjacency
where there should be none. A false negative means that est_amat does
not place an adjacency where there should have been one.
Orientation comparison: The orientation confusion matrix is conditional on agreement on adjacency. This means that only adjacencies that are shared in both input matrices are considered, and agreement wrt. orientation is then computed only among these edges that occur in both matrices. A true positive is a correctly placed arrowhead (1), a false positive marks placement of arrowhead (1) where there should have been a tail (0), a false negative marks placement of tail (0) where there should have been an arrowhead (1), and a true negative marks correct placement of a tail (0).
Value
A list with entries $tp (number of true positives), $tn (number of true negatives),
$fp (number of false positives), and $tp (number of false negatives).
Examples
#############################################################################
# Compare two adjacency matrices ############################################
#############################################################################
x1 <- matrix(c(0, 0, 0, 0,
1, 0, 1, 0,
1, 0, 0, 0,
0, 0, 1, 0), 4, 4, byrow = TRUE)
x2 <- matrix(c(0, 0, 1, 0,
1, 0, 0, 0,
0, 0, 0, 0,
1, 0, 1, 0), 4, 4, byrow = TRUE)
# confusion matrix for adjacencies
confusion(x2, x1)
# confusion matrix for conditional orientations
confusion(x2, x1, type = "dir")
#############################################################################
# Compare estimated cpdag with true adjacency matrix ########################
#############################################################################
# simulate DAG adjacency matrix and Gaussian data
set.seed(123)
x3 <- matrix(c(0, 0, 0, 0,
1, 0, 1, 0,
0, 0, 0, 0,
0, 0, 1, 0), 4, 4, byrow = TRUE)
ex_data <- simGausFromDAG(x3, n = 50)
pcres <- pc(ex_data, sparsity = 0.1, test = corTest)
# compare adjacencies with true amat (x1)
confusion(pcres, x3)
# compare conditional orientations with true amat
confusion(pcres, x1, type = "dir")
Test for vanishing partial correlations
Description
This function simply calls the gaussCItest
function from the pcalg package.
Usage
corTest(x, y, S, suffStat)
Arguments
x |
Index of x variable |
y |
Index of y variable |
S |
Index of S variable(s), possibly NULL |
suffStat |
Sufficient statistic; list with data, binary variables and order. |
Value
A numeric, which is the p-value of the test.
Compute confusion matrix for comparing two adjacency matrices
Description
Two adjacency matrices are compared either in terms of adjacencies
(type = "adj") or orientations (type = "dir").
Usage
dir_confusion(est_amat, true_amat)
Arguments
est_amat |
The estimated adjacency matrix, or |
true_amat |
The true adjacency matrix, or |
Details
Adjacency comparison: The confusion matrix is a cross-tabulation
of adjacencies. Hence, a true positive means that the two inputs agree on
the presence of an adjacency. A true negative means that the two inputs agree
on no adjacency. A false positive means that est_amat places an adjacency
where there should be none. A false negative means that est_amat does
not place an adjacency where there should have been one.
Orientation comparison: The orientation confusion matrix is conditional on agreement on adjacency. This means that only adjacencies that are shared in both input matrices are considered, and agreement wrt. orientation is then computed only among these edges that occur in both matrices. A true positive is a correctly placed arrowhead (1), a false positive marks placement of arrowhead (1) where there should have been a tail (0), a false negative marks placement of tail (0) where there should have been an arrowhead (1), and a true negative marks correct placement of a tail (0).
Value
A list with entries $tp (number of true positives), $tn (number of true negatives),
$fp (number of false positives), and $tp (number of false negatives).
Examples
#############################################################################
# Compare two adjacency matrices ############################################
#############################################################################
x1 <- matrix(c(0, 0, 0, 0,
1, 0, 1, 0,
1, 0, 0, 0,
0, 0, 1, 0), 4, 4, byrow = TRUE)
x2 <- matrix(c(0, 0, 1, 0,
1, 0, 0, 0,
0, 0, 0, 0,
1, 0, 1, 0), 4, 4, byrow = TRUE)
# confusion matrix for adjacencies
confusion(x2, x1)
# confusion matrix for conditional orientations
confusion(x2, x1, type = "dir")
#############################################################################
# Compare estimated cpdag with true adjacency matrix ########################
#############################################################################
# simulate DAG adjacency matrix and Gaussian data
set.seed(123)
x3 <- matrix(c(0, 0, 0, 0,
1, 0, 1, 0,
0, 0, 0, 0,
0, 0, 1, 0), 4, 4, byrow = TRUE)
ex_data <- simGausFromDAG(x3, n = 50)
pcres <- pc(ex_data, sparsity = 0.1, test = corTest)
# compare adjacencies with true amat (x1)
confusion(pcres, x3)
# compare conditional orientations with true amat
confusion(pcres, x1, type = "dir")
Compute confusion matrix for comparing two adjacency matrices
Description
Two adjacency matrices are compared either in terms of adjacencies
(type = "adj") or orientations (type = "dir").
Usage
dir_confusion_original(est_amat, true_amat)
Arguments
est_amat |
The estimated adjacency matrix, or |
true_amat |
The true adjacency matrix, or |
Details
This is an old version of the function, included for possible backwards compatibility. Edges are scored as follows: A correctly unoriented edge counts as a true negative (TN). An undirected edge that should have been directed counts as a false negative (FN). A directed edge that should have been undirected counts as a false positive (FP). A directed edge oriented in the correct direction counts as a true positive (TP). A directed edge oriented in the incorrect direction counts as both a false positive (FP) and a false negative (FN).
Value
A list with entries $tp (number of true positives), $tn (number of true negatives),
$fp (number of false positives), and $tp (number of false negatives).
Examples
#############################################################################
# Compare two adjacency matrices ############################################
#############################################################################
x1 <- matrix(c(0, 0, 0, 0,
1, 0, 1, 0,
1, 0, 0, 0,
0, 0, 1, 0), 4, 4, byrow = TRUE)
x2 <- matrix(c(0, 0, 1, 0,
1, 0, 0, 0,
0, 0, 0, 0,
1, 0, 1, 0), 4, 4, byrow = TRUE)
# confusion matrix for adjacencies
confusion(x2, x1)
# confusion matrix for conditional orientations
confusion(x2, x1, type = "dir")
#############################################################################
# Compare estimated cpdag with true adjacency matrix ########################
#############################################################################
# simulate DAG adjacency matrix and Gaussian data
set.seed(123)
x3 <- matrix(c(0, 0, 0, 0,
1, 0, 1, 0,
0, 0, 0, 0,
0, 0, 1, 0), 4, 4, byrow = TRUE)
ex_data <- simGausFromDAG(x3, n = 50)
pcres <- pc(ex_data, sparsity = 0.1, test = corTest)
# compare adjacencies with true amat (x1)
confusion(pcres, x3)
# compare conditional orientations with true amat
confusion(pcres, x1, type = "dir")
List of edges in adjacency matrix
Description
Produces a list of edges from an adjacency matrix.
Usage
edges(amat)
Arguments
amat |
An adjacency matrix. |
Value
A list consisting of two lists: One for oriented edges ($dir),
and one for unoriented edges ($undir).
Convert essential graph to adjacency matrix
Description
Extracts the adjacency matrix from an EssGraph-class object. This object is returned
by score-based causal discovery algorithms in the pcalg package.
Usage
essgraph2amat(essgraph, p = length(essgraph$field(".nodes")))
Arguments
essgraph |
An |
p |
The number of nodes in the graph |
Value
An adjacency matrix (square matrix with 0/1 entries).
Evaluate adjacency matrix estimation
Description
Applies several different metrics to evaluate difference between estimated and true adjacency matrices. Intended to be used to evaluate performance of causal discovery algorithms.
Usage
evaluate(est, true, metrics, ...)
Arguments
est |
Estimated adjacency matrix/matrices. |
true |
True adjacency matrix/matrices. |
metrics |
List of metrics, see details. |
... |
Further arguments that depend on input type. Currently only |
Details
Two options for input are available: Either est and true
can be two adjacency matrices, or they can be two arrays of adjacency matrices.
The arrays should have shape n * p * p where n is the number of of matrices,
and p is the number of nodes/variables.
The metrics should be given as a list with slots $adj, $dir and
$other. Metrics under $adj are applied to the adjacency confusion
matrix, while metrics under $dir are applied to the conditional orientation
confusion matrix (see confusion). Metrics under $other are applied
without computing confusion matrices first.
Available metrics to be used with confusion matrices are precision, recall, specificity, FOR, FDR, NPV, F1 and G1. The user can supply custom metrics as well: They need to have the confusion matrix as their first argument and should return a numeric.
Available metrics to be used as "other" is: shd. The user
can supply custom metrics as well: They need to have arguments est_amat and true_amat,
where the former is the estimated adjacency matrix and the latter is the true adjacency matrix. The
metrics should return a numeric.
Value
A data.frame with one column for each computed metric and one row per evaluated
matrix pair. Adjacency metrics are prefixed with "adj_", orientation metrics are prefixed
with "dir_", other metrics do not get a prefix. If the first argument is a matrix, list.out = TRUE
can be used to change the return object to a list instead. This list will contain three lists, where
adjacency, orientation and other metrics are reported, respectively.
Evaluate adjacency matrix estimation
Description
Applies several different metrics to evaluate difference between estimated and true adjacency matrices. Intended to be used to evaluate performance of causal discovery algorithms.
Usage
## S3 method for class 'array'
evaluate(est, true, metrics, ...)
Arguments
est |
Estimated adjacency matrix/matrices. |
true |
True adjacency matrix/matrices. |
metrics |
List of metrics, see details. |
... |
Further arguments that depend on input type. Currently only |
Details
Two options for input are available: Either est and true
can be two adjacency matrices, or they can be two arrays of adjacency matrices.
The arrays should have shape n * p * p where n is the number of of matrices,
and p is the number of nodes/variables.
The metrics should be given as a list with slots $adj, $dir and
$other. Metrics under $adj are applied to the adjacency confusion
matrix, while metrics under $dir are applied to the conditional orientation
confusion matrix (see confusion). Metrics under $other are applied
without computing confusion matrices first.
Available metrics to be used with confusion matrices are precision, recall, specificity, FOR, FDR, NPV, F1 and G1. The user can supply custom metrics as well: They need to have the confusion matrix as their first argument and should return a numeric.
Available metrics to be used as "other" is: shd. The user
can supply custom metrics as well: They need to have arguments est_amat and true_amat,
where the former is the estimated adjacency matrix and the latter is the true adjacency matrix. The
metrics should return a numeric.
Value
A data.frame with one column for each computed metric and one row per evaluated
matrix pair. Adjacency metrics are prefixed with "adj_", orientation metrics are prefixed
with "dir_", other metrics do not get a prefix. If the first argument is a matrix, list.out = TRUE
can be used to change the return object to a list instead. This list will contain three lists, where
adjacency, orientation and other metrics are reported, respectively.
Evaluate adjacency matrix estimation
Description
Applies several different metrics to evaluate difference between estimated and true adjacency matrices. Intended to be used to evaluate performance of causal discovery algorithms.
Usage
## S3 method for class 'matrix'
evaluate(est, true, metrics, list.out = FALSE, ...)
Arguments
est |
Estimated adjacency matrix/matrices. |
true |
True adjacency matrix/matrices. |
metrics |
List of metrics, see details. |
list.out |
If |
... |
Further arguments that depend on input type. Currently only |
Details
Two options for input are available: Either est and true
can be two adjacency matrices, or they can be two arrays of adjacency matrices.
The arrays should have shape n * p * p where n is the number of of matrices,
and p is the number of nodes/variables.
The metrics should be given as a list with slots $adj, $dir and
$other. Metrics under $adj are applied to the adjacency confusion
matrix, while metrics under $dir are applied to the conditional orientation
confusion matrix (see confusion). Metrics under $other are applied
without computing confusion matrices first.
Available metrics to be used with confusion matrices are precision, recall, specificity, FOR, FDR, NPV, F1 and G1. The user can supply custom metrics as well: They need to have the confusion matrix as their first argument and should return a numeric.
Available metrics to be used as "other" is: shd. The user
can supply custom metrics as well: They need to have arguments est_amat and true_amat,
where the former is the estimated adjacency matrix and the latter is the true adjacency matrix. The
metrics should return a numeric.
Value
A data.frame with one column for each computed metric and one row per evaluated
matrix pair. Adjacency metrics are prefixed with "adj_", orientation metrics are prefixed
with "dir_", other metrics do not get a prefix. If the first argument is a matrix, list.out = TRUE
can be used to change the return object to a list instead. This list will contain three lists, where
adjacency, orientation and other metrics are reported, respectively.
Evaluate adjacency matrix estimation
Description
Applies several different metrics to evaluate difference between estimated and true adjacency matrices. Intended to be used to evaluate performance of causal discovery algorithms.
Usage
## S3 method for class 'tamat'
evaluate(est, true, metrics, ...)
Arguments
est |
Estimated adjacency matrix/matrices. |
true |
True adjacency matrix/matrices. |
metrics |
List of metrics, see details. |
... |
Further arguments that depend on input type. Currently only |
Details
Two options for input are available: Either est and true
can be two adjacency matrices, or they can be two arrays of adjacency matrices.
The arrays should have shape n * p * p where n is the number of of matrices,
and p is the number of nodes/variables.
The metrics should be given as a list with slots $adj, $dir and
$other. Metrics under $adj are applied to the adjacency confusion
matrix, while metrics under $dir are applied to the conditional orientation
confusion matrix (see confusion). Metrics under $other are applied
without computing confusion matrices first.
Available metrics to be used with confusion matrices are precision, recall, specificity, FOR, FDR, NPV, F1 and G1. The user can supply custom metrics as well: They need to have the confusion matrix as their first argument and should return a numeric.
Available metrics to be used as "other" is: shd. The user
can supply custom metrics as well: They need to have arguments est_amat and true_amat,
where the former is the estimated adjacency matrix and the latter is the true adjacency matrix. The
metrics should return a numeric.
Value
A data.frame with one column for each computed metric and one row per evaluated
matrix pair. Adjacency metrics are prefixed with "adj_", orientation metrics are prefixed
with "dir_", other metrics do not get a prefix. If the first argument is a matrix, list.out = TRUE
can be used to change the return object to a list instead. This list will contain three lists, where
adjacency, orientation and other metrics are reported, respectively.
Perform causal discovery using the FCI algorithm
Description
This is a wrapper function for the fci function as
implemented in the pcalg package. All computations are carried out by the
pcalg package. The default output object however matches that of tfci, see
this function for details about how the adjacency matrix is encoded.
Usage
fci(
data = NULL,
sparsity = 10^(-1),
test = regTest,
suffStat = NULL,
method = "stable.fast",
methodNA = "none",
methodOri = "conservative",
output = "pag",
varnames = NULL,
...
)
Arguments
data |
A data.frame with data. All variables should be assigned to exactly one period by prefixing them with the period name (see example below). |
sparsity |
The sparsity level to be used for independence testing (i.e. significance level threshold to use for each test). |
test |
A procedure for testing conditional independence.
The default, |
suffStat |
Sufficient statistic. If this argument is supplied, the sufficient statistic is not computed from the inputted data. The format and contents of the sufficient statistic depends on which test is being used. |
method |
Which method to use for skeleton construction, must be
|
methodNA |
Method for handling missing information ( |
methodOri |
Method for handling conflicting separating sets when orienting
edges, must be one of |
output |
One of |
varnames |
A character vector of variable names. It only needs to be supplied
if the |
... |
Further optional arguments which are passed to
|
Examples
# simulate linear Gaussian data w unobserved variable L1
n <- 100
L1 <- rnorm(n)
X1 <- rnorm(n)
X2 <- L1 + X1 + rnorm(n)
X3 <- X1 + rnorm(n)
X4 <- X3 + L1 + rnorm(n)
d <- data.frame(p1_X1 = X1,
p1_X2 = X2,
p2_X3 = X3,
p2_X4 = X4)
# use FCI algorithm to recover PAG
fci(d, test = corTest)
Gaussian L0 score computed on correlation matrix
Description
The score is intended to be used with score-based causal discovery algorithms from the pcalg package. It is identical to the pcalg::GaussL0penObsScore, except that it takes in a correlation matrix instead of the full data set.
Usage
gausCorScore(cormat, n, p = NULL, lambda = NULL, ...)
Arguments
cormat |
A correlation matrix. Needs to be symmetric. |
n |
The number of observations in the dataset that the correlation matrix was computed from. |
p |
The number of variables. This is inferred from the cormat if not supplied. |
lambda |
Penalty to use for the score. If |
... |
Other arguments passed along to pcalg::GaussL0penObsScore. |
Value
A Score object (S4), see pcalg::Score.
Examples
# Simulate data and compute correlation matrix
x1 <- rnorm(100)
x2 <- rnorm(100)
x3 <- x1 + x2 + rnorm(100)
d <- data.frame(x1, x2, x3)
cmat <- cor(d)
# Use gausCorScore with pcalg::ges()
pcalg::ges(gausCorScore(cmat, n = 100))
Get variables with a specific prefix (character method)
Description
Get variables with a specific prefix (character method)
Usage
## S3 method for class 'character'
getvar(x, prefix, sep = "_")
Arguments
x |
Character vector |
prefix |
Prefix to match |
sep |
Separator, default "_" |
Value
Subset of x that matches the prefix
Get variables with a specific prefix (data.frame method)
Description
Get variables with a specific prefix (data.frame method)
Usage
## S3 method for class 'data.frame'
getvar(x, prefix, sep = "_")
Arguments
x |
A data.frame |
prefix |
Prefix to match |
sep |
Separator, default "_" |
Value
Names of columns in x that match the prefix
Convert graphNEL object to adjacency matrix
Description
Convert graphNEL object to adjacency matrix
Usage
graph2amat(graph, toFrom = TRUE, type = "pdag")
Arguments
graph |
A graphNEL object. |
toFrom |
Logical indicating whether the resulting adjancency matrix is "to-from" (default), or "from-to", see details. |
type |
The type of adjancency matrix, must be one of |
Details
A "to-from" pdag adjacency matrix is encoded as follows: A(i,j) = 1 and A(j,i) = 0
means there is an edge i -> j. A(j,i) = 1 and A(i,j) = 0 means there is an edge j -> i.
A(i,j) = 1 and A(j,i) = 1 means there is an undirected edge between i and j, i - j.
A(i,j) = 0 and A(j,i) = 0 means there is no edge between i and j.
A "from-to" adjacency matrix is the transpose of a "to-from" adjacency matrix.
A "from-to" pdag adjacency matrix is hence encoded as follows: A(i,j) = 1 and A(j,i) = 0
means there is an edge j -> i. A(j,i) = 1 and A(i,j) = 0 means there is an edge i -> j.
A(i,j) = 1 and A(j,i) = 1 means there is an undirected edge between i and j, i - j.
A(i,j) = 0 and A(j,i) = 0 means there is no edge between i and j.
See amat for details about how an ag adjacency matrix is encoded.
Check for CPDAG
Description
Check for CPDAG
Usage
is_cpdag(amat)
Arguments
amat |
An adjacency matrix |
Details
Check: Is adjacency matrix proper CPDAG? See isValidGraph for
definition.
Value
A logical.
Check for PDAG
Description
Check for PDAG
Usage
is_pdag(amat)
Arguments
amat |
An adjacency matrix |
Details
Check: Is adjacency matrix proper PDAG? See isValidGraph for
definition.
Value
A logical.
Generate Latex tikz code for plotting a temporal DAG, PDAG or PAG.
Description
Generate Latex tikz code for plotting a temporal DAG, PDAG or PAG.
Usage
maketikz(
model,
xjit = 2,
yjit = 2,
markperiods = TRUE,
xpgap = 4,
annotateEdges = NULL,
addAxis = TRUE,
varLabels = NULL,
periodLabels = NULL,
annotationLabels = NULL,
clipboard = TRUE,
rawout = FALSE,
colorAnnotate = NULL,
bendedges = FALSE
)
Arguments
model |
|
xjit |
How much should nodes within a period be jittered horizontally. |
yjit |
Vertical distance between nodes within a period. |
markperiods |
If |
xpgap |
Horizontal gap between different periods. |
annotateEdges |
If |
addAxis |
If |
varLabels |
Optional labels for nodes (variables). Should be given as a named list, where
the name is the variable name, and the entry is the label, e.g. |
periodLabels |
Optional labels for periods. Should be given as a named list, where
the name is the period name (as stored in the |
annotationLabels |
Optional labels for edge annotations. Only used if |
clipboard |
If |
rawout |
If |
colorAnnotate |
Named list of colors to use to mark edge annotations instead of labels. This
overrules |
bendedges |
If |
Details
Note that it is necessary to read in relevant tikz libraries in the
Latex preamble. The relevant lines of code are (depending a bit on parameter settings):
\usepackage{tikz}
\usetikzlibrary{arrows.meta,arrows,shapes,decorations,automata,backgrounds,petri}
\usepackage{pgfplots}
Value
Silently returns a character vector with lines of tikz code. The function
furthermore has a side-effect. If clipboard = TRUE, the side-effect is that the tikz
code is also copied to the clipboard. If clipboard = FALSE, the tikz code is instead printed
in the console.
Examples
# Make tikz figure code from tpdag, print code to screen
data(tpcExample)
tpdag1 <- tpc(tpcExample, order = c("child", "youth", "oldage"), sparsity = 0.01,
test = corTest)
maketikz(tpdag1, clipboard = FALSE)
# Make tikz figure code from tamat, copy code to clipboard
thisdag <- simDAG(5)
rownames(thisdag) <- colnames(thisdag) <- c("child_x", "child_y",
"child_z", "adult_x",
"adult_y")
thistamat <- tamat(thisdag, order = c("child", "adult"))
## Not run:
maketikz(thistamat)
## End(Not run)
Compute maximal number of edges for graph
Description
Computes the number of edges a graph with p nodes will have if its
fully connected.
Usage
maxnedges(p)
Arguments
p |
The number of nodes in the graph |
Value
A numeric.
Number of different DAGs
Description
Computes the number of different possible DAGs that can be constructed over a given number of nodes.
Usage
nDAGs(p)
Arguments
p |
The number of nodes. |
Value
A numeric.
Number of edges in adjacency matrix
Description
Counts the number of edges in an adjacency matrix.
Usage
nedges(amat)
Arguments
amat |
An adjacency matrix |
Value
A numeric (non-negative integer).
Perform causal discovery using the PC algorithm
Description
This is a wrapper function for the pc function as
implemented in the pcalg package. All computations are carried out by the
pcalg package.
Usage
pc(
data = NULL,
sparsity = 10^(-1),
test = regTest,
suffStat = NULL,
method = "stable.fast",
methodNA = "none",
methodOri = "conservative",
output = "cpdag",
varnames = NULL,
conservative = TRUE,
...
)
Arguments
data |
A data.frame with data. All variables should be assigned to exactly one period by prefixing them with the period name (see example below). |
sparsity |
The sparsity level to be used for independence testing (i.e. significance level threshold to use for each test). |
test |
A procedure for testing conditional independence.
The default, |
suffStat |
Sufficient statistic. If this argument is supplied, the sufficient statistic is not computed from the inputted data. The format and contents of the sufficient statistic depends on which test is being used. |
method |
Which method to use for skeleton construction, must be
|
methodNA |
Method for handling missing information ( |
methodOri |
Method for handling conflicting separating sets when orienting
edges, must be one of |
output |
One of |
varnames |
A character vector of variable names. It only needs to be supplied
if the |
conservative |
Logital, if |
... |
Further optional arguments which are passed to
|
Details
Note that all independence test procedures implemented
in the pcalg package may be used, see pc.
The methods for handling missing information require that the data,
rather than the suffStat argument is used for inputting data; the latter
assumes no missing information and hence always sets methodNA = "none".
If the test is corTest, test-wise deletion is performed when computing the
sufficient statistic (correlation matrix) (so for each pair of variables, only
complete cases are used). If the test is regTest, test-wise deletion
is performed for each conditional independence test instead.
Value
A tpdag or tskeleton object. Both return types are
S3 objects, i.e., lists with entries: $amat (the estimated adjacency
matrix), $order (character vector with the order, as inputted to
this function), $psi (the significance level used for testing), and
$ntests (the number of tests conducted).
Examples
# PC on included example data, use sparsity psi = 0.01, default test (regression-based
#information loss):
data(tpcExample)
pc(tpcExample, sparsity = 0.01)
# PC on included example data, use sparsity psi = 0.01, use test for vanishing partial
# correlations:
data(tpcExample)
pc(tpcExample, sparsity = 0.01, test = corTest)
Plot partial ancestral graph (PAG)
Description
Plot partial ancestral graph (PAG)
Usage
## S3 method for class 'pag'
plot(x, ...)
Arguments
x |
pag object
to be plotted (as outputted from |
... |
Currently not in use. |
Value
No return value, the function is called for its side-effects (plotting).
Author(s)
This code is a modification of the fciAlgo plotting method implemented in the pcalg package.
Examples
# simulate linear Gaussian data w unobserved variable L1
n <- 100
L1 <- rnorm(n)
X1 <- rnorm(n)
X2 <- L1 + X1 + rnorm(n)
X3 <- X1 + rnorm(n)
X4 <- X3 + L1 + rnorm(n)
d <- data.frame(p1_X1 = X1,
p1_X2 = X2,
p2_X3 = X3,
p2_X4 = X4)
# use FCI algorithm to recover PAG
res <- fci(d, test = corTest)
# plot
plot(res)
Plot adjacency matrix with order information
Description
Plot adjacency matrix with order information
Usage
## S3 method for class 'tamat'
plot(x, ...)
Arguments
x |
tamat (temporal adjacency matrix) object to be plotted
(as outputted from |
... |
Further plotting arguments passed along to |
Value
No return value, the function is called for its side-effects (plotting).
Plot temporal partial ancestral graph (TPAG)
Description
Plot temporal partial ancestral graph (TPAG)
Usage
## S3 method for class 'tpag'
plot(x, ...)
Arguments
x |
tpag object
to be plotted (as outputted from |
... |
Currently not in use. |
Value
No return value, the function is called for its side-effects (plotting).
Author(s)
This code is a modification of the fciAlgo plotting method implemented in the pcalg package.
Examples
# simulate linear Gaussian data w unobserved variable L1
n <- 100
L1 <- rnorm(n)
X1 <- rnorm(n)
X2 <- L1 + X1 + rnorm(n)
X3 <- X1 + rnorm(n)
X4 <- X3 + L1 + rnorm(n)
d <- data.frame(p1_X1 = X1,
p1_X2 = X2,
p2_X3 = X3,
p2_X4 = X4)
# use FCI algorithm to recover PAG
res <- tfci(d, order = c("p1", "p2"), test = corTest)
# plot
plot(res)
Plot temporal partially directed acyclic graph (TPDAG)
Description
Plot temporal partially directed acyclic graph (TPDAG)
Usage
## S3 method for class 'tpdag'
plot(x, ...)
Arguments
x |
tpdag (temporal partially directed acyclic graph) object
to be plotted (as outputted from |
... |
Further plotting arguments passed along to |
Value
No return value, the function is called for its side-effects (plotting).
Plot temporal skeleton
Description
Plot temporal skeleton
Usage
## S3 method for class 'tskeleton'
plot(x, ...)
Arguments
x |
tskeleton (temporal skeleton) object to be plotted
(as outputted from |
... |
Further plotting arguments passed along to |
Value
No return value, the function is called for its side-effects (plotting).
Plot temporal data generating mechanism
Description
Plots tpdag, tskeleton and tamat objects.
Usage
plotTempoMech(
x,
addTimeAxis = TRUE,
addPsi = TRUE,
varLabels = NULL,
periodLabels = NULL,
colors = NULL,
...
)
Arguments
x |
The tpdag/tskeleton or tamat to plot. |
addTimeAxis |
Logical indicating whether a time axis should be added to the plot. |
addPsi |
Logical indicating whether the sparsity level should be added to the plot. |
varLabels |
A named list of variable labels. |
periodLabels |
A character vector with labels for periods. |
colors |
A character vector with colors to use for marking periods. Should have at least as many elements as the numbers of periods. |
... |
Additional arguments passed to |
Value
No return value, the function is called for its side-effects (plotting).
Precision
Description
Computes precision (aka positive predictive value) from a confusion matrix, see confusion. Precision is defined as TP/(TP + FP), where TP are true positives and FP are false positives. If TP + FP = 0, 0 is returned.
Usage
precision(confusion)
Arguments
confusion |
Confusion matrix as obtained from confusion |
Value
A numeric in [0,1].
Convert a matrix of probabilities into an adjacency matrix
Description
Convert a matrix of probabilities into an adjacency matrix
Usage
probmat2amat(
probmat,
threshold,
method = "cutoff",
keep_vnames = TRUE,
graph_criterion = "pdag",
deletesym = FALSE
)
Arguments
probmat |
Square matrix of probabilities. |
threshold |
Value between 0 and 1. Any probabilities lower than this value will be set to 0 (no arrowhead). |
method |
Either |
keep_vnames |
If |
graph_criterion |
Which criterion to check if the output graph fulfills for the bpco
method. Should be one of |
deletesym |
If |
Details
Two methods for converting the probability matrix into an adjacency
matrix are implemented. First, the cutoff-method (method = "cutoff") simply
uses a threshold value and sets all values below that to zero in the outputted
adjacency matrix. No checks are performed to ensure that the resulting
matrix is a proper dag/pdag/cpdag adjacency matrix. Second, the backwards
PC orientation method (method = "bpco") first uses a cutoff, and then
sets further elements to zero until the resulting matrix can be converted into
a proper adjacency matrix (using the graph criterion specified in the
graph_criterion argument) by applying the PC algorithm orientation rules.
See Petersen et al. 2022 for further details.
Value
A square matrix of probabilities (all entries in [0,1]).
References
Petersen, Anne Helby, et al. "Causal discovery for observational sciences using supervised machine learning." arXiv preprint arXiv:2202.12813 (2022).
Examples
#Make random probability matrix that can be
#converted into adjancency matrix
pmat <- matrix(runif(25, 0, 1), 5, 5)
diag(pmat) <- 0
#Convert to adjacency matrix using cutoff-method (threshold = 0.5)
probmat2amat(pmat, threshold = 0.5)
#Convert to adjacency matrix using BPCO-method (threshold = 0.5)
probmat2amat(pmat, threshold = 0.5, method = "bpco")
Recall
Description
Computes recall from a confusion matrix, see confusion. Recall is defined as TP/(TP + FN), where TP are true positives and FN are false negatives. If TP + FN = 0, 0 is returned.
Usage
recall(confusion)
Arguments
confusion |
Confusion matrix as obtained from confusion |
Value
A numeric in [0,1].
Regression-based information loss test
Description
We test whether x and y are associated, given
S using a generalized linear model.
Usage
regTest(x, y, S, suffStat)
Arguments
x |
Index of x variable |
y |
Index of y variable |
S |
Index of S variable(s), possibly NULL |
suffStat |
Sufficient statistic; list with data, binary variables and order. |
Details
All included variables should be either numeric or binary. If
y is binary, a logistic regression model is fitted. If y is numeric,
a linear regression model is fitted. x and S are included as
explanatory variables. Any numeric variables among x and S are
modeled with spline expansions (natural splines, 3 df). This model is tested
against a numeric where x (including a possible spline expansion) has
been left out using a likelihood ratio test.
The model is fitted in both directions (interchanging the roles
of x and y). The final p-value is the maximum of the two
obtained p-values.
Value
A numeric, which is the p-value of the test.
Structural hamming distance between adjacency matrices
Description
Computes the structural hamming distance between two adjacency matrices. This implementation
is a modification of the shd function from the pcalg package, but here we
avoid working on the heavy graphNEL objects for representing graphs that are used in the
pcalg package.
Usage
shd(est_amat, true_amat)
Arguments
est_amat |
Estimated adjacency matrix |
true_amat |
True adjacency matrix |
Details
Note that the function is symmetric in the two inputted adjacency matrices.
Value
A numeric (a non-negative integer).
Simulate a random DAG
Description
Simulates a random directed acyclic graph adjacency (DAG) matrix with the provided edge sparsity. The edge sparsity is the percentage of edges that are absent, relative to a fully connected DAG.
Usage
simDAG(p, sparsity = NULL, sparsityLim = c(0, 0.8), permute = TRUE)
Arguments
p |
The number of nodes. |
sparsity |
If |
sparsityLim |
A vector of two numerics, both must be in [0,1]. |
permute |
If |
Value
An adjacency matrix.
Examples
# Simulate a DAG adjacency matrix with 5 nodes
simDAG(5)
Simulate Gaussian data according to DAG
Description
Simulates a jointly Gaussian dataset given a DAG adjacency matrix ("from-to" encoding, see amat for details). The data is simulated using linear structural equations and the parameters (residual standard deviations and regression coefficients) are sampled from chosen intervals.
Usage
simGausFromDAG(
amat,
n,
regparLim = c(0.5, 2),
resSDLim = c(0.1, 1),
pnegRegpar = 0.4,
standardize = FALSE
)
Arguments
amat |
An adjacency matrix. |
n |
The number of observations that should be simulated. |
regparLim |
The interval from which regression parameters are sampled. |
resSDLim |
The interval from which residual standard deviations are sampled. |
pnegRegpar |
The probability of sampling a negative regression parameter. |
standardize |
If |
Details
A variable X_{i} is simulated as
X_{i} := \sum_{Z \in pa(X_{i})} \beta_{Z} * Z + e_{i}
where pa(X_{i}) are the parents of X_{i} in the DAG.
The residual, e_{i}, is drawn from a normal distribution.
Value
A data.frame of identically distributed simulated observations.
Examples
# Simulate DAG adjacency matrix with 6 nodes
ex_dag <- simDAG(6)
# Simulate Gaussian data (100 iid observations)
ex_data <- simGausFromDAG(ex_dag, n = 100)
Specificity
Description
Computes specificity from a confusion matrix, see confusion. Specificity is defined as TN/(TN + FP), where TN are true negatives and FP are false positives. If TN + FP = 0, 0 is returned.
Usage
specificity(confusion)
Arguments
confusion |
Confusion matrix as obtained from confusion |
Value
A numeric in [0,1].
Make a temporal adjacency matrix
Description
Make a temporal adjacency matrix
Usage
tamat(amat, order, type = NULL)
Arguments
amat |
Adjacency matrix. Row names and column names should be identical and be the names of the variables/nodes. Variable names should be prefixed with their period, e.g. "child_x" for variable "x" at period "child" |
order |
A character vector with the periods in their order. |
type |
The type of adjancency matrix, must be one of |
Value
A tamat object, which is a matrix with a "order"
attribute(a character vector listing the temporal order of the variables
in the adjacency matrix).
Perform causal discovery using the temporal FCI algorithm (TFCI)
Description
Use a modification of the FCI algorithm that makes use of background knowledge in the format of a partial ordering. This may for instance come about when variables can be assigned to distinct tiers or periods (i.e., a temporal ordering).
Usage
tfci(
data = NULL,
order,
sparsity = 10^(-1),
test = regTest,
suffStat = NULL,
method = "stable.fast",
methodNA = "none",
methodOri = "conservative",
varnames = NULL,
...
)
Arguments
data |
A data.frame with data. All variables should be assigned to exactly one period by prefixing them with the period name (see example below). |
order |
A character vector with period-prefixes in their temporal order (see example below). |
sparsity |
The sparsity level to be used for independence testing (i.e. significance level threshold to use for each test). |
test |
A procedure for testing conditional independence.
The default, |
suffStat |
Sufficient statistic. If this argument is supplied, the sufficient statistic is not computed from the inputted data. The format and contents of the sufficient statistic depends on which test is being used. |
method |
Which method to use for skeleton construction, must be
|
methodNA |
Method for handling missing information ( |
methodOri |
Method for handling conflicting separating sets when orienting
edges, must be one of |
varnames |
A character vector of variable names. It only needs to be supplied
if the |
... |
Further optional arguments which are passed to
|
Details
The temporal/tiered background information enters several places in the TFCI algorithm: 1) In the skeleton construction phase, when looking for separating sets Z between two variables X and Y, Z is not allowed to contain variables that are strictly after both X and Y in the temporal order. 2) This also applies to the subsequent phase where the algorithm searches for possible D-SEP sets. 3) Prior to other orientation steps, any cross-tier edges get an arrowhead placed at their latest node.
After this, the usual FCI orientation rules are applied, see udag2pag for details.
Value
The default output is a tpag object. This is an
S3 object, i.e., a list, with entries: $tamat (the estimated adjacency
matrix), $order (character vector with the order, as inputted to
this function), $psi (the significance level used for testing), and
$ntests (the number of tests conducted).
The adjacency matrix A has four possible entry values: 0 (no edge), 1 (circle),
2 (arrowhead), 3 (tail). It is encoded as a "to-from" adjacency matrix, which means
that e.g. A(i,j) = 1 places a circle in the directed from j to i. For example, if
A(i,j) = 1 and A(j,i) = 2, we have that i o-> j. Similarly, A(i,j) = 2 and A(j,i) = 3
mean that i <- j. Note that this is a transposed version of the adjacency
matrix outputted for fciAlgo objects from the pcalg package, which
is "to-from". But it is consistent with the "from-to" adjacency matrices used
for pcAlgo objects from the same package.
Author(s)
Anne Helby Petersen, Qixiang Chen, and Daniel Malinsky.
Examples
# simulate linear Gaussian data w unobserved variable L1
set.seed(123)
n <- 100
L1 <- rnorm(n)
X1 <- rnorm(n)
X2 <- L1 + X1 + rnorm(n)
X3 <- X1 + rnorm(n)
X4 <- X3 + L1 + rnorm(n)
d <- data.frame(p1_X1 = X1,
p1_X2 = X2,
p2_X3 = X3,
p2_X4 = X4)
# use tfci algorithm to recover tpag (conservative edge orientation)
tfci(d, test = corTest, order = c("p1", "p2"))
# use tfci with standard (non-conservative) method for edge orientation
tfci(d, test = corTest, order = c("p1", "p2"), methodOri = "standard")
Estimate the restricted Markov equivalence class using Temporal Greedy Equivalence Search
Description
Perform causal discovery using the temporal greedy equivalence search algorithm.
Usage
tges(score, verbose = FALSE)
Arguments
score |
tiered scoring object to be used. At the moment only scores supported re |
verbose |
indicates whether debug output should be printed. |
Value
tges returns a tamat object which is a matrix with a
"order" attribute (a character vector listing the temporal order of the variables in the adjacency matrix).
Author(s)
Tobias Ellegaard Larsen
See Also
Examples
#Simulate Gaussian data
set.seed(123)
n <- 500
child_x <- rnorm(n)
child_y <- 0.5*child_x + rnorm(n)
child_z <- 2*child_x + child_y + rnorm(n)
adult_x <- child_x + rnorm(n)
adult_z <- child_z + rnorm(n)
adult_w <- 2*adult_z + rnorm(n)
adult_y <- 2*child_x + adult_w + rnorm(n)
simdata <- data.frame(child_x, child_y, child_z,
adult_x, adult_z, adult_w,
adult_y)
# Define order in prefix way
prefix_order <- c("child", "adult")
# Define TBIC score
t_score <- new("TemporalBIC", order = prefix_order
, data = simdata)
# Run tges
tges_pre <- tges(t_score)
# Plot MPDAG
plot(tges_pre)
# Define order in integer way
integer_order <- c(1,1,1,2,2,2,2)
# Define TBIC score
t_score <- new("TemporalBIC", order = integer_order
, data = simdata)
# Run tges
tges_int <- tges(t_score)
# Plot MPDAG
plot(tges_int)
Perform causal discovery using the temporal PC algorithm (TPC)
Description
Perform causal discovery using the temporal PC algorithm (TPC)
Usage
tpc(
data = NULL,
order,
sparsity = 10^(-1),
test = regTest,
suffStat = NULL,
method = "stable.fast",
methodNA = "none",
methodOri = "conservative",
output = "tpdag",
varnames = NULL,
...
)
Arguments
data |
A data.frame with data. All variables should be assigned to exactly one period by prefixing them with the period name (see example below). |
order |
A character vector with period-prefixes in their temporal order (see example below). |
sparsity |
The sparsity level to be used for independence testing (i.e. significance level threshold to use for each test). |
test |
A procedure for testing conditional independence.
The default, |
suffStat |
Sufficient statistic. If this argument is supplied, the sufficient statistic is not computed from the inputted data. The format and contents of the sufficient statistic depends on which test is being used. |
method |
Which method to use for skeleton construction, must be
|
methodNA |
Method for handling missing information ( |
methodOri |
Method for handling conflicting separating sets when orienting edges. Currently only the conservative method is available. |
output |
One of |
varnames |
A character vector of variable names. It only needs to be supplied
if the |
... |
Further optional arguments which are passed to
|
Details
Note that all independence test procedures implemented
in the pcalg package may be used, see pc.
The methods for handling missing information require that the data,
rather than the suffStat argument is used for inputting data; the latter
assumes no missing information and hence always sets methodNA = "none".
If the test is corTest, test-wise deletion is performed when computing the
sufficient statistic (correlation matrix) (so for each pair of variables, only
complete cases are used). If the test is regTest, test-wise deletion
is performed for each conditional independence test instead.
Value
A tpdag or tskeleton object. Both return types are
S3 objects, i.e., lists with entries: $tamat (the estimated adjacency
matrix), $order (character vector with the order, as inputted to
this function), $psi (the significance level used for testing), and
$ntests (the number of tests conducted).
Examples
#TPC on included example data, use sparsity psi = 0.01, default test (regression-based
#information loss):
data(tpcExample)
tpc(tpcExample, order = c("child", "youth", "oldage"), sparsity = 0.01)
#TPC on included example data, use sparsity psi = 0.01, use test for vanishing partial
# correlations:
data(tpcExample)
tpc(tpcExample, order = c("child", "youth", "oldage"), sparsity = 0.01,
test = corTest)
#TPC on another simulated data set
#Simulate data
set.seed(123)
n <- 500
child_x <- rnorm(n)^2
child_y <- 0.5*child_x + rnorm(n)
child_z <- sample(c(0,1), n, replace = TRUE,
prob = c(0.3, 0.7))
adult_x <- child_x + rnorm(n)
adult_z <- as.numeric(child_z + rnorm(n) > 0)
adult_w <- 2*adult_z + rnorm(n)
adult_y <- 2*sqrt(child_x) + adult_w^2 + rnorm(n)
simdata <- data.frame(child_x, child_y, child_z,
adult_x, adult_z, adult_w,
adult_y)
#Define order
simorder <- c("child", "adult")
#Perform TPC with sparsity psi = 0.001
results <- tpc(simdata, order = simorder, sparsity = 10^(-3))
Simulated data example
Description
A small simulated data example intended to showcase the TPC algorithm. Note that the variable name prefixes defines with period they are related to ("child", "youth" or "oldage").
Usage
tpcExample
Format
A data.frame with 200 rows and 6 variables.
- child_x1
Structural equation:
X1 := \epsilon1with\epsilon1 ~ Unif{0,1}- child_x2
Structural equation:
X2 := 2 * X1 + \epsilon2with\epsilon2 ~ N(0,1)- youth_x3
Structural equation:
X3 := \epsilon3with\epsilon3 ~ Unif{0, 1}- youth_x4
Structural equation:
X4 := X2 + \epsilon4with\epsilon4 ~ N(0,1)- oldage_x5
Structural equation:
X5 := X3^2 + X3 - 3 * X2 + \epsilon5with\epsilon5 ~ N(0,1)- oldage_x6
Structural equation:
X6 := X4^3 + X4^2 + 2 * X5 + \epsilon6with\epsilon6 ~ N(0,1)
References
Petersen, AH; Osler, M and Ekstrøm, CT (2021): Data-Driven Model Building for Life-Course Epidemiology, American Journal of Epidemiology.
Examples
data(tpcExample)
Plot temporal graph via Latex
Description
Generates a plot of a tamat, tpdag or tpag object by use of Latex tikz. Note that a working Latex installation is required. Note also that this function is slower than typical R graphics options and may take some time to terminate.
Usage
tplot(
x,
filename = "causaldisco_tplot_temp",
keepfiles = FALSE,
bendedges = TRUE,
...
)
Arguments
x |
A tamat, tpdag, or tpag object as obtained from |
filename |
Name of files that will be used internally during the function's
runtime. This filename will be appended with both .rmd and .pdf.
Note that unless |
keepfiles |
If |
bendedges |
If |
... |
Additional argument passed to |
Details
The function renders Latex code using rmarkdown, which relies on a working installation of Latex. Afterwards, the resulting pdf graphic is loaded into R and displayed in a browser. If working in Rstudio it may be opened in the built-in viewer, depending on Rstudio global settings.