| Type: | Package | 
| Title: | Nonlinear Conditional Independence Tests | 
| Version: | 0.1.5 | 
| Date: | 2019-11-11 | 
| Author: | Christina Heinze-Deml <heinzedeml@stat.math.ethz.ch>, Jonas Peters <jonas.peters@math.ku.dk>, Asbjoern Marco Sinius Munk <fgp998@alumni.ku.dk> | 
| Depends: | R (≥ 3.1.0) | 
| Maintainer: | Christina Heinze-Deml <heinzedeml@stat.math.ethz.ch> | 
| Description: | Code for a variety of nonlinear conditional independence tests: Kernel conditional independence test (Zhang et al., UAI 2011, <doi:10.48550/arXiv.1202.3775>), Residual Prediction test (based on Shah and Buehlmann, <doi:10.48550/arXiv.1511.03334>), Invariant environment prediction, Invariant target prediction, Invariant residual distribution test, Invariant conditional quantile prediction (all from Heinze-Deml et al., <doi:10.48550/arXiv.1706.08576>). | 
| License: | GPL-2 | GPL-3 [expanded from: GPL] | 
| LazyData: | TRUE | 
| Imports: | methods, randomForest, quantregForest, lawstat, RPtests, caTools, mgcv, MASS, kernlab, pracma, mize | 
| URL: | https://github.com/christinaheinze/nonlinearICP-and-CondIndTests | 
| BugReports: | https://github.com/christinaheinze/nonlinearICP-and-CondIndTests/issues | 
| RoxygenNote: | 6.1.1 | 
| Suggests: | testthat | 
| NeedsCompilation: | no | 
| Packaged: | 2019-11-11 16:14:39 UTC; heinzec | 
| Repository: | CRAN | 
| Date/Publication: | 2019-11-12 06:50:21 UTC | 
Wrapper function for conditional independence tests.
Description
Tests the null hypothesis that Y and E are independent given X.
Usage
CondIndTest(Y, E, X, method = "KCI", alpha = 0.05,
  parsMethod = list(), verbose = FALSE)
Arguments
| Y | An n-dimensional vector or a matrix or dataframe with n rows and p columns. | 
| E | An n-dimensional vector or a matrix or dataframe with n rows and p columns. | 
| X | An n-dimensional vector or a matrix or dataframe with n rows and p columns. | 
| method | The conditional indepdence test to use, can be one of
 | 
| alpha | Significance level. Defaults to 0.05. | 
| parsMethod | Named list to pass options to  | 
| verbose | If  | 
Value
A list with the p-value of the test (pvalue) and possibly additional
entries, depending on the output of the chosen conditional independence test in method.
References
Please cite C. Heinze-Deml, J. Peters and N. Meinshausen: "Invariant Causal Prediction for Nonlinear Models", arXiv:1706.08576 and the corresponding reference for the conditional independence test.
Examples
# Example 1
set.seed(1)
n <- 100
Z <- rnorm(n)
X <- 4 + 2 * Z + rnorm(n)
Y <- 3 * X^2 + Z + rnorm(n)
test1 <- CondIndTest(X,Y,Z, method = "KCI")
cat("These data come from a distribution, for which X and Y are NOT
cond. ind. given Z.")
cat(paste("The p-value of the test is: ", test1$pvalue))
# Example 2
set.seed(1)
Z <- rnorm(n)
X <- 4 + 2 * Z + rnorm(n)
Y <- 3 + Z + rnorm(n)
test2 <- CondIndTest(X,Y,Z, method = "KCI")
cat("The data come from a distribution, for which X and Y are cond.
ind. given Z.")
cat(paste("The p-value of the test is: ", test2$pvalue))
Invariant conditional quantile prediction.
Description
Tests the null hypothesis that Y and E are independent given X.
Usage
InvariantConditionalQuantilePrediction(Y, E, X, alpha = 0.05,
  verbose = FALSE, test = fishersTestExceedance,
  mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL,
  quantiles = c(0.1, 0.5, 0.9), returnModel = FALSE)
Arguments
| Y | An n-dimensional vector. | 
| E | An n-dimensional vector. If  | 
| X | A matrix or dataframe with n rows and p columns. | 
| alpha | Significance level. Defaults to 0.05. | 
| verbose | If  | 
| test | Unconditional independence test that tests whether exceedence is
independent of E. Defaults to  | 
| mtry | Random forest parameter: Number of variables randomly sampled as
candidates at each split. Defaults to  | 
| ntree | Random forest parameter: Number of trees to grow. Defaults to 100. | 
| nodesize | Random forest parameter: Minimum size of terminal nodes. Defaults to 5. | 
| maxnodes | Random forest parameter: Maximum number of terminal nodes trees in the forest can have. Defaults to NULL. | 
| quantiles | Quantiles for which to test independence between exceedence and E.
Defaults to  | 
| returnModel | If  | 
Value
A list with the following entries:
-  pvalueThe p-value for the null hypothesis that Y and E are independent given X.
-  modelThe fitted quantile regression forest model ifreturnModel = TRUE.
Examples
# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantConditionalQuantilePrediction(Y, as.factor(E), X)
# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantConditionalQuantilePrediction(Y, as.factor(E), X)
Invariant environment prediction.
Description
Tests the null hypothesis that Y and E are independent given X.
Usage
InvariantEnvironmentPrediction(Y, E, X, alpha = 0.05, verbose = FALSE,
  trainTestSplitFunc = caTools::sample.split,
  argsTrainTestSplitFunc = list(Y = E, SplitRatio = 0.8),
  test = propTestTargetE, mtry = sqrt(NCOL(X)), ntree = 100,
  nodesize = 5, maxnodes = NULL, permute = TRUE,
  returnModel = FALSE)
Arguments
| Y | An n-dimensional vector. | 
| E | An n-dimensional vector. If  | 
| X | A matrix or dataframe with n rows and p columns. | 
| alpha | Significance level. Defaults to 0.05. | 
| verbose | If  | 
| trainTestSplitFunc | Function to split sample. Defaults to stratified sampling
using  | 
| argsTrainTestSplitFunc | Arguments for sampling splitting function. | 
| test | Unconditional independence test that tests whether the out-of-sample
prediction accuracy is the same when using X only vs. X and Y as predictors for E.
Defaults to  | 
| mtry | Random forest parameter: Number of variables randomly sampled as
candidates at each split.  Defaults to  | 
| ntree | Random forest parameter: Number of trees to grow. Defaults to 100. | 
| nodesize | Random forest parameter: Minimum size of terminal nodes. Defaults to 5. | 
| maxnodes | Random forest parameter: Maximum number of terminal nodes trees in the forest can have.
Defaults to  | 
| permute | Random forest parameter: If  | 
| returnModel | If  | 
Value
A list with the following entries:
-  pvalueThe p-value for the null hypothesis that Y and E are independent given X.
-  modelThe fitted models ifreturnModel = TRUE.
Examples
# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantEnvironmentPrediction(Y, as.factor(E), X)
# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantEnvironmentPrediction(Y, as.factor(E), X)
# Example 3
E <- rnorm(n)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantEnvironmentPrediction(Y, E, X, test = wilcoxTestTargetY)
InvariantEnvironmentPrediction(Y, X, E, test = wilcoxTestTargetY)
Invariant residual distribution test.
Description
Tests the null hypothesis that Y and E are independent given X.
Usage
InvariantResidualDistributionTest(Y, E, X, alpha = 0.05,
  verbose = FALSE, fitWithGam = TRUE,
  test = leveneAndWilcoxResidualDistributions, colNameNoSmooth = NULL,
  mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL,
  returnModel = FALSE)
Arguments
| Y | An n-dimensional vector. | 
| E | An n-dimensional vector. E needs to be a factor. | 
| X | A matrix or dataframe with n rows and p columns. | 
| alpha | Significance level. Defaults to 0.05. | 
| verbose | If  | 
| fitWithGam | If  | 
| test | Unconditional independence test that tests whether residual distribution is
invariant across different levels of E. Defaults to  | 
| colNameNoSmooth | Gam parameter: Name of variables that should enter linearly into the model.
Defaults to  | 
| mtry | Random forest parameter: Number of variables randomly sampled as
candidates at each split. Defaults to  | 
| ntree | Random forest parameter: Number of trees to grow. Defaults to 100. | 
| nodesize | Random forest parameter: Minimum size of terminal nodes. Defaults to 5. | 
| maxnodes | Random forest parameter: Maximum number of terminal nodes trees in the forest can have.
Defaults to  | 
| returnModel | If  | 
Value
A list with the following entries:
-  pvalueThe p-value for the null hypothesis that Y and E are independent given X.
-  modelThe fitted model ifreturnModel = TRUE.
Examples
# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantResidualDistributionTest(Y, as.factor(E), X)
InvariantResidualDistributionTest(Y, as.factor(E), X, test = ksResidualDistributions)
# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantResidualDistributionTest(Y, as.factor(E), X)
InvariantResidualDistributionTest(Y, as.factor(E), X, test = ksResidualDistributions)
Invariant target prediction.
Description
Tests the null hypothesis that Y and E are independent given X.
Usage
InvariantTargetPrediction(Y, E, X, alpha = 0.05, verbose = FALSE,
  fitWithGam = TRUE, trainTestSplitFunc = caTools::sample.split,
  argsTrainTestSplitFunc = NULL, test = fTestTargetY,
  colNameNoSmooth = NULL, mtry = sqrt(NCOL(X)), ntree = 100,
  nodesize = 5, maxnodes = NULL, permute = TRUE,
  returnModel = FALSE)
Arguments
| Y | An n-dimensional vector. | 
| E | An n-dimensional vector or an nxq dimensional matrix or dataframe. | 
| X | A matrix or dataframe with n rows and p columns. | 
| alpha | Significance level. Defaults to 0.05. | 
| verbose | If  | 
| fitWithGam | If  | 
| trainTestSplitFunc | Function to split sample. Defaults to stratified sampling
using  | 
| argsTrainTestSplitFunc | Arguments for sampling splitting function. | 
| test | Unconditional independence test that tests whether the out-of-sample
prediction accuracy is the same when using X only vs. X and E as predictors for Y.
Defaults to  | 
| colNameNoSmooth | Gam parameter: Name of variables that should enter linearly into the model.
Defaults to  | 
| mtry | Random forest parameter: Number of variables randomly sampled as
candidates at each split. Defaults to  | 
| ntree | Random forest parameter: Number of trees to grow. Defaults to 100. | 
| nodesize | Random forest parameter: Minimum size of terminal nodes. Defaults to 5. | 
| maxnodes | Random forest parameter: Maximum number of terminal nodes trees in the forest can have. Defaults to NULL. | 
| permute | Random forest parameter: If  | 
| returnModel | If  | 
Value
A list with the following entries:
-  pvalueThe p-value for the null hypothesis that Y and E are independent given X.
-  modelThe fitted models ifreturnModel = TRUE.
Examples
# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantTargetPrediction(Y, as.factor(E), X)
InvariantTargetPrediction(Y, as.factor(E), X, test = wilcoxTestTargetY)
# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantTargetPrediction(Y, as.factor(E), X)
InvariantTargetPrediction(Y, as.factor(E), X, test = wilcoxTestTargetY)
# Example 3
E <- rnorm(n)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantTargetPrediction(Y, E, X)
InvariantTargetPrediction(Y, X, E)
InvariantTargetPrediction(Y, E, X, test = wilcoxTestTargetY)
InvariantTargetPrediction(Y, X, E, test = wilcoxTestTargetY)
Kernel conditional independence test.
Description
Tests the null hypothesis that Y and E are independent given X. The distribution of the test statistic under the null hypothesis equals an infinite weighted sum of chi squared variables. This distribution can either be approximated by a gamma distribution or by a Monte Carlo approach. This version includes an implementation of choosing the hyperparameters by Gaussian Process regression.
Usage
KCI(Y, E, X, width = 0, alpha = 0.05, unbiased = FALSE,
  gammaApprox = TRUE, GP = TRUE, nRepBs = 5000, lambda = 0.001,
  thresh = 1e-05, numEig = NROW(Y), verbose = FALSE)
Arguments
| Y | A vector of length n or a matrix or dataframe with n rows and p columns. | 
| E | A vector of length n or a matrix or dataframe with n rows and p columns. | 
| X | A matrix or dataframe with n rows and p columns. | 
| width | Kernel width; if it is set to zero, the width is chosen automatically (default: 0). | 
| alpha | Significance level (default: 0.05). | 
| unbiased | A boolean variable that indicates whether a bias correction should be applied (default: FALSE). | 
| gammaApprox | A boolean variable that indicates whether the null distribution is approximated by a Gamma distribution. If it is FALSE, a Monte Carlo approach is used (default: TRUE). | 
| GP | Flag whether to use Gaussian Process regression to choose the hyperparameters | 
| nRepBs | Number of draws for the Monte Carlo approach (default: 500). | 
| lambda | Regularization parameter (default: 1e-03). | 
| thresh | Threshold for eigenvalues. Whenever eigenvalues are computed, they are set to zero if they are smaller than thresh times the maximum eigenvalue (default: 1e-05). | 
| numEig | Number of eigenvalues computed (only relevant for computing the distribution under the hypothesis of conditional independence) (default: length(Y)). | 
| verbose | If  | 
Value
A list with the following entries:
-  testStatisticthe statistic Tr(K_(ddot(Y)|X) * K_(E|X))
-  criticalValuethe critical point at the p-value equal to alpha; obtained by a Monte Carlo approach ifgammaApprox = FALSE, otherwise obtained by Gamma approximation.
-  pvalueThe p-value for the null hypothesis that Y and E are independent given X. It is obtained by a Monte Carlo approach ifgammaApprox = FALSE, otherwise obtained by Gamma approximation.
Examples
# Example 1
n <- 100
E <- rnorm(n)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
KCI(Y, E, X)
KCI(Y, X, E)
Residual prediction test.
Description
Tests the null hypothesis that Y and E are independent given X.
Usage
ResidualPredictionTest(Y, E, X, alpha = 0.05, verbose = FALSE,
  degree = 4, basis = c("nystrom", "nystrom_poly", "fourier",
  "polynomial", "provided")[1], resid_type = "OLS", XBasis = NULL,
  noiseMat = NULL, getnoiseFct = function(n, ...) {     rnorm(n) },
  argsGetNoiseFct = NULL, nSim = 100, funcOfRes = function(x) {    
  abs(x) }, useX = TRUE, returnXBasis = FALSE,
  nSub = ceiling(NROW(X)/4), ntree = 100, nodesize = 5,
  maxnodes = NULL)
Arguments
| Y | An n-dimensional vector. | 
| E | An n-dimensional vector or an nxq dimensional matrix or dataframe. | 
| X | A matrix or dataframe with n rows and p columns. | 
| alpha | Significance level. Defaults to 0.05. | 
| verbose | If  | 
| degree | Degree of polynomial to use if   | 
| basis | Can be one of   | 
| resid_type | Can be   | 
| XBasis | Basis if   | 
| noiseMat | Matrix with simulated noise. Defaults to NULL in which case the simulation is performed inside the function. | 
| getnoiseFct | Function to use to generate the noise matrix. Defaults to  | 
| argsGetNoiseFct | Arguments for  | 
| nSim | Number of simulations to use. Defaults to 100. | 
| funcOfRes | Function of residuals to use in addition to predicting the
conditional mean. Defaults to  | 
| useX | Set to  | 
| returnXBasis | Set to  | 
| nSub | Number of random features to use if   | 
| ntree | Random forest parameter: Number of trees to grow. Defaults to 500. | 
| nodesize | Random forest parameter: Minimum size of terminal nodes. Defaults to 5. | 
| maxnodes | Random forest parameter: Maximum number of terminal nodes trees in the forest can have. Defaults to NULL. | 
Value
A list with the following entries:
-  pvalueThe p-value for the null hypothesis that Y and E are independent given X.
-  XBasisBasis expansion ifreturnXBasiswas set toTRUE.
-  fctBasisExpansionFunction used to create basis expansion if basis is not"provided".
Examples
# Example 1
n <- 100
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
ResidualPredictionTest(Y, as.factor(E), X)
# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
ResidualPredictionTest(Y, as.factor(E), X)
# not run:
# # Example 3
# E <- rnorm(n)
# X <- 4 + 2 * E + rnorm(n)
# Y <- 3 * (X)^2 + rnorm(n)
# ResidualPredictionTest(Y, E, X)
# ResidualPredictionTest(Y, X, E)
F-test for a nested model comparison.
Description
Used as a subroutine in InvariantTargetPrediction to test
whether out-of-sample prediction performance is better when using X and E as predictors for Y,
compared to using X only.
Usage
fTestTargetY(Y, predictedOnlyX, predictedXE, verbose, ...)
Arguments
| Y | An n-dimensional vector. | 
| predictedOnlyX | Predictions for Y based on predictors in X only. | 
| predictedXE | Predictions for Y based on predictors in X and E. | 
| verbose | Set to  | 
| ... | The dimensions of X (df) and E (dimE) need to be passed via the ... argument to allow for coherent interface of fTestTargetY and wilcoxTestTargetY. | 
Value
A list with the p-value for the test.
Fishers test to test whether the exceedance of the conditional quantiles is independent of the categorical variable E.
Description
Used as a subroutine in InvariantConditionalQuantilePrediction
to test whether the exceedance of the conditional quantiles
is independent of the categorical variable E.
Usage
fishersTestExceedance(Y, predicted, E, verbose)
Arguments
| Y | An n-dimensional vector. | 
| predicted | A matrix with n rows. The columns contain predictions for different conditional quantiles of Y|X. | 
| E | An n-dimensional vector.  | 
| verbose | Set to  | 
Value
A list with the p-value for the test.
Kolmogorov-Smirnov test to compare residual distributions
Description
Used as a subroutine in InvariantResidualDistributionTest
to test whether residual distribution remains invariant across different levels
of E.
Usage
ksResidualDistributions(Y, predicted, E, verbose)
Arguments
| Y | An n-dimensional vector. | 
| predicted | An n-dimensional vector of predictions for Y. | 
| E | An n-dimensional vector.  | 
| verbose | Set to  | 
Value
A list with the p-value for the test.
Levene and wilcoxon test to compare first and second moments of residual distributions
Description
Used as a subroutine in InvariantResidualDistributionTest
to test whether residual distribution remains invariant across different levels
of E.
Usage
leveneAndWilcoxResidualDistributions(Y, predicted, E, verbose)
Arguments
| Y | An n-dimensional vector. | 
| predicted | An n-dimensional vector of predictions for Y. | 
| E | An n-dimensional vector.  | 
| verbose | Set to  | 
Value
A list with the p-value for the test.
Proportion test to compare two misclassification rates.
Description
Used as a subroutine in InvariantEnvironmentPrediction to test
whether out-of-sample performance is better when using X and Y as predictors for E,
compared to using X only.
Usage
propTestTargetE(E, predictedOnlyX, predictedXY, verbose)
Arguments
| E | An n-dimensional vector. | 
| predictedOnlyX | Predictions for E based on predictors in X only. | 
| predictedXY | Predictions for E based on predictors in X and Y. | 
| verbose | Set to  | 
Value
A list with the p-value for the test.
Wilcoxon test to compare two mean squared error rates.
Description
Used as a subroutine in InvariantTargetPrediction to test
whether out-of-sample performance is better when using X and E as predictors for Y,
compared to using X only.
Usage
wilcoxTestTargetY(Y, predictedOnlyX, predictedXE, verbose, ...)
Arguments
| Y | An n-dimensional vector. | 
| predictedOnlyX | Predictions for Y based on predictors in X only. | 
| predictedXE | Predictions for Y based on predictors in X and E. | 
| verbose | Set to  | 
| ... | Argument to allow for coherent interface of fTestTargetY and wilcoxTestTargetY. | 
Value
A list with the p-value for the test.