CRE

Installation

Installing the latest developing version.

library(devtools)
install_github("NSAPH-Software/CRE", ref="develop")
library("CRE")

Installing from CRAN.

install.packages("CRE")

Usage

Input parameters:

y The observed response (outcome) vector.
z The treatment (exposure, policy) vector.
X The covariate matrix.
method_params Parameters for estimating the individual treatment effect, including:
Parameters for Discovery
- ratio_dis The ratio of data delegated to the discovery sub-sample.
- ite_method_dis The method to estimate the discovery sub-sample in estimating individual treatment effect (ITE).
- include_ps_dis Whether or not to include propensity score estimate as a covariate in discovery ITE estimation, considered only for BART, or CF.
- ps_method_dis The estimation model for the propensity score on the discovery sub-sample.
- or_method_dis The estimation model for the outcome regressions estimate_ite_aipw on the discovery sub-sample.
Parameters for Inference
- ite_method_inf The method to estimate the inference sample ITE.
- include_ps_inf Whether or not to include propensity score estimate as a covariate in inference ITE estimation, considered only for BART, or CF.
- ps_method_inf The estimation model for the propensity score on the inference subsample.
- or_method_inf The estimation model for the outcome regressions in estimate_ite_aipw on the inference subsample.
Other Parameters - include_offset Whether or not to include an offset when estimating the ITE, for Poisson only.
- offset_name The name of the offset, if it is to be included.
- cate_method The method to estimate the conditional average treatment effect (CATE) values.
- cate_SL_library The library used if cate_method is set to DRLearner.
- filter_cate Whether or not to filter rules with p-value <= 0.05.
hyper_params The list of parameters required to tune the functions, including:
- effect_modifiers Effect Modifiers for Rules Generation.
- ntrees_rf The number of decision trees for randomForest.
- ntrees_gbm The number of decision trees for gradient boosting.
- node_size The minimum size of the trees’ terminal nodes.
- max_nodes The maximum number of terminal nodes trees in the forest can have.
- max_depth The number of top levels from each tree considered to extract conditions.
- replace Boolean variable for replacement in bootstrapping.
- max_decay Decay Threshold for pruning the rules.
- type_decay Decay Type for pruning the rules (1: relative error; 2: error).
- t_anom The threshold to define too generic or too specific (anomalous) rules.
- t_corr The threshold to define correlated rules.
- q Number of (unique) selected rules per subsample in stability selection.
- stability_selection Whether or not using stability selection for selecting the causal rules.
- pfer_val The Per-Family Error Rate, the expected number of false discoveries.

A note on the parameters

(1) Options for the ITE estimation are as follows:

(2) The include_ps_dis and include_ps_inf arguments will only be considered if the ITE method selected is bart, cf.

The CRE package can generate synthetic data that can be used to test different features of the package. At the current implementation, the code can generate data with continuous or binary outcomes.

  set.seed(9687)
  dataset_cont <- generate_cre_dataset(n = 300, rho = 0, n_rules = 2, p = 10,
                                       effect_size = 2, binary = FALSE)
  y <- dataset_cont[["y"]]
  z <- dataset_cont[["z"]]
  X <- as.data.frame(dataset_cont[["X"]])
  X_names <- names(as.data.frame(X))

  method_params = list(ratio_dis = 0.25,
                       ite_method_dis="bart",
                       ps_method_dis = "SL.xgboost",
                       oreg_method_dis = "SL.xgboost",
                       include_ps_dis = TRUE,
                       ite_method_inf = "bart",
                       ps_method_inf = "SL.xgboost",
                       oreg_method_inf = "SL.xgboost",
                       include_ps_inf = TRUE,
                       include_offset = FALSE,
                       cate_method = "DRLearner",
                       cate_SL_library = "SL.xgboost",
                       filter_cate = FALSE,
                       offset_name = NA,
                       random_state = 3591)

 hyper_params = list(ntrees_rf = 100,
                     ntrees_gbm = 50,
                     node_size = 20,
                     max_nodes = 5,
                     max_depth = 15,
                     max_decay = 0.025,
                     type_decay = 2,
                     t_anom = 0.025,
                     t_corr = 1,
                     replace = FALSE,
                     q = 0.8,
                     stability_selection = TRUE,
                     pfer_val = 0.1)

cre_results <- cre(y, z, X, method_params, hyper_params)