The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Quick Start with CRAM

Introduction

The Cram package provides a unified framework for:

This vignette walks through these three core modules.


Cram User file

For reproducible use cases, see the example script provided in the Cram GitHub repository:

View user_cram.R on GitHub


1. cram_policy() — Binary Policy Learning & Evaluation

Generate Simulated Data

generate_data <- function(n) {
  X <- data.table(
    binary = rbinom(n, 1, 0.5),
    discrete = sample(1:5, n, replace = TRUE),
    continuous = rnorm(n)
  )
  D <- rbinom(n, 1, 0.5)
  treatment_effect <- ifelse(X$binary == 1 & X$discrete <= 2, 1,
                       ifelse(X$binary == 0 & X$discrete >= 4, -1, 0.1))
  Y <- D * (treatment_effect + rnorm(n)) + (1 - D) * rnorm(n)
  list(X = X, D = D, Y = Y)
}

set.seed(123)
data <- generate_data(1000)
X <- data$X; D <- data$D; Y <- data$Y

Run cram_policy() with causal forest

res <- cram_policy(
  X, D, Y,
  batch = 20,
  model_type = "causal_forest",
  learner_type = NULL,
  baseline_policy = as.list(rep(0, nrow(X))),
  alpha = 0.05
)
print(res)
#> $raw_results
#>                        Metric   Value
#> 1              Delta Estimate 0.23208
#> 2        Delta Standard Error 0.05862
#> 3              Delta CI Lower 0.11718
#> 4              Delta CI Upper 0.34697
#> 5       Policy Value Estimate 0.21751
#> 6 Policy Value Standard Error 0.05237
#> 7       Policy Value CI Lower 0.11486
#> 8       Policy Value CI Upper 0.32016
#> 9          Proportion Treated 0.60500
#> 
#> $interactive_table
#> 
#> $final_policy_model
#> GRF forest object of type causal_forest 
#> Number of trees: 100 
#> Number of training samples: 1000 
#> Variable importance: 
#>     1     2     3 
#> 0.437 0.350 0.213

Case of categorical target Y

Use caret and choose a classification method outputting probabilities i.e. using the key word classProbs = TRUE in trainControl, see the following as an example with a Random Forest Classifier:

model_params <- list(formula = Y ~ ., caret_params = list(method = "rf", trControl = trainControl(method = "none", classProbs = TRUE)))

Also note that all data inputs needs to be of numeric types, hence for Y categorical, it should contain numeric values representing the class of each observation. No need to use the type factor for cram_policy().

Custom Models with cram_policy()

Set model_params to NULL and specify custom_fit and custom_predict.

custom_fit <- function(X, Y, D, n_folds = 5) {
  treated <- which(D == 1); control <- which(D == 0)
  m1 <- cv.glmnet(as.matrix(X[treated, ]), Y[treated], alpha = 0, nfolds = n_folds)
  m0 <- cv.glmnet(as.matrix(X[control, ]), Y[control], alpha = 0, nfolds = n_folds)
  tau1 <- predict(m1, as.matrix(X[control, ]), s = "lambda.min") - Y[control]
  tau0 <- Y[treated] - predict(m0, as.matrix(X[treated, ]), s = "lambda.min")
  tau <- c(tau0, tau1); X_all <- rbind(X[treated, ], X[control, ])
  final_model <- cv.glmnet(as.matrix(X_all), tau, alpha = 0)
  final_model
}

custom_predict <- function(model, X, D) {
  as.numeric(predict(model, as.matrix(X), s = "lambda.min") > 0)
}

res <- cram_policy(
  X, D, Y,
  batch = 20,
  model_type = NULL,
  custom_fit = custom_fit,
  custom_predict = custom_predict
)
print(res)
#> $raw_results
#>                        Metric   Value
#> 1              Delta Estimate 0.22542
#> 2        Delta Standard Error 0.06004
#> 3              Delta CI Lower 0.10774
#> 4              Delta CI Upper 0.34310
#> 5       Policy Value Estimate 0.21085
#> 6 Policy Value Standard Error 0.04280
#> 7       Policy Value CI Lower 0.12696
#> 8       Policy Value CI Upper 0.29475
#> 9          Proportion Treated 0.54500
#> 
#> $interactive_table
#> 
#> $final_policy_model
#> 
#> Call:  cv.glmnet(x = as.matrix(X_all), y = tau, alpha = 0) 
#> 
#> Measure: Mean-Squared Error 
#> 
#>     Lambda Index Measure      SE Nonzero
#> min 0.0395   100  0.9194 0.02589       3
#> 1se 0.4872    73  0.9423 0.03002       3

2. cram_ml() — ML Learning & Evaluation

Regression with cram_ml()

Specify formula and caret_paramsconforming to the popular caret::train() and set an individual loss under loss_name.

set.seed(42)
data_df <- data.frame(
  x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100), Y = rnorm(100)
)

caret_params <- list(
  method = "lm",
  trControl = trainControl(method = "none")
)

res <- cram_ml(
  data = data_df,
  formula = Y ~ .,
  batch = 5,
  loss_name = "se",
  caret_params = caret_params
)
print(res)
#> $raw_results
#>                         Metric    Value
#> 1       Expected Loss Estimate  0.86429
#> 2 Expected Loss Standard Error  0.73665
#> 3       Expected Loss CI Lower -0.57952
#> 4       Expected Loss CI Upper  2.30809
#> 
#> $interactive_table
#> 
#> $final_ml_model
#> Linear Regression 
#> 
#> 100 samples
#>   3 predictor
#> 
#> No pre-processing
#> Resampling: None

Classification with cram_ml()

All data inputs needs to be of numeric types, hence for Y categorical, it should contain numeric values representing the class of each observation. No need to use the type factor for cram_ml().

Case 1: Predicting Class labels

In this case, the model outputs hard predictions (labels, e.g. 0, 1, 2 etc.), and the metric used is classification accuracy—the proportion of correctly predicted labels.

  • Use loss_name = "accuracy"
  • Set classProbs = FALSE in trainControl
  • Set classify = TRUE in cram_ml()
set.seed(42)

# Generate binary classification dataset
X_data <- data.frame(x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100))
Y_data <- rbinom(nrow(X_data), 1, 0.5)
data_df <- data.frame(X_data, Y = Y_data)

# Define caret parameters: predict labels (default behavior)
caret_params_rf <- list(
  method = "rf",
  trControl = trainControl(method = "none")
)

# Run CRAM ML with accuracy as loss
result <- cram_ml(
  data = data_df,
  formula = Y ~ .,
  batch = 5,
  loss_name = "accuracy",
  caret_params = caret_params_rf,
  classify = TRUE
)

print(result)
#> $raw_results
#>                         Metric    Value
#> 1       Expected Loss Estimate  0.48750
#> 2 Expected Loss Standard Error  0.43071
#> 3       Expected Loss CI Lower -0.35668
#> 4       Expected Loss CI Upper  1.33168
#> 
#> $interactive_table
#> 
#> $final_ml_model
#> Random Forest 
#> 
#> 100 samples
#>   3 predictor
#>   2 classes: 'class0', 'class1' 
#> 
#> No pre-processing
#> Resampling: None

Case 2: Predicting Class Probabilities

In this setup, the model outputs class probabilities, and the loss is evaluated using logarithmic loss (logloss)—a standard metric for probabilistic classification.

  • Use loss_name = "logloss"
  • Set classProbs = TRUE in trainControl
  • Set classify = TRUE in cram_ml()
set.seed(42)

# Generate binary classification dataset
X_data <- data.frame(x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100))
Y_data <- rbinom(nrow(X_data), 1, 0.5)
data_df <- data.frame(X_data, Y = Y_data)

# Define caret parameters for probability output
caret_params_rf_probs <- list(
  method = "rf",
  trControl = trainControl(method = "none", classProbs = TRUE)
)

# Run CRAM ML with logloss as the evaluation loss
result <- cram_ml(
  data = data_df,
  formula = Y ~ .,
  batch = 5,
  loss_name = "logloss",
  caret_params = caret_params_rf_probs,
  classify = TRUE
)

print(result)
#> $raw_results
#>                         Metric    Value
#> 1       Expected Loss Estimate  0.93225
#> 2 Expected Loss Standard Error  0.48118
#> 3       Expected Loss CI Lower -0.01085
#> 4       Expected Loss CI Upper  1.87534
#> 
#> $interactive_table
#> 
#> $final_ml_model
#> Random Forest 
#> 
#> 100 samples
#>   3 predictor
#>   2 classes: 'class0', 'class1' 
#> 
#> No pre-processing
#> Resampling: None

In addition to using built-in learners via caret, cram_ml() also supports fully custom model workflows. You can specify your own:

  • Model fitting function (custom_fit)
  • Prediction function (custom_predict)
  • Loss function (custom_loss)

See the vignette “Cram ML” for more details.


3. cram_bandit() — Contextual Bandits for On-policy Statistical Evaluation

Specify:

set.seed(42)
T <- 100; K <- 4
pi <- array(runif(T * T * K, 0.1, 1), dim = c(T, T, K))
for (t in 1:T) for (j in 1:T) pi[j, t, ] <- pi[j, t, ] / sum(pi[j, t, ])
arm <- sample(1:K, T, replace = TRUE)
reward <- rnorm(T, 1, 0.5)

res <- cram_bandit(pi, arm, reward, batch=1, alpha=0.05)
print(res)
#> $raw_results
#>                        Metric   Value
#> 1       Policy Value Estimate 0.67621
#> 2 Policy Value Standard Error 0.04394
#> 3       Policy Value CI Lower 0.59008
#> 4       Policy Value CI Upper 0.76234
#> 
#> $interactive_table

Summary

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.