Prediction using BCF

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

In this vignette, we show how to use the bcf package to fit a model and use the fitted object to predict estimates for new data.

We use the same data generating process as in the “Simple Example” vignette:

set.seed(1)

## Training data
p <- 3 # two control variables and one effect moderator
n <- 1000
n_burn <- 2000
n_sim <- 1500

x <- matrix(rnorm(n*(p-1)), nrow=n)
x <- cbind(x, x[,2] + rnorm(n))
weights <- abs(rnorm(n))

# create targeted selection, whereby a practice's likelihood of joining the intervention (pi) 
# is related to their expected outcome (mu)
mu <- -1*(x[,1]>(x[,2])) + 1*(x[,1]<(x[,2])) - 0.1

# generate treatment variable
pi <- pnorm(mu)
z <- rbinom(n,1,pi)

# tau is the true treatment effect. It varies across practices as a function of
# X3 and X2
tau <- 1/(1 + exp(-x[,3])) + x[,2]/10

# generate the response using q, tau and z
y_noiseless <- mu + tau*z

# set the noise level relative to the expected mean function of Y
sigma <- diff(range(mu + tau*pi))/8

# draw the response variable with additive error
y <- y_noiseless + sigma*rnorm(n)/sqrt(weights)

# Fitting the model
bcf_out <- bcf(y               = y,
               z               = z,
               x_control       = x,
               x_moderate      = x,
               pihat           = pi,
               nburn           = n_burn,
               nsim            = n_sim,
               w               = weights,
               n_chains        = 2,
               update_interval = 1)

Predicting using BCF

We make predictions at 10 new observations, including some extreme values:

set.seed(1)
n_test = 10
x_test <- matrix(rnorm(n_test*(p-1), 0, 2), nrow=n_test) # sd of 2 makes x_test more dispersed than x
x_test <- cbind(x_test, x_test[,2] + rnorm(n_test))
mu_pred  <- -1*(x_test[,1]>(x_test[,2])) + 1*(x_test[,1]<(x_test[,2])) - 0.1
pi_pred <- pnorm(mu_pred)
z_pred  <- rbinom(n_test,1, pi_pred)

We now predict \(y\) and estimate treatment effects \(\tau\) for these new observations based on our fitted model.

pred_out = predict(object=bcf_out,
                   x_predict_control=x_test,
                   x_predict_moderate=x_test,
                   pi_pred=pi_pred,
                   z_pred=z_pred,
                   n_cores = 1,
                   save_tree_directory = '.')
#> Initializing BCF Prediction
#> Starting Prediction 
#> Starting to Predict Chain  1 
#> Loading...
#> ntrees 50
#> p 3
#> ndraws 1500
#> done
#> Loading...
#> ntrees 200
#> p 4
#> ndraws 1500
#> done
#> Starting to Predict Chain  2 
#> Loading...
#> ntrees 50
#> p 3
#> ndraws 1500
#> done
#> Loading...
#> ntrees 200
#> p 4
#> ndraws 1500
#> done

Comparison

Let’s compare the results for our training and testing data. We will show the estimated treatment effects for training and test observations as a function of \(x_3\), which is an effect modifier.

tau_ests_preds <- data.frame(x     = c(x[,3], x_test[,3]),
                             Mean  = c(colMeans(bcf_out$tau), 
                                       colMeans(pred_out$tau)),
                             Low95 = c(apply(bcf_out$tau, 2, quantile, 0.025), 
                                       apply(pred_out$tau, 2, quantile, 0.025)),
                             Up95  = c(apply(bcf_out$tau, 2, quantile, 0.975), 
                                       apply(pred_out$tau, 2, quantile, 0.975)),
                             group = factor(c(rep("training", n), rep("testing", n_test))),
                             agroup = c(rep(0.2, n), rep(1, n_test)))
ggplot(tau_ests_preds, aes(x, Mean, color = group)) +
  geom_pointrange(aes(ymin = Low95, ymax = Up95), alpha = tau_ests_preds$agroup) +
  xlab(TeX("$x_3$")) +
  ylab(TeX("$\\hat{\\tau}$"))

Note that the credible intervals get wider as the \(x_3\) values get closer to the end of the range, as we would hope.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.