library(recipes)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'recipes'
## The following object is masked from 'package:stats':
##
## step
library(workflows)
library(bayesian)
## Loading required package: brms
## Loading required package: Rcpp
## Loading 'brms' package (version 2.15.5). Useful instructions
## can be found by typing help('brms'). A more detailed introduction
## to the package is available through vignette('brms_overview').
##
## Attaching package: 'brms'
## The following object is masked from 'package:stats':
##
## ar
## Loading required package: parsnip
As a simple example, we will model the seizure counts in epileptic patients to investigate whether the treatment (represented by variable Trt
) can reduce the seizure counts and whether the effect of the treatment varies with the baseline number of seizures a person had before treatment (variable Base
) and with the age of the person (variable Age)
. As we have multiple observations per person
, a group-level intercept is incorporated to account for the resulting dependency in the data. In a first step, we use the recipes
package to prepare (a recipe for) the epilepsy
data. This data set is shipped with the brms
package, which is automatically loaded by bayesian
.
epi_recipe <- epilepsy %>%
recipe() %>%
update_role(count, new_role = "outcome") %>%
update_role(Trt, Age, Base, patient, new_role = "predictor") %>%
add_role(patient, new_role = "group") %>%
step_normalize(Age, Base)
print(epi_recipe)
## Data Recipe
##
## Inputs:
##
## role #variables
## group 1
## outcome 1
## predictor 4
##
## 4 variables with undeclared roles
##
## Operations:
##
## Centering and scaling for Age, Base
Above, we not only define the roles of the relevant variables but also normalized the Age
and Base
predictors to facilitate model fitting later on. In the next step, we use bayesian
to set up a basic model structure.
epi_model <- bayesian(family = poisson()) %>%
set_engine("brms") %>%
set_mode("regression")
print(epi_model)
## Bayesian Model Specification (regression)
##
## Main Arguments:
## family = poisson()
##
## Computational engine: brms
The bayesian
function is the main function of the package to initialize a Bayesian model. We can set up a lot of the information directly within the function or update the information later on, via the update
method. For example, if we didn't specify the family initially or set it to something else that we now wanted to change, we could write:
epi_model <- epi_model %>%
update(family = poisson())
Next, we define a workflow via the workflows
package, by combining the above defined data processing recipe and the model plus the actual model formula to be passed to the brms
engine.
epi_workflow <- workflow() %>%
add_recipe(epi_recipe) %>%
add_model(
spec = epi_model,
formula = count ~ Trt + Base + Age + (1 | patient)
)
print(epi_workflow)
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: bayesian()
##
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 1 Recipe Step
##
## • step_normalize()
##
## ── Model ───────────────────────────────────────────────────────────────────────
## Bayesian Model Specification (regression)
##
## Main Arguments:
## family = poisson()
##
## Computational engine: brms
We are now ready to fit the model by calling the fit
method with the data set we want to train the model on.
epi_workflow_fit <- epi_workflow %>%
fit(data = epilepsy)
## Compiling Stan program...
## Start sampling
print(epi_workflow_fit)
## ══ Workflow [trained] ══════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: bayesian()
##
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 1 Recipe Step
##
## • step_normalize()
##
## ── Model ───────────────────────────────────────────────────────────────────────
## Family: poisson
## Links: mu = log
## Formula: count ~ Trt + Base + Age + (1 | patient)
## Data: .x3 (Number of observations: 236)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Group-Level Effects:
## ~patient (Number of levels: 59)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.58 0.07 0.46 0.73 1.01 761 1342
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 1.77 0.12 1.53 1.99 1.01 781 1307
## Trt1 -0.26 0.17 -0.57 0.08 1.00 824 1241
## Base 0.73 0.08 0.58 0.88 1.01 729 1569
## Age 0.09 0.09 -0.10 0.26 1.01 562 1232
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
We can use the workflow including the fitted model, for example, for convenient predictions with new data without having to worry about all the data reprocessing.
newdata <- epilepsy[1:5, ]
epi_workflow_fit %>% predict(new_data = newdata, type = "conf_int")
## # A tibble: 5 x 2
## .pred_lower .pred_upper
## <dbl> <dbl>
## 1 0 8
## 2 0 8
## 3 0 7
## 4 0 8
## 5 6.98 23