The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

policy_data

library(polle)

This vignette is a guide to policy_data(). As the name suggests, the function creates a policy_data object with a specific data structure making it easy to use in combination with policy_def(), policy_learn(), and policy_eval(). The vignette is also a guide to some of the associated S3 functions which transform or access parts of the data, see ?policy_data and methods(class="policy_data").

We will start by looking at a simple single-stage example, then consider a fixed two-stage example with varying actions sets and data in wide format, and finally we will look at an example with a stochastic number of stages and data in long format.

Single-stage: wide data

Consider a simple single-stage problem with covariates/state variables \((Z, L, B)\), binary action variable \(A\), and utility outcome \(U\). We use sim_single_stage() to simulate data:

(d <- sim_single_stage(n = 5e2, seed=1)) |> head()
#>            Z          L B A          U
#> 1  1.2879704 -1.4795962 0 1 -0.9337648
#> 2  1.6184181  1.2966436 0 1  6.7506026
#> 3  1.2710352 -1.0431352 0 1 -0.3377580
#> 4 -0.2157605  0.1198224 1 0  1.4993427
#> 5 -1.0671588 -1.3663727 0 1 -9.1718727
#> 6 -1.4469746 -0.4018530 0 0 -2.6692961

We give instructions to policy_data() which variables define the action, the state covariates, and the utility variable:

pd <- policy_data(d, action="A", covariates=list("Z", "B", "L"), utility="U")
pd
#> Policy data with n = 500 observations and maximal K = 1 stages.
#> 
#>      action
#> stage   0   1   n
#>     1 278 222 500
#> 
#> Baseline covariates:
#> State covariates: Z, B, L
#> Average utility: -0.98

In the single-stage case the history \(H\) is just \((B, Z, L)\). We access the history and actions using get_history():

get_history(pd)$H |> head()
#> Key: <id, stage>
#>       id stage          Z     B          L
#>    <int> <int>      <num> <num>      <num>
#> 1:     1     1  1.2879704     0 -1.4795962
#> 2:     2     1  1.6184181     0  1.2966436
#> 3:     3     1  1.2710352     0 -1.0431352
#> 4:     4     1 -0.2157605     1  0.1198224
#> 5:     5     1 -1.0671588     0 -1.3663727
#> 6:     6     1 -1.4469746     0 -0.4018530
get_history(pd)$A |> head()
#> Key: <id, stage>
#>       id stage      A
#>    <int> <int> <char>
#> 1:     1     1      1
#> 2:     2     1      1
#> 3:     3     1      1
#> 4:     4     1      0
#> 5:     5     1      1
#> 6:     6     1      0

Similarly, we access the utility outcomes \(U\):

get_utility(pd) |> head()
#> Key: <id>
#>       id          U
#>    <int>      <num>
#> 1:     1 -0.9337648
#> 2:     2  6.7506026
#> 3:     3 -0.3377580
#> 4:     4  1.4993427
#> 5:     5 -9.1718727
#> 6:     6 -2.6692961

Two-stage: wide data

Consider a two-stage problem with observations \(O = (B, BB, L_{1}, C_{1}, U_{1}, A_1, L_2, C_{2}, U_{2}, A_2, U_{3})\). Following the general notation introduced in Section 3.1 of (Nordland and Holst 2023), \((B,BB)\) are the baseline covariates, \(S_k =(L_{k, C_{k}})\) are the state covariates at stage k, \(A_{k}\) is the action at stage k, and \(U_k\) is the reward at stage \(k\). The utility is the sum of the rewards \(U=U_{1}+U_{2}+U_{3}\).

We use sim_two_stage_multi_actions() to simulate data:

d <- sim_two_stage_multi_actions(n=2e3, seed = 1)
colnames(d)
#>  [1] "B"   "BB"  "L_1" "C_1" "A_1" "L_2" "C_2" "A_2" "L_3" "U_1" "U_2" "U_3"

Note that the data is in wide format. The data is transformed using policy_data() with instructions on which variables define the actions, baseline covariates, state covariates, and the rewards:

pd <- policy_data(d,
                  action = c("A_1", "A_2"),
                  baseline = c("B", "BB"),
                  covariates = list(L = c("L_1", "L_2"),
                                    C = c("C_1", "C_2")),
                  utility = c("U_1", "U_2", "U_3"))
pd
#> Policy data with n = 2000 observations and maximal K = 2 stages.
#> 
#>      action
#> stage default   no  yes    n
#>     1       0 1017  983 2000
#>     2     769  826  405 2000
#> 
#> Baseline covariates: B, BB
#> State covariates: L, C
#> Average utility: 0.39

The length of the character vector action determines the number of stages K (in this case 2). If the number of stages is 2 or more, the covariates argument must be a named list. Each element must be a character vector with length equal to the number of stages. If a covariate is not available at a given stage we insert an NA value, e.g., L = c(NA, "L_2").

Finally, the utility argument must be a single character string (the utility is observed after stage K) or a character vector of length K+1 with the names of the rewards.

In this example, the observed action sets vary for each stage. get_action_set() returns the global action set and get_stage_action_sets() returns the action set for each stage:

get_action_set(pd)
#> [1] "default" "no"      "yes"
get_stage_action_sets(pd)
#> $stage_1
#> [1] "no"  "yes"
#> 
#> $stage_2
#> [1] "default" "no"      "yes"

The full histories \(H_1 = (B, BB, L_{1}, C_{1})\) and \(H_2=(B, BB, L_{1}, C_{1}, A_{1}, L_{2}, C_{2})\) are available using get_history() and full_history = TRUE:

get_history(pd, stage = 1, full_history = TRUE)$H |> head()
#> Key: <id, stage>
#>       id stage        L_1        C_1          B     BB
#>    <int> <num>      <num>      <num>      <num> <char>
#> 1:     1     1  0.9696772  1.7112790 -0.6264538 group2
#> 2:     2     1 -2.1994065 -2.6431237  0.1836433 group1
#> 3:     3     1  1.9480938  2.0619342 -0.8356286 group2
#> 4:     4     1  0.1798532  1.0066957  1.5952808 group2
#> 5:     5     1  0.4150568  0.1538534  0.3295078 group2
#> 6:     6     1  0.6468405 -0.0982121 -0.8204684 group3
get_history(pd, stage = 2, full_history = TRUE)$H |> head()
#> Key: <id, stage>
#>       id stage    A_1        L_1        L_2        C_1        C_2          B
#>    <int> <num> <char>      <num>      <num>      <num>      <num>      <num>
#> 1:     1     2    yes  0.9696772 -0.7393434  1.7112790  2.4243702 -0.6264538
#> 2:     2     2     no -2.1994065  0.4828756 -2.6431237 -2.6647281  0.1836433
#> 3:     3     2     no  1.9480938  0.4803055  2.0619342  2.4747615 -0.8356286
#> 4:     4     2    yes  0.1798532 -0.3574497  1.0066957  2.0571959  1.5952808
#> 5:     5     2     no  0.4150568  2.0473541  0.1538534 -0.9649004  0.3295078
#> 6:     6     2    yes  0.6468405 -2.3701135 -0.0982121  1.0989523 -0.8204684
#>        BB
#>    <char>
#> 1: group2
#> 2: group1
#> 3: group2
#> 4: group2
#> 5: group2
#> 6: group3

Similarly, we access the associated actions at each stage via list element A:

get_history(pd, stage = 1, full_history = TRUE)$A |> head()
#> Key: <id, stage>
#>       id stage    A_1
#>    <int> <num> <char>
#> 1:     1     1    yes
#> 2:     2     1     no
#> 3:     3     1     no
#> 4:     4     1    yes
#> 5:     5     1     no
#> 6:     6     1    yes
get_history(pd, stage = 2, full_history = TRUE)$A |> head()
#> Key: <id, stage>
#>       id stage     A_2
#>    <int> <num>  <char>
#> 1:     1     2      no
#> 2:     2     2      no
#> 3:     3     2 default
#> 4:     4     2     yes
#> 5:     5     2     yes
#> 6:     6     2      no

Alternatively, the state/Markov type history and actions are available using full_history = FALSE:

get_history(pd, full_history = FALSE)$H |> head()
#> Key: <id, stage>
#>       id stage          L         C          B     BB
#>    <int> <int>      <num>     <num>      <num> <char>
#> 1:     1     1  0.9696772  1.711279 -0.6264538 group2
#> 2:     1     2 -0.7393434  2.424370 -0.6264538 group2
#> 3:     2     1 -2.1994065 -2.643124  0.1836433 group1
#> 4:     2     2  0.4828756 -2.664728  0.1836433 group1
#> 5:     3     1  1.9480938  2.061934 -0.8356286 group2
#> 6:     3     2  0.4803055  2.474761 -0.8356286 group2
get_history(pd, full_history = FALSE)$A |> head()
#> Key: <id, stage>
#>       id stage       A
#>    <int> <int>  <char>
#> 1:     1     1     yes
#> 2:     1     2      no
#> 3:     2     1      no
#> 4:     2     2      no
#> 5:     3     1      no
#> 6:     3     2 default

Note that policy_data() overrides the action variable names to A_1, A_2, … in the full history case and A in the state/Markov history case.

As in the single-stage case we access the utility, i.e. the sum of the rewards, using get_utility():

get_utility(pd) |> head()
#> Key: <id>
#>       id         U
#>    <int>     <num>
#> 1:     1  1.110369
#> 2:     2 -1.788041
#> 3:     3  2.836251
#> 4:     4  3.173743
#> 5:     5  1.891312
#> 6:     6 -1.120837

Multi-stage: long data

In this example we illustrate how polle handles decision processes with a stochastic number of stages, see Section 3.5 in (Nordland and Holst 2023). The data is simulated using sim_multi_stage(). Detailed information on the simulation is available in ?sim_multi_stage. We simulate data from 2000 iid subjects:

d <- sim_multi_stage(2e3, seed = 1)

As described, the stage data is in long format:

d$stage_data[, -(9:10)] |> head()
#>       id stage event        t      A          X     X_lead         U
#>    <num> <num> <num>    <num> <char>      <num>      <num>     <num>
#> 1:     1     1     0 0.000000      1  1.3297993  0.0000000 0.0000000
#> 2:     1     2     0 1.686561      1 -0.7926711  1.3297993 0.3567621
#> 3:     1     3     0 3.071768      0  3.5246509 -0.7926711 2.1778778
#> 4:     1     4     1 3.071768   <NA>         NA         NA 0.0000000
#> 5:     2     1     0 0.000000      1  0.7635935  0.0000000 0.0000000
#> 6:     2     2     0 1.297336      1 -0.5441694  0.7635935 0.5337427

The id variable is important for identifying which rows belong to each subjects. The baseline data uses the same id variable:

d$baseline_data |> head()
#>       id     B
#>    <num> <int>
#> 1:     1     0
#> 2:     2     0
#> 3:     3     1
#> 4:     4     1
#> 5:     5     1
#> 6:     6     0

The data is transformed using policy_data() with type = "long". The names of the id, stage, event, action, and utility variables must be specified. The event variable, inspired by the event variable in survival::Surv(), is 0 whenever an action occur and 1 for a terminal event.

pd <- policy_data(data = d$stage_data,
                  baseline_data = d$baseline_data,
                  type = "long",
                  id = "id",
                  stage = "stage",
                  event = "event",
                  action = "A",
                  utility = "U")
pd
#> Policy data with n = 2000 observations and maximal K = 4 stages.
#> 
#>      action
#> stage    0    1    n
#>     1  113 1887 2000
#>     2  844 1039 1883
#>     3  956   74 1030
#>     4   72    0   72
#> 
#> Baseline covariates: B
#> State covariates: t, X, X_lead
#> Average utility: 2.46

In some cases we are only interested in analyzing a subset of the decision stages. partial() trims the maximum number of decision stages:

pd3 <- partial(pd, K = 3)
pd3
#> Policy data with n = 2000 observations and maximal K = 3 stages.
#> 
#>      action
#> stage    0    1    n
#>     1  113 1887 2000
#>     2  844 1039 1883
#>     3  956   74 1030
#> 
#> Baseline covariates: B
#> State covariates: t, X, X_lead
#> Average utility: 2.46

SessionInfo

sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: aarch64-apple-darwin22.6.0 (64-bit)
#> Running under: macOS Sonoma 14.4.1
#> 
#> Matrix products: default
#> BLAS:   /Users/oano/.asdf/installs/R/4.3.2/lib/R/lib/libRblas.dylib 
#> LAPACK: /Users/oano/.asdf/installs/R/4.3.2/lib/R/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Copenhagen
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] splines   stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] polle_1.4           SuperLearner_2.0-29 gam_1.22-3         
#> [4] foreach_1.5.2       nnls_1.5           
#> 
#> loaded via a namespace (and not attached):
#>  [1] progressr_0.14.0    cli_3.6.2           knitr_1.45         
#>  [4] rlang_1.1.3         xfun_0.41           jsonlite_1.8.8     
#>  [7] data.table_1.15.4   listenv_0.9.1       future.apply_1.11.2
#> [10] lava_1.8.0          htmltools_0.5.7     sass_0.4.7         
#> [13] rmarkdown_2.25      grid_4.3.2          evaluate_0.23      
#> [16] jquerylib_0.1.4     fastmap_1.1.1       yaml_2.3.7         
#> [19] compiler_4.3.2      codetools_0.2-19    future_1.33.2      
#> [22] lattice_0.21-9      digest_0.6.35       R6_2.5.1           
#> [25] parallelly_1.37.1   parallel_4.3.2      Matrix_1.6-1.1     
#> [28] bslib_0.5.1         tools_4.3.2         iterators_1.0.14   
#> [31] globals_0.16.3      survival_3.5-7      cachem_1.0.8

References

Nordland, Andreas, and Klaus K. Holst. 2023. “Policy Learning with the Polle Package.” https://doi.org/10.48550/arXiv.2212.02335.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.