| Type: | Package | 
| Title: | Approximate POMDP Planning Software | 
| Version: | 0.6.16 | 
| Description: | A toolkit for Partially Observed Markov Decision Processes (POMDP). Provides bindings to C++ libraries implementing the algorithm SARSOP (Successive Approximations of the Reachable Space under Optimal Policies) and described in Kurniawati et al (2008), <doi:10.15607/RSS.2008.IV.009>. This package also provides a high-level interface for generating, solving and simulating POMDP problems and their solutions. | 
| License: | GPL-2 | 
| URL: | https://github.com/boettiger-lab/sarsop | 
| BugReports: | https://github.com/boettiger-lab/sarsop/issues | 
| RoxygenNote: | 7.1.1 | 
| Imports: | xml2, parallel, processx, digest, Matrix | 
| Suggests: | testthat, roxygen2, knitr, covr, spelling | 
| LinkingTo: | BH | 
| Encoding: | UTF-8 | 
| Language: | en-US | 
| SystemRequirements: | mallinfo, hence Linux, MacOS or Windows | 
| NeedsCompilation: | yes | 
| Packaged: | 2025-04-15 17:11:33 UTC; jovyan | 
| Author: | Carl Boettiger | 
| Maintainer: | Carl Boettiger <cboettig@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-04-16 04:50:08 UTC | 
alphas_from_log
Description
Read alpha vectors from a log file.
Usage
alphas_from_log(meta, log_dir = ".")
Arguments
| meta | a data frame containing the log metadata
for each set of alpha vectors desired, see
 | 
| log_dir | path to log directory | 
Value
a list with a matrix of alpha vectors for each
entry in the provided metadata (as returned by sarsop).
Examples
 # takes > 5s
source(system.file("examples/fisheries-ex.R", package = "sarsop"))
log = tempfile()
alpha <- sarsop(transition, observation, reward, discount, precision = 10,
                log_dir = log)
test the APPL binaries
Description
Asserts that the C++ binaries for appl have been compiled successfully
Usage
assert_has_appl()
Value
Will return TRUE if binaries are installed and can be located and executed, and FALSE otherwise.
Examples
assert_has_appl()
compute_policy
Description
Derive the corresponding policy function from the alpha vectors
Usage
compute_policy(
  alpha,
  transition,
  observation,
  reward,
  state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
  a_0 = 1
)
Arguments
| alpha | the matrix of alpha vectors returned by  | 
| transition | Transition matrix, dimension n_s x n_s x n_a | 
| observation | Observation matrix, dimension n_s x n_z x n_a | 
| reward | reward matrix, dimension n_s x n_a | 
| state_prior | initial belief state, optional, defaults to uniform over states | 
| a_0 | previous action. Belief in state depends not only on observation, but on prior belief of the state and subsequent action that had been taken. | 
Value
a data frame providing the optimal policy (choice of action) and corresponding value of the action for each possible belief state
Examples
m <- fisheries_matrices()
 ## Takes > 5s
if(assert_has_appl()){
alpha <- sarsop(m$transition, m$observation, m$reward, 0.95, precision = 10)
compute_policy(alpha, m$transition, m$observation, m$reward)
}
f from log
Description
Read transition function from log
Usage
f_from_log(meta)
Arguments
| meta | a data frame containing the log metadata
for each set of alpha vectors desired, see
 | 
Details
note this function is unique to the fisheries example problem and assumes that sarsop call is run with logging specifying a column "model" that contains either the string "ricker" (corresponding to a Ricker-type growth function) or "allen" (corresponding to an Allen-type.)
Value
the growth function associated with the model indicated.
Examples
 # takes > 5s
source(system.file("examples/fisheries-ex.R", package = "sarsop"))
log = tempfile()
alpha <- sarsop(transition, observation, reward, discount, precision = 10,
                log_dir = log)
fisheries_matrices
Description
Initialize the transition, observation, and reward matrices given a transition function, reward function, and state space
Usage
fisheries_matrices(
  states = 0:20,
  actions = states,
  observed_states = states,
  reward_fn = function(x, a) pmin(x, a),
  f = ricker(1, 15),
  sigma_g = 0.1,
  sigma_m = 0.1,
  noise = c("rescaled-lognormal", "lognormal", "uniform", "normal")
)
Arguments
| states | sequence of possible states | 
| actions | sequence of possible actions | 
| observed_states | sequence of possible observations | 
| reward_fn | function of x and a that gives reward for tacking action a when state is x | 
| f | transition function of state x and action a. | 
| sigma_g | half-width of uniform shock or equivalent variance for log-normal | 
| sigma_m | half-width of uniform shock or equivalent variance for log-normal | 
| noise | distribution for noise, "lognormal" or "uniform" | 
Details
assumes log-normally distributed observation errors and process errors
Value
list of transition matrix, observation matrix, and reward matrix
Examples
m <- fisheries_matrices()
hindcast_pomdp
Description
Compare historical actions to what pomdp recommendation would have been.
Usage
hindcast_pomdp(
  transition,
  observation,
  reward,
  discount,
  obs,
  action,
  state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
  alpha = NULL,
  ...
)
Arguments
| transition | Transition matrix, dimension n_s x n_s x n_a | 
| observation | Observation matrix, dimension n_s x n_z x n_a | 
| reward | reward matrix, dimension n_s x n_a | 
| discount | the discount factor | 
| obs | a given sequence of observations | 
| action | the corresponding sequence of actions | 
| state_prior | initial belief state, optional, defaults to uniform over states | 
| alpha | the matrix of alpha vectors returned by  | 
| ... | additional arguments to  | 
Value
a list, containing: a data frame with columns for time, obs, action, and optimal action, and an array containing the posterior belief distribution at each time t
Examples
m <- fisheries_matrices()
 ## Takes > 5s
if(assert_has_appl()){
alpha <- sarsop(m$transition, m$observation, m$reward, 0.95, precision = 10)
sim <- hindcast_pomdp(m$transition, m$observation, m$reward, 0.95,
                     obs = rnorm(21, 15, .1), action = rep(1, 21),
                     alpha = alpha)
}
meta from log
Description
load metadata from a log file
Usage
meta_from_log(
  parameters,
  log_dir = ".",
  metafile = paste0(log_dir, "/meta.csv")
)
Arguments
| parameters | a data.frame with the desired parameter values as given in metafile | 
| log_dir | path to log directory | 
| metafile | path to metafile index, assumed to be meta.csv in log_dir | 
Value
a data.frame with the rows of the matching metadata.
Examples
 # takes > 5s
source(system.file("examples/fisheries-ex.R", package = "sarsop"))
log = tempfile()
alpha <- sarsop(transition, observation, reward, discount, precision = 10,
                log_dir = log)
model from log
Description
Read model details from log file
Usage
models_from_log(meta, reward_fn = function(x, h) pmin(x, h))
Arguments
| meta | a data frame containing the log metadata
for each set of alpha vectors desired, see
 | 
| reward_fn | a function f(x,a) giving the reward for taking action a given a system in state x. | 
Details
assumes transition can be determined by the f_from_log function, which is specific to the fisheries example
Value
a list with an element for each row in the requested meta data frame, which itself is a list of the three matrices: transition, observation, and reward, defining the pomdp problem.
Examples
 # takes > 5s
source(system.file("examples/fisheries-ex.R", package = "sarsop"))
log = tempfile()
alpha <- sarsop(transition, observation, reward, discount, precision = 10,
                log_dir = log)
APPL wrappers
Description
Wrappers for the APPL executables. The pomdpsol function solves a model
file and returns the path to the output policy file.
Usage
pomdpsol(
  model,
  output = tempfile(),
  precision = 0.001,
  timeout = NULL,
  fast = FALSE,
  randomization = FALSE,
  memory = NULL,
  improvementConstant = NULL,
  timeInterval = NULL,
  stdout = tempfile(),
  stderr = tempfile(),
  spinner = TRUE
)
polgraph(
  model,
  policy,
  output = tempfile(),
  max_depth = 3,
  max_branches = 10,
  min_prob = 0.001,
  stdout = "",
  spinner = TRUE
)
pomdpsim(
  model,
  policy,
  output = tempfile(),
  steps = 100,
  simulations = 3,
  stdout = "",
  spinner = TRUE
)
pomdpeval(
  model,
  policy,
  output = tempfile(),
  steps = 100,
  simulations = 3,
  stdout = "",
  spinner = TRUE
)
pomdpconvert(model, stdout = "", spinner = TRUE)
Arguments
| model | file/path to the  | 
| output | file/path of the output policy file. This is also returned by the function. | 
| precision | targetPrecision. Set targetPrecision as the target precision in solution quality; run ends when target precision is reached. The target precision is 1e-3 by default. | 
| timeout | Use timeLimit as the timeout in seconds. If running time exceeds the specified value, pomdpsol writes out a policy and terminates. There is no time limit by default. | 
| fast | logical, default FALSE. use fast (but very picky) alternate parser for .pomdp files. | 
| randomization | logical, default FALSE. Turn on randomization for the sampling algorithm. | 
| memory | Use memoryLimit as the memory limit in MB. No memory limit by default. If memory usage exceeds the specified value, pomdpsol writes out a policy and terminates. Set the value to be less than physical memory to avoid swapping. | 
| improvementConstant | Use improvementConstant as the trial improvement factor in the sampling algorithm. At the default of 0.5, a trial terminates at a belief when the gap between its upper and lower bound is 0.5 of the current precision at the initial belief. | 
| timeInterval | Use timeInterval as the time interval between two consecutive write-out of policy files. If this is not specified, pomdpsol only writes out a policy file upon termination. | 
| stdout | a filename where pomdp run statistics will be stored | 
| stderr | currently ignored. | 
| spinner | should we show a spinner while sarsop is running? | 
| policy | file/path to the policy file | 
| max_depth | the maximum horizon of the generated policy graph | 
| max_branches | maximum number of branches to show in the policy graph | 
| min_prob | the minimum probability threshold for a branch to be shown in the policy graph | 
| steps | number of steps for each simulation run | 
| simulations | as the number of simulation runs | 
Examples
if(assert_has_appl()){
  model <- system.file("models", "example.pomdp", package = "sarsop")
  policy <- tempfile(fileext = ".policyx")
  pomdpsol(model, output = policy, timeout = 1)
# Other tools
  evaluation <- pomdpeval(model, policy, stdout = FALSE)
  graph <- polgraph(model, policy, stdout = FALSE)
  simulations <- pomdpsim(model, policy, stdout = FALSE)
}
read_policyx
Description
read a .policyx file created by SARSOP and return alpha vectors and associated actions.
Usage
read_policyx(file = "output.policyx")
Arguments
| file | name of the policyx file to be read. | 
Value
a list, first element "vectors" is an n_states x n_vectors array of alpha vectors, second element is a numeric vector "action" of length n_vectors whose i'th element indicates the action corresponding to the i'th alpha vector (column) in the vectors array.
Examples
f <- system.file("extdata", "out.policy", package="sarsop", mustWork = TRUE)
policy <- read_policyx(f)
sarsop
Description
sarsop wraps the tasks of writing the pomdpx file defining the problem, running the pomdsol (SARSOP) algorithm in C++, and then reading the resulting policy file back into R. The returned alpha vectors and alpha_action information is then transformed into a more generic, user-friendly representation as a matrix whose columns correspond to actions and rows to states. This function can thus be used at the heart of most pomdp applications.
Usage
sarsop(
  transition,
  observation,
  reward,
  discount,
  state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
  verbose = TRUE,
  log_dir = tempdir(),
  log_data = NULL,
  cache = TRUE,
  ...
)
Arguments
| transition | Transition matrix, dimension n_s x n_s x n_a | 
| observation | Observation matrix, dimension n_s x n_z x n_a | 
| reward | reward matrix, dimension n_s x n_a | 
| discount | the discount factor | 
| state_prior | initial belief state, optional, defaults to uniform over states | 
| verbose | logical, should the function include a message with pomdp diagnostics (timings, final precision, end condition) | 
| log_dir | pomdpx and policyx files will be saved here, along with a metadata file | 
| log_data | a data.frame of additional columns to include in the log, such as model parameters. A unique id value for each run can be provided as one of the columns, otherwise, a globally unique id will be generated. | 
| cache | should results from the log directory be cached? Default TRUE. Identical functional calls will quickly return previously cached alpha vectors from file rather than re-running. | 
| ... | additional arguments to  | 
Value
a matrix of alpha vectors. Column index indicates action associated with the alpha vector, (1:n_actions), rows indicate system state, x. Actions for which no alpha vector was found are included as all -Inf, since such actions are not optimal regardless of belief, and thus have no corresponding alpha vectors in alpha_action list.
Examples
 ## Takes > 5s
## Use example code to generate matrices for pomdp problem:
source(system.file("examples/fisheries-ex.R", package = "sarsop"))
alpha <- sarsop(transition, observation, reward, discount, precision = 10)
compute_policy(alpha, transition, observation, reward)
simulate a POMDP
Description
Simulate a POMDP given the appropriate matrices.
Usage
sim_pomdp(
  transition,
  observation,
  reward,
  discount,
  state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
  x0,
  a0 = 1,
  Tmax = 20,
  policy = NULL,
  alpha = NULL,
  reps = 1,
  ...
)
Arguments
| transition | Transition matrix, dimension n_s x n_s x n_a | 
| observation | Observation matrix, dimension n_s x n_z x n_a | 
| reward | reward matrix, dimension n_s x n_a | 
| discount | the discount factor | 
| state_prior | initial belief state, optional, defaults to uniform over states | 
| x0 | initial state | 
| a0 | initial action (default is action 1, e.g. can be arbitrary if the observation process is independent of the action taken) | 
| Tmax | duration of simulation | 
| policy | Simulate using a pre-computed policy (e.g. MDP policy) instead of POMDP | 
| alpha | the matrix of alpha vectors returned by  | 
| reps | number of replicate simulations to compute | 
| ... | additional arguments to mclapply | 
Details
simulation assumes the following order of updating: For system in state[t] at time t, an observation of the system obs[t] is made, and then action[t] is based on that observation and the given policy, returning (discounted) reward[t].
Value
a data frame with columns for time, state, obs, action, and (discounted) value.
Examples
m <- fisheries_matrices()
discount <- 0.95
 ## Takes > 5s
if(assert_has_appl()){
alpha <- sarsop(m$transition, m$observation, m$reward, discount, precision = 10)
sim <- sim_pomdp(m$transition, m$observation, m$reward, discount,
                 x0 = 5, Tmax = 20, alpha = alpha)
}
write pomdpx files
Description
A POMDPX file specifies a POMDP problem in terms of the transition, observation, and reward matrices, the discount factor, and the initial belief.
Usage
write_pomdpx(
  P,
  O,
  R,
  gamma,
  b = rep(1/dim(O)[1], dim(O)[1]),
  file = "input.pomdpx",
  digits = 12,
  digits2 = 12,
  format = "f"
)
Arguments
| P | transition matrix | 
| O | observation matrix | 
| R | reward | 
| gamma | discount factor | 
| b | initial belief | 
| file | pomdpx file to create | 
| digits | precision to round to before normalizing. Leave at 4 since sarsop seems unable to do more? | 
| digits2 | precision to write solution to. Leave at 10, since normalizing requires additional precision | 
| format | floating point format, because sarsop parser doesn't seem to know scientific notation | 
Examples
m <- fisheries_matrices()
f <- tempfile()
write_pomdpx(m$transition, m$observation, m$reward, 0.95,
             file = f)