The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Single-arm trials

library(goldilocks)

The other vignettes describe two-arm randomised designs. Single-arm trials – in which every subject receives the experimental therapy and the comparator is an external benchmark – are common in early-phase oncology, rare-disease, and proof-of-concept studies. This vignette shows how to set up a Goldilocks single-arm design with survival_adapt().

Two practical constraints on single-arm designs in this package:

The decision rule

In a single-arm trial there is no concurrent control, so the “treatment effect” is replaced by the cumulative event probability on the treatment arm itself:

\(\text{effect} \;=\; p_{\text{treatment}} \;=\; \Pr(\text{event by end\_of\_study} \mid \text{data}).\)

The argument h0 plays the role of a benchmark on this scale: a target failure probability (or, equivalently, \(1 - h_0\) is a target survival probability) drawn from external evidence such as a published rate, registry, or historical cohort. With alternative = "less" and prob_ha, the trial declares success when

\[\Pr(p_{\text{treatment}} < h_0 \mid \text{data}) \;>\; \texttt{prob\_ha},\]

i.e. when the posterior assigns enough mass to “the experimental therapy has a lower failure rate than the benchmark”. Choosing alternative = "greater" reverses the direction; alternative = "two.sided" is not allowed for method = "bayes".

The same posterior is used at each interim look to compute the predictive probability of eventual success, which drives the futility (Fn) and expected-success (Sn) stopping rules. Predictive probabilities are obtained by imputing remaining follow-up from the posterior predictive distribution of the (piecewise-)exponential model and re-evaluating the success criterion on each completed dataset.

Setting up the design

Suppose the existing standard of care has a 30% event probability by 24 months, and we are testing a new agent that we hope will reduce this to 20%. We use an interim look at 50 of 80 enrolled subjects:

end_of_study <- 24
benchmark <- 0.30                       # external standard-of-care failure rate
target    <- 0.20                       # rate we hope the new therapy achieves

# Convert the target failure rate into a constant hazard (so we can simulate)
ht <- prop_to_haz(probs = target, endtime = end_of_study)
ht
#> [1] 0.009297648

Now we run survival_adapt():

out <- survival_adapt(
  hazard_treatment = ht,
  hazard_control   = NULL,              # single-arm
  cutpoints        = 0,
  N_total          = 80,
  lambda           = 5,                 # enrolments per month (constant)
  lambda_time      = 0,
  interim_look     = 50,
  end_of_study     = end_of_study,
  prior            = c(0.1, 0.1),       # Gamma(0.1, 0.1) on the hazard
  block            = 2,                 # default; inert in single-arm mode
  rand_ratio       = c(1, 1),           # default; inert in single-arm mode
  prop_loss        = 0.05,
  alternative      = "less",
  h0               = benchmark,         # benchmark failure probability
  Fn               = 0.05,
  Sn               = 0.95,
  prob_ha          = 0.95,
  N_impute         = 50,
  N_mcmc           = 2000,
  method           = "bayes")

out
#>   prob_threshold margin alternative N_treatment N_control N_enrolled N_max
#> 1           0.95    0.3        less          80         0         80    80
#>   post_prob_ha est_final ppp_success stop_futility stop_expected_success
#> 1       0.9985 0.1689997        0.86             0                     0

A few points to highlight in the output:

Why block and rand_ratio still appear

survival_adapt() shares its trial-data simulator with the two-arm case. In single-arm mode the simulator skips randomization() entirely and assigns every subject to the treatment arm; block and rand_ratio are therefore inert and can be left at their defaults. The minimum-interim_look rule (interim_look >= max(block)) only applies to two-arm designs, so a single-arm trial can use any interim_look strictly less than N_total.

Operating characteristics

A single trial does not tell you whether the design is well-calibrated. To estimate power and type I error, we run the design under each scenario using sim_trials(). The chunks below are not run when knitting (each takes a few minutes) but illustrate the workflow:

# Power: simulate under the alternative (true rate = 0.20)
out_power <- sim_trials(
  N_trials         = 1000,
  hazard_treatment = ht,
  hazard_control   = NULL,
  cutpoints        = 0,
  N_total          = 80,
  lambda           = 5,
  lambda_time      = 0,
  interim_look     = 50,
  end_of_study     = end_of_study,
  prior            = c(0.1, 0.1),
  block            = 2,
  rand_ratio       = c(1, 1),
  prop_loss        = 0.05,
  alternative      = "less",
  h0               = benchmark,
  Fn               = 0.05,
  Sn               = 0.95,
  prob_ha          = 0.95,
  N_impute         = 50,
  N_mcmc           = 2000,
  method           = "bayes")

# Type I error: simulate under the null (true rate = benchmark = 0.30)
ht_null <- prop_to_haz(probs = benchmark, endtime = end_of_study)
out_t1error <- sim_trials(
  N_trials         = 1000,
  hazard_treatment = ht_null,
  hazard_control   = NULL,
  cutpoints        = 0,
  N_total          = 80,
  lambda           = 5,
  lambda_time      = 0,
  interim_look     = 50,
  end_of_study     = end_of_study,
  prior            = c(0.1, 0.1),
  block            = 2,
  rand_ratio       = c(1, 1),
  prop_loss        = 0.05,
  alternative      = "less",
  h0               = benchmark,
  Fn               = 0.05,
  Sn               = 0.95,
  prob_ha          = 0.95,
  N_impute         = 50,
  N_mcmc           = 2000,
  method           = "bayes")

summarise_sims(list(out_power$sims, out_t1error$sims))

Calibration proceeds the same way as for two-arm designs: if the type I error under the null (where the true rate equals the benchmark) is above the desired level, raise prob_ha; if power is too low, increase N_total or relax the Fn/Sn thresholds.

A practical caveat on benchmarks

The validity of a single-arm Goldilocks trial rests entirely on the benchmark h0 being a fair representation of the population the trial is enrolling. Drift in standard of care, differences in patient mix, and unmeasured confounding all bias the comparison in a way that randomisation would otherwise neutralise. A Bayesian framework can incorporate uncertainty about the benchmark itself – e.g. by replacing a fixed h0 with a prior distribution informed by historical data – but this is outside the scope of the simple h0 scalar that survival_adapt() exposes, and would require a custom analysis. When in doubt, simulating the design under several plausible values of the true rate (including ones near the benchmark) is a useful way to characterise its sensitivity.

See also

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.