---
title: "Single-arm trials"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Single-arm trials}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
set.seed(3081)
```

```{r setup, message=FALSE}
library(goldilocks)
```

The other vignettes describe two-arm randomised designs. Single-arm trials -- in which every subject receives the experimental therapy and the comparator is an external benchmark -- are common in early-phase oncology, rare-disease, and proof-of-concept studies. This vignette shows how to set up a Goldilocks single-arm design with `survival_adapt()`.

Two practical constraints on single-arm designs in this package:

- A single-arm trial is signalled by setting `hazard_control = NULL`.
- Only `method = "bayes"` is supported for single-arm trials. The frequentist tests (`logrank`, `cox`, `chisq`) require two arms and will raise an error if used in this mode.

## The decision rule

In a single-arm trial there is no concurrent control, so the "treatment effect" is replaced by the cumulative event probability on the treatment arm itself:

$\text{effect} \;=\; p_{\text{treatment}} \;=\; \Pr(\text{event by end\_of\_study} \mid \text{data}).$

The argument `h0` plays the role of a benchmark on this scale: a target failure probability (or, equivalently, $1 - h_0$ is a target survival probability) drawn from external evidence such as a published rate, registry, or historical cohort. With `alternative = "less"` and `prob_ha`, the trial declares success when

$$\Pr(p_{\text{treatment}} < h_0 \mid \text{data}) \;>\; \texttt{prob\_ha},$$

i.e. when the posterior assigns enough mass to "the experimental therapy has a lower failure rate than the benchmark". Choosing `alternative = "greater"` reverses the direction; `alternative = "two.sided"` is not allowed for `method = "bayes"`.

The same posterior is used at each interim look to compute the predictive probability of eventual success, which drives the futility (`Fn`) and expected-success (`Sn`) stopping rules. Predictive probabilities are obtained by imputing remaining follow-up from the posterior predictive distribution of the (piecewise-)exponential model and re-evaluating the success criterion on each completed dataset.

## Setting up the design

Suppose the existing standard of care has a 30% event probability by 24 months, and we are testing a new agent that we hope will reduce this to 20%. We use an interim look at 50 of 80 enrolled subjects:

```{r design}
end_of_study <- 24
benchmark <- 0.30                       # external standard-of-care failure rate
target    <- 0.20                       # rate we hope the new therapy achieves

# Convert the target failure rate into a constant hazard (so we can simulate)
ht <- prop_to_haz(probs = target, endtime = end_of_study)
ht
```

Now we run `survival_adapt()`:

```{r run, cache=TRUE}
out <- survival_adapt(
  hazard_treatment = ht,
  hazard_control   = NULL,              # single-arm
  cutpoints        = 0,
  N_total          = 80,
  lambda           = 5,                 # enrolments per month (constant)
  lambda_time      = 0,
  interim_look     = 50,
  end_of_study     = end_of_study,
  prior            = c(0.1, 0.1),       # Gamma(0.1, 0.1) on the hazard
  block            = 2,                 # default; inert in single-arm mode
  rand_ratio       = c(1, 1),           # default; inert in single-arm mode
  prop_loss        = 0.05,
  alternative      = "less",
  h0               = benchmark,         # benchmark failure probability
  Fn               = 0.05,
  Sn               = 0.95,
  prob_ha          = 0.95,
  N_impute         = 50,
  N_mcmc           = 2000,
  method           = "bayes")

out
```

A few points to highlight in the output:

- `N_control = 0`: no concurrent control was simulated.
- `margin = 0.30`: this is the value of `h0` that the trial is testing against. Note that it is on the cumulative-failure scale, not the survival scale.
- `est_final` is the posterior mean of $p_{\text{treatment}}$ at `end_of_study`, *not* a treatment effect relative to control.
- `post_prob_ha` is the posterior probability that $p_{\text{treatment}} < h_0$.

## Why `block` and `rand_ratio` still appear

`survival_adapt()` shares its trial-data simulator with the two-arm case. In single-arm mode the simulator skips `randomization()` entirely and assigns every subject to the treatment arm; `block` and `rand_ratio` are therefore inert and can be left at their defaults. The minimum-`interim_look` rule (`interim_look >= max(block)`) only applies to two-arm designs, so a single-arm trial can use any `interim_look` strictly less than `N_total`.

## Operating characteristics

A single trial does not tell you whether the design is well-calibrated. To estimate power and type I error, we run the design under each scenario using `sim_trials()`. The chunks below are not run when knitting (each takes a few minutes) but illustrate the workflow:

```{r oc, eval=FALSE}
# Power: simulate under the alternative (true rate = 0.20)
out_power <- sim_trials(
  N_trials         = 1000,
  hazard_treatment = ht,
  hazard_control   = NULL,
  cutpoints        = 0,
  N_total          = 80,
  lambda           = 5,
  lambda_time      = 0,
  interim_look     = 50,
  end_of_study     = end_of_study,
  prior            = c(0.1, 0.1),
  block            = 2,
  rand_ratio       = c(1, 1),
  prop_loss        = 0.05,
  alternative      = "less",
  h0               = benchmark,
  Fn               = 0.05,
  Sn               = 0.95,
  prob_ha          = 0.95,
  N_impute         = 50,
  N_mcmc           = 2000,
  method           = "bayes")

# Type I error: simulate under the null (true rate = benchmark = 0.30)
ht_null <- prop_to_haz(probs = benchmark, endtime = end_of_study)
out_t1error <- sim_trials(
  N_trials         = 1000,
  hazard_treatment = ht_null,
  hazard_control   = NULL,
  cutpoints        = 0,
  N_total          = 80,
  lambda           = 5,
  lambda_time      = 0,
  interim_look     = 50,
  end_of_study     = end_of_study,
  prior            = c(0.1, 0.1),
  block            = 2,
  rand_ratio       = c(1, 1),
  prop_loss        = 0.05,
  alternative      = "less",
  h0               = benchmark,
  Fn               = 0.05,
  Sn               = 0.95,
  prob_ha          = 0.95,
  N_impute         = 50,
  N_mcmc           = 2000,
  method           = "bayes")

summarise_sims(list(out_power$sims, out_t1error$sims))
```

Calibration proceeds the same way as for two-arm designs: if the type I error under the null (where the true rate equals the benchmark) is above the desired level, raise `prob_ha`; if power is too low, increase `N_total` or relax the `Fn`/`Sn` thresholds.

## A practical caveat on benchmarks

The validity of a single-arm Goldilocks trial rests entirely on the benchmark `h0` being a fair representation of the population the trial is enrolling. Drift in standard of care, differences in patient mix, and unmeasured confounding all bias the comparison in a way that randomisation would otherwise neutralise. A Bayesian framework can incorporate uncertainty about the benchmark itself -- e.g. by replacing a fixed `h0` with a prior distribution informed by historical data -- but this is outside the scope of the simple `h0` scalar that `survival_adapt()` exposes, and would require a custom analysis. When in doubt, simulating the design under several plausible values of the true rate (including ones near the benchmark) is a useful way to characterise its sensitivity.

## See also

- The "Example: Two-armed RCT" vignette covers the corresponding two-arm randomised design with a log-rank decision rule.
- The "Bayesian decisions with piecewise-exponential hazards" vignette covers the same decision rule used here, but in a two-arm setting and with non-constant hazards. The piecewise machinery applies directly to single-arm trials too (just keep `hazard_control = NULL` and pass a per-interval `hazard_treatment` vector).
- `?survival_adapt` documents all arguments.
