Introduction

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction

Outline

This package implements the anytime-valid sequential p-value estimation procedure as described in Stoepker and Castro (2024). References to equations and sections in this vignette pertain to this paper.

The package is centered around the function avseqmc(). The function can be used both to start the sequential procedure, as well as to sharpen an earlier estimate.

This vignette is structured in two parts. We start by showcasing basic usage in Part I, and then detail some advanced usage of the package in Part II.

Part I: Basic Usage

Starting Estimation

To start the sequential estimation procedure, the first two arguments of avseqmc() are most important:

a function sample_G which, when called, returns a binary sample (or a batch of samples — see Part II) from the Monte-Carlo distribution $G^*(\mathbf{X})$ as described in Equation (5) in the main paper (Stoepker and Castro (2024)).
a number epsilon corresponding to the desired value of the risk of overestimated significance as described in Definition 5 in the main paper.

A self-contained minimal example usage (with default settings) is given below.

library(avseqmc)
G1 <- function(){runif(1) < 0.01} # A mock MC function to demonstrate functionality
R1 <- avseqmc(sample_G = G1, epsilon = 0.001)
print(R1)
#> p-value estimate:            0.0499 (rounded to 4 decimals) 
#> Resampling risk:             0.0010 (rounded to 4 decimals) 
#> Total number of samples:     357

We have not specified a stopping time, prompting avseqmc() to use its default stopping criterion: “when inference can be drawn at 5% significance” as described in Section 4.1 and in Equation (11) in the main paper. As a result, the sequential estimation has terminated automatically after 357 samples and provided an (intermediate) p-value estimate of $\tilde p_\tau(\mathbf{x}) =$ 0.0499. The output is collected in an object avseqmc_progress that has a custom plotting function visualizing the sequential trajectory:

plot(R1)

To avoid potentially infinite runtimes, avseqmc() also stops after a pre-specified number of samples which defaults at 1000:

G2 <- function(){runif(1) < 0.05}
R2 <- avseqmc(sample_G = G2, epsilon = 0.001)
print(R2)
#> p-value estimate:            0.0975 (rounded to 4 decimals) 
#> Resampling risk:             0.0010 (rounded to 4 decimals) 
#> Total number of samples:     1000

The maximum number of samples can be adjusted via the max_samples argument. A minimum number of samples can also be selected via min_samples if preferred.

Resuming Earlier Estimation

An important advantage of the proposed anytime-valid sequential methodology is the option to resume earlier estimation while retaining important validity guarantees. To facilitate this, the function avseqmc() returns an object of class avseqmc_progress, which in turn can be used as an argument to function avseqmc() to resume sampling based on that earlier progress. In case this avseqmc_progress object is available within your own R script resuming is straightforward:

G3 <- function(){runif(1) < 0.03}
R3a <- avseqmc(sample_G = G3, epsilon = 0.001)
print(R3a)
#> p-value estimate:            0.0569 (rounded to 4 decimals) 
#> Resampling risk:             0.0010 (rounded to 4 decimals) 
#> Total number of samples:     1000

R3b <- avseqmc(R3a)
print(R3b)
#> p-value estimate:            0.0536 (rounded to 4 decimals) 
#> Resampling risk:             0.0010 (rounded to 4 decimals) 
#> Total number of samples:     2000

The case where the avseqmc_progress object is not directly available (for example, because the earlier sequential estimate is computed by a different analyst) is discussed in Part II.

Using Built-in Stopping Times

The function avseqmc() continues sampling until the stopping criterion specified via stopcrit is reached (or the maximum max_samples is reached). There are two built-in stopping criteria (which are also described in Section 4.1 in the main paper). To use these built-in criteria, stopcrit should be specified as a list with the following format: stopcrit = list("type"=...,"param"=...), where the values of the elipses depend on the criterion chosen:

For the stopping time “when inference can be drawn at level $\alpha$” the stopcrit argument should be specified as list("type"="futility","param"=...) where "param" corresponds to the significance level $\alpha$:

G4 <- function(){runif(1) < 0.04}
R4 <- avseqmc(sample_G = G4, 
              epsilon = 0.001, 
              stopcrit = list("type" = "futility", 
                              param = 0.1)) # Stop when inference can be drawn at 10% significance
print(R4)
#> p-value estimate:            0.1000 (rounded to 4 decimals) 
#> Resampling risk:             0.0010 (rounded to 4 decimals) 
#> Total number of samples:     690

For the stopping time “when estimates are no longer decreasing at a sufficient rate” the stopcrit argument should be specified as list("type"="convergence","param"=...) where "param" corresponds to a vector of two numbers: the first should corresponds to the value $\gamma$, the second to an (integer) value of $n_0$ used in the convergence criterion:

G5 <- function(){runif(1) < 0.04}
R5 <- avseqmc(sample_G = G5, 
              epsilon = 0.001, 
              stopcrit = list("type" = "convergence", 
                              param = c(1e-5, 100)))
print(R5)
#> p-value estimate:            0.0898 (rounded to 4 decimals) 
#> Resampling risk:             0.0010 (rounded to 4 decimals) 
#> Total number of samples:     674

The actual value of $n_0$ used may be slightly different when drawing Monte-Carlo samples in batches: this is discussed in Part II.

Custom stopping times can also be used, which is exemplified in Part II.

Computing Lower Confidence Bounds

It may be informative to also obtain lower bounds of the confidence sequence on which the sequential estimation is based. You can request to compute lower bounds based on the construction by Robbins (1970) by setting compute_lower=TRUE in the function avseqmc(). The lower bounds are computed by default when the futility stopping criterion is used (i.e. stopcrit = list("type"="futility","param"=...)) since the stopping criterion is dependent on this value (serving as $\tilde L_n(\mathbf{X})$ in Equation (11) in the main paper).

The lower bounds are available in the output object avseqmc_progress, specifically in the $Ltilde element of the returned list.

Inspecting Output

As stated, the function avseqmc() returns an object of class avseqmc_progress which is a list with the following elements:

$epsilon: risk of overestimated significance used in the sequential estimation;
$sample_G: function that samples from the Monte-Carlo distribution $G^*(\mathbf{X})$;
$ptilde: sequence of sequential $p$-value estimates. The final value in this sequence is the most recent estimate of the $p$-value;
$Ltilde: sequence of lower bounds of the confidence sequence based on the construction by Robbins (1970). Contains NA values if these were not computed by default through stopcrit = list("type"="futility","param"=...) or requested using compute_lower=TRUE;
$n: total number of samples drawn from the MC sampler;
$S: total number of ones observed from the MC sampler;
$B: sequence of number of ones observed at each sampling timepoint (which can be greater than 1 if sample_G samples in batches: see Part II);
$Bn: sequence of number of samples drawn from MC sampler at each timepoint (which can be greater than 1 if sample_G samples in batches: see Part II).

As shown earlier, the object can be used as an argument to avseqmc() to resume sampling, and the object has a custom printing and plotting function.

Part II: Advanced Usage

Sampling Batches

Sequentially drawing single samples from the Monte-Carlo distribution $G^*(\mathbf{X})$ may induce computational overhead that can be partly alleviated through sampling in batches. The function avseqmc() automatically allows for the specified sample_G object to return a batch of samples in the form of a vector (of arbitrary length) of zeroes and ones:

G6 <- function(){runif(15) < 0.04}
R6 <- avseqmc(sample_G = G6, epsilon = 0.001)
print(R6)
#> p-value estimate:            0.0644 (rounded to 4 decimals) 
#> Resampling risk:             0.0010 (rounded to 4 decimals) 
#> Total number of samples:     1005

The function sequentially samples a new batch until the total amount of samples drawn within the run exceeds the max_samples argument. As is the case in the above example, if max_samples is not a multiple of the batch size, this may cause the final amount of samples to exceed the max_samples argument (by at most the batch size minus one).

For the built-in stopping criterion “convergence”, the actual value of $n_0$ used in the criterion may slightly differ from the user-specified value when sampling in batches. Specifically, when sampling in batches of size $M$, if $n_0$ is not a multiple of the batch size, then the actual value of $n_0$ used in the stopping criterion is $n_0$ is $\lceil M/n_0 \rceil n_0$.

Resuming Based on Earlier Printed Results

In case you would like to resume an earlier estimation for which the sequential estimate $\tilde p_\tau(\mathbf{x})$ and the risk of overestimated significance $\varepsilon$ is known, but the avseqmc_progress object is not available, you can build the object yourself via the function init_avseqmc_progress():

G7 <- function(){runif(1) < 0.04}
R7a <- init_avseqmc_progress(sample_G = G7, 
                             epsilon = 0.001, 
                             ptilde = 0.0819, 
                             n = 1000)
#> Warning in init_avseqmc_progress(sample_G = G7, epsilon = 0.001, ptilde =
#> 0.0819, : Initializing avseqmc_progress with a previous value of ptilde,
#> without the previously observed number of ones S from the MC function sample_G.
#> The value of S is computed from ptilde, but may be inaccurate if ptilde is a
#> rounded value.

R7b <- avseqmc(R7a)

The number of previously observed zeroes and ones from $G^*(\mathbf{x})$ can be computed based on the values of $\varepsilon$, $n$, and $\tilde p_n(\mathbf{x})$ under the assumption that $\tilde p_n(\mathbf{x})$ is constructed using the confidence sequence of Definition 6 (as is done in the function avseqmc()). However, note that if $\tilde p_n(\mathbf{x})$ is input as rounded value, this may induce errors in the inversion, and a warning message is output to notify the user of this. If available, it is safer to input the total number of ones observed from sample_G directly:

G7 <- function(){runif(1) < 0.04}
R7a <- init_avseqmc_progress(sample_G = G7, 
                             epsilon = 0.001, 
                             ptilde = 0.0813, 
                             n = 1000,
                             S = 44)
#> Warning in init_avseqmc_progress(sample_G = G7, epsilon = 0.001, ptilde =
#> 0.0813, : Initializing avseqmc_progress with a previous value of n, S and
#> ptilde. Note that ptilde is not recomputed and is assumed correct.

A warning message indicates that the value of ptilde is not verified based on the value of n and S.

Custom Stopping Times

You can also use a fully custom stopping time by supplying stopcrit with a function. This function should take a single argument with class avseqmc_progress and output either FALSE to continue sampling, or TRUE to stop sampling. For example, to stop sampling when, in the last 50 batches, at most a single 1 is observed, we may specify a custom stopping rule as:

G8 <- function(){runif(1) < 0.04}
customstop <- function(R){
  if(length(R$ptilde) > 50){
    if(sum(R$B[(length(R$B)-49):length(R$B)]) <= 1 ){
      return(TRUE)
    }
  }
  return(FALSE)
}
R8 <- avseqmc(sample_G = G8, epsilon = 0.001, stopcrit = customstop)
#> A custom stopping criterion is used with compute_lower=FALSE. Note that the lower bound of the confidence sequence R$Ltilde is only available for the evaluation of the stopping criterion if requested via compute_lower=TRUE.

print(R8)
#> p-value estimate:            0.2352 (rounded to 4 decimals) 
#> Resampling risk:             0.0010 (rounded to 4 decimals) 
#> Total number of samples:     51

When you run this code, a message notifies you that if your custom stopping criterion is based on the lower bound of the confidence sequence R$Ltilde, then this should be requested via setting the argument compute_lower=TRUE. In the example above, the lower bound is not required to evaluate the stopping criterion, so we leave it as the default compute_lower=FALSE.

Stoepker, I. V., and R. M. Castro. 2024. Inference with Sequential Monte-Carlo Computation of p-Values: Fast and Valid Approaches. https://doi.org/10.48550/arXiv.2409.18908.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.