The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This package implements the anytime-valid sequential p-value estimation procedure as described in Stoepker and Castro (2024). References to equations and sections in this vignette pertain to this paper.
The package is centered around the function avseqmc()
.
The function can be used both to start the sequential procedure, as well
as to sharpen an earlier estimate.
This vignette is structured in two parts. We start by showcasing basic usage in Part I, and then detail some advanced usage of the package in Part II.
To start the sequential estimation procedure, the first two arguments
of avseqmc()
are most important:
sample_G
which, when called, returns a
binary sample (or a batch of samples — see Part
II) from the Monte-Carlo distribution \(G^*(\mathbf{X})\) as described in Equation
(5) in the main paper (Stoepker and Castro (2024)).epsilon
corresponding to the desired value of
the risk of overestimated significance as described in Definition 5 in
the main paper.A self-contained minimal example usage (with default settings) is given below.
library(avseqmc)
G1 <- function(){runif(1) < 0.01} # A mock MC function to demonstrate functionality
R1 <- avseqmc(sample_G = G1, epsilon = 0.001)
print(R1)
#> p-value estimate: 0.0499 (rounded to 4 decimals)
#> Resampling risk: 0.0010 (rounded to 4 decimals)
#> Total number of samples: 357
We have not specified a stopping time, prompting
avseqmc()
to use its default stopping criterion: “when
inference can be drawn at 5% significance” as described in Section 4.1
and in Equation (11) in the main paper. As a result, the sequential
estimation has terminated automatically after 357 samples and provided
an (intermediate) p-value estimate of \(\tilde
p_\tau(\mathbf{x}) =\) 0.0499. The output is collected in an
object avseqmc_progress
that has a custom plotting function
visualizing the sequential trajectory:
To avoid potentially infinite runtimes, avseqmc()
also
stops after a pre-specified number of samples which defaults at
1000:
G2 <- function(){runif(1) < 0.05}
R2 <- avseqmc(sample_G = G2, epsilon = 0.001)
print(R2)
#> p-value estimate: 0.0975 (rounded to 4 decimals)
#> Resampling risk: 0.0010 (rounded to 4 decimals)
#> Total number of samples: 1000
The maximum number of samples can be adjusted via the
max_samples
argument. A minimum number of samples can also
be selected via min_samples
if preferred.
An important advantage of the proposed anytime-valid sequential
methodology is the option to resume earlier estimation while retaining
important validity guarantees. To facilitate this, the function
avseqmc()
returns an object of class
avseqmc_progress
, which in turn can be used as an
argument to function avseqmc()
to resume sampling
based on that earlier progress. In case this
avseqmc_progress
object is available within your own R
script resuming is straightforward:
G3 <- function(){runif(1) < 0.03}
R3a <- avseqmc(sample_G = G3, epsilon = 0.001)
print(R3a)
#> p-value estimate: 0.0569 (rounded to 4 decimals)
#> Resampling risk: 0.0010 (rounded to 4 decimals)
#> Total number of samples: 1000
R3b <- avseqmc(R3a)
print(R3b)
#> p-value estimate: 0.0536 (rounded to 4 decimals)
#> Resampling risk: 0.0010 (rounded to 4 decimals)
#> Total number of samples: 2000
The case where the avseqmc_progress
object is not
directly available (for example, because the earlier sequential estimate
is computed by a different analyst) is discussed in Part II.
The function avseqmc()
continues sampling until the
stopping criterion specified via stopcrit
is reached (or
the maximum max_samples
is reached). There are two built-in
stopping criteria (which are also described in Section 4.1 in the main
paper). To use these built-in criteria, stopcrit
should be
specified as a list with the following format:
stopcrit = list("type"=...,"param"=...)
, where the values
of the elipses depend on the criterion chosen:
stopcrit
argument should be specified as
list("type"="futility","param"=...)
where
"param"
corresponds to the significance level \(\alpha\):G4 <- function(){runif(1) < 0.04}
R4 <- avseqmc(sample_G = G4,
epsilon = 0.001,
stopcrit = list("type" = "futility",
param = 0.1)) # Stop when inference can be drawn at 10% significance
print(R4)
#> p-value estimate: 0.1000 (rounded to 4 decimals)
#> Resampling risk: 0.0010 (rounded to 4 decimals)
#> Total number of samples: 690
stopcrit
argument should be specified
as list("type"="convergence","param"=...)
where
"param"
corresponds to a vector of two numbers: the first
should corresponds to the value \(\gamma\), the second to an (integer) value
of \(n_0\) used in the convergence
criterion:G5 <- function(){runif(1) < 0.04}
R5 <- avseqmc(sample_G = G5,
epsilon = 0.001,
stopcrit = list("type" = "convergence",
param = c(1e-5, 100)))
print(R5)
#> p-value estimate: 0.0898 (rounded to 4 decimals)
#> Resampling risk: 0.0010 (rounded to 4 decimals)
#> Total number of samples: 674
The actual value of \(n_0\) used may be slightly different when drawing Monte-Carlo samples in batches: this is discussed in Part II.
Custom stopping times can also be used, which is exemplified in Part II.
It may be informative to also obtain lower bounds of the confidence
sequence on which the sequential estimation is based. You can request to
compute lower bounds based on the construction by Robbins (1970) by
setting compute_lower=TRUE
in the function
avseqmc()
. The lower bounds are computed by default when
the futility stopping criterion is used
(i.e. stopcrit = list("type"="futility","param"=...)
) since
the stopping criterion is dependent on this value (serving as \(\tilde L_n(\mathbf{X})\) in Equation (11)
in the main paper).
The lower bounds are available in the output object
avseqmc_progress
, specifically in the $Ltilde
element of the returned list.
As stated, the function avseqmc()
returns an object of
class avseqmc_progress
which is a list with the following
elements:
$epsilon
: risk of overestimated significance used in
the sequential estimation;$sample_G
: function that samples from the Monte-Carlo
distribution \(G^*(\mathbf{X})\);$ptilde
: sequence of sequential \(p\)-value estimates. The final value in
this sequence is the most recent estimate of the \(p\)-value;$Ltilde
: sequence of lower bounds of the confidence
sequence based on the construction by Robbins (1970). Contains NA values
if these were not computed by default through
stopcrit = list("type"="futility","param"=...)
or requested
using compute_lower=TRUE
;$n
: total number of samples drawn from the MC
sampler;$S
: total number of ones observed from the MC
sampler;$B
: sequence of number of ones observed at each
sampling timepoint (which can be greater than 1 if sample_G
samples in batches: see Part II);$Bn
: sequence of number of samples drawn from MC
sampler at each timepoint (which can be greater than 1 if
sample_G
samples in batches: see Part
II).As shown earlier, the object can be used as an argument to
avseqmc()
to resume sampling, and the object has a custom
printing and plotting function.
Sequentially drawing single samples from the Monte-Carlo
distribution \(G^*(\mathbf{X})\) may
induce computational overhead that can be partly alleviated through
sampling in batches. The function avseqmc()
automatically allows for the specified sample_G
object to
return a batch of samples in the form of a vector (of arbitrary length)
of zeroes and ones:
G6 <- function(){runif(15) < 0.04}
R6 <- avseqmc(sample_G = G6, epsilon = 0.001)
print(R6)
#> p-value estimate: 0.0644 (rounded to 4 decimals)
#> Resampling risk: 0.0010 (rounded to 4 decimals)
#> Total number of samples: 1005
The function sequentially samples a new batch until the total amount
of samples drawn within the run exceeds the max_samples
argument. As is the case in the above example, if
max_samples
is not a multiple of the batch size, this may
cause the final amount of samples to exceed the max_samples
argument (by at most the batch size minus one).
For the built-in stopping criterion “convergence”, the actual value of \(n_0\) used in the criterion may slightly differ from the user-specified value when sampling in batches. Specifically, when sampling in batches of size \(M\), if \(n_0\) is not a multiple of the batch size, then the actual value of \(n_0\) used in the stopping criterion is \(n_0\) is \(\lceil M/n_0 \rceil n_0\).
In case you would like to resume an earlier estimation for which the
sequential estimate \(\tilde
p_\tau(\mathbf{x})\) and the risk of overestimated significance
\(\varepsilon\) is known, but the
avseqmc_progress
object is not available, you can build the
object yourself via the function
init_avseqmc_progress()
:
G7 <- function(){runif(1) < 0.04}
R7a <- init_avseqmc_progress(sample_G = G7,
epsilon = 0.001,
ptilde = 0.0819,
n = 1000)
#> Warning in init_avseqmc_progress(sample_G = G7, epsilon = 0.001, ptilde =
#> 0.0819, : Initializing avseqmc_progress with a previous value of ptilde,
#> without the previously observed number of ones S from the MC function sample_G.
#> The value of S is computed from ptilde, but may be inaccurate if ptilde is a
#> rounded value.
The number of previously observed zeroes and ones from \(G^*(\mathbf{x})\) can be computed based on
the values of \(\varepsilon\), \(n\), and \(\tilde
p_n(\mathbf{x})\) under the assumption that \(\tilde p_n(\mathbf{x})\) is constructed
using the confidence sequence of Definition 6 (as is done in the
function avseqmc()
). However, note that if \(\tilde p_n(\mathbf{x})\) is input as
rounded value, this may induce errors in the inversion, and a warning
message is output to notify the user of this. If available, it is safer
to input the total number of ones observed from sample_G
directly:
G7 <- function(){runif(1) < 0.04}
R7a <- init_avseqmc_progress(sample_G = G7,
epsilon = 0.001,
ptilde = 0.0813,
n = 1000,
S = 44)
#> Warning in init_avseqmc_progress(sample_G = G7, epsilon = 0.001, ptilde =
#> 0.0813, : Initializing avseqmc_progress with a previous value of n, S and
#> ptilde. Note that ptilde is not recomputed and is assumed correct.
A warning message indicates that the value of ptilde is not verified
based on the value of n
and S
.
You can also use a fully custom stopping time by supplying
stopcrit
with a function. This function should take a
single argument with class avseqmc_progress
and output
either FALSE
to continue sampling, or TRUE
to
stop sampling. For example, to stop sampling when, in the last 50
batches, at most a single 1 is observed, we may specify a custom
stopping rule as:
G8 <- function(){runif(1) < 0.04}
customstop <- function(R){
if(length(R$ptilde) > 50){
if(sum(R$B[(length(R$B)-49):length(R$B)]) <= 1 ){
return(TRUE)
}
}
return(FALSE)
}
R8 <- avseqmc(sample_G = G8, epsilon = 0.001, stopcrit = customstop)
#> A custom stopping criterion is used with compute_lower=FALSE. Note that the lower bound of the confidence sequence R$Ltilde is only available for the evaluation of the stopping criterion if requested via compute_lower=TRUE.
print(R8)
#> p-value estimate: 0.2352 (rounded to 4 decimals)
#> Resampling risk: 0.0010 (rounded to 4 decimals)
#> Total number of samples: 51
When you run this code, a message notifies you that if your custom
stopping criterion is based on the lower bound of the confidence
sequence R$Ltilde, then this should be requested via setting the
argument compute_lower=TRUE
. In the example above, the
lower bound is not required to evaluate the stopping criterion, so we
leave it as the default compute_lower=FALSE
.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.