The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The hypothesize package provides a consistent API for
hypothesis testing in R. It is designed around three principles from
Structure and Interpretation of Computer Programs (SICP):
This vignette introduces the package through examples, building from simple primitives to more sophisticated compositions.
Every hypothesis test in this package implements the same interface:
pval(x) — extract the p-valuetest_stat(x) — extract the test statisticdof(x) — extract degrees of freedomis_significant_at(x, alpha) — check significance at
level αThis abstraction means you can work with any test the same way, regardless of its internal implementation.
The z-test is the most basic hypothesis test. It tests whether a population mean equals a hypothesized value when the population standard deviation is known:
# A manufacturer claims widgets weigh 100g on average
# We test 30 widgets (population SD is known to be 5g from historical data)
set.seed(42)
weights <- rnorm(30, mean = 102, sd = 5)
# Test H0: mu = 100 vs H1: mu != 100
z <- z_test(weights, mu0 = 100, sigma = 5)
z
#> Hypothesis test ( z_test )
#> -----------------------------
#> Test statistic: 2.566556
#> P-value: 0.01027141
#> Degrees of freedom: Inf
#> Significant at 5% level: TRUEWe can extract components:
The Wald test generalizes the z-test to any asymptotically normal estimator. If you have an estimate and its standard error, you can test whether the true parameter equals a hypothesized value:
# Suppose we fit a regression and got coefficient = 2.3 with SE = 0.8
# Test H0: beta = 0 (no effect) vs H1: beta != 0
w <- wald_test(estimate = 2.3, se = 0.8, null_value = 0)
w
#> Hypothesis test ( wald_test )
#> -----------------------------
#> Test statistic: 8.265625
#> P-value: 0.004040275
#> Degrees of freedom: 1
#> Significant at 5% level: TRUE
is_significant_at(w, 0.05)
#> [1] TRUEThe Wald statistic is z², which follows a chi-squared distribution with 1 df:
The LRT compares nested models by examining the ratio of their likelihoods:
# Suppose we fit two models:
# Null model (simpler): log-likelihood = -250
# Alt model (complex): log-likelihood = -240
# The alt model has 3 extra parameters
test <- lrt(null_loglik = -250, alt_loglik = -240, dof = 3)
test
#> Hypothesis test ( likelihood_ratio_test )
#> -----------------------------
#> Test statistic: 20
#> P-value: 0.0001697424
#> Degrees of freedom: 3
#> Significant at 5% level: TRUE
# Is the more complex model justified?
is_significant_at(test, 0.05)
#> [1] TRUEHypothesis tests and confidence intervals are two sides of the same coin. A 95% confidence interval contains exactly those parameter values that would not be rejected at the 5% level.
For tests that store their estimates, we can extract confidence intervals:
w <- wald_test(estimate = 5.2, se = 1.1)
confint(w) # 95% CI
#> lower upper
#> 3.04404 7.35596
confint(w, level = 0.99) # 99% CI
#> lower upper
#> 2.366588 8.033412Notice the duality: the CI contains 0 if and only if the test is not significant:
A key principle from SICP is the closure property: when you combine things, you get back the same kind of thing. In our case, combining hypothesis tests yields a hypothesis test.
Fisher’s method combines p-values from independent tests:
# Three independent studies test the same hypothesis
# None is individually significant, but together...
p1 <- 0.08
p2 <- 0.12
p3 <- 0.06
combined <- fisher_combine(p1, p2, p3)
combined
#> Hypothesis test ( fisher_combined_test )
#> -----------------------------
#> Test statistic: 14.91881
#> P-value: 0.02089771
#> Degrees of freedom: 6
#> Significant at 5% level: TRUE
is_significant_at(combined, 0.05)
#> [1] TRUEYou can also combine actual test objects:
t1 <- wald_test(estimate = 1.2, se = 0.8)
t2 <- wald_test(estimate = 0.9, se = 0.6)
t3 <- wald_test(estimate = 1.5, se = 1.0)
fisher_combine(t1, t2, t3)
#> Hypothesis test ( fisher_combined_test )
#> -----------------------------
#> Test statistic: 12.07678
#> P-value: 0.06027762
#> Degrees of freedom: 6
#> Significant at 5% level: FALSEBecause the result is itself a hypothesis_test, you can
use all the same methods on it:
When performing multiple tests, we need to adjust p-values to control
error rates. The adjust_pval() function transforms tests by
adjusting their p-values:
# Perform 5 tests
tests <- list(
wald_test(estimate = 2.5, se = 1.0),
wald_test(estimate = 1.8, se = 0.9),
wald_test(estimate = 1.2, se = 0.7),
wald_test(estimate = 0.9, se = 0.8),
wald_test(estimate = 2.1, se = 1.1)
)
# Original p-values
sapply(tests, pval)
#> [1] 0.01241933 0.04550026 0.08647627 0.26058903 0.05625037
# Bonferroni-adjusted p-values
adjusted <- adjust_pval(tests, method = "bonferroni")
sapply(adjusted, pval)
#> [1] 0.06209665 0.22750132 0.43238133 1.00000000 0.28125183
# Less conservative: Benjamini-Hochberg (FDR control)
adjusted_bh <- adjust_pval(tests, method = "BH")
sapply(adjusted_bh, pval)
#> [1] 0.06209665 0.09375061 0.10809533 0.26058903 0.09375061The adjusted tests retain all original properties:
adj <- adjusted[[1]]
test_stat(adj) # Same as original
#> [1] 6.25
adj$original_pval # Can access unadjusted p-value
#> [1] 0.01241933
adj$adjustment_method # Method used
#> [1] "bonferroni"Because adjusted tests are still hypothesis tests, they compose naturally:
# First adjust, then combine
adjusted <- adjust_pval(tests[1:3], method = "bonferroni")
fisher_combine(adjusted[[1]], adjusted[[2]], adjusted[[3]])
#> Hypothesis test ( fisher_combined_test )
#> -----------------------------
#> Test statistic: 13.26117
#> P-value: 0.03907096
#> Degrees of freedom: 6
#> Significant at 5% level: TRUEThe hypothesis_test() constructor makes it easy to add
new test types:
# Example: Create a simple chi-squared goodness-of-fit test wrapper
chisq_gof <- function(observed, expected) {
stat <- sum((observed - expected)^2 / expected)
df <- length(observed) - 1
p.value <- pchisq(stat, df = df, lower.tail = FALSE)
hypothesis_test(
stat = stat,
p.value = p.value,
dof = df,
superclasses = "chisq_gof_test",
observed = observed,
expected = expected
)
}
# Use it
result <- chisq_gof(
observed = c(45, 35, 20),
expected = c(40, 40, 20)
)
result
#> Hypothesis test ( chisq_gof_test )
#> -----------------------------
#> Test statistic: 1.25
#> P-value: 0.5352614
#> Degrees of freedom: 2
#> Significant at 5% level: FALSE
# It works with all the standard functions
pval(result)
#> [1] 0.5352614
is_significant_at(result, 0.05)
#> [1] FALSEThe hypothesize package demonstrates that a small set of
well-designed primitives and combinators can provide a powerful,
extensible framework for hypothesis testing:
| Function | Purpose | SICP Principle |
|---|---|---|
hypothesis_test() |
Base constructor | Data abstraction |
z_test(), wald_test(),
lrt() |
Primitive tests | Building blocks |
pval(), test_stat(),
dof() |
Accessors | Abstraction barrier |
fisher_combine() |
Combine tests | Closure property |
adjust_pval() |
Transform tests | Higher-order functions |
confint() |
Extract CI | Duality |
This design makes hypothesis testing composable, extensible, and easy to reason about.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.