The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

blockstrap

Sample complete groups (“blocks”) from a grouped data frame. This package implements a simple block bootstrap style sampler: instead of sampling individual rows, you sample entire groups preserving the intra-group structure.

Installation

You can install the release version from CRAN:

install.packages("blockstrap")

or the development version

# install.packages("remotes")
remotes::install_github("numbats/blockstrap")

Motivation

When observations belonging to the same experimental unit are spread across multiple rows (e.g. multiple measurements per subject, dose combinations, time series segments), ordinary row-wise sampling breaks these units apart. A block sampler keeps units intact by sampling at the group level.

Core function

slice_block() works on a grouped data frame. If you call it on an ungrouped data frame, it throws a helpful error.

Key arguments:

n: number of groups (blocks) to sample.
replace: sample with replacement? Needed when n exceeds number of groups.
weight_by: optional expression (unquoted) evaluated per-group to weight sampling probabilities.
...: passed to base sample()

Basic example

We use the built-in ToothGrowth dataset and treat each supplement-dose combination as a block.

library(dplyr)
library(blockstrap)

set.seed(1)
ToothGrowth |>
  group_by(supp, dose) |>
  slice_block(n = 2)

## # A tibble: 20 × 3
## # Groups:   supp, dose [2]
##      len supp   dose
##    <dbl> <fct> <dbl>
##  1  15.2 OJ      0.5
##  2  21.5 OJ      0.5
##  3  17.6 OJ      0.5
##  4   9.7 OJ      0.5
##  5  14.5 OJ      0.5
##  6  10   OJ      0.5
##  7   8.2 OJ      0.5
##  8   9.4 OJ      0.5
##  9  16.5 OJ      0.5
## 10   9.7 OJ      0.5
## 11   4.2 VC      0.5
## 12  11.5 VC      0.5
## 13   7.3 VC      0.5
## 14   5.8 VC      0.5
## 15   6.4 VC      0.5
## 16  10   VC      0.5
## 17  11.2 VC      0.5
## 18  11.2 VC      0.5
## 19   5.2 VC      0.5
## 20   7   VC      0.5

Sampling with replacement

If you want to sample more groups than exist, or allow repeats:

ToothGrowth |>
  group_by(supp, dose) |>
  slice_block(n = 10, replace = TRUE) |>
  count(supp, dose)

## # A tibble: 5 × 3
## # Groups:   supp, dose [5]
##   supp   dose     n
##   <fct> <dbl> <int>
## 1 OJ      0.5    20
## 2 OJ      1      20
## 3 OJ      2      30
## 4 VC      1      20
## 5 VC      2      10

Repeated blocks will appear multiple times (row counts summed accordingly).

Weighted sampling

Weight blocks by a statistic, e.g. mean tooth length, to favor larger mean response groups:

set.seed(42)
weighted <- ToothGrowth |>
  group_by(supp, dose) |>
  slice_block(n = 3, weight_by = mean(len))

You can verify weighting bias by repeating and tallying frequencies:

set.seed(99)
rep_draws <- replicate(500, {
  ToothGrowth |> group_by(supp, dose) |> slice_block(n = 3, weight_by = mean(len)) |> distinct(supp, dose)
}, simplify = FALSE)

freqs <- bind_rows(rep_draws) |> count(supp, dose, name = "times") |> arrange(desc(times))
freqs

## # A tibble: 6 × 3
## # Groups:   supp, dose [6]
##   supp   dose times
##   <fct> <dbl> <int>
## 1 OJ      2     334
## 2 VC      2     326
## 3 OJ      1     292
## 4 VC      1     224
## 5 OJ      0.5   205
## 6 VC      0.5   119

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.