| Type: | Package |
| Title: | Block Designs for Observational Studies |
| Version: | 1.0.0 |
| Description: | Creates block designs of fixed size J with at least one treated and control unit per block. Blocks larger than pairs better distinguish effects caused by a treatment from unmeasured confounding in assignment of individuals to treatment. Somewhat counterintuitively, blocks larger than pairs can use more units while attaining better covariate balance and block homogeneity. A forthcoming manuscript by Brumberg and Rosenbaum details the design. |
| License: | GPL-2 |
| Encoding: | UTF-8 |
| Imports: | iTOS, lpSolve, stats |
| Suggests: | DOS2, sensitivity2x2xk, sensitivitymv, weightedRank, xtable, testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| Depends: | R (≥ 3.5.0) |
| NeedsCompilation: | no |
| RoxygenNote: | 7.3.2 |
| LazyData: | true |
| Packaged: | 2026-04-04 19:15:23 UTC; katherine |
| Author: | Katherine Brumberg
|
| Maintainer: | Katherine Brumberg <kbrum@umich.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-04-10 09:50:02 UTC |
Evidence of Fecal-Oral Transmission of Helicobacter Pylori
Description
Motivated by the study by Bui et al. (2016), these data from NHANES 1999-2000 concern evidence about the possible fecal-oral transmission of Helicobacter Pylori.
Usage
data(Hpylori)
Format
A data frame with observations (age >= 3, complete cases on key variables) on the following 11 variables.
SEQNNHANES id number
female1 if female, 0 if male
ageAge in years
educationEducation level. Ordered factor with levels
<9<9-11<HS/GED<SomeCol<College<Age<20incomeFamily income relative to poverty. Ordered factor with levels
<2,>=2,Missingblack1 if black, 0 otherwise
hispanic1 if hispanic, 0 otherwise
bornCountry of birth. Ordered factor with levels
US<Mexico<Otherpeopleroom11 if people per room > 1, 0 otherwise
hepaAHepatitis A antibody, 1 if positive, 0 if negative
helioBPHelicobacter pylori.
Details
Does oral consumption of fecal matter – perhaps because someone prepared food without washing their hands – cause infection with Helicobacter Pylori, a type of bacteria that infects the stomach and may cause peptic ulcers or gastric cancer? It is difficult to study this question, because there is no record of incidents in which small amounts of fecal matter were ingested. It is known that hepatitis A virus is mostly transmitted by the fecal-oral route. Following prior studies, Bui et al. (2016) used antibodies for hepatitis A as an indicator of a higher level of ingestion of fecal matter, and examined its relationship with Helicobacter pylori, adjusting for possible confounders, such as age, country of birth, or a crowded home.
Source
NHANES, US National Health and Nutrition Examination Survey, 1999-2000. https://wwwn.cdc.gov/nchs/nhanes/
References
Bui, D., Brown, H. E., Harris, R. B. and Oren, E. (2016) Serologic evidence for fecal–oral transmission of Helicobacter pylori. The American Journal of Tropical Medicine and Hygiene, 94(1), 82–88. doi:10.4269/ajtmh.15-0297 https://pmc.ncbi.nlm.nih.gov/articles/PMC4710451/
Examples
data(Hpylori)
boxplot(Hpylori$helioBP ~ Hpylori$hepaA)
Add additional units to a seed match
Description
Seed units from the treatment groups are inferred from sdm: the first group is
z == 1 when seed_tc is TRUE (treated-to-control seed)
and z == 0 when FALSE (control-to-treated seed); the second group is the
remaining rows in sdm (same convention as blockMatch).
Usage
addMatch(id1, id2, sdm, dat, cost, J, seed_tc, solver = "rlemon")
Arguments
id1 |
All IDs in group 1 (treated if |
id2 |
All IDs in group 2 (control if |
sdm |
Seed match |
dat |
Augmented |
cost |
Cost matrix (rows = treated, columns = control). Row and column
names must be numeric unit ids (see |
J |
Block size (integer |
seed_tc |
|
solver |
Either |
Value
A data frame of the matched sample with columns mset (matched set ID),
type (factor: "seed", "add", or "single"), plus all columns from dat.
Rows are ordered by mset, type, and treatment status.
Assess covariate balance and homogeneity in matched sample
Description
Computes balance diagnostics for a specified covariate in the output of
blockMatch. Compares treated vs control means before and after
matching, standardized differences, and within-block homogeneity.
Usage
balEq(vname, o, detail = FALSE)
Arguments
vname |
Character string naming the variable to assess (must be a column
in both |
o |
A list containing |
detail |
Logical. If |
Value
If detail = FALSE, a 1-row matrix with columns:
- T-before, C-before
Mean for treated and control before matching
- T-after, C-after
Equally weighted averages of within-block treated or control means after matching
- dif.before, dif.after
Raw difference (T mean - C mean) before and after
- sdif.before, sdif.after
Standardized difference of means before and after; for comparability, both use the pooled standard deviation of
vnamein the full sample before matching, where the pooling equally weights the treated and control groups- med, q9
Median and 90th percentile of within-block means of pairwise absolute differences
- pct0
Percent of blocks with within-block mean pairwise difference of 0
If detail = TRUE, a list with balance (that matrix), y,
z, and d.
Examples
#' data(Hpylori)
df <- Hpylori[sample(1:nrow(Hpylori), 1000), ]
pr <- glm(hepaA ~ age + female, data = df, family = binomial)$fitted
cochran <- cumsum(c(0, .07, .18, .25, .25, .18, .07))
df$prc <- as.integer(cut(pr, stats::quantile(pr, cochran), include.lowest = TRUE))
df$z <- df$hepaA
bd <- basicDistance(df, near = df$female)
out <- blockMatch(df, cost = bd$cost, J = 4, ratio = 4)
balEq("age", out)
Compute distance matrix for matching
Description
Compute distance matrix for matching
Usage
basicDistance(
dat,
xm = NULL,
near = NULL,
xinteger = NULL,
prc.penalty = 1000,
near.penalty = 100,
integer.penalty = 20,
compute_distance = TRUE
)
Arguments
dat |
A data frame with |
xm |
A numeric matrix or data frame with |
near |
A numeric vector of length |
xinteger |
A numeric vector of length |
prc.penalty |
A single finite positive number: penalty for propensity
score stratum ( |
near.penalty |
Nonnegative penalties for |
integer.penalty |
Nonnegative penalties for |
compute_distance |
If |
Details
This function borrows much of its functionality from the package 'iTOS'.
Documentation for 'iTOS' functions addNearExact, addinteger, addMahal
could prove helpful.
Value
A list with components:
dat |
The input data frame with column |
cost |
The cost/distance matrix for matching (rows = treated, cols = control),
or |
Examples
#' data(Hpylori)
df <- Hpylori[sample(1:nrow(Hpylori), 1000), ]
pr <- glm(hepaA ~ age + female, data = df, family = binomial)$fitted
cochran <- cumsum(c(0, .07, .18, .25, .25, .18, .07))
df$prc <- as.integer(cut(pr, stats::quantile(pr, cochran), include.lowest = TRUE))
df$z <- df$hepaA
bd <- basicDistance(df, near = df$female)
Block matching within propensity score strata
Description
Creates blocks of fixed size J with at least one control and one treated. Within each stratum, the function chooses a matching strategy based on the treated-to-control ratio: direct matching when one group dominates, or a two-stage seed-and-add approach when groups are more balanced.
Usage
blockMatch(dat, cost, J = 4, ratio = 4, solver = "rlemon", rseed = 12345)
Arguments
dat |
A data frame with |
cost |
Distance matrix: one row per treated unit and one column
per control, with |
J |
Target number of individuals per matched block. Each block has at least one control and at least one treated. |
ratio |
Minimum matching ratio, greater than or equal to |
solver |
Either |
rseed |
Single finite number.
Fix |
Value
A list with components:
m |
A data frame of the matched sample, with columns |
all |
The full |
References
Cochran, W. G. (1968). The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics, 24(2), 295–313.
Examples
data(Hpylori)
df <- Hpylori[sample(1:nrow(Hpylori), 1000), ]
pr <- glm(hepaA ~ age + female, data = df, family = binomial)$fitted
cochran <- cumsum(c(0, .07, .18, .25, .25, .18, .07))
df$prc <- as.integer(cut(pr, stats::quantile(pr, cochran), include.lowest = TRUE))
df$z <- df$hepaA
bd <- basicDistance(df, near = df$female)
out <- blockMatch(df, cost = bd$cost, J = 4, ratio = 4)
table(out$all$matched, out$all$hepaA)
Maximum number of blocks of size J from treated and control counts
Description
Solves an integer program when there are nt treated and nc
control units. The smaller group is exhausted (all of those units are
placed in blocks). Subject to that, the linear program
maximizes units from the larger group.
Usage
blockSizes(nt, nc, J)
Arguments
nt |
Number of treated units. |
nc |
Number of control units. |
J |
Block size (number of units per matched block). |
Details
This function reproduces some calculations in Section 4 of the forthcoming paper “Constructing Observational Block Designs When the Propensity Score Exhibits Limited Overlap" by Brumberg and Rosenbaum.
If either nt or nc is 0, or if
nt + nc < J, a warning is issued and the function returns a degenerate
result with zero blocks and zero counts.
Value
A list with components:
detail |
Named vector with |
counts |
Named integer vector of length |
Examples
blockSizes(nt = 2, nc = 10, J = 5)
blockSizes(nt = 10, nc = 2, J = 5)
blockSizes(nt = 6, nc = 6, J = 5)
Seed optimal matching
Description
Subsets dat and cost to the given treated and control ids and
calls iTOS::makematch. Columns z and id are required on
dat because the row subset is passed through to the matcher.
Usage
seedMatch(id1, id2, dat, cost, msetAdd, ncontrols = 1, solver = "rlemon")
Arguments
id1 |
Treated unit ids (subset of |
id2 |
Control unit ids (subset of |
dat |
Data frame with columns |
cost |
Cost matrix (rows = treated, cols = control). |
msetAdd |
Finite scalar added to matched set ids in the output. |
ncontrols |
Number of controls per treated (default 1). |
solver |
Either |
Value
Value from iTOS::makematch (typically a data.frame of
matched units including column mset), with mset coerced to
numeric and shifted by msetAdd.