The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette is a faithful reproduction of Example 1 from Schroeders and Gnambs (2025), “Sample Size Planning for Item Response Models: A Tutorial for the Quantitative Researcher” and its companion R code at https://ulrich-schroeders.github.io/IRT-sample-size/example_1.html. The goal is to let irtsim users compare its Monte Carlo output directly against the published reference.
The paper’s Example 1 asks a classic question: how many examinees are needed to recover item difficulty parameters in a linked, two-form achievement test fit with a Rasch (1PL) model?
| Decision | Paper value | irtsim mapping |
|---|---|---|
| Estimation model | 1PL (Rasch) | estimation_model = "1PL" |
| Number of items | 30 | n_items = 30 |
| Item discriminations (generation) | rnorm(30, 1, 0.1) |
item_params$a |
| Item difficulties | seq(-2, 2, length.out = 30) |
item_params$b |
| Two forms, common block items 13–18 | 2 × 30 linking matrix | missing = "linking" |
| Monte Carlo iterations | 438 | iterations = 438 |
| Sample sizes | seq(100, 600, 50) |
sample_sizes |
| Performance criterion | MSE, threshold 0.05 | summary(res)$item_summary$mse |
Note on the data-generating model: the paper uses a near-constant
discrimination (mean 1, sd 0.1) and fits a Rasch model. irtsim’s 1PL
generation fixes a = 1 exactly, so to match the paper we
generate under a 2PL with a ~ rnorm(30, 1, 0.1) and set
estimation_model = "1PL". The estimation model is the one
the paper targets; the generation model is a faithful implementation of
the paper’s a draws.
The code below mirrors the paper. It is shown for reference; the
actual simulation is precomputed and cached in
inst/extdata/vignette_ex1_paper.rds to keep vignette build
time low.
library(irtsim)
set.seed(2024)
n_items <- 30L
# Item parameters exactly as in the paper
a_vals <- rnorm(n_items, mean = 1, sd = 0.1)
b_vals <- seq(-2, 2, length.out = n_items)
# Linking matrix: form 1 = odd items + common block 13-18,
# form 2 = even items + common block 13-18
linking_matrix <- matrix(0L, nrow = 2L, ncol = n_items)
linking_matrix[1L, sort(unique(c(seq(1L, n_items, 2L), 13:18)))] <- 1L
linking_matrix[2L, sort(unique(c(seq(2L, n_items, 2L), 13:18)))] <- 1L
design <- irt_design(
model = "2PL",
n_items = n_items,
item_params = list(a = a_vals, b = b_vals),
theta_dist = "normal"
)
study <- irt_study(
design,
sample_sizes = seq(100L, 600L, by = 50L),
missing = "linking",
test_design = list(linking_matrix = linking_matrix),
estimation_model = "1PL"
)
res <- irt_simulate(
study,
iterations = 438L,
seed = 2024L,
parallel = TRUE
)We summarize the recovered item-difficulty MSE and pair it with its Monte Carlo standard error (MCSE), following Morris et al. (2019).
s <- summary(res, criterion = c("mse", "mcse_mse"), param = "b")
head(s$item_summary, 10)
#> sample_size item param true_value mse mcse_mse n_converged
#> 1 100 1 b -2.0000000 0.2503685 0.018046678 438
#> 2 100 2 b -1.8620690 0.2138392 0.017983836 438
#> 3 100 3 b -1.7241379 0.1597980 0.011247343 438
#> 4 100 4 b -1.5862069 0.1667535 0.012496766 438
#> 5 100 5 b -1.4482759 0.1862439 0.018703552 438
#> 6 100 6 b -1.3103448 0.1711703 0.012482780 438
#> 7 100 7 b -1.1724138 0.1426810 0.011621785 438
#> 8 100 8 b -1.0344828 0.1378920 0.008631983 438
#> 9 100 9 b -0.8965517 0.1351424 0.009190359 438
#> 10 100 10 b -0.7586207 0.1482162 0.010252653 438The paper plots the MSE trajectory for two representative items: item 1 (difficulty ≈ −2, extreme) and item 15 (difficulty ≈ 0, central). We do the same.
item_df <- s$item_summary
focal <- subset(item_df, item %in% c(1L, 15L))
focal$item_label <- factor(
focal$item,
levels = c(1L, 15L),
labels = c("Item 1 (b \u2248 -2)", "Item 15 (b \u2248 0)")
)
ggplot(focal, aes(x = sample_size, y = mse, colour = item_label)) +
geom_hline(yintercept = 0.05, linetype = "dashed", colour = "grey40") +
geom_line(linewidth = 0.8) +
geom_point(size = 2) +
geom_errorbar(
aes(
ymin = pmax(mse - 1.96 * mcse_mse, 0),
ymax = mse + 1.96 * mcse_mse
),
width = 15
) +
scale_x_continuous(breaks = seq(100, 600, 100)) +
labs(
title = "Example 1: MSE of b-parameter vs. sample size",
subtitle = "Dashed line = paper's 0.05 MSE threshold",
x = "Sample size (N)",
y = "MSE(b)",
colour = NULL
) +
theme_minimal(base_size = 12)Using irtsim’s built-in recommended_n() helper we can
extract the smallest N that meets the paper’s MSE ≤ 0.05 threshold for
each item. The paper reports sample-size requirements in the same
sense.
sim_summary <- summary(res, criterion = "mse", param = "b")
rec <- recommended_n(sim_summary, criterion = "mse", threshold = 0.05, param = "b")
head(rec, 10)
#> item param recommended_n criterion threshold
#> 1 1 b NA mse 0.05
#> 2 2 b 450 mse 0.05
#> 3 3 b 300 mse 0.05
#> 4 4 b 350 mse 0.05
#> 5 5 b 450 mse 0.05
#> 6 6 b 400 mse 0.05
#> 7 7 b 300 mse 0.05
#> 8 8 b 250 mse 0.05
#> 9 9 b 300 mse 0.05
#> 10 10 b 300 mse 0.05Should reproduce (within MC noise):
NA in the
recommended_n column means the criterion was never met —
this is informative, not an error.Expected small numerical differences:
sample(c(1, 2), n, replace = TRUE)).
irtsim’s apply_missing_structured() uses deterministic
round-robin assignment. At N ≥ 100 the induced difference in per-form
sample sizes is small (< 1 examinee gap) and does not materially
shift the MSE trajectories.future.seed = TRUE, which uses L’Ecuyer-CMRG substreams.
The paper uses the session default Mersenne-Twister. Both are valid
Monte Carlo streams; the specific numbers differ. Only trajectory shape
is expected to match, not bit-for-bit numerical equality.irt_iterations()]; users can recompute it if
they want a different MC precision.Burton, A., Altman, D. G., Royston, P., & Holder, R. L. (2006). The design of simulation studies in medical statistics. Statistics in Medicine, 25, 4279–4292. https://doi.org/10.1002/sim.2673
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38, 2074–2102. https://doi.org/10.1002/sim.8086
Schroeders, U., & Gnambs, T. (2025). Sample size planning for item response models: A tutorial for the quantitative researcher. Companion code: https://ulrich-schroeders.github.io/IRT-sample-size/.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.