The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Paper Example 1: Linked-Test Design with 1PL Estimation

Purpose

This vignette is a faithful reproduction of Example 1 from Schroeders and Gnambs (2025), “Sample Size Planning for Item Response Models: A Tutorial for the Quantitative Researcher” and its companion R code at https://ulrich-schroeders.github.io/IRT-sample-size/example_1.html. The goal is to let irtsim users compare its Monte Carlo output directly against the published reference.

The paper’s Example 1 asks a classic question: how many examinees are needed to recover item difficulty parameters in a linked, two-form achievement test fit with a Rasch (1PL) model?

Design, from the paper

Decision Paper value irtsim mapping
Estimation model 1PL (Rasch) estimation_model = "1PL"
Number of items 30 n_items = 30
Item discriminations (generation) rnorm(30, 1, 0.1) item_params$a
Item difficulties seq(-2, 2, length.out = 30) item_params$b
Two forms, common block items 13–18 2 × 30 linking matrix missing = "linking"
Monte Carlo iterations 438 iterations = 438
Sample sizes seq(100, 600, 50) sample_sizes
Performance criterion MSE, threshold 0.05 summary(res)$item_summary$mse

Note on the data-generating model: the paper uses a near-constant discrimination (mean 1, sd 0.1) and fits a Rasch model. irtsim’s 1PL generation fixes a = 1 exactly, so to match the paper we generate under a 2PL with a ~ rnorm(30, 1, 0.1) and set estimation_model = "1PL". The estimation model is the one the paper targets; the generation model is a faithful implementation of the paper’s a draws.

Reproducing the study

The code below mirrors the paper. It is shown for reference; the actual simulation is precomputed and cached in inst/extdata/vignette_ex1_paper.rds to keep vignette build time low.

library(irtsim)

set.seed(2024)
n_items <- 30L

# Item parameters exactly as in the paper
a_vals <- rnorm(n_items, mean = 1, sd = 0.1)
b_vals <- seq(-2, 2, length.out = n_items)

# Linking matrix: form 1 = odd items + common block 13-18,
#                 form 2 = even items + common block 13-18
linking_matrix <- matrix(0L, nrow = 2L, ncol = n_items)
linking_matrix[1L, sort(unique(c(seq(1L, n_items, 2L), 13:18)))] <- 1L
linking_matrix[2L, sort(unique(c(seq(2L, n_items, 2L), 13:18)))] <- 1L

design <- irt_design(
  model       = "2PL",
  n_items     = n_items,
  item_params = list(a = a_vals, b = b_vals),
  theta_dist  = "normal"
)

study <- irt_study(
  design,
  sample_sizes     = seq(100L, 600L, by = 50L),
  missing          = "linking",
  test_design      = list(linking_matrix = linking_matrix),
  estimation_model = "1PL"
)

res <- irt_simulate(
  study,
  iterations = 438L,
  seed       = 2024L,
  parallel   = TRUE
)

Summary: MSE by sample size

We summarize the recovered item-difficulty MSE and pair it with its Monte Carlo standard error (MCSE), following Morris et al. (2019).

s <- summary(res, criterion = c("mse", "mcse_mse"), param = "b")
head(s$item_summary, 10)
#>    sample_size item param true_value       mse    mcse_mse n_converged
#> 1          100    1     b -2.0000000 0.2503685 0.018046678         438
#> 2          100    2     b -1.8620690 0.2138392 0.017983836         438
#> 3          100    3     b -1.7241379 0.1597980 0.011247343         438
#> 4          100    4     b -1.5862069 0.1667535 0.012496766         438
#> 5          100    5     b -1.4482759 0.1862439 0.018703552         438
#> 6          100    6     b -1.3103448 0.1711703 0.012482780         438
#> 7          100    7     b -1.1724138 0.1426810 0.011621785         438
#> 8          100    8     b -1.0344828 0.1378920 0.008631983         438
#> 9          100    9     b -0.8965517 0.1351424 0.009190359         438
#> 10         100   10     b -0.7586207 0.1482162 0.010252653         438

The paper plots the MSE trajectory for two representative items: item 1 (difficulty ≈ −2, extreme) and item 15 (difficulty ≈ 0, central). We do the same.

item_df <- s$item_summary
focal <- subset(item_df, item %in% c(1L, 15L))
focal$item_label <- factor(
  focal$item,
  levels = c(1L, 15L),
  labels = c("Item 1 (b \u2248 -2)", "Item 15 (b \u2248 0)")
)

ggplot(focal, aes(x = sample_size, y = mse, colour = item_label)) +
  geom_hline(yintercept = 0.05, linetype = "dashed", colour = "grey40") +
  geom_line(linewidth = 0.8) +
  geom_point(size = 2) +
  geom_errorbar(
    aes(
      ymin = pmax(mse - 1.96 * mcse_mse, 0),
      ymax = mse + 1.96 * mcse_mse
    ),
    width = 15
  ) +
  scale_x_continuous(breaks = seq(100, 600, 100)) +
  labs(
    title    = "Example 1: MSE of b-parameter vs. sample size",
    subtitle = "Dashed line = paper's 0.05 MSE threshold",
    x        = "Sample size (N)",
    y        = "MSE(b)",
    colour   = NULL
  ) +
  theme_minimal(base_size = 12)

Comparison notes — paper vs. irtsim

Should reproduce (within MC noise):

Expected small numerical differences:

  1. Form assignment. The paper uses random per-examinee form assignment (sample(c(1, 2), n, replace = TRUE)). irtsim’s apply_missing_structured() uses deterministic round-robin assignment. At N ≥ 100 the induced difference in per-form sample sizes is small (< 1 examinee gap) and does not materially shift the MSE trajectories.
  2. RNG dispatch. irtsim runs in parallel mode with future.seed = TRUE, which uses L’Ecuyer-CMRG substreams. The paper uses the session default Mersenne-Twister. Both are valid Monte Carlo streams; the specific numbers differ. Only trajectory shape is expected to match, not bit-for-bit numerical equality.
  3. Iterations. We use the paper’s 438 iterations as-is. That value reflects a Burton (2003) precision target that irtsim exposes via [irt_iterations()]; users can recompute it if they want a different MC precision.

References

Burton, A., Altman, D. G., Royston, P., & Holder, R. L. (2006). The design of simulation studies in medical statistics. Statistics in Medicine, 25, 4279–4292. https://doi.org/10.1002/sim.2673

Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38, 2074–2102. https://doi.org/10.1002/sim.8086

Schroeders, U., & Gnambs, T. (2025). Sample size planning for item response models: A tutorial for the quantitative researcher. Companion code: https://ulrich-schroeders.github.io/IRT-sample-size/.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.