README

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

mixedsubjectsirt

mixedsubjectsirt is a package that facilitates augmenting human pilot data with LLM-generated item responses in psychometric calibration studies. We do this by implementing the Mixed-Subject Design¹ ² for latent variable measurement models. This package ports the Prediction Powered Inference (PPI)³ and PPI++⁴ paradigms to EM-based estimation procedures that don’t have the clear independent and dependent variables usually thought of in these PPI-based workflows. The goal is item-parameter estimates that retain the human-data target while using synthetic responses only when they appear informative. This works because the estimator is anchored to the human responses and the LLM contribution is down-weighted when it does not help.

The strength of this method is that it tunes the contribution of the LLM-generated responses based on how informative they are. This is done through a procedure called power tuning (derived from PPI++) with one key deviation: Instead of selecting a tuning parameter to minimize the standard errors of the estimated model parameters, we minimize ability risk, a quantification of the expected measurement error in downstream ability estimation, integrated over the assumed ability distribution. This allows our method to target parts of the scale where reductions in item parameter uncertainty are the most valuable, increasing operational measurement precision. Additionally, this approach guards against poor-quality synthetic data: ability-risk tuning can shrink the tuning parameter λ toward zero when synthetic responses do not improve downstream scoring precision, so estimation leans on the human responses where the LLM is uninformative. This means that whenever users are able to produce better quality predictions (through the use of using auxiliary data, better prompting, stronger models, or other new and unforeseen advances in LLMs or response prediction), the utility of this method increases in kind.

Implemented here are methods for standard dichotomous 2PL and 1PL IRT models. There are multiple options for estimation, with the recommended approach being Marginal Maximum Likelihood-based EM cross-fit to split samples.⁵ Other options include approximations based upon quadrature-based expected count regressions and iterated expected counts. This package is under active development, with experimental features such as per-item power tuning available for users to try.

What should I use?

Installation

Goal	Recommended function
Complete calibration workflow, including cross-fit λ tuning	`tune_lambda_ability_risk_crossfit()`
Complete workflow without cross-fit λ tuning	`tune_lambda_ability_risk()`
Fitting models with user-specified λ value	`fit_mixed_subjects_mml()`
Experimental item-specific λ tuning	`tune_lambda_ability_risk_item()`

devtools::install_github('klintkanopka/mixedsubjectsirt')

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.