The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

GSbench

Benchmark genomic-selection models — classic and machine-learning — from SNP marker data, through one interface, with breeding-relevant cross-validation and honest accuracy reporting.

The problem GSbench addresses: people increasingly throw glmnet, ranger, or xgboost at marker matrices, but hand-roll the cross-validation (often incorrectly) and compare models on unequal footing. GSbench fits the standard baselines (GBLUP, ridge marker effects) and the ML methods behind a single gs_fit()/predict() API, runs them through the same CV, and reports predictive ability you can actually trust — plus a stacked ensemble that combines them.

Installation

# install.packages("remotes")
remotes::install_github("mqfarooqi1/GSbench")

Only graphics, stats and withr are required. The ML backends — glmnet, ranger, xgboost — are optional (Suggests); install whichever you want to use.

Quick start

library(GSbench)

sim <- simulate_population(n = 300, m = 2000, h2 = 0.5, seed = 1)

# one model
fit <- gs_fit(sim$pheno, sim$geno, model = "gblup")
gebv <- predict(fit, sim$geno)

# compare every available model (incl. the stacked ensemble) under one CV
bench <- gs_benchmark(sim$pheno, sim$geno, k = 5, seed = 1)
bench
plot(bench)
         model  mean    sd n_folds
   elastic_net 0.367 0.187       5
         gblup 0.334 0.189       5
      ensemble 0.328 0.165       5
 random_forest 0.269 0.185       5
       xgboost 0.185 0.318       5
  (accuracy = predictive ability, cor(pred, observed) on held-out data)

What’s in it

Core (base R, no compiled code, no heavy deps):

Function Purpose
simulate_population() Reproducible SNP + phenotype simulator with known h²
qc_markers(), impute_markers() Call-rate / MAF / monomorphic filtering, mean imputation
Gmatrix() VanRaden additive genomic relationship matrix
gblup() GBLUP by REML — validated to match rrBLUP::mixed.solve to 6×10⁻⁵

Modelling & evaluation:

Function Purpose
gs_fit() / predict() Unified interface: "gblup", "elastic_net", "random_forest", "xgboost", "ensemble"
gs_cv() Cross-validation: random k-fold (CV1) or leave-one-group-out (family/environment)
gs_ensemble() Stacked super-learner — combines base models with non-negative CV-learned weights
gs_benchmark() + plot() Run all available models through one CV and compare
available_models() Which models are usable in your session

Why the methods are trustworthy

Honest limitations

References


Muhammad Farooqi · https://github.com/mqfarooqi1

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.