The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
GSbench fits and compares genomic-selection models — GBLUP, ridge marker effects, and machine-learning methods — through one interface, using breeding-relevant cross-validation. This vignette walks through a full workflow.
For a reproducible example we simulate a population: 300 lines, 2000 SNPs, 50 QTL, narrow-sense heritability 0.5.
Filter low-call-rate, low-MAF and monomorphic markers, and impute the rest.
gblup() estimates variance components by REML. (Its
GEBVs match rrBLUP::mixed.solve to within 6e-5
— see the package tests.)
fit <- gblup(sim$pheno, geno)
fit
#> <gblup>
#> Vu = 50.27, Ve = 37.31, h2 = 0.574
#> 300 individuals; intercept = -0.4076; marker effects available
cor(fit$gebv, sim$bv) # accuracy against the true breeding values
#> [1] 0.6941135Because geno was supplied, the fit also carries ridge
marker effects, so it can predict new genotypes:
gs_fit() exposes the same API for every model family.
Which are available depends on which optional packages you have
installed:
gs_cv() runs breeding-relevant cross-validation. Random
k-fold predicts untested lines (CV1); leave_group_out holds
out whole families or environments.
cv <- gs_cv(sim$pheno, geno, models = "gblup", k = 5, reps = 1, seed = 1)
cv
#> <gs_cv: kfold>
#> 5-fold x 1 rep(s)
#> model mean sd n_folds
#> gblup 0.159 0.076 5
#> (accuracy = predictive ability, cor(pred, observed) on held-out data)
# leave-one-family-out, with a toy family structure
fam <- rep(1:6, length.out = nrow(geno))
gs_cv(sim$pheno, geno, models = "gblup",
scheme = "leave_group_out", groups = fam)
#> <gs_cv: leave_group_out>
#> model mean sd n_folds
#> gblup 0.196 0.166 6
#> (accuracy = predictive ability, cor(pred, observed) on held-out data)gs_ensemble() combines base models into a super-learner:
each model’s out-of-fold predictions are used to learn non-negative
weights (summing to one), and the final predictor is that weighted
blend.
With several base models installed, the weights show how much each contributes.
gs_benchmark() runs every available model — including
the ensemble — under one cross-validation and reports predictive ability
so you can compare them fairly.
bench <- gs_benchmark(sim$pheno, geno, models = c("gblup", "ensemble"),
k = 5, seed = 1)
bench$accuracy
#> model mean sd n_folds
#> 1 ensemble 0.2855766 0.03898727 5
#> 2 gblup 0.1590768 0.07638873 5For an additive trait like this one, GBLUP and elastic net are typically strongest; tree methods trail; the ensemble lands near the top without you having to pick the winner in advance.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.