The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
GAreg implements genetic algorithm (GA) optimization for three core statistical workflows where discrete model structure must be learned from data:
Best subset variable selection – choose a parsimonious set of predictors under information or validation criteria.
Changepoint detection – estimate the number and locations of structural breaks in regression or time series.
Optimal spline knot selection – place interior knots for splines or piecewise polynomials to balance fit and smoothness.
The package provides a consistent GA interface, unified S4 results (gareg), and penalty-aware objectives. It supports both varying‑knot and fixed‑knot modes, minimum spacing constraints, and unbalanced designs.
We use the simulation function below for subset selection
illustration. Here, n is the number of observations and
p is the number of predictors. For the covairates,
s0 represent number of truly active predictors, valued
range from 0 to p.
magnitudes_range specifies the range of significantly
expressed coefficients that corresponding to the truly active
predictors. If rho is specified with some values, the
autoregressive structure is introduced into the error terms. If
rho=NULL, we will have independent and identically
distributed (IID) errors. We can also specify sigma for the
error standard deviation.
sim_subset_data <- function(n = 60, p = 50, s0 = 25, sigma = 1.5,
magnitudes_range = c(0.5, 2),
rho = NULL,
seed = NULL) {
stopifnot(n > 0, p > 0, s0 >= 0, s0 <= p, sigma >= 0)
if (!is.null(seed)) set.seed(seed)
X <- matrix(rnorm(n * p), n, p)
# Active set and coefficients
true_idx <- if (s0 > 0) sort(sample.int(p, s0)) else integer(0)
signs <- if (s0 > 0) sample(c(-1, 1), s0, replace = TRUE) else numeric(0)
magnitudes <- if (s0 > 0) runif(s0, magnitudes_range[1], magnitudes_range[2]) else numeric(0)
beta_true <- numeric(p)
if (s0 > 0) beta_true[true_idx] <- magnitudes * signs
if (is.null(rho)) {
e <- rnorm(n, sd = sigma)
} else {
sd_innov <- sigma * sqrt(1 - rho^2)
burn_in <- 100
z <- rnorm(n + burn_in, sd = sd_innov)
e_full <- numeric(n + burn_in)
for (t in 2:(n + burn_in)) e_full[t] <- rho * e_full[t - 1] + z[t]
e <- e_full[(burn_in + 1):(burn_in + n)]
}
y <- as.numeric(X %*% beta_true + e)
DF <- data.frame(y = y, as.data.frame(X))
colnames(DF)[-1] <- paste0("X", seq_len(p))
list(
X = X,
y = y,
beta_true = beta_true,
true_idx = true_idx,
DF = DF,
rho = if (is.null(rho)) NULL else rho,
args = list(
n = n, p = p, s0 = s0, sigma = sigma,
magnitudes_range = magnitudes_range,
rho = rho, seed = seed
)
)
}sim <- sim_subset_data(n = 100, p = 50, s0 = 25, sigma = 1.5, rho = NULL, seed = 123)
y <- sim$y
X <- sim$X
ga <- gareg_subset(
y = y, X = X, gaMethod = "GA", monitor = FALSE,
gacontrol = list(
popSize = 120,
maxiter = 20000,
run = 4000,
pmutation = 0.02
)
)
summary(ga)
#> ##########################################################
#> # Best Subset Variable Selection via GA #
#> ##########################################################
#> Settings:
#> Population size = 120
#> Number of generations = NA
#> GA convergence = FALSE
#> Crossover probability = 0.8
#> Mutation probability = 0.02
#> Changepoint probability = NA
#> Parallel Usage = FALSE
#> Suggestions:
#>
#> ##### Best #####
#> Fitness = -166.3586
#> k = 27
#> Subset Id = 3 9 10 12 13 16 18 19 20 21 25 26 29 30 31 34 35 36 38 39 40 42 43 45 46 48 50
#> Best subset = , , , , , , , , , , , , , , , , , , , , , , , , , ,
res <- FDRCalc(truelabel = sim$true_idx, predlabel = ga@bestsol, N = 50)
# FALSE Discover Rate
res$fdr
#> [1] 0.07407407
# TRUE Positive Rate
res$tpr
#> [1] 1The multiple changepoint detection can be conducted through
changepointGA package (Li and Lu, 2024). The BIC penalized
function is provided below for IID data. The related math details can be
found in (Li et al., 2026).
BIC.cpt <- function(chromosome, Xt) {
m <- chromosome[1]
tau <- chromosome[2:(2 + m - 1)]
N <- length(Xt)
if (m == 0) {
mu.hat <- mean(Xt)
sigma.hatsq <- sum((Xt - mu.hat)^2) / N
BIC.obj <- 0.5 * N * log(sigma.hatsq) + 2 * log(N)
} else {
tau.ext <- c(1, tau, N + 1)
seg.len <- diff(tau.ext)
ff <- rep(0:m, times = seg.len)
Xseg <- split(Xt, ff)
mu.seg <- unlist(lapply(Xseg, mean), use.names = F)
mu.hat <- rep(mu.seg, seg.len)
sigma.hatsq <- sum((Xt - mu.hat)^2) / N
BIC.obj <- 0.5 * N * log(sigma.hatsq) + (2 * m + 2) * log(N)
}
return(BIC.obj)
}
# IID data
set.seed(1234)
n <- 200
et <- rnorm(n)
Xt <- et + rep(c(0, 2, 0, 2), each = n / 4)
library(changepointGA)
GA.res <- cptga(
ObjFunc = BIC.cpt, N = n, popSize = 500,
pcrossover = 0.95, pmutation = 0.3, pchangepoint = 10 / n,
Xt = Xt
)
summary(GA.res)
#> ###############################################
#> # Changepoint Detection via GA #
#> ###############################################
#> Settings:
#> Population size = 500
#> Number of generations = 11961
#> Crossover probability = 0.95
#> Mutation probability = 0.3
#> Changepoint probability = 0.05
#> Task mode = cp
#> Parallel Usage = FALSE
#>
#> ##### GA results #####
#> Optimal Fitness value = 41.3175
#> Optimal Solution:
#> Number of Changepoints = 3
#> Changepoints Locations = 52 101 149This section will illustrate optimal spline knot placement on the
classic motocycle impact dataset from Package (Venables & Ripley,
2002). contains 133 observations from a simulated motorcycle crash test,
recording head acceleration (in g) of a helmeted test subject over time
(milliseconds). The series is non-linear with sharp transients and
heteroskedastic noise, which makes it a canonical benchmark for
smoothing and spline-based regression. We use function choose interior
spline knots subject to a minimum separation in indices to avoid
clustering. Here the truncated-power piecewise polynomials
with polynomial degree of 3 is the used default.
g1 <- gareg_knots(
y = mcycle$accel, x = mcycle$times,
minDist = 5,
gaMethod = "cptga",
cptgactrl = cptgaControl(popSize = 200, pcrossover = 0.9, pmutation = 0.3),
ic_method = "BIC"
)
summary(g1)
#> ##########################################################
#> # Varying Knots Detection via changepointGA #
#> ##########################################################
#> Settings:
#> Population size = 200
#> Number of generations = 5000
#> GA convergence = 0
#> Crossover probability = 0.9
#> Mutation probability = 0.3
#> Changepoint probability = 0.01
#> Parallel Usage = FALSE
#>
#> ##### Best #####
#> Fitness = 866.6072
#> m = 4
#> knots = 16 48 54 65
# knots location
g1@bestsol
#> [1] 16 48 54 65g2 <- gareg_knots(
y = mcycle$accel, x = mcycle$times,
minDist = 5,
gaMethod = "cptgaisl",
cptgactrl = cptgaControl(
numIslands = 5, popSize = 200, maxMig = 250,
pcrossover = 0.9, pmutation = 0.3
),
ic_method = "BIC"
)
summary(g2)
#> ##########################################################
#> # Varying Knots Detection via changepointGA #
#> ##########################################################
#> Settings:
#> Population size = 200
#> Number of generations = 5650
#> GA convergence = 0
#> Crossover probability = 0.9
#> Mutation probability = 0.3
#> Changepoint probability = 0.01
#> Parallel Usage = FALSE
#>
#> ##### Best #####
#> Fitness = 859.4364
#> m = 4
#> knots = 23 39 57 71g3 <- gareg_knots(
y = mcycle$accel, x = mcycle$times,
fixedknots = 3,
minDist = 5,
gaMethod = "cptga",
cptgactrl = cptgaControl(popSize = 200, pcrossover = 0.9, pmutation = 0.3),
ic_method = "BIC"
)
summary(g3)
#> ##########################################################
#> # Fixed Knots Detection via changepointGA #
#> ##########################################################
#> Settings:
#> Population size = 200
#> Number of generations = 5757
#> GA convergence = 0
#> Crossover probability = 0.9
#> Mutation probability = 0.3
#> Changepoint probability = 0.01
#> Parallel Usage = FALSE
#>
#> ##### Best #####
#> Fitness = 868.9659
#> m = 3
#> knots = 21 42 48g4 <- gareg_knots(
y = mcycle$accel, x = mcycle$times,
fixedknots = 4,
minDist = 5,
gaMethod = "cptgaisl",
cptgactrl = cptgaControl(
numIslands = 5, popSize = 200, maxMig = 250,
pcrossover = 0.9, pmutation = 0.3
),
ic_method = "BIC"
)
summary(g4)
#> ##########################################################
#> # Fixed Knots Detection via changepointGA #
#> ##########################################################
#> Settings:
#> Population size = 200
#> Number of generations = 5850
#> GA convergence = 0
#> Crossover probability = 0.9
#> Mutation probability = 0.3
#> Changepoint probability = 0.01
#> Parallel Usage = FALSE
#>
#> ##### Best #####
#> Fitness = 858.0017
#> m = 4
#> knots = 21 39 61 67y <- mcycle$accel
x <- mcycle$times
x_unique <- unique(x)
tBIC.vary.ga <- g1@bestsol
tBIC.vary.gaisl <- g2@bestsol
tBIC.fix.3.ga <- g3@bestsol
tBIC.fix.4.gaisl <- g4@bestsol
bsfit.vary.ga <- lm(y ~ bs(x, knots = x_unique[g1@bestsol], Boundary.knots = range(x)))
bsfit.vary.gaisl <- lm(y ~ bs(x, knots = x_unique[g2@bestsol], Boundary.knots = range(x)))
bsfit.fix.3.ga <- lm(y ~ bs(x, knots = x_unique[g3@bestsol], Boundary.knots = range(x)))
bsfit.fix.4.gaisl <- lm(y ~ bs(x, knots = x_unique[g4@bestsol], Boundary.knots = range(x)))
plot(x, y, xlab = "Time (ms)", ylab = "Acceleration (g)")
ht <- seq(min(x), max(x), length.out = 200)
lines(ht, predict(bsfit.vary.ga, data.frame(x = ht)), col = "blue", lty = 5, lwd = 2)
lines(ht, predict(bsfit.vary.gaisl, data.frame(x = ht)), col = "orange", lty = 4, lwd = 2)
lines(ht, predict(bsfit.fix.3.ga, data.frame(x = ht)), col = "purple", lty = 3, lwd = 2)
lines(ht, predict(bsfit.fix.4.gaisl, data.frame(x = ht)), col = "#D55E00", lty = 2, lwd = 2)
legend("bottomright",
legend = c(
"Varying knots GA",
"Varying knots island model GA",
"Fixed 3 knots GA",
"Fixed 4 knots island model GA"
),
lty = 5:2, lwd = 2,
col = c("blue", "orange", "purple", "#D55E00"), bty = "n"
)This section illustrates how to build spline design matrices via for three common options:
We’ll use the motorcycle acceleration data , create interior knots at
quantiles of , and compare how different spline types/degrees behave.
Here, we only illustrate through
Varying number and locations of knots set-up (Let GA choose
both how many knots and where they
go).
g_pp3 <- gareg_knots(
y = y, x = x,
minDist = 5,
gaMethod = "cptga",
ObjFunc = NULL, # use default varyknotsIC
type = "ppolys",
degree = 3, # degree-3 piecewise polynomial
intercept = TRUE,
ic_method = "BIC"
)
summary(g_pp3)
#> ##########################################################
#> # Varying Knots Detection via changepointGA #
#> ##########################################################
#> Settings:
#> Population size = 200
#> Number of generations = 7233
#> GA convergence = 0
#> Crossover probability = 0.95
#> Mutation probability = 0.3
#> Changepoint probability = 0.01
#> Parallel Usage = FALSE
#>
#> ##### Best #####
#> Fitness = 862.0585
#> m = 4
#> knots = 21 41 53 75g_ns <- gareg_knots(
y = y, x = x,
minDist = 5,
gaMethod = "cptga",
type = "ns", # natural cubic (degree ignored)
degree = 3, # ignored for "ns"
intercept = TRUE,
ic_method = "BIC"
)
summary(g_ns)
#> ##########################################################
#> # Varying Knots Detection via changepointGA #
#> ##########################################################
#> Settings:
#> Population size = 200
#> Number of generations = 7741
#> GA convergence = 0
#> Crossover probability = 0.95
#> Mutation probability = 0.3
#> Changepoint probability = 0.01
#> Parallel Usage = FALSE
#>
#> ##### Best #####
#> Fitness = 857.3008
#> m = 4
#> knots = 24 42 55 76g_bs1 <- gareg_knots(
y = y, x = x,
minDist = 5,
gaMethod = "cptga",
type = "bs",
degree = 1, # linear B-splines
intercept = TRUE,
ic_method = "BIC"
)
summary(g_bs1)
#> ##########################################################
#> # Varying Knots Detection via changepointGA #
#> ##########################################################
#> Settings:
#> Population size = 200
#> Number of generations = 8044
#> GA convergence = 0
#> Crossover probability = 0.95
#> Mutation probability = 0.3
#> Changepoint probability = 0.01
#> Parallel Usage = FALSE
#>
#> ##### Best #####
#> Fitness = 852.9165
#> m = 4
#> knots = 19 31 43 59sessionInfo()
#> R version 4.5.0 (2025-04-11)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sequoia 15.7.3
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: America/Chicago
#> tzcode source: internal
#>
#> attached base packages:
#> [1] splines stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] MASS_7.3-65 changepointGA_0.1.3 GAReg_0.1.0
#>
#> loaded via a namespace (and not attached):
#> [1] crayon_1.5.3 doParallel_1.0.17 cli_3.6.5 knitr_1.51
#> [5] rlang_1.1.7 xfun_0.56 jsonlite_2.0.0 htmltools_0.5.9
#> [9] sass_0.4.10 rmarkdown_2.30 evaluate_1.0.5 jquerylib_0.1.4
#> [13] fastmap_1.2.0 yaml_2.3.12 foreach_1.5.2 lifecycle_1.0.5
#> [17] compiler_4.5.0 codetools_0.2-20 Rcpp_1.1.1 rstudioapi_0.17.1
#> [21] digest_0.6.39 R6_2.6.1 parallel_4.5.0 bslib_0.10.0
#> [25] tools_4.5.0 iterators_1.0.14 GA_3.2.4 cachem_1.1.0Li, M., & Lu, Q. (2025). changepointGA: An R package for Fast Changepoint Detection via Genetic Algorithm. arXiv preprint arXiv:2410.15571.
Mo Li, QiQi Lu, Robert Lund, & Xueheng Shi. (2026). Genetic Algorithms in Regression. In preparation.
Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.