The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The compositional.mle package provides
composable optimization strategies for maximum
likelihood estimation (MLE). Following SICP principles, it offers:
gradient_ascent(),
newton_raphson(), bfgs(),
nelder_mead(), etc.%>>%
(sequential), %|% (race), with_restarts()# Generate sample data
set.seed(123)
data <- rnorm(100, mean = 5, sd = 2)
# Define the problem (separate from solver strategy)
problem <- mle_problem(
loglike = function(theta) {
if (theta[2] <= 0) return(-Inf)
sum(dnorm(data, theta[1], theta[2], log = TRUE))
},
score = function(theta) {
mu <- theta[1]; sigma <- theta[2]; n <- length(data)
c(sum(data - mu) / sigma^2,
-n / sigma + sum((data - mu)^2) / sigma^3)
},
constraint = mle_constraint(
support = function(theta) theta[2] > 0,
project = function(theta) c(theta[1], max(theta[2], 1e-6))
),
theta_names = c("mu", "sigma")
)
# Solve with gradient ascent
result <- gradient_ascent()(problem, theta0 = c(0, 1))
cat("Estimated mean:", result$theta.hat[1], "(true: 5)\n")
#> Estimated mean: 5.180812 (true: 5)
cat("Estimated sd:", result$theta.hat[2], "(true: 2)\n")
#> Estimated sd: 1.816481 (true: 2)The key design principle is separating what you’re estimating from how you estimate it:
# The problem encapsulates the statistical model
print(problem)
#> MLE Problem
#> Parameters: mu, sigma
#> Score: analytic
#> Fisher: numerical
#> Constraints: yes
# Solvers are independent strategies
solver1 <- gradient_ascent(max_iter = 100)
solver2 <- newton_raphson(max_iter = 50)
solver3 <- bfgs()
# Same problem, different solvers
result1 <- solver1(problem, c(0, 1))
result2 <- solver2(problem, c(0, 1))
result3 <- solver3(problem, c(0, 1))
cat("Gradient ascent:", result1$theta.hat, "\n")
#> Gradient ascent: 5.180812 1.816481
cat("Newton-Raphson:", result2$theta.hat, "\n")
#> Newton-Raphson: 0 1
cat("BFGS:", result3$theta.hat, "\n")
#> BFGS: 100.7711 567.4039%>>%)Chain solvers for coarse-to-fine optimization:
# Coarse grid -> gradient ascent -> Newton-Raphson polish
strategy <- grid_search(lower = c(-10, 0.5), upper = c(10, 5), n = 5) %>>%
gradient_ascent(max_iter = 30) %>>%
newton_raphson(max_iter = 10)
result <- strategy(problem, theta0 = c(0, 1))
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481
cat("Converged:", result$converged, "\n")
#> Converged: FALSE%|%)Race multiple methods and keep the best result:
Escape local optima with multiple starting points:
strategy <- with_restarts(
gradient_ascent(),
n = 10,
sampler = uniform_sampler(c(-10, 0.5), c(10, 5))
)
result <- strategy(problem, theta0 = c(0, 1))
cat("Best restart:", result$best_restart, "of", result$n_restarts, "\n")
#> Best restart: 1 of 10
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481Only refine if the first solver didn’t converge:
| Factory | Method | Requires |
|---|---|---|
gradient_ascent() |
Steepest ascent with line search | score (or numerical) |
newton_raphson() |
Second-order Newton | score, fisher (or numerical) |
bfgs() |
Quasi-Newton BFGS | score (or numerical) |
lbfgsb() |
L-BFGS-B with box constraints | score (or numerical) |
nelder_mead() |
Simplex (derivative-free) | - |
grid_search() |
Exhaustive grid | - |
random_search() |
Random sampling | - |
Define domain constraints with support checking and projection:
# Positive parameters
pos_constraint <- mle_constraint(
support = function(theta) all(theta > 0),
project = function(theta) pmax(theta, 1e-8)
)
# Box constraints [0, 10]
box_constraint <- mle_constraint(
support = function(theta) all(theta >= 0 & theta <= 10),
project = function(theta) pmax(0, pmin(10, theta))
)
# Use in problem definition
problem_constrained <- mle_problem(
loglike = function(theta) -sum((theta - 5)^2),
constraint = pos_constraint
)For large datasets, subsample observations:
Add penalty terms for regularization:
loglike <- function(theta) -sum(theta^2)
# L1 (Lasso), L2 (Ridge), Elastic Net
loglike_l1 <- with_penalty(loglike, penalty_l1(), lambda = 0.1)
loglike_l2 <- with_penalty(loglike, penalty_l2(), lambda = 0.1)
loglike_enet <- with_penalty(loglike, penalty_elastic_net(alpha = 0.5), lambda = 0.1)
theta <- c(1, 2, 3)
cat("Original:", loglike(theta), "\n")
#> Original: -14
cat("With L1:", loglike_l1(theta), "\n")
#> With L1: -14.6
cat("With L2:", loglike_l2(theta), "\n")
#> With L2: -15.4Track the optimization path for diagnostics:
trace_config <- mle_trace(values = TRUE, path = TRUE, gradients = TRUE)
result <- gradient_ascent(max_iter = 20)(
problem,
theta0 = c(0, 1),
trace = trace_config
)
if (!is.null(result$trace_data)) {
cat("Iterations:", result$trace_data$total_iterations, "\n")
cat("Final log-likelihood:", tail(result$trace_data$values, 1), "\n")
}
#> Iterations: 20
#> Final log-likelihood: -201.5839Problem Specification: - mle_problem()
- Define the estimation problem - mle_constraint() - Domain
constraints
Solver Factories: - gradient_ascent(),
newton_raphson(), bfgs(),
lbfgsb(), nelder_mead() -
grid_search(), random_search()
Composition: - %>>% - Sequential
chaining - %|% - Parallel racing -
with_restarts() - Multiple starting points -
unless_converged() - Conditional refinement -
compose() - Compose multiple solvers
Samplers: - uniform_sampler(),
normal_sampler()
Transformers: - with_subsampling(),
with_penalty() - penalty_l1(),
penalty_l2(), penalty_elastic_net()
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.