The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Getting Started with compositional.mle

Alexander Towell

2026-02-05

Introduction

The compositional.mle package provides composable optimization strategies for maximum likelihood estimation (MLE). Following SICP principles, it offers:

  1. Primitive solvers - gradient_ascent(), newton_raphson(), bfgs(), nelder_mead(), etc.
  2. Composition operators - %>>% (sequential), %|% (race), with_restarts()
  3. Closure property - Combining solvers yields a solver

Installation

devtools::install_github("queelius/compositional.mle")
library(compositional.mle)

Quick Start: Normal Distribution MLE

# Generate sample data
set.seed(123)
data <- rnorm(100, mean = 5, sd = 2)

# Define the problem (separate from solver strategy)
problem <- mle_problem(
  loglike = function(theta) {
    if (theta[2] <= 0) return(-Inf)
    sum(dnorm(data, theta[1], theta[2], log = TRUE))
  },
  score = function(theta) {
    mu <- theta[1]; sigma <- theta[2]; n <- length(data)
    c(sum(data - mu) / sigma^2,
      -n / sigma + sum((data - mu)^2) / sigma^3)
  },
  constraint = mle_constraint(
    support = function(theta) theta[2] > 0,
    project = function(theta) c(theta[1], max(theta[2], 1e-6))
  ),
  theta_names = c("mu", "sigma")
)

# Solve with gradient ascent
result <- gradient_ascent()(problem, theta0 = c(0, 1))

cat("Estimated mean:", result$theta.hat[1], "(true: 5)\n")
#> Estimated mean: 5.180812 (true: 5)
cat("Estimated sd:", result$theta.hat[2], "(true: 2)\n")
#> Estimated sd: 1.816481 (true: 2)

The Problem-Solver Separation

The key design principle is separating what you’re estimating from how you estimate it:

# The problem encapsulates the statistical model
print(problem)
#> MLE Problem
#>   Parameters: mu, sigma 
#>   Score: analytic 
#>   Fisher: numerical 
#>   Constraints: yes

# Solvers are independent strategies
solver1 <- gradient_ascent(max_iter = 100)
solver2 <- newton_raphson(max_iter = 50)
solver3 <- bfgs()

# Same problem, different solvers
result1 <- solver1(problem, c(0, 1))
result2 <- solver2(problem, c(0, 1))
result3 <- solver3(problem, c(0, 1))

cat("Gradient ascent:", result1$theta.hat, "\n")
#> Gradient ascent: 5.180812 1.816481
cat("Newton-Raphson:", result2$theta.hat, "\n")
#> Newton-Raphson: 0 1
cat("BFGS:", result3$theta.hat, "\n")
#> BFGS: 100.7711 567.4039

Composing Solvers

Sequential Chaining (%>>%)

Chain solvers for coarse-to-fine optimization:

# Grid search finds a good region, then gradient ascent refines
strategy <- grid_search(lower = c(-10, 0.5), upper = c(10, 5), n = 5) %>>%
  gradient_ascent(max_iter = 50)

result <- strategy(problem, theta0 = c(0, 1))
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481

Three-Stage Refinement

# Coarse grid -> gradient ascent -> Newton-Raphson polish
strategy <- grid_search(lower = c(-10, 0.5), upper = c(10, 5), n = 5) %>>%
  gradient_ascent(max_iter = 30) %>>%
  newton_raphson(max_iter = 10)

result <- strategy(problem, theta0 = c(0, 1))
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481
cat("Converged:", result$converged, "\n")
#> Converged: FALSE

Parallel Racing (%|%)

Race multiple methods and keep the best result:

# Try gradient-based and derivative-free methods
strategy <- gradient_ascent() %|% bfgs() %|% nelder_mead()

result <- strategy(problem, theta0 = c(0, 1))
cat("Winner:", result$solver, "\n")
#> Winner: gradient_ascent
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481

Random Restarts

Escape local optima with multiple starting points:

strategy <- with_restarts(
  gradient_ascent(),
  n = 10,
  sampler = uniform_sampler(c(-10, 0.5), c(10, 5))
)

result <- strategy(problem, theta0 = c(0, 1))
cat("Best restart:", result$best_restart, "of", result$n_restarts, "\n")
#> Best restart: 1 of 10
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481

Conditional Refinement

Only refine if the first solver didn’t converge:

strategy <- unless_converged(
  gradient_ascent(max_iter = 10),  # Quick attempt

  newton_raphson(max_iter = 50)     # Refine if needed
)

result <- strategy(problem, theta0 = c(0, 1))
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481

Available Solvers

Factory Method Requires
gradient_ascent() Steepest ascent with line search score (or numerical)
newton_raphson() Second-order Newton score, fisher (or numerical)
bfgs() Quasi-Newton BFGS score (or numerical)
lbfgsb() L-BFGS-B with box constraints score (or numerical)
nelder_mead() Simplex (derivative-free) -
grid_search() Exhaustive grid -
random_search() Random sampling -

Constraints

Define domain constraints with support checking and projection:

# Positive parameters
pos_constraint <- mle_constraint(
  support = function(theta) all(theta > 0),
  project = function(theta) pmax(theta, 1e-8)
)

# Box constraints [0, 10]
box_constraint <- mle_constraint(
  support = function(theta) all(theta >= 0 & theta <= 10),
  project = function(theta) pmax(0, pmin(10, theta))
)

# Use in problem definition
problem_constrained <- mle_problem(
  loglike = function(theta) -sum((theta - 5)^2),
  constraint = pos_constraint
)

Function Transformers

Stochastic Gradient (Mini-batching)

For large datasets, subsample observations:

# Original log-likelihood uses all data
loglike_full <- function(theta, obs = large_data) {
  sum(dnorm(obs, theta[1], theta[2], log = TRUE))
}

# Stochastic version uses random subsets
loglike_sgd <- with_subsampling(loglike_full, data = large_data, subsample_size = 100)

Regularization

Add penalty terms for regularization:

loglike <- function(theta) -sum(theta^2)

# L1 (Lasso), L2 (Ridge), Elastic Net
loglike_l1 <- with_penalty(loglike, penalty_l1(), lambda = 0.1)
loglike_l2 <- with_penalty(loglike, penalty_l2(), lambda = 0.1)
loglike_enet <- with_penalty(loglike, penalty_elastic_net(alpha = 0.5), lambda = 0.1)

theta <- c(1, 2, 3)
cat("Original:", loglike(theta), "\n")
#> Original: -14
cat("With L1:", loglike_l1(theta), "\n")
#> With L1: -14.6
cat("With L2:", loglike_l2(theta), "\n")
#> With L2: -15.4

Tracing Optimization

Track the optimization path for diagnostics:

trace_config <- mle_trace(values = TRUE, path = TRUE, gradients = TRUE)

result <- gradient_ascent(max_iter = 20)(
  problem,
  theta0 = c(0, 1),
  trace = trace_config
)

if (!is.null(result$trace_data)) {
  cat("Iterations:", result$trace_data$total_iterations, "\n")
  cat("Final log-likelihood:", tail(result$trace_data$values, 1), "\n")
}
#> Iterations: 20 
#> Final log-likelihood: -201.5839

API Summary

Problem Specification: - mle_problem() - Define the estimation problem - mle_constraint() - Domain constraints

Solver Factories: - gradient_ascent(), newton_raphson(), bfgs(), lbfgsb(), nelder_mead() - grid_search(), random_search()

Composition: - %>>% - Sequential chaining - %|% - Parallel racing - with_restarts() - Multiple starting points - unless_converged() - Conditional refinement - compose() - Compose multiple solvers

Samplers: - uniform_sampler(), normal_sampler()

Transformers: - with_subsampling(), with_penalty() - penalty_l1(), penalty_l2(), penalty_elastic_net()

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.