Getting Started with compositional.mle

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Alexander Towell

2026-02-05

Introduction

The compositional.mle package provides composable optimization strategies for maximum likelihood estimation (MLE). Following SICP principles, it offers:

Primitive solvers - gradient_ascent(), newton_raphson(), bfgs(), nelder_mead(), etc.
Composition operators - %>>% (sequential), %|% (race), with_restarts()
Closure property - Combining solvers yields a solver

Installation

devtools::install_github("queelius/compositional.mle")

library(compositional.mle)

Quick Start: Normal Distribution MLE

# Generate sample data
set.seed(123)
data <- rnorm(100, mean = 5, sd = 2)

# Define the problem (separate from solver strategy)
problem <- mle_problem(
  loglike = function(theta) {
    if (theta[2] <= 0) return(-Inf)
    sum(dnorm(data, theta[1], theta[2], log = TRUE))
  },
  score = function(theta) {
    mu <- theta[1]; sigma <- theta[2]; n <- length(data)
    c(sum(data - mu) / sigma^2,
      -n / sigma + sum((data - mu)^2) / sigma^3)
  },
  constraint = mle_constraint(
    support = function(theta) theta[2] > 0,
    project = function(theta) c(theta[1], max(theta[2], 1e-6))
  ),
  theta_names = c("mu", "sigma")
)

# Solve with gradient ascent
result <- gradient_ascent()(problem, theta0 = c(0, 1))

cat("Estimated mean:", result$theta.hat[1], "(true: 5)\n")
#> Estimated mean: 5.180812 (true: 5)
cat("Estimated sd:", result$theta.hat[2], "(true: 2)\n")
#> Estimated sd: 1.816481 (true: 2)

The Problem-Solver Separation

The key design principle is separating what you’re estimating from how you estimate it:

# The problem encapsulates the statistical model
print(problem)
#> MLE Problem
#>   Parameters: mu, sigma 
#>   Score: analytic 
#>   Fisher: numerical 
#>   Constraints: yes

# Solvers are independent strategies
solver1 <- gradient_ascent(max_iter = 100)
solver2 <- newton_raphson(max_iter = 50)
solver3 <- bfgs()

# Same problem, different solvers
result1 <- solver1(problem, c(0, 1))
result2 <- solver2(problem, c(0, 1))
result3 <- solver3(problem, c(0, 1))

cat("Gradient ascent:", result1$theta.hat, "\n")
#> Gradient ascent: 5.180812 1.816481
cat("Newton-Raphson:", result2$theta.hat, "\n")
#> Newton-Raphson: 0 1
cat("BFGS:", result3$theta.hat, "\n")
#> BFGS: 100.7711 567.4039

Composing Solvers

Sequential Chaining (`%>>%`)

Chain solvers for coarse-to-fine optimization:

# Grid search finds a good region, then gradient ascent refines
strategy <- grid_search(lower = c(-10, 0.5), upper = c(10, 5), n = 5) %>>%
  gradient_ascent(max_iter = 50)

result <- strategy(problem, theta0 = c(0, 1))
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481

Three-Stage Refinement

# Coarse grid -> gradient ascent -> Newton-Raphson polish
strategy <- grid_search(lower = c(-10, 0.5), upper = c(10, 5), n = 5) %>>%
  gradient_ascent(max_iter = 30) %>>%
  newton_raphson(max_iter = 10)

result <- strategy(problem, theta0 = c(0, 1))
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481
cat("Converged:", result$converged, "\n")
#> Converged: FALSE

Parallel Racing (`%|%`)

Race multiple methods and keep the best result:

# Try gradient-based and derivative-free methods
strategy <- gradient_ascent() %|% bfgs() %|% nelder_mead()

result <- strategy(problem, theta0 = c(0, 1))
cat("Winner:", result$solver, "\n")
#> Winner: gradient_ascent
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481

Random Restarts

Escape local optima with multiple starting points:

strategy <- with_restarts(
  gradient_ascent(),
  n = 10,
  sampler = uniform_sampler(c(-10, 0.5), c(10, 5))
)

result <- strategy(problem, theta0 = c(0, 1))
cat("Best restart:", result$best_restart, "of", result$n_restarts, "\n")
#> Best restart: 1 of 10
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481

Conditional Refinement

Only refine if the first solver didn’t converge:

strategy <- unless_converged(
  gradient_ascent(max_iter = 10),  # Quick attempt

  newton_raphson(max_iter = 50)     # Refine if needed
)

result <- strategy(problem, theta0 = c(0, 1))
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481

Available Solvers

Factory	Method	Requires
`gradient_ascent()`	Steepest ascent with line search	score (or numerical)
`newton_raphson()`	Second-order Newton	score, fisher (or numerical)
`bfgs()`	Quasi-Newton BFGS	score (or numerical)
`lbfgsb()`	L-BFGS-B with box constraints	score (or numerical)
`nelder_mead()`	Simplex (derivative-free)	-
`grid_search()`	Exhaustive grid	-
`random_search()`	Random sampling	-

Constraints

Define domain constraints with support checking and projection:

# Positive parameters
pos_constraint <- mle_constraint(
  support = function(theta) all(theta > 0),
  project = function(theta) pmax(theta, 1e-8)
)

# Box constraints [0, 10]
box_constraint <- mle_constraint(
  support = function(theta) all(theta >= 0 & theta <= 10),
  project = function(theta) pmax(0, pmin(10, theta))
)

# Use in problem definition
problem_constrained <- mle_problem(
  loglike = function(theta) -sum((theta - 5)^2),
  constraint = pos_constraint
)

Function Transformers

Stochastic Gradient (Mini-batching)

For large datasets, subsample observations:

# Original log-likelihood uses all data
loglike_full <- function(theta, obs = large_data) {
  sum(dnorm(obs, theta[1], theta[2], log = TRUE))
}

# Stochastic version uses random subsets
loglike_sgd <- with_subsampling(loglike_full, data = large_data, subsample_size = 100)

Regularization

Add penalty terms for regularization:

loglike <- function(theta) -sum(theta^2)

# L1 (Lasso), L2 (Ridge), Elastic Net
loglike_l1 <- with_penalty(loglike, penalty_l1(), lambda = 0.1)
loglike_l2 <- with_penalty(loglike, penalty_l2(), lambda = 0.1)
loglike_enet <- with_penalty(loglike, penalty_elastic_net(alpha = 0.5), lambda = 0.1)

theta <- c(1, 2, 3)
cat("Original:", loglike(theta), "\n")
#> Original: -14
cat("With L1:", loglike_l1(theta), "\n")
#> With L1: -14.6
cat("With L2:", loglike_l2(theta), "\n")
#> With L2: -15.4

Tracing Optimization

Track the optimization path for diagnostics:

trace_config <- mle_trace(values = TRUE, path = TRUE, gradients = TRUE)

result <- gradient_ascent(max_iter = 20)(
  problem,
  theta0 = c(0, 1),
  trace = trace_config
)

if (!is.null(result$trace_data)) {
  cat("Iterations:", result$trace_data$total_iterations, "\n")
  cat("Final log-likelihood:", tail(result$trace_data$values, 1), "\n")
}
#> Iterations: 20 
#> Final log-likelihood: -201.5839

API Summary

Problem Specification: - mle_problem() - Define the estimation problem - mle_constraint() - Domain constraints

Solver Factories: - gradient_ascent(), newton_raphson(), bfgs(), lbfgsb(), nelder_mead() - grid_search(), random_search()

Composition: - %>>% - Sequential chaining - %|% - Parallel racing - with_restarts() - Multiple starting points - unless_converged() - Conditional refinement - compose() - Compose multiple solvers

Samplers: - uniform_sampler(), normal_sampler()

Transformers: - with_subsampling(), with_penalty() - penalty_l1(), penalty_l2(), penalty_elastic_net()

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.

Getting Started with compositional.mle

Alexander Towell

2026-02-05

Introduction

Installation

Quick Start: Normal Distribution MLE

The Problem-Solver Separation

Composing Solvers

Sequential Chaining (%>>%)

Three-Stage Refinement

Parallel Racing (%|%)

Random Restarts

Conditional Refinement

Available Solvers

Constraints

Function Transformers

Stochastic Gradient (Mini-batching)

Regularization

Tracing Optimization

API Summary

Sequential Chaining (`%>>%`)

Parallel Racing (`%|%`)