Repository Mirror for your Cloud Server and Webhosting

Type:

Package

Title:

Adaptive Regularization using Cubics for Optimization

Version:

0.3.0

Description:

Implements cubic regularization methods (ARC) for local optimization problems common in statistics and applied research. Provides robust handling of ill-conditioned, nonconvex, and indefinite Hessian problems with automatic saddle point escape. Supports box constraints; linear equality constraints are planned for a future release.

License:

MIT + file LICENSE

Encoding:

UTF-8

URL:

https://github.com/marcus-waldman/arcopt

BugReports:

https://github.com/marcus-waldman/arcopt/issues

Depends:

R (≥ 4.1.0)

Imports:

Rcpp (≥ 1.0.0), utils

LinkingTo:

Rcpp

Suggests:

testthat (≥ 3.0.0), knitr, rmarkdown, covr, marqLevAlg, trust

VignetteBuilder:

knitr

RoxygenNote:

7.3.3

NeedsCompilation:

yes

Packaged:

2026-04-28 16:36:29 UTC; marcu

Author:

Marcus Waldman [aut, cre]

Maintainer:

Marcus Waldman <marcus.waldman@cuanschutz.edu>

Repository:

CRAN

Date/Publication:

2026-04-29 08:10:10 UTC

arcopt: Adaptive Regularization using Cubics for Optimization

Description

Author(s)

Maintainer: Marcus Waldman marcus.waldman@cuanschutz.edu

Apply Box Constraints to Step

Description

Truncates a proposed step to ensure the new iterate stays within box constraints, then projects for numerical safety.

Usage

apply_box_constraints(x_current, s, lower, upper)

Arguments

x_current

Current iterate

s

Proposed unconstrained step

lower

Lower bounds (use -Inf for unbounded)

upper

Upper bounds (use Inf for unbounded)

Details

This implements Algorithm 7a from the design specification. The approach:

Compute maximum feasible step length alpha_max
Truncate step: s_bounded = alpha_max * s
Project x_k + s_bounded to box for numerical safety

Value

List with components:

s_bounded

Truncated step that respects bounds

x_new

New iterate projected to feasible region

Adaptive Regularization using Cubics Optimizer

Description

Minimizes a nonlinear objective function using Adaptive Regularization with Cubics (ARC). Designed for robust optimization of ill-conditioned, nonconvex, and indefinite Hessian problems common in statistical applications.

Usage

arcopt(
  x0,
  fn,
  gr,
  hess = NULL,
  ...,
  lower = rep(-Inf, length(x0)),
  upper = rep(Inf, length(x0)),
  control = list()
)

Arguments

x0

Numeric vector of initial parameter values (length Q).

fn

Function that computes the objective function value. Should take a numeric vector of length Q and return a scalar.

gr

Function that computes the gradient. Should take a numeric vector of length Q and return a numeric vector of length Q. Required.

hess

Function that computes the Hessian matrix. Should take a numeric vector of length Q and return a Q-by-Q symmetric matrix. Required (unless control$use_qn = TRUE; see Details).

...

Additional arguments passed to fn, gr, and hess.

lower

Numeric vector of lower bounds (length Q). Use -Inf for unbounded parameters. Default: all -Inf.

upper

Numeric vector of upper bounds (length Q). Use Inf for unbounded parameters. Default: all Inf.

control

A named list of control parameters. The user-facing tolerances and switches are documented below; advanced regularization tuning, the trust-region fallback, and the quasi-Newton polish mode live on a separate help page (see ⁠\link{arcopt_advanced_controls}⁠). Recognized entries:

maxit: Maximum number of iterations (default 1000).
gtol_abs: Absolute gradient-norm tolerance for convergence (default 1e-5).
ftol_abs: Absolute objective-value tolerance (default 1e-8).
xtol_abs: Absolute step-size tolerance (default 1e-8).
trace: Integer in 0:3. Depth of per-iteration data captured into result$trace: 0 collects nothing, 1 (the default) collects function value and gradient norm, 2 adds sigma, rho, step type and reciprocal Hessian condition number, 3 adds the full iterate, step, Hessian and convergence-criterion record. This flag controls saved data only – for live console output see verbose.
verbose: Logical. If TRUE, prints one line per iteration to the console showing iteration number, objective value, ⁠||g||_inf⁠, ratio rho, regularization scale, and active solver mode (default FALSE). Orthogonal to trace.
use_qn: Logical. If TRUE, route to the quasi-Newton ARC variant, which approximates the Hessian via SR1/BFGS updates and does not require an analytic hess function. See the advanced-controls page for QN-specific parameters (default FALSE).

See ⁠\link{arcopt_advanced_controls}⁠ for the full set of advanced tuning parameters governing the cubic regularization, the trust-region fallback, and the quasi-Newton polish mode.

Details

The ARC algorithm iteratively minimizes a cubic regularization model:

m_k(s) = f_k + g_k^\top s + \frac{1}{2} s^\top H_k s + \frac{\sigma_k}{3} \|s\|^3

where \sigma_k is adapted from observed model accuracy. arcopt may transparently fall back to a trust-region subproblem in flat-ridge regimes and (optionally, opt-in) to a line-search BFGS polish in the quadratic attraction basin. The transitions are observable via result$diagnostics; the algorithmic details and tunable thresholds are documented under ⁠\link{arcopt_advanced_controls}⁠.

Hessian Requirement

arcopt is Hessian-centric: an analytic hess function is strongly recommended. If the analytic form is unavailable, set control$use_qn = TRUE to obtain Hessian-free quasi-Newton updates (see the advanced-controls page).

Value

A list with components:

par: Optimal parameter vector.
value: Objective value at par.
gradient: Gradient at par.
hessian: Hessian at par (or the final BFGS approximation if the run ended in qn_polish mode).
sigma: Final cubic regularization parameter.
converged: Logical; whether convergence criteria were met.
iterations: Number of iterations performed.
evaluations: Named list of fn, gr, and hess evaluation counts.
message: Convergence reason.
trace: Per-iteration trace data (depth controlled by control$trace); NULL when trace = 0.
diagnostics: Sublist of internal mode-dispatch diagnostics – solver_mode_final, ridge_switches, radius_final, qn_polish_switches, qn_polish_reverts, and hess_evals_at_polish_switch. See ⁠\link{arcopt_advanced_controls}⁠ for the meaning of each field. Most users do not need to inspect this; it is preserved for diagnostic and benchmarking use.

Examples


# Rosenbrock function
rosenbrock <- function(x) (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2

rosenbrock_gr <- function(x) {
  c(-2 * (1 - x[1]) - 400 * x[1] * (x[2] - x[1]^2),
    200 * (x[2] - x[1]^2))
}

rosenbrock_hess <- function(x) {
  matrix(c(
    1200 * x[1]^2 - 400 * x[2] + 2, -400 * x[1],
    -400 * x[1], 200
  ), 2, 2)
}

result <- arcopt(
  x0 = c(-1.2, 1),
  fn = rosenbrock,
  gr = rosenbrock_gr,
  hess = rosenbrock_hess
)

print(result$par)      # Should be near c(1, 1)
print(result$value)    # Should be near 0

Advanced Control Parameters for arcopt

Description

The main arcopt help page documents only the user-facing tolerances and switches (maxit, gtol_abs, ftol_abs, xtol_abs, trace, verbose, use_qn). This page documents every other entry that the control list of arcopt() (and the routed-to arcopt:::arcopt_qn() variant) recognizes, organized by the subsystem each parameter governs.

Cubic regularization

These parameters control the adaptive cubic model m_k(s) = f_k + g_k^\top s + \tfrac{1}{2} s^\top H_k s + \tfrac{\sigma_k}{3} \|s\|^3 and the sigma adaptation rule (Algorithm 2a/2b of design/pseudocode.qmd).

sigma0: Initial regularization parameter (default 1.0).
sigma_min: Floor on sigma_k (default 1e-6). Prevents the cubic term from vanishing entirely on flat regions.
sigma_max: Ceiling on sigma_k (default 1e12). Triggers emergency-stop behavior when reached.
eta1: Step-acceptance threshold; steps with \rho_k \ge \texttt{eta1} are accepted (default 0.1).
eta2: Very-successful threshold; \rho_k \ge \texttt{eta2} triggers sigma shrinkage (default 0.9).
gamma1: Multiplicative shrink factor on a very-successful step (default 0.5).
gamma2: Multiplicative grow factor on an unsuccessful step (default 2.0).

Trust-region fallback (cubic to TR on flat-ridge detection)

Cubic regularization can stagnate in "flat-ridge" regimes – iterations with the regularization floor pinned, model predictions matching the objective (\rho \approx 1), gradient stalling, and a Hessian that is positive-definite but nearly singular. This is outside the local-error-bound condition of Yue, Zhou & So (2018) under which cubic regularization is guaranteed to converge quadratically at degenerate minimizers. arcopt detects the regime and switches once from the cubic subproblem to a trust-region subproblem.

tr_fallback_enabled: One-way cubic to TR switch (default TRUE).
tr_fallback_window: Sliding-window length for the detector (default 10).
tr_fallback_tol_ridge: lambda_min(H) threshold defining a "near-singular PD" Hessian (default 1e-3).
tr_fallback_rho_tol: Tolerance on ⁠|rho - 1|⁠ for the "near-perfect model" signal (default 0.1).
tr_fallback_grad_decrease_max: Ratio of latest to oldest ⁠||g||_inf⁠ above which the gradient counts as stagnant (default 0.9).
tr_fallback_g_inf_floor: Absolute lower bound on ⁠||g||_inf⁠ below which the switch will not fire – keeps the hybrid from triggering at true local minima (default 1e-6).
tr_r0: Initial trust-region radius at the switch (default 1.0).
tr_rmax: Maximum trust-region radius (default 100).
tr_eta1: TR step-acceptance threshold (default 0.25).
tr_eta2: TR expansion threshold (default 0.75).
tr_gamma_shrink: Radius shrink factor on a poor step (default 0.25).
tr_gamma_grow: Radius grow factor on a very-good boundary step (default 2.0).

Quasi-Newton polish (cubic to BFGS line-search)

Once the iterate enters the quadratic attraction basin of a strict local minimum, the cubic regularization penalty has decayed to its floor and contributes negligible damping, but arcopt still evaluates hess() every iteration. For expensive Hessians (analytic AD via Stan, finite differences) this dominates wall-clock time. Polish mode replaces the cubic subproblem with a Wolfe line search along the BFGS-approximated Newton direction, skipping further hess() calls until convergence or until the BFGS approximation drifts.

Off by default in v0.2.0 because the existing manuscript and benchmark problems converge before the five-signal healthy-basin detector accumulates a full window. Enable opt-in for long-running smooth problems with expensive Hessians.

qn_polish_enabled: Enable the cubic to qn_polish bidirectional switch (default FALSE).
qn_polish_window: Sliding-window length for the healthy- basin detector (default 5).
qn_polish_rho: Minimum rho_k required throughout the window (default 0.9).
qn_polish_lambda_min: Minimum lambda_min(H_k) required throughout the window (default 1e-3).
qn_polish_g_decay: Maximum ratio of consecutive ⁠||g||_inf⁠ values; e.g. 0.5 requires 2x-per-iteration contraction (default 0.5).
qn_polish_g_inf_floor: Absolute lower bound on ⁠||g||_inf⁠ at window start; prevents firing at convergence (default 1e-8).
qn_polish_c1, qn_polish_c2: Wolfe line-search constants (defaults 1e-4 and 0.9).
qn_polish_alpha_max: Initial step length tried by the line search (default 1.0).
qn_polish_max_ls_iter: Maximum line-search evaluations per iteration (default 20).
qn_polish_max_fail: Consecutive line-search failures that trigger a revert to cubic mode (default 3).
qn_polish_reenter_delay: Cubic iterations required after a revert before qn_polish may re-fire (default 5).
qn_polish_curv_eps: Curvature threshold for skipping BFGS updates to preserve PD (default 1e-10).

Quasi-Newton variant (`use_qn = TRUE`)

When control$use_qn = TRUE, arcopt() routes to an internal quasi-Newton variant that approximates H_k via SR1/BFGS updates and does not require a hess function. The following parameters apply only when use_qn = TRUE.

qn_method: One of "hybrid" (default), "sr1", "bfgs". "hybrid" uses state-aware routing between SR1-first and BFGS-first orderings based on the current B's eigenstructure and recent rho values.
bfgs_tol: Curvature tolerance for the BFGS update (default 1e-10).
sr1_skip_tol: SR1 skip-test tolerance (default 1e-8).
sr1_restart_threshold: Consecutive SR1 skips before restart (default 5).
qn_route_demote_rho: rho below this counts as a "bad" step in the routing FSM (default 0.25).
qn_route_promote_rho: rho above this counts as a "good" step (default 0.5).
qn_route_demote_k: Consecutive bad steps in "pd" routing mode that demote back to "indefinite" (default 2).
qn_route_promote_k: Consecutive good PD steps in "indefinite" mode that promote to "pd" (default 3).
qn_fd_refresh_k: Consecutive bad-rho iterations in "indefinite" mode that trigger an FD-Hessian refresh of B (default 3).
qn_stuck_refresh_k: Iterations stuck in "indefinite" mode without promotion that force a refresh (default 100).
use_accel_qn: EXPERIMENTAL. Enable Nesterov acceleration in the QN path. May improve convergence on strongly convex problems but can hurt nonconvex (default FALSE).

Diagnostics in `result$diagnostics`

Mode-dispatch diagnostics are nested under result$diagnostics so the primary return list stays compact.

solver_mode_final: "cubic", "tr", or "qn_polish" – which subproblem solver was active at termination.
ridge_switches: Integer count of cubic to TR transitions (0 or 1 in v1; the switch is one-way).
radius_final: Final trust-region radius (NA if the solver never switched to TR mode).
qn_polish_switches: Integer count of cubic to qn_polish transitions (bidirectional; may be ⁠> 1⁠).
qn_polish_reverts: Integer count of qn_polish to cubic reversions.
hess_evals_at_polish_switch: evaluations$hess at the first polish switch; compare against final evaluations$hess to quantify Hessian-evaluation savings.

QN-variant runs add qn_updates, qn_skips, qn_restarts, and qn_fd_refreshes to the same sublist.

Quasi-Newton ARC Optimizer

Description

Minimizes a nonlinear objective function using Adaptive Regularization with Cubics and quasi-Newton Hessian approximations. Does not require an exact Hessian function.

Usage

arcopt_qn(
  x0,
  fn,
  gr,
  hess = NULL,
  ...,
  lower = rep(-Inf, length(x0)),
  upper = rep(Inf, length(x0)),
  control = list()
)

Arguments

x0

Numeric vector of initial parameter values (length n).

fn

Function that computes the objective function value.

gr

Function that computes the gradient.

hess

Optional Hessian function for hybrid mode. If provided, initializes B_0 = H(x_0) and can refresh when approximation degrades.

...

Additional arguments passed to fn, gr, and hess.

lower

Numeric vector of lower bounds (length n). Default: all -Inf.

upper

Numeric vector of upper bounds (length n). Default: all Inf.

control

List of control parameters (see Details).

Details

The QN-ARC algorithm maintains a quasi-Newton Hessian approximation B_k and solves the cubic regularization subproblem:

m_k(s) = f_k + g_k^T s + 1/2 s^T B_k s + sigma_k/3 ||s||^3

Initialization of B_0

When hess is not supplied, arcopt_qn() seeds the initial Hessian approximation with a one-time finite-difference Hessian computed from the supplied gradient at x0 (cost: 2 * length(x0) gradient evaluations once at startup). This provides negative-curvature information that the pure-identity initialization used by classical quasi-Newton methods would miss on saddle-prone problems. To opt out and recover the identity-initialization convention, pass hess = function(x) diag(length(x)) explicitly.

Control Parameters

In addition to standard arcopt controls:

qn_method: Update method (default: "hybrid"):
- "hybrid": State-aware routing between SR1-first and BFGS-first orderings based on the current B_k's eigenstructure and recent rho_k values (see below). Includes automatic FD refresh of B_k when SR1 is stuck. Recommended for saddle-prone problems.
- "sr1": Symmetric Rank-1 (allows indefinite Hessians)
- "bfgs": Standard BFGS (maintains positive definiteness)
bfgs_tol: Curvature tolerance for BFGS in hybrid mode (default: 1e-10)
sr1_skip_tol: SR1 skip test tolerance (default: 1e-8)
sr1_restart_threshold: Consecutive skips before restart (default: 5)
use_accel_qn: EXPERIMENTAL Enable Nesterov acceleration (default: FALSE). May improve convergence on strongly convex problems but can hurt performance on nonconvex problems. Use with caution.

Hybrid Routing Parameters

The "hybrid" method maintains an internal mode flag that starts in "indefinite" (SR1-first priority) and promotes to "pd" (BFGS-first) once the B_k approximation has been reliably positive definite and cubic-model predictions have tracked the true objective:

qn_route_demote_rho (default 0.25): rho_k below this counts as a "bad" step
qn_route_promote_rho (default 0.5): rho_k above this counts as a "good" step
qn_route_demote_k (default 2): consecutive bad steps in "pd" mode demote back to "indefinite"
qn_route_promote_k (default 3): consecutive good PD steps in "indefinite" mode promote to "pd"
qn_fd_refresh_k (default 3): while in "indefinite" mode, this many consecutive bad rho steps rebuild B_k from a fresh FD Hessian
qn_stuck_refresh_k (default 100): while in "indefinite" mode, this many iterations without promoting also triggers an FD refresh (safety net for secondary-saddle stalls)

Trust-Region Fallback

All ⁠tr_fallback_*⁠ and ⁠tr_*⁠ control parameters from arcopt are respected. In QN mode the flat-ridge detector uses lambda_min(B_k) (the QN approximation's smallest eigenvalue) as the ridge signal, consistent with the quantity the cubic subproblem acts on. The ridge detector and the QN FD-refresh triggers are mutually exclusive: the FD refresh fires only in "indefinite" routing mode, while the ridge detector requires lambda_min(B_k) > 0, so both cannot fire on the same iteration. If the one-way switch to trust-region mode occurs, subsequent iterations no longer update the indefinite-mode counters (since they are gated on being in cubic mode).

Value

Same structure as ⁠\link{arcopt}⁠. The diagnostics sublist gains four QN-specific counters in addition to the standard mode- dispatch fields:

qn_updates: number of successful QN updates;
qn_skips: number of skipped updates (curvature condition failed);
qn_restarts: number of approximation restarts;
qn_fd_refreshes: number of FD Hessian refreshes performed (hybrid mode only; 0 otherwise).

qn_polish_switches, qn_polish_reverts, and hess_evals_at_polish_switch are always 0L/NA for QN runs (the polish mode is exact-Hessian only).

Check Convergence Criteria (Stan-style)

Description

Checks multiple convergence criteria following Stan's L-BFGS implementation. Termination occurs when any criterion is satisfied, preventing premature termination on ill-scaled problems while ensuring robust detection.

Usage

check_convergence(
  g_current,
  f_current,
  f_previous,
  x_current,
  x_previous,
  iter,
  tol_grad = 1e-08,
  tol_rel_grad = 1e-06,
  tol_obj = 1e-12,
  tol_rel_obj = 1e-08,
  tol_param = 1e-08,
  max_iter = 1000
)

Arguments

g_current

Current gradient vector (length Q)

f_current

Current objective function value (scalar)

f_previous

Previous objective function value (scalar), NA for first iteration

x_current

Current parameter vector (length Q)

x_previous

Previous parameter vector (length Q), NULL for first iteration

iter

Current iteration number (0-indexed)

tol_grad

Absolute gradient tolerance

tol_rel_grad

Relative gradient tolerance

tol_obj

Absolute objective change tolerance

tol_rel_obj

Relative objective change tolerance

tol_param

Parameter change tolerance (infinity norm)

max_iter

Maximum iterations allowed

Details

The function checks 6 convergence criteria in order:

max_iter: Iteration limit reached
gradient_abs: max(abs(g)) < tol_grad
gradient_rel: max(abs(g)) < tol_rel_grad * max(1, abs(f))
objective_abs: abs(f_current - f_previous) < tol_obj
objective_rel: abs(f_current - f_previous) < tol_rel_obj * max(1, abs(f_current))
parameter: max(abs(x_current - x_previous)) < tol_param

Value

List with two components:

converged

Logical; TRUE if any criterion satisfied

reason

Character; description of which criterion triggered, or "" if not converged

Protect Against NaN/Inf in Function Evaluations

Description

Detects NaN or Inf values in function and gradient evaluations and attempts recovery by increasing regularization to produce smaller steps.

Usage

check_finite(f_value, g_value)

Arguments

f_value

Function value to check

g_value

Gradient vector to check

Details

This is a simple check function. The recovery logic (re-solving with larger sigma) should be implemented in the main optimization loop.

Checks that:

Function value is finite (not NaN or Inf)
All gradient components are finite

Value

Logical; TRUE if values are finite and valid, FALSE if NaN/Inf detected

Check Flat-Ridge Trigger

Description

Evaluates the four-signal rule against the current sliding window.

Usage

check_flat_ridge_trigger(
  state,
  sigma_min,
  tol_ridge = 0.001,
  rho_tol = 0.1,
  grad_decrease_max = 0.9,
  g_inf_floor = 1e-06
)

Arguments

state

List returned by init_flat_ridge_state and maintained by update_flat_ridge_state.

sigma_min

Minimum regularization floor used by the orchestrator.

tol_ridge

Upper bound on lambda_min that counts as "poorly conditioned" (default 1e-3).

rho_tol

Maximum |rho - 1| that counts as "near-perfect model" (default 0.1).

grad_decrease_max

Ratio of latest to oldest ||g||_inf above which the gradient is deemed stagnant (default 0.9).

g_inf_floor

Absolute lower bound on ||g||_inf; the trigger will not fire below this value to avoid firing at true local minima (default 1e-6).

Value

TRUE if the trust-region fallback should be activated, FALSE otherwise. Returns FALSE whenever the window is not yet full or any diagnostic is non-finite.

Check Healthy-Basin Trigger

Description

Evaluates the five-signal rule against the current sliding window.

Usage

check_healthy_basin_trigger(
  state,
  rho_polish = 0.9,
  lambda_min_polish = 0.001,
  g_decay_polish = 0.5,
  g_inf_floor_polish = 1e-08
)

Arguments

state

List returned by init_healthy_basin_state and maintained by update_healthy_basin_state.

rho_polish

Minimum acceptable rho throughout the window (default 0.9).

lambda_min_polish

Minimum acceptable lambda_min(H) throughout the window (default 1e-3).

g_decay_polish

Maximum acceptable ratio of consecutive ||g||_inf values (default 0.5 = 2x-per-iter contraction).

g_inf_floor_polish

Absolute lower bound on ||g||_inf at window start (default 1e-8); prevents firing at convergence.

Value

TRUE if the qn_polish transition should be activated, FALSE otherwise. Returns FALSE whenever the window is not yet full or any diagnostic is non-finite.

Detect Stagnation and Recommend Action

Description

Monitors optimization progress and detects stagnation based on small consecutive steps or function changes. Returns recommended action: continue, refresh Hessian, or stop.

Usage

detect_stagnation(
  step_norms,
  f_values,
  tol_step = 1e-12,
  tol_f = 1e-14,
  max_stagnant = 5,
  is_qn = FALSE,
  already_refreshed = FALSE
)

Arguments

step_norms

Vector of recent step norms (most recent last)

f_values

Vector of recent function values (most recent last)

tol_step

Step size tolerance for stagnation (default: 1e-12)

tol_f

Function change tolerance for stagnation (default: 1e-14)

max_stagnant

Maximum consecutive stagnant iterations before action (default: 5)

is_qn

Logical; TRUE if using quasi-Newton (enables Hessian refresh option)

already_refreshed

Logical; TRUE if Hessian was already refreshed once

Details

Stagnation is detected when either:

Step norms are consecutively below tol_step for max_stagnant iterations
Function changes are consecutively below tol_f for max_stagnant iterations

When stagnation is detected:

If using quasi-Newton and not already refreshed: return "refresh_hessian"
Otherwise: return "stop" (declare convergence failure)

Value

Character; one of "continue", "refresh_hessian", or "stop"

Flat-Ridge Detector for Cubic-to-Trust-Region Fallback

Description

Detects when cubic regularization has entered a flat-ridge stagnation regime: gradient stalls in a near-singular positive-definite region where the cubic penalty has no bite. Triggers the trust-region fallback in the main arcopt iteration when fired.

Details

The detector maintains a sliding window of runtime diagnostics and fires when all four of the following signals hold throughout the window:

Sigma pinned at floor: sigma_k <= 10 * sigma_min for every iteration in the window.
Near-perfect model: |rho_k - 1| < rho_tol for every iteration in the window (model reduction matches actual reduction).
Gradient stagnant: The ratio of the most recent ||g||_inf to the oldest in the window exceeds grad_decrease_max (less than 10 percent decrease over the window by default).
Hessian ill-conditioned: The smallest Hessian eigenvalue at the most recent iteration satisfies lambda_min < tol_ridge. This includes both the classical "flat-ridge" regime (small positive lambda_min) and the "stuck-at-indefinite-saddle" regime (any negative lambda_min), because cubic regularization loses its grip in both.

An absolute floor g_inf_floor is also enforced: the trigger will not fire if ||g||_inf is already below the typical convergence tolerance, preventing spurious triggers at true local minima.

The detector is an empirical proxy for local-error-bound (EB) violation (Yue, Zhou, So 2018). Under EB, cubic regularization attains Q-quadratic convergence even at degenerate minima; the detector fires precisely in the regime where EB is practically vacuous and no theoretical convergence guarantee applies.

Healthy-Basin Detector for Cubic-to-QN-Polish Transition

Description

Detects when the iterate has entered the quadratic attraction basin of a strict local minimum, at which point arcopt can switch from cubic regularization to line-search BFGS (qn_polish mode) and skip further Hessian evaluations for the remainder of the run.

Details

The detector maintains a sliding window of runtime diagnostics and fires when all five of the following signals hold throughout the window:

Newton step accepted: step_type == "newton" at every iteration in the window. Indicates the cubic subproblem has not been invoked (Cholesky succeeded and the Newton step passed the ratio test).
High ratio: rho_k >= rho_polish (default 0.9) throughout the window. The quadratic model is accurately predicting actual reduction.
Hessian well-conditioned PD: lambda_min(H_k) >= lambda_min_polish (default 1e-3) throughout the window. Not just "Cholesky succeeded" but strictly bounded away from zero.
Superlinear gradient decay: ||g_k||_inf / ||g_{k-1}||_inf <= g_decay_polish (default 0.5) for every consecutive pair in the window. This is the strongest signal of the quadratic attraction basin.
Above convergence floor: ||g_0||_inf > g_inf_floor_polish at window start (default 1e-8). Prevents firing at convergence where Newton-accept and gradient-decay are trivially satisfied.

Contrast with the flat-ridge detector (flat_ridge): that detector fires on stagnation (gradient not decreasing); this one fires on healthy convergence (gradient decreasing super- linearly). Both are expected to be mutually exclusive in practice.

Initialize Flat-Ridge Detector State

Description

Initialize Flat-Ridge Detector State

Usage

init_flat_ridge_state(window = 10)

Arguments

window

Integer sliding-window length (default 10)

Value

A list with empty diagnostic vectors and the window size.

Initialize Healthy-Basin Detector State

Description

Initialize Healthy-Basin Detector State

Usage

init_healthy_basin_state(window = 5)

Arguments

window

Integer sliding-window length (default 5; shorter than the flat-ridge detector's 10 because the basin signal is stronger and revert-on-false-positive is cheap).

Value

A list with empty diagnostic vectors and the window size.

Project Point to Box Constraints

Description

Projects a point to the feasible region defined by box constraints.

Usage

project_to_box(x, lower, upper)

Arguments

x

Point to project

lower

Lower bounds

upper

Upper bounds

Details

Component-wise projection: each component is clamped to its bounds

Value

Projected point within bounds

Solve Cubic Subproblem via Eigendecomposition (Algorithm 5a)

Description

Solves min_s m(s) = g^T s + 1/2 s^T H s + sigma/3 ||s||^3 using full eigendecomposition with explicit hard-case detection.

Usage

solve_cubic_eigen(
  g,
  H,
  sigma,
  max_lambda_iter = 50,
  lambda_tol = 1e-10,
  hard_case_tol = 1e-08
)

Arguments

g

Gradient vector (length n)

H

Hessian matrix (n x n symmetric, may be indefinite)

sigma

Regularization parameter (positive scalar)

max_lambda_iter

Maximum secular equation iterations (default: 50)

lambda_tol

Secular equation convergence tolerance (default: 1e-10)

hard_case_tol

Hard case detection threshold (default: 1e-8)

Details

Uses eigen() for O(n^3) eigendecomposition. Hard case occurs when gradient is nearly orthogonal to smallest eigenvector (|g_1| < tol_hard * ||g||). Recommended for n <= 500 due to cubic computational cost.

Value

List with:

s

Solution vector

lambda

Lagrange multiplier

pred_reduction

Predicted reduction: -g^T s - 1/2 s^T H s - sigma/3 ||s||^3

converged

TRUE if secular equation converged

case_type

"easy", "hard", or "zero_gradient"

eigenvalues

Eigenvalues of H sorted increasing, or NULL for the zero-gradient early-return path. Exposed so callers can reuse the spectrum (e.g. for lambda_min in end-of-iteration detectors) instead of re-decomposing H.

Solve Cubic Subproblem with Automatic Solver Selection

Description

Unified dispatcher for cubic subproblem solvers with automatic solver selection based on problem size and Hessian representation.

Usage

solve_cubic_subproblem_dispatch(g, H = NULL, sigma, solver = "auto", ...)

Arguments

g

Gradient vector (length n)

H

Hessian matrix (n x n symmetric, may be indefinite)

sigma

Regularization parameter (positive scalar)

solver

Solver to use: "auto" (default) or "eigen". Auto-selection uses eigendecomposition (Algorithm 5a).

...

Additional arguments passed to specific solvers

Details

Uses eigendecomposition-based solver (Algorithm 5a) which provides robust handling of indefinite Hessians and hard cases.

For large-scale matrix-free optimization (n > 500), see design/scalable-arcs.qmd for deferred large-scale solver implementations.

Value

List with:

s

Solution vector

lambda

Lagrange multiplier

pred_reduction

Predicted reduction

converged

TRUE if solver converged

solver_used

Name of solver that was used

Solve Trust-Region Subproblem via Eigendecomposition (Algorithm 5c)

Description

Solves min_s m(s) = g^T s + 1/2 s^T H s subject to ||s|| <= radius using full eigendecomposition with explicit hard-case detection.

Usage

solve_tr_eigen(
  g,
  H,
  radius,
  max_lambda_iter = 50,
  lambda_tol = 1e-10,
  hard_case_tol = 1e-08
)

Arguments

g

Gradient vector (length n)

H

Hessian matrix (n x n symmetric, may be indefinite)

radius

Trust-region radius (positive scalar)

max_lambda_iter

Maximum secular equation iterations (default: 50)

lambda_tol

Secular equation convergence tolerance (default: 1e-10)

hard_case_tol

Hard case detection threshold (default: 1e-8)

Details

This is the trust-region counterpart to solve_cubic_eigen. The only structural difference is the constraint: the cubic solver enforces lambda = sigma * ||s||, while the trust-region solver enforces ||s|| <= radius. Both reduce to (H + lambda*I) s = -g with lambda >= max(0, -lambda_min(H)).

Three cases are handled:

Interior: H is positive definite and the unconstrained Newton step -H^{-1} g lies inside the trust region. Returned with lambda = 0 and on_boundary = FALSE.
Easy boundary: A unique lambda > max(0, -lambda_min) places ||s(lambda)|| = radius. Found by Newton on the secular equation.
Hard case: Hessian is indefinite and the gradient is nearly orthogonal to the smallest-eigenvalue eigenvector. Resolved per More-Sorensen by setting lambda = -lambda_min(H) and adding a component along the smallest eigenvector to reach the boundary.

Recommended for n <= 500 due to O(n^3) eigendecomposition cost.

Value

List with:

s

Solution vector

lambda

Lagrange multiplier (0 if interior Newton step)

pred_reduction

Predicted reduction: -g^T s - 1/2 s^T H s

converged

TRUE if solver returned a valid step

case_type

"interior", "easy", "hard", or "zero_gradient"

on_boundary

TRUE if ||s|| == radius (active constraint)

Try Newton Step

Description

Attempts to compute and validate a Newton step when the Hessian is positive definite. Uses Cholesky factorization to check positive definiteness and solve the Newton system.

Usage

try_newton_step(g, H)

Arguments

g

Gradient vector (length n)

H

Hessian matrix (n x n symmetric matrix)

Details

The algorithm:

Attempts Cholesky factorization of H (fails if H is indefinite)
If successful, solves H * s = -g for Newton step
Computes predicted reduction from quadratic model

No pre-filtering based on sigma is required - the ratio test in the main ARC loop naturally determines whether Newton steps are appropriate. See the design notes in ⁠literature/consensus_reviews/⁠ for the rationale behind not imposing an explicit sigma threshold on the Newton step.

Value

List with components:

success

Logical; TRUE if Newton step computed successfully

s

Newton step vector if successful, NULL otherwise

pred_reduction

Predicted reduction from quadratic model if successful, NULL otherwise

BFGS (Broyden-Fletcher-Goldfarb-Shanno) Hessian Update

Description

Updates the Hessian approximation using the BFGS formula. BFGS maintains positive definiteness when the curvature condition y'*s > 0 is satisfied.

Usage

update_bfgs(b, s, y)

Arguments

b

Current Hessian approximation (n x n symmetric positive definite matrix).

s

Step vector: s = x_new - x_old.

y

Gradient difference: y = g_new - g_old.

Details

The BFGS update formula is:

B_{k+1} = B_k - \frac{B_k s s^T B_k}{s^T B_k s} + \frac{y y^T}{y^T s}

The update is skipped when the curvature condition y^T s > 0 is violated. This can happen on nonconvex problems. When skipped, the previous approximation is retained.

Note: BFGS maintains positive definiteness, which wastes ARC's ability to handle indefinite Hessians. Consider using SR1 for nonconvex problems.

Value

A list with components:

b

Updated Hessian approximation (n x n matrix).

skipped

Logical indicating if update was skipped due to curvature condition violation.

References

Algorithm 4a in design/pseudocode.qmd

Nocedal, J., & Wright, S. J. (2006). Numerical Optimization (2nd ed.). Springer. Section 6.1.

Powell-Damped BFGS Update

Description

Applies Powell damping to ensure the curvature condition is satisfied, then performs a BFGS update. This guarantees an update even when the standard curvature condition y's > 0 fails.

Usage

update_bfgs_powell(b, s, y, damping_threshold = 0.2)

Arguments

b

Current Hessian approximation (n x n symmetric matrix).

s

Step vector: s = x_new - x_old.

y

Gradient difference: y = g_new - g_old.

damping_threshold

Threshold for damping (default: 0.2). Damping applied if y's < damping_threshold * s'Bs.

Details

Powell damping modifies y to y_damped = theta*y + (1-theta)*Bs such that s'*y_damped >= damping_threshold * s'*Bs, ensuring positive curvature.

The formula for theta when y's < damping_threshold * s'Bs is: theta = (1 - damping_threshold) * s'Bs / (s'Bs - y's)

This ensures s'*y_damped = damping_threshold * s'Bs exactly.

Value

A list with components:

b

Updated Hessian approximation (n x n matrix).

theta

Damping parameter used (1.0 = no damping).

y_damped

The damped y vector used in update.

skipped

Logical indicating if update was skipped (only if s'Bs is too small to apply damping).

References

Powell, M. J. D. (1978). A fast algorithm for nonlinearly constrained optimization calculations. Numerical Analysis, 144-157.

Algorithm 4a-hybrid in design/pseudocode.qmd

Update Flat-Ridge Detector State with One Iteration's Diagnostics

Description

Pushes a new record onto the sliding window and trims the oldest entry once the window is full.

Usage

update_flat_ridge_state(state, sigma, rho, g_inf, lambda_min)

Arguments

state

List returned by init_flat_ridge_state.

sigma

Current regularization parameter.

rho

Current step-acceptance ratio.

g_inf

Current ||g||_inf.

lambda_min

Current smallest Hessian eigenvalue (may be negative for indefinite Hessians; pass NA_real_ if unavailable).

Value

The updated state list.

Update Healthy-Basin Detector State with One Iteration's Diagnostics

Description

Pushes a new record onto the sliding window and trims the oldest entry once the window is full.

Usage

update_healthy_basin_state(state, used_newton, rho, lambda_min, g_inf)

Arguments

state

List returned by init_healthy_basin_state.

used_newton

Logical; TRUE if the current iteration accepted a Newton step (i.e., step_type == "newton").

rho

Current step-acceptance ratio (NA if step was rejected).

lambda_min

Current smallest Hessian eigenvalue.

g_inf

Current ||g||_inf.

Value

The updated state list.

Hybrid BFGS/SR1 Update with Automatic Routing

Description

Attempts BFGS first, falls back to SR1, then to Powell-damped BFGS. Provides robust quasi-Newton updates for both convex and nonconvex regions.

Usage

update_hybrid(
  b,
  s,
  y,
  bfgs_tol = 1e-10,
  sr1_skip_tol = 1e-08,
  skip_count = 0L,
  restart_threshold = 5L
)

Arguments

b

Current Hessian approximation (n x n symmetric matrix).

s

Step vector: s = x_new - x_old.

y

Gradient difference: y = g_new - g_old.

bfgs_tol

Tolerance for BFGS curvature condition (default: 1e-10).

sr1_skip_tol

Tolerance for SR1 skip test (default: 1e-8).

skip_count

Current count of consecutive skipped updates.

restart_threshold

Number of consecutive skips before restart (default: 5).

Details

Routing priority:

BFGS if y's > bfgs_tol (stable, positive definite)
SR1 if |r's| >= sr1_skip_tol * ||r|| * ||s|| (allows indefiniteness)
Powell-damped BFGS otherwise (guaranteed update)

The hybrid approach is ideal for problems that transition between convex and nonconvex regions.

Value

A list with components:

b

Updated Hessian approximation (n x n matrix).

update_type

Character: "bfgs", "sr1", "powell", or "skipped".

skipped

Logical indicating if no update was applied.

skip_count

Updated skip counter (reset on successful update).

restarted

Logical indicating if b was reset to scaled identity.

theta

Powell damping parameter (only if update_type = "powell").

References

Algorithm 4a-hybrid in design/pseudocode.qmd

Hybrid Update with SR1-First Routing

Description

Reverse priority of update_hybrid: tries SR1 first (to preserve indefinite curvature information), then BFGS, then Powell-damped BFGS. Used by arcopt_qn when the current Hessian approximation is indefinite or when recent BFGS predictions have been inaccurate.

Usage

update_hybrid_sr1_first(
  b,
  s,
  y,
  bfgs_tol = 1e-10,
  sr1_skip_tol = 1e-08,
  skip_count = 0L,
  restart_threshold = 5L
)

Arguments

b

Current Hessian approximation (n x n symmetric matrix).

s

Step vector: s = x_new - x_old.

y

Gradient difference: y = g_new - g_old.

bfgs_tol

Tolerance for BFGS curvature condition (default: 1e-10).

sr1_skip_tol

Tolerance for SR1 skip test (default: 1e-8).

skip_count

Current count of consecutive skipped updates.

restart_threshold

Number of consecutive skips before restart (default: 5).

Value

Same structure as update_hybrid.

Update Regularization Parameter (Classical CGT)

Description

Adaptively adjusts the regularization parameter sigma based on how well the cubic model predicted the actual function decrease. Follows the classical Cartis-Gould-Toint (2011) update strategy.

Usage

update_sigma_cgt(
  sigma_current,
  rho,
  eta1 = 0.1,
  eta2 = 0.9,
  gamma1 = 0.5,
  gamma2 = 2,
  sigma_min = 1e-16,
  sigma_max = 1e+16
)

Arguments

sigma_current

Current regularization parameter (positive scalar)

rho

Ratio of actual to predicted reduction (scalar)

eta1

Acceptance threshold (default: 0.1). Steps with rho >= eta1 are accepted.

eta2

Very successful threshold (default: 0.9). Steps with rho >= eta2 trigger sigma decrease.

gamma1

Sigma decrease factor for very successful steps (default: 0.5)

gamma2

Sigma increase factor for unsuccessful steps (default: 2.0)

sigma_min

Minimum allowed sigma (default: 1e-16)

sigma_max

Maximum allowed sigma (default: 1e16)

Details

The update strategy is:

Very successful (rho >= eta2): Decrease sigma by gamma1 factor (trust model more)
Moderately successful (eta1 <= rho < eta2): Keep sigma unchanged
Unsuccessful (rho < eta1): Increase sigma by gamma2 factor (trust model less)

The ratio rho measures model quality: rho = (f(x) - f(x+s)) / pred_reduction. Values near 1 indicate excellent model accuracy; values near 0 or negative indicate poor model fit.

Value

Updated sigma value (scalar)

Update Regularization Parameter (Interpolation-Enhanced CGT)

Description

Adaptively adjusts sigma using interpolation to estimate local Lipschitz constant from model-function discrepancy. May provide faster adaptation on large-scale problems with varying curvature (Gould, Porcelli, Toint, 2012).

Usage

update_sigma_interpolation(
  sigma_current,
  rho,
  s,
  f_trial,
  m_trial,
  eta1 = 0.1,
  eta2 = 0.9,
  gamma1 = 0.5,
  gamma2 = 2,
  sigma_min = 1e-16,
  sigma_max = 1e+16
)

Arguments

sigma_current

Current regularization parameter (positive scalar)

rho

Ratio of actual to predicted reduction (scalar)

s

Current step vector

f_trial

Function value at trial point f(x + s)

m_trial

Model value at step m(s)

eta1

Acceptance threshold (default: 0.1)

eta2

Very successful threshold (default: 0.9)

gamma1

Sigma decrease factor for very successful steps (default: 0.5)

gamma2

Sigma increase factor for unsuccessful steps (default: 2.0)

sigma_min

Minimum allowed sigma (default: 1e-16)

sigma_max

Maximum allowed sigma (default: 1e16)

Details

Estimates local Lipschitz constant from discrepancy between function and model: L_hat = 2 * |f(x+s) - m(s)| / ||s||^3

The update strategy incorporates this estimate:

Very successful (rho >= eta2): sigma = max(L_hat/2, sigma/gamma1, sigma_min)
Moderately successful (eta1 <= rho < eta2): Keep sigma unchanged
Unsuccessful (rho < eta1): sigma = min(max(L_hat/2, gamma2*sigma), sigma_max)

This may provide better adaptation than Algorithm 2a (classical CGT) when curvature varies significantly across the domain.

Value

Updated sigma value (scalar)

SR1 (Symmetric Rank-1) Hessian Update

Description

Updates the Hessian approximation using the SR1 formula. Unlike BFGS, SR1 can represent indefinite matrices, making it ideal for ARC which naturally handles negative curvature.

Usage

update_sr1(b, s, y, skip_tol = 1e-08, skip_count = 0L, restart_threshold = 5L)

Arguments

b

Current Hessian approximation (n x n symmetric matrix).

s

Step vector: s = x_new - x_old.

y

Gradient difference: y = g_new - g_old.

skip_tol

Tolerance for skip test (default: 1e-8). Update is skipped if |r'*s| < skip_tol * ||r|| * ||s||.

skip_count

Current count of consecutive skipped updates.

restart_threshold

Number of consecutive skips before restart (default: 5).

Details

The SR1 update formula is:

B_{k+1} = B_k + \frac{r r^T}{r^T s}

where r = y - B_k s.

The update is skipped when the denominator is too small relative to the norms of r and s, which prevents numerical instability. After too many consecutive skips, the approximation is reset using Barzilai-Borwein scaling.

Value

A list with components:

b

Updated Hessian approximation (n x n matrix).

skipped

Logical indicating if update was skipped.

skip_count

Updated skip counter (reset to 0 if update applied or restart triggered).

restarted

Logical indicating if b was reset to scaled identity.

References

Algorithm 4a in design/pseudocode.qmd

Conn, A. R., Gould, N. I., & Toint, P. L. (1991). Convergence of quasi-Newton matrices generated by the symmetric rank one update. Mathematical Programming, 50(1-3), 177-195.

Validate and Project Initial Point

Description

Validates bounds and projects initial point to feasible region if needed.

Usage

validate_and_project_initial(x0, lower, upper)

Arguments

x0

User-provided initial point

lower

Lower bounds

upper

Upper bounds

Details

This implements Algorithm 7b from the design specification:

Validate bounds (lower <= upper, correct dimensions)
Check if x0 is feasible
Project to box if infeasible (with warning)

Value

Feasible initial point

Wolfe Line Search

Description

Performs a line search along a descent direction d from point x, finding a step length alpha that satisfies the strong Wolfe conditions. Implements the bracketing + zoom scheme from Nocedal & Wright (2006), Algorithms 3.5 (bracketing) and 3.6 (zoom).

Usage

wolfe_line_search(
  fn,
  gr,
  x,
  d,
  f_x,
  g_x,
  c1 = 1e-04,
  c2 = 0.9,
  alpha_max = 1,
  max_iter = 20,
  ...
)

Arguments

fn

Objective function. Called as fn(x_new, ...).

gr

Gradient function. Called as gr(x_new, ...).

x

Numeric vector; current iterate.

d

Numeric vector; search direction (must be a descent direction, i.e., sum(g_x * d) < 0).

f_x

Current objective value fn(x).

g_x

Current gradient gr(x).

c1

Armijo (sufficient decrease) constant; default 1e-4.

c2

Curvature constant (strong Wolfe); default 0.9 (standard for quasi-Newton). Smaller values give stricter curvature conditions; for linear-CG use 0.1.

alpha_max

Maximum step length; default 1.0. Line search will try alpha_curr = alpha_max first (appropriate for Newton-like directions).

max_iter

Maximum total evaluations (bracketing + zoom); default 20.

...

Extra arguments forwarded to fn and gr.

Details

Strong Wolfe conditions:

Armijo (sufficient decrease): f(x + \alpha d) \le f(x) + c_1 \alpha g_x^\top d
Curvature: |g(x + \alpha d)^\top d| \le c_2 |g_x^\top d|

The algorithm first brackets an interval containing a Wolfe point, then bisects within the bracket to locate one. Bisection is used for simplicity; quadratic or cubic interpolation as in Nocedal & Wright Sec. 3.5 can be added as a future refinement.

Returns success = FALSE when no strong Wolfe alpha is found within max_iter evaluations. The caller can still consult alpha (and the associated trial point) to decide whether to take the best-known step or reject the iteration.

Value

A list with components:

success

TRUE if strong Wolfe was satisfied; FALSE otherwise.

alpha

Final step length (best available even on failure).

x_new

x + alpha * d.

f_new

fn(x_new).

g_new

gr(x_new).

evals_f

Number of objective evaluations performed.

evals_g

Number of gradient evaluations performed.

reason

String describing success or failure mode.

Package {arcopt}

arcopt: Adaptive Regularization using Cubics for Optimization

Description

Author(s)

See Also

Apply Box Constraints to Step

Description

Usage

Arguments

Details

Value

Adaptive Regularization using Cubics Optimizer

Description

Usage

Arguments

Details

Hessian Requirement

Value

See Also

Examples

Advanced Control Parameters for arcopt

Description

Cubic regularization

Trust-region fallback (cubic to TR on flat-ridge detection)

Quasi-Newton polish (cubic to BFGS line-search)

Quasi-Newton variant (use_qn = TRUE)

Diagnostics in result$diagnostics

See Also

Quasi-Newton ARC Optimizer

Description

Usage

Arguments

Details

Initialization of B_0

Control Parameters

Hybrid Routing Parameters

Trust-Region Fallback

Value

Check Convergence Criteria (Stan-style)

Description

Usage

Arguments

Details

Value

Protect Against NaN/Inf in Function Evaluations

Description

Usage

Arguments

Details

Value

Check Flat-Ridge Trigger

Description

Usage

Arguments

Value

Check Healthy-Basin Trigger

Description

Usage

Arguments

Value

Detect Stagnation and Recommend Action

Description

Usage

Arguments

Details

Value

Flat-Ridge Detector for Cubic-to-Trust-Region Fallback

Description

Details

Healthy-Basin Detector for Cubic-to-QN-Polish Transition

Description

Details

Initialize Flat-Ridge Detector State

Description

Usage

Arguments

Value

See Also

Initialize Healthy-Basin Detector State

Description

Quasi-Newton variant (`use_qn = TRUE`)

Diagnostics in `result$diagnostics`