| Type: | Package |
| Title: | Adaptive Regularization using Cubics for Optimization |
| Version: | 0.3.0 |
| Description: | Implements cubic regularization methods (ARC) for local optimization problems common in statistics and applied research. Provides robust handling of ill-conditioned, nonconvex, and indefinite Hessian problems with automatic saddle point escape. Supports box constraints; linear equality constraints are planned for a future release. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| URL: | https://github.com/marcus-waldman/arcopt |
| BugReports: | https://github.com/marcus-waldman/arcopt/issues |
| Depends: | R (≥ 4.1.0) |
| Imports: | Rcpp (≥ 1.0.0), utils |
| LinkingTo: | Rcpp |
| Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown, covr, marqLevAlg, trust |
| VignetteBuilder: | knitr |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | yes |
| Packaged: | 2026-04-28 16:36:29 UTC; marcu |
| Author: | Marcus Waldman [aut, cre] |
| Maintainer: | Marcus Waldman <marcus.waldman@cuanschutz.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-04-29 08:10:10 UTC |
arcopt: Adaptive Regularization using Cubics for Optimization
Description
Implements cubic regularization methods (ARC) for local optimization problems common in statistics and applied research. Provides robust handling of ill-conditioned, nonconvex, and indefinite Hessian problems with automatic saddle point escape. Supports box constraints; linear equality constraints are planned for a future release.
Author(s)
Maintainer: Marcus Waldman marcus.waldman@cuanschutz.edu
See Also
Useful links:
Report bugs at https://github.com/marcus-waldman/arcopt/issues
Apply Box Constraints to Step
Description
Truncates a proposed step to ensure the new iterate stays within box constraints, then projects for numerical safety.
Usage
apply_box_constraints(x_current, s, lower, upper)
Arguments
x_current |
Current iterate |
s |
Proposed unconstrained step |
lower |
Lower bounds (use -Inf for unbounded) |
upper |
Upper bounds (use Inf for unbounded) |
Details
This implements Algorithm 7a from the design specification. The approach:
Compute maximum feasible step length alpha_max
Truncate step: s_bounded = alpha_max * s
Project x_k + s_bounded to box for numerical safety
Value
List with components:
s_bounded |
Truncated step that respects bounds |
x_new |
New iterate projected to feasible region |
Adaptive Regularization using Cubics Optimizer
Description
Minimizes a nonlinear objective function using Adaptive Regularization with Cubics (ARC). Designed for robust optimization of ill-conditioned, nonconvex, and indefinite Hessian problems common in statistical applications.
Usage
arcopt(
x0,
fn,
gr,
hess = NULL,
...,
lower = rep(-Inf, length(x0)),
upper = rep(Inf, length(x0)),
control = list()
)
Arguments
x0 |
Numeric vector of initial parameter values (length Q). |
fn |
Function that computes the objective function value. Should take a numeric vector of length Q and return a scalar. |
gr |
Function that computes the gradient. Should take a numeric vector of length Q and return a numeric vector of length Q. Required. |
hess |
Function that computes the Hessian matrix. Should take a
numeric vector of length Q and return a Q-by-Q symmetric matrix.
Required (unless |
... |
Additional arguments passed to |
lower |
Numeric vector of lower bounds (length Q). Use |
upper |
Numeric vector of upper bounds (length Q). Use |
control |
A named list of control parameters. The user-facing
tolerances and switches are documented below; advanced regularization
tuning, the trust-region fallback, and the quasi-Newton polish mode
live on a separate help page (see
See |
Details
The ARC algorithm iteratively minimizes a cubic regularization model:
m_k(s) = f_k + g_k^\top s + \frac{1}{2} s^\top H_k s +
\frac{\sigma_k}{3} \|s\|^3
where \sigma_k is adapted from observed model accuracy. arcopt
may transparently fall back to a trust-region subproblem in flat-ridge
regimes and (optionally, opt-in) to a line-search BFGS polish in the
quadratic attraction basin. The transitions are observable via
result$diagnostics; the algorithmic details and tunable thresholds
are documented under \link{arcopt_advanced_controls}.
Hessian Requirement
arcopt is Hessian-centric: an analytic hess function is strongly
recommended. If the analytic form is unavailable, set
control$use_qn = TRUE to obtain Hessian-free quasi-Newton updates
(see the advanced-controls page).
Value
A list with components:
-
par: Optimal parameter vector. -
value: Objective value atpar. -
gradient: Gradient atpar. -
hessian: Hessian atpar(or the final BFGS approximation if the run ended in qn_polish mode). -
sigma: Final cubic regularization parameter. -
converged: Logical; whether convergence criteria were met. -
iterations: Number of iterations performed. -
evaluations: Named list offn,gr, andhessevaluation counts. -
message: Convergence reason. -
trace: Per-iteration trace data (depth controlled bycontrol$trace);NULLwhentrace = 0. -
diagnostics: Sublist of internal mode-dispatch diagnostics –solver_mode_final,ridge_switches,radius_final,qn_polish_switches,qn_polish_reverts, andhess_evals_at_polish_switch. See\link{arcopt_advanced_controls}for the meaning of each field. Most users do not need to inspect this; it is preserved for diagnostic and benchmarking use.
See Also
arcopt_advanced_controls for advanced tuning of
the cubic regularization, trust-region fallback, and quasi-Newton
polish mode.
Examples
# Rosenbrock function
rosenbrock <- function(x) (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2
rosenbrock_gr <- function(x) {
c(-2 * (1 - x[1]) - 400 * x[1] * (x[2] - x[1]^2),
200 * (x[2] - x[1]^2))
}
rosenbrock_hess <- function(x) {
matrix(c(
1200 * x[1]^2 - 400 * x[2] + 2, -400 * x[1],
-400 * x[1], 200
), 2, 2)
}
result <- arcopt(
x0 = c(-1.2, 1),
fn = rosenbrock,
gr = rosenbrock_gr,
hess = rosenbrock_hess
)
print(result$par) # Should be near c(1, 1)
print(result$value) # Should be near 0
Advanced Control Parameters for arcopt
Description
The main arcopt help page documents only the user-facing
tolerances and switches (maxit, gtol_abs, ftol_abs, xtol_abs,
trace, verbose, use_qn). This page documents every other entry
that the control list of arcopt() (and the routed-to
arcopt:::arcopt_qn() variant) recognizes, organized by the
subsystem each parameter governs.
Cubic regularization
These parameters control the adaptive cubic model
m_k(s) = f_k + g_k^\top s + \tfrac{1}{2} s^\top H_k s +
\tfrac{\sigma_k}{3} \|s\|^3 and the sigma adaptation rule (Algorithm
2a/2b of design/pseudocode.qmd).
sigma0Initial regularization parameter (default
1.0).sigma_minFloor on
sigma_k(default1e-6). Prevents the cubic term from vanishing entirely on flat regions.sigma_maxCeiling on
sigma_k(default1e12). Triggers emergency-stop behavior when reached.eta1Step-acceptance threshold; steps with
\rho_k \ge \texttt{eta1}are accepted (default0.1).eta2Very-successful threshold;
\rho_k \ge \texttt{eta2}triggerssigmashrinkage (default0.9).gamma1Multiplicative shrink factor on a very-successful step (default
0.5).gamma2Multiplicative grow factor on an unsuccessful step (default
2.0).
Trust-region fallback (cubic to TR on flat-ridge detection)
Cubic regularization can stagnate in "flat-ridge" regimes – iterations
with the regularization floor pinned, model predictions matching the
objective (\rho \approx 1), gradient stalling, and a Hessian
that is positive-definite but nearly singular. This is outside the
local-error-bound condition of Yue, Zhou & So (2018) under which
cubic regularization is guaranteed to converge quadratically at
degenerate minimizers. arcopt detects the regime and switches once
from the cubic subproblem to a trust-region subproblem.
tr_fallback_enabledOne-way cubic to TR switch (default
TRUE).tr_fallback_windowSliding-window length for the detector (default
10).tr_fallback_tol_ridgelambda_min(H)threshold defining a "near-singular PD" Hessian (default1e-3).tr_fallback_rho_tolTolerance on
|rho - 1|for the "near-perfect model" signal (default0.1).tr_fallback_grad_decrease_maxRatio of latest to oldest
||g||_infabove which the gradient counts as stagnant (default0.9).tr_fallback_g_inf_floorAbsolute lower bound on
||g||_infbelow which the switch will not fire – keeps the hybrid from triggering at true local minima (default1e-6).tr_r0Initial trust-region radius at the switch (default
1.0).tr_rmaxMaximum trust-region radius (default
100).tr_eta1TR step-acceptance threshold (default
0.25).tr_eta2TR expansion threshold (default
0.75).tr_gamma_shrinkRadius shrink factor on a poor step (default
0.25).tr_gamma_growRadius grow factor on a very-good boundary step (default
2.0).
Quasi-Newton polish (cubic to BFGS line-search)
Once the iterate enters the quadratic attraction basin of a strict
local minimum, the cubic regularization penalty has decayed to its
floor and contributes negligible damping, but arcopt still evaluates
hess() every iteration. For expensive Hessians (analytic AD via
Stan, finite differences) this dominates wall-clock time. Polish
mode replaces the cubic subproblem with a Wolfe line search along the
BFGS-approximated Newton direction, skipping further hess() calls
until convergence or until the BFGS approximation drifts.
Off by default in v0.2.0 because the existing manuscript and benchmark problems converge before the five-signal healthy-basin detector accumulates a full window. Enable opt-in for long-running smooth problems with expensive Hessians.
qn_polish_enabledEnable the cubic to qn_polish bidirectional switch (default
FALSE).qn_polish_windowSliding-window length for the healthy- basin detector (default
5).qn_polish_rhoMinimum
rho_krequired throughout the window (default0.9).qn_polish_lambda_minMinimum
lambda_min(H_k)required throughout the window (default1e-3).qn_polish_g_decayMaximum ratio of consecutive
||g||_infvalues; e.g.0.5requires 2x-per-iteration contraction (default0.5).qn_polish_g_inf_floorAbsolute lower bound on
||g||_infat window start; prevents firing at convergence (default1e-8).qn_polish_c1,qn_polish_c2Wolfe line-search constants (defaults
1e-4and0.9).qn_polish_alpha_maxInitial step length tried by the line search (default
1.0).qn_polish_max_ls_iterMaximum line-search evaluations per iteration (default
20).qn_polish_max_failConsecutive line-search failures that trigger a revert to cubic mode (default
3).qn_polish_reenter_delayCubic iterations required after a revert before qn_polish may re-fire (default
5).qn_polish_curv_epsCurvature threshold for skipping BFGS updates to preserve PD (default
1e-10).
Quasi-Newton variant (use_qn = TRUE)
When control$use_qn = TRUE, arcopt() routes to an internal
quasi-Newton variant that approximates H_k via SR1/BFGS updates and
does not require a hess function. The following parameters apply
only when use_qn = TRUE.
qn_methodOne of
"hybrid"(default),"sr1","bfgs"."hybrid"uses state-aware routing between SR1-first and BFGS-first orderings based on the current B's eigenstructure and recent rho values.bfgs_tolCurvature tolerance for the BFGS update (default
1e-10).sr1_skip_tolSR1 skip-test tolerance (default
1e-8).sr1_restart_thresholdConsecutive SR1 skips before restart (default
5).qn_route_demote_rhorhobelow this counts as a "bad" step in the routing FSM (default0.25).qn_route_promote_rhorhoabove this counts as a "good" step (default0.5).qn_route_demote_kConsecutive bad steps in
"pd"routing mode that demote back to"indefinite"(default2).qn_route_promote_kConsecutive good PD steps in
"indefinite"mode that promote to"pd"(default3).qn_fd_refresh_kConsecutive bad-rho iterations in
"indefinite"mode that trigger an FD-Hessian refresh of B (default3).qn_stuck_refresh_kIterations stuck in
"indefinite"mode without promotion that force a refresh (default100).use_accel_qnEXPERIMENTAL. Enable Nesterov acceleration in the QN path. May improve convergence on strongly convex problems but can hurt nonconvex (default
FALSE).
Diagnostics in result$diagnostics
Mode-dispatch diagnostics are nested under result$diagnostics so the
primary return list stays compact.
solver_mode_final"cubic","tr", or"qn_polish"– which subproblem solver was active at termination.ridge_switchesInteger count of cubic to TR transitions (
0or1in v1; the switch is one-way).radius_finalFinal trust-region radius (
NAif the solver never switched to TR mode).qn_polish_switchesInteger count of cubic to qn_polish transitions (bidirectional; may be
> 1).qn_polish_revertsInteger count of qn_polish to cubic reversions.
hess_evals_at_polish_switchevaluations$hessat the first polish switch; compare against finalevaluations$hessto quantify Hessian-evaluation savings.
QN-variant runs add qn_updates, qn_skips, qn_restarts, and
qn_fd_refreshes to the same sublist.
See Also
arcopt for the user-facing entry point.
Quasi-Newton ARC Optimizer
Description
Minimizes a nonlinear objective function using Adaptive Regularization with Cubics and quasi-Newton Hessian approximations. Does not require an exact Hessian function.
Usage
arcopt_qn(
x0,
fn,
gr,
hess = NULL,
...,
lower = rep(-Inf, length(x0)),
upper = rep(Inf, length(x0)),
control = list()
)
Arguments
x0 |
Numeric vector of initial parameter values (length n). |
fn |
Function that computes the objective function value. |
gr |
Function that computes the gradient. |
hess |
Optional Hessian function for hybrid mode. If provided, initializes B_0 = H(x_0) and can refresh when approximation degrades. |
... |
Additional arguments passed to |
lower |
Numeric vector of lower bounds (length n). Default: all -Inf. |
upper |
Numeric vector of upper bounds (length n). Default: all Inf. |
control |
List of control parameters (see Details). |
Details
The QN-ARC algorithm maintains a quasi-Newton Hessian approximation B_k and solves the cubic regularization subproblem:
m_k(s) = f_k + g_k^T s + 1/2 s^T B_k s + sigma_k/3 ||s||^3
Initialization of B_0
When hess is not supplied, arcopt_qn() seeds the initial Hessian
approximation with a one-time finite-difference Hessian computed from
the supplied gradient at x0 (cost: 2 * length(x0) gradient
evaluations once at startup). This provides negative-curvature
information that the pure-identity initialization used by classical
quasi-Newton methods would miss on saddle-prone problems. To opt out
and recover the identity-initialization convention, pass
hess = function(x) diag(length(x)) explicitly.
Control Parameters
In addition to standard arcopt controls:
-
qn_method: Update method (default: "hybrid"):"hybrid": State-aware routing between SR1-first and BFGS-first orderings based on the current B_k's eigenstructure and recent rho_k values (see below). Includes automatic FD refresh of B_k when SR1 is stuck. Recommended for saddle-prone problems.
"sr1": Symmetric Rank-1 (allows indefinite Hessians)
"bfgs": Standard BFGS (maintains positive definiteness)
-
bfgs_tol: Curvature tolerance for BFGS in hybrid mode (default: 1e-10) -
sr1_skip_tol: SR1 skip test tolerance (default: 1e-8) -
sr1_restart_threshold: Consecutive skips before restart (default: 5) -
use_accel_qn: EXPERIMENTAL Enable Nesterov acceleration (default: FALSE). May improve convergence on strongly convex problems but can hurt performance on nonconvex problems. Use with caution.
Hybrid Routing Parameters
The "hybrid" method maintains an internal mode flag that starts in
"indefinite" (SR1-first priority) and promotes to "pd" (BFGS-first)
once the B_k approximation has been reliably positive definite and
cubic-model predictions have tracked the true objective:
-
qn_route_demote_rho(default 0.25): rho_k below this counts as a "bad" step -
qn_route_promote_rho(default 0.5): rho_k above this counts as a "good" step -
qn_route_demote_k(default 2): consecutive bad steps in "pd" mode demote back to "indefinite" -
qn_route_promote_k(default 3): consecutive good PD steps in "indefinite" mode promote to "pd" -
qn_fd_refresh_k(default 3): while in "indefinite" mode, this many consecutive bad rho steps rebuild B_k from a fresh FD Hessian -
qn_stuck_refresh_k(default 100): while in "indefinite" mode, this many iterations without promoting also triggers an FD refresh (safety net for secondary-saddle stalls)
Trust-Region Fallback
All tr_fallback_* and tr_* control parameters from
arcopt are respected. In QN mode the flat-ridge detector
uses lambda_min(B_k) (the QN approximation's smallest eigenvalue)
as the ridge signal, consistent with the quantity the cubic subproblem
acts on. The ridge detector and the QN FD-refresh triggers are
mutually exclusive: the FD refresh fires only in "indefinite"
routing mode, while the ridge detector requires lambda_min(B_k) > 0,
so both cannot fire on the same iteration. If the one-way switch to
trust-region mode occurs, subsequent iterations no longer update the
indefinite-mode counters (since they are gated on being in cubic
mode).
Value
Same structure as \link{arcopt}. The diagnostics sublist
gains four QN-specific counters in addition to the standard mode-
dispatch fields:
-
qn_updates: number of successful QN updates; -
qn_skips: number of skipped updates (curvature condition failed); -
qn_restarts: number of approximation restarts; -
qn_fd_refreshes: number of FD Hessian refreshes performed (hybrid mode only; 0 otherwise).
qn_polish_switches, qn_polish_reverts, and
hess_evals_at_polish_switch are always 0L/NA for QN runs
(the polish mode is exact-Hessian only).
Check Convergence Criteria (Stan-style)
Description
Checks multiple convergence criteria following Stan's L-BFGS implementation. Termination occurs when any criterion is satisfied, preventing premature termination on ill-scaled problems while ensuring robust detection.
Usage
check_convergence(
g_current,
f_current,
f_previous,
x_current,
x_previous,
iter,
tol_grad = 1e-08,
tol_rel_grad = 1e-06,
tol_obj = 1e-12,
tol_rel_obj = 1e-08,
tol_param = 1e-08,
max_iter = 1000
)
Arguments
g_current |
Current gradient vector (length Q) |
f_current |
Current objective function value (scalar) |
f_previous |
Previous objective function value (scalar), NA for first iteration |
x_current |
Current parameter vector (length Q) |
x_previous |
Previous parameter vector (length Q), NULL for first iteration |
iter |
Current iteration number (0-indexed) |
tol_grad |
Absolute gradient tolerance |
tol_rel_grad |
Relative gradient tolerance |
tol_obj |
Absolute objective change tolerance |
tol_rel_obj |
Relative objective change tolerance |
tol_param |
Parameter change tolerance (infinity norm) |
max_iter |
Maximum iterations allowed |
Details
The function checks 6 convergence criteria in order:
-
max_iter: Iteration limit reached
-
gradient_abs: max(abs(g)) < tol_grad
-
gradient_rel: max(abs(g)) < tol_rel_grad * max(1, abs(f))
-
objective_abs: abs(f_current - f_previous) < tol_obj
-
objective_rel: abs(f_current - f_previous) < tol_rel_obj * max(1, abs(f_current))
-
parameter: max(abs(x_current - x_previous)) < tol_param
Value
List with two components:
converged |
Logical; TRUE if any criterion satisfied |
reason |
Character; description of which criterion triggered, or "" if not converged |
Protect Against NaN/Inf in Function Evaluations
Description
Detects NaN or Inf values in function and gradient evaluations and attempts recovery by increasing regularization to produce smaller steps.
Usage
check_finite(f_value, g_value)
Arguments
f_value |
Function value to check |
g_value |
Gradient vector to check |
Details
This is a simple check function. The recovery logic (re-solving with larger sigma) should be implemented in the main optimization loop.
Checks that:
Function value is finite (not NaN or Inf)
All gradient components are finite
Value
Logical; TRUE if values are finite and valid, FALSE if NaN/Inf detected
Check Flat-Ridge Trigger
Description
Evaluates the four-signal rule against the current sliding window.
Usage
check_flat_ridge_trigger(
state,
sigma_min,
tol_ridge = 0.001,
rho_tol = 0.1,
grad_decrease_max = 0.9,
g_inf_floor = 1e-06
)
Arguments
state |
List returned by |
sigma_min |
Minimum regularization floor used by the orchestrator. |
tol_ridge |
Upper bound on |
rho_tol |
Maximum |
grad_decrease_max |
Ratio of latest to oldest |
g_inf_floor |
Absolute lower bound on |
Value
TRUE if the trust-region fallback should be activated,
FALSE otherwise. Returns FALSE whenever the window is
not yet full or any diagnostic is non-finite.
Check Healthy-Basin Trigger
Description
Evaluates the five-signal rule against the current sliding window.
Usage
check_healthy_basin_trigger(
state,
rho_polish = 0.9,
lambda_min_polish = 0.001,
g_decay_polish = 0.5,
g_inf_floor_polish = 1e-08
)
Arguments
state |
List returned by |
rho_polish |
Minimum acceptable |
lambda_min_polish |
Minimum acceptable |
g_decay_polish |
Maximum acceptable ratio of consecutive
|
g_inf_floor_polish |
Absolute lower bound on |
Value
TRUE if the qn_polish transition should be activated,
FALSE otherwise. Returns FALSE whenever the window is
not yet full or any diagnostic is non-finite.
Detect Stagnation and Recommend Action
Description
Monitors optimization progress and detects stagnation based on small consecutive steps or function changes. Returns recommended action: continue, refresh Hessian, or stop.
Usage
detect_stagnation(
step_norms,
f_values,
tol_step = 1e-12,
tol_f = 1e-14,
max_stagnant = 5,
is_qn = FALSE,
already_refreshed = FALSE
)
Arguments
step_norms |
Vector of recent step norms (most recent last) |
f_values |
Vector of recent function values (most recent last) |
tol_step |
Step size tolerance for stagnation (default: 1e-12) |
tol_f |
Function change tolerance for stagnation (default: 1e-14) |
max_stagnant |
Maximum consecutive stagnant iterations before action (default: 5) |
is_qn |
Logical; TRUE if using quasi-Newton (enables Hessian refresh option) |
already_refreshed |
Logical; TRUE if Hessian was already refreshed once |
Details
Stagnation is detected when either:
Step norms are consecutively below tol_step for max_stagnant iterations
Function changes are consecutively below tol_f for max_stagnant iterations
When stagnation is detected:
If using quasi-Newton and not already refreshed: return "refresh_hessian"
Otherwise: return "stop" (declare convergence failure)
Value
Character; one of "continue", "refresh_hessian", or "stop"
Flat-Ridge Detector for Cubic-to-Trust-Region Fallback
Description
Detects when cubic regularization has entered a flat-ridge stagnation regime: gradient stalls in a near-singular positive-definite region where the cubic penalty has no bite. Triggers the trust-region fallback in the main arcopt iteration when fired.
Details
The detector maintains a sliding window of runtime diagnostics and fires when all four of the following signals hold throughout the window:
-
Sigma pinned at floor:
sigma_k <= 10 * sigma_minfor every iteration in the window. -
Near-perfect model:
|rho_k - 1| < rho_tolfor every iteration in the window (model reduction matches actual reduction). -
Gradient stagnant: The ratio of the most recent
||g||_infto the oldest in the window exceedsgrad_decrease_max(less than 10 percent decrease over the window by default). -
Hessian ill-conditioned: The smallest Hessian eigenvalue at the most recent iteration satisfies
lambda_min < tol_ridge. This includes both the classical "flat-ridge" regime (small positivelambda_min) and the "stuck-at-indefinite-saddle" regime (any negativelambda_min), because cubic regularization loses its grip in both.
An absolute floor g_inf_floor is also enforced: the trigger will
not fire if ||g||_inf is already below the typical convergence
tolerance, preventing spurious triggers at true local minima.
The detector is an empirical proxy for local-error-bound (EB) violation (Yue, Zhou, So 2018). Under EB, cubic regularization attains Q-quadratic convergence even at degenerate minima; the detector fires precisely in the regime where EB is practically vacuous and no theoretical convergence guarantee applies.
Healthy-Basin Detector for Cubic-to-QN-Polish Transition
Description
Detects when the iterate has entered the quadratic attraction basin of a strict local minimum, at which point arcopt can switch from cubic regularization to line-search BFGS (qn_polish mode) and skip further Hessian evaluations for the remainder of the run.
Details
The detector maintains a sliding window of runtime diagnostics and fires when all five of the following signals hold throughout the window:
-
Newton step accepted:
step_type == "newton"at every iteration in the window. Indicates the cubic subproblem has not been invoked (Cholesky succeeded and the Newton step passed the ratio test). -
High ratio:
rho_k >= rho_polish(default0.9) throughout the window. The quadratic model is accurately predicting actual reduction. -
Hessian well-conditioned PD:
lambda_min(H_k) >= lambda_min_polish(default1e-3) throughout the window. Not just "Cholesky succeeded" but strictly bounded away from zero. -
Superlinear gradient decay:
||g_k||_inf / ||g_{k-1}||_inf <= g_decay_polish(default0.5) for every consecutive pair in the window. This is the strongest signal of the quadratic attraction basin. -
Above convergence floor:
||g_0||_inf > g_inf_floor_polishat window start (default1e-8). Prevents firing at convergence where Newton-accept and gradient-decay are trivially satisfied.
Contrast with the flat-ridge detector (flat_ridge): that
detector fires on stagnation (gradient not decreasing); this one
fires on healthy convergence (gradient decreasing super-
linearly). Both are expected to be mutually exclusive in practice.
Initialize Flat-Ridge Detector State
Description
Initialize Flat-Ridge Detector State
Usage
init_flat_ridge_state(window = 10)
Arguments
window |
Integer sliding-window length (default 10) |
Value
A list with empty diagnostic vectors and the window size.
See Also
update_flat_ridge_state,
check_flat_ridge_trigger
Initialize Healthy-Basin Detector State
Description
Initialize Healthy-Basin Detector State
Usage
init_healthy_basin_state(window = 5)
Arguments
window |
Integer sliding-window length (default 5; shorter than the flat-ridge detector's 10 because the basin signal is stronger and revert-on-false-positive is cheap). |
Value
A list with empty diagnostic vectors and the window size.
See Also
update_healthy_basin_state,
check_healthy_basin_trigger
Project Point to Box Constraints
Description
Projects a point to the feasible region defined by box constraints.
Usage
project_to_box(x, lower, upper)
Arguments
x |
Point to project |
lower |
Lower bounds |
upper |
Upper bounds |
Details
Component-wise projection: each component is clamped to its bounds
Value
Projected point within bounds
Solve Cubic Subproblem via Eigendecomposition (Algorithm 5a)
Description
Solves min_s m(s) = g^T s + 1/2 s^T H s + sigma/3 ||s||^3 using full eigendecomposition with explicit hard-case detection.
Usage
solve_cubic_eigen(
g,
H,
sigma,
max_lambda_iter = 50,
lambda_tol = 1e-10,
hard_case_tol = 1e-08
)
Arguments
g |
Gradient vector (length n) |
H |
Hessian matrix (n x n symmetric, may be indefinite) |
sigma |
Regularization parameter (positive scalar) |
max_lambda_iter |
Maximum secular equation iterations (default: 50) |
lambda_tol |
Secular equation convergence tolerance (default: 1e-10) |
hard_case_tol |
Hard case detection threshold (default: 1e-8) |
Details
Uses eigen() for O(n^3) eigendecomposition. Hard case occurs when gradient is nearly orthogonal to smallest eigenvector (|g_1| < tol_hard * ||g||). Recommended for n <= 500 due to cubic computational cost.
Value
List with:
s |
Solution vector |
lambda |
Lagrange multiplier |
pred_reduction |
Predicted reduction: -g^T s - 1/2 s^T H s - sigma/3 ||s||^3 |
converged |
TRUE if secular equation converged |
case_type |
"easy", "hard", or "zero_gradient" |
eigenvalues |
Eigenvalues of |
Solve Cubic Subproblem with Automatic Solver Selection
Description
Unified dispatcher for cubic subproblem solvers with automatic solver selection based on problem size and Hessian representation.
Usage
solve_cubic_subproblem_dispatch(g, H = NULL, sigma, solver = "auto", ...)
Arguments
g |
Gradient vector (length n) |
H |
Hessian matrix (n x n symmetric, may be indefinite) |
sigma |
Regularization parameter (positive scalar) |
solver |
Solver to use: "auto" (default) or "eigen". Auto-selection uses eigendecomposition (Algorithm 5a). |
... |
Additional arguments passed to specific solvers |
Details
Uses eigendecomposition-based solver (Algorithm 5a) which provides robust handling of indefinite Hessians and hard cases.
For large-scale matrix-free optimization (n > 500), see design/scalable-arcs.qmd for deferred large-scale solver implementations.
Value
List with:
s |
Solution vector |
lambda |
Lagrange multiplier |
pred_reduction |
Predicted reduction |
converged |
TRUE if solver converged |
solver_used |
Name of solver that was used |
Solve Trust-Region Subproblem via Eigendecomposition (Algorithm 5c)
Description
Solves min_s m(s) = g^T s + 1/2 s^T H s subject to ||s|| <= radius using full eigendecomposition with explicit hard-case detection.
Usage
solve_tr_eigen(
g,
H,
radius,
max_lambda_iter = 50,
lambda_tol = 1e-10,
hard_case_tol = 1e-08
)
Arguments
g |
Gradient vector (length n) |
H |
Hessian matrix (n x n symmetric, may be indefinite) |
radius |
Trust-region radius (positive scalar) |
max_lambda_iter |
Maximum secular equation iterations (default: 50) |
lambda_tol |
Secular equation convergence tolerance (default: 1e-10) |
hard_case_tol |
Hard case detection threshold (default: 1e-8) |
Details
This is the trust-region counterpart to solve_cubic_eigen. The only
structural difference is the constraint: the cubic solver enforces
lambda = sigma * ||s||, while the trust-region solver enforces
||s|| <= radius. Both reduce to (H + lambda*I) s = -g with
lambda >= max(0, -lambda_min(H)).
Three cases are handled:
-
Interior: H is positive definite and the unconstrained Newton step
-H^{-1} glies inside the trust region. Returned withlambda = 0andon_boundary = FALSE. -
Easy boundary: A unique
lambda > max(0, -lambda_min)places||s(lambda)|| = radius. Found by Newton on the secular equation. -
Hard case: Hessian is indefinite and the gradient is nearly orthogonal to the smallest-eigenvalue eigenvector. Resolved per More-Sorensen by setting
lambda = -lambda_min(H)and adding a component along the smallest eigenvector to reach the boundary.
Recommended for n <= 500 due to O(n^3) eigendecomposition cost.
Value
List with:
s |
Solution vector |
lambda |
Lagrange multiplier (0 if interior Newton step) |
pred_reduction |
Predicted reduction: -g^T s - 1/2 s^T H s |
converged |
TRUE if solver returned a valid step |
case_type |
"interior", "easy", "hard", or "zero_gradient" |
on_boundary |
TRUE if ||s|| == radius (active constraint) |
Try Newton Step
Description
Attempts to compute and validate a Newton step when the Hessian is positive definite. Uses Cholesky factorization to check positive definiteness and solve the Newton system.
Usage
try_newton_step(g, H)
Arguments
g |
Gradient vector (length n) |
H |
Hessian matrix (n x n symmetric matrix) |
Details
The algorithm:
Attempts Cholesky factorization of H (fails if H is indefinite)
If successful, solves H * s = -g for Newton step
Computes predicted reduction from quadratic model
No pre-filtering based on sigma is required - the ratio test in the main
ARC loop naturally determines whether Newton steps are appropriate. See
the design notes in literature/consensus_reviews/ for the rationale
behind not imposing an explicit sigma threshold on the Newton step.
Value
List with components:
success |
Logical; TRUE if Newton step computed successfully |
s |
Newton step vector if successful, NULL otherwise |
pred_reduction |
Predicted reduction from quadratic model if successful, NULL otherwise |
BFGS (Broyden-Fletcher-Goldfarb-Shanno) Hessian Update
Description
Updates the Hessian approximation using the BFGS formula. BFGS maintains positive definiteness when the curvature condition y'*s > 0 is satisfied.
Usage
update_bfgs(b, s, y)
Arguments
b |
Current Hessian approximation (n x n symmetric positive definite matrix). |
s |
Step vector: s = x_new - x_old. |
y |
Gradient difference: y = g_new - g_old. |
Details
The BFGS update formula is:
B_{k+1} = B_k - \frac{B_k s s^T B_k}{s^T B_k s} + \frac{y y^T}{y^T s}
The update is skipped when the curvature condition y^T s > 0 is
violated. This can happen on nonconvex problems. When skipped, the previous
approximation is retained.
Note: BFGS maintains positive definiteness, which wastes ARC's ability to handle indefinite Hessians. Consider using SR1 for nonconvex problems.
Value
A list with components:
b |
Updated Hessian approximation (n x n matrix). |
skipped |
Logical indicating if update was skipped due to curvature condition violation. |
References
Algorithm 4a in design/pseudocode.qmd
Nocedal, J., & Wright, S. J. (2006). Numerical Optimization (2nd ed.). Springer. Section 6.1.
Powell-Damped BFGS Update
Description
Applies Powell damping to ensure the curvature condition is satisfied, then performs a BFGS update. This guarantees an update even when the standard curvature condition y's > 0 fails.
Usage
update_bfgs_powell(b, s, y, damping_threshold = 0.2)
Arguments
b |
Current Hessian approximation (n x n symmetric matrix). |
s |
Step vector: s = x_new - x_old. |
y |
Gradient difference: y = g_new - g_old. |
damping_threshold |
Threshold for damping (default: 0.2). Damping applied if y's < damping_threshold * s'Bs. |
Details
Powell damping modifies y to y_damped = theta*y + (1-theta)*Bs such that s'*y_damped >= damping_threshold * s'*Bs, ensuring positive curvature.
The formula for theta when y's < damping_threshold * s'Bs is: theta = (1 - damping_threshold) * s'Bs / (s'Bs - y's)
This ensures s'*y_damped = damping_threshold * s'Bs exactly.
Value
A list with components:
b |
Updated Hessian approximation (n x n matrix). |
theta |
Damping parameter used (1.0 = no damping). |
y_damped |
The damped y vector used in update. |
skipped |
Logical indicating if update was skipped (only if s'Bs is too small to apply damping). |
References
Powell, M. J. D. (1978). A fast algorithm for nonlinearly constrained optimization calculations. Numerical Analysis, 144-157.
Algorithm 4a-hybrid in design/pseudocode.qmd
Update Flat-Ridge Detector State with One Iteration's Diagnostics
Description
Pushes a new record onto the sliding window and trims the oldest entry once the window is full.
Usage
update_flat_ridge_state(state, sigma, rho, g_inf, lambda_min)
Arguments
state |
List returned by |
sigma |
Current regularization parameter. |
rho |
Current step-acceptance ratio. |
g_inf |
Current |
lambda_min |
Current smallest Hessian eigenvalue (may be negative
for indefinite Hessians; pass |
Value
The updated state list.
Update Healthy-Basin Detector State with One Iteration's Diagnostics
Description
Pushes a new record onto the sliding window and trims the oldest entry once the window is full.
Usage
update_healthy_basin_state(state, used_newton, rho, lambda_min, g_inf)
Arguments
state |
List returned by |
used_newton |
Logical; |
rho |
Current step-acceptance ratio (NA if step was rejected). |
lambda_min |
Current smallest Hessian eigenvalue. |
g_inf |
Current |
Value
The updated state list.
Hybrid BFGS/SR1 Update with Automatic Routing
Description
Attempts BFGS first, falls back to SR1, then to Powell-damped BFGS. Provides robust quasi-Newton updates for both convex and nonconvex regions.
Usage
update_hybrid(
b,
s,
y,
bfgs_tol = 1e-10,
sr1_skip_tol = 1e-08,
skip_count = 0L,
restart_threshold = 5L
)
Arguments
b |
Current Hessian approximation (n x n symmetric matrix). |
s |
Step vector: s = x_new - x_old. |
y |
Gradient difference: y = g_new - g_old. |
bfgs_tol |
Tolerance for BFGS curvature condition (default: 1e-10). |
sr1_skip_tol |
Tolerance for SR1 skip test (default: 1e-8). |
skip_count |
Current count of consecutive skipped updates. |
restart_threshold |
Number of consecutive skips before restart (default: 5). |
Details
Routing priority:
BFGS if y's > bfgs_tol (stable, positive definite)
SR1 if |r's| >= sr1_skip_tol * ||r|| * ||s|| (allows indefiniteness)
Powell-damped BFGS otherwise (guaranteed update)
The hybrid approach is ideal for problems that transition between convex and nonconvex regions.
Value
A list with components:
b |
Updated Hessian approximation (n x n matrix). |
update_type |
Character: "bfgs", "sr1", "powell", or "skipped". |
skipped |
Logical indicating if no update was applied. |
skip_count |
Updated skip counter (reset on successful update). |
restarted |
Logical indicating if b was reset to scaled identity. |
theta |
Powell damping parameter (only if update_type = "powell"). |
References
Algorithm 4a-hybrid in design/pseudocode.qmd
Hybrid Update with SR1-First Routing
Description
Reverse priority of update_hybrid: tries SR1 first (to preserve indefinite curvature information), then BFGS, then Powell-damped BFGS. Used by arcopt_qn when the current Hessian approximation is indefinite or when recent BFGS predictions have been inaccurate.
Usage
update_hybrid_sr1_first(
b,
s,
y,
bfgs_tol = 1e-10,
sr1_skip_tol = 1e-08,
skip_count = 0L,
restart_threshold = 5L
)
Arguments
b |
Current Hessian approximation (n x n symmetric matrix). |
s |
Step vector: s = x_new - x_old. |
y |
Gradient difference: y = g_new - g_old. |
bfgs_tol |
Tolerance for BFGS curvature condition (default: 1e-10). |
sr1_skip_tol |
Tolerance for SR1 skip test (default: 1e-8). |
skip_count |
Current count of consecutive skipped updates. |
restart_threshold |
Number of consecutive skips before restart (default: 5). |
Value
Same structure as update_hybrid.
Update Regularization Parameter (Classical CGT)
Description
Adaptively adjusts the regularization parameter sigma based on how well the cubic model predicted the actual function decrease. Follows the classical Cartis-Gould-Toint (2011) update strategy.
Usage
update_sigma_cgt(
sigma_current,
rho,
eta1 = 0.1,
eta2 = 0.9,
gamma1 = 0.5,
gamma2 = 2,
sigma_min = 1e-16,
sigma_max = 1e+16
)
Arguments
sigma_current |
Current regularization parameter (positive scalar) |
rho |
Ratio of actual to predicted reduction (scalar) |
eta1 |
Acceptance threshold (default: 0.1). Steps with rho >= eta1 are accepted. |
eta2 |
Very successful threshold (default: 0.9). Steps with rho >= eta2 trigger sigma decrease. |
gamma1 |
Sigma decrease factor for very successful steps (default: 0.5) |
gamma2 |
Sigma increase factor for unsuccessful steps (default: 2.0) |
sigma_min |
Minimum allowed sigma (default: 1e-16) |
sigma_max |
Maximum allowed sigma (default: 1e16) |
Details
The update strategy is:
-
Very successful (rho >= eta2): Decrease sigma by gamma1 factor (trust model more)
-
Moderately successful (eta1 <= rho < eta2): Keep sigma unchanged
-
Unsuccessful (rho < eta1): Increase sigma by gamma2 factor (trust model less)
The ratio rho measures model quality: rho = (f(x) - f(x+s)) / pred_reduction. Values near 1 indicate excellent model accuracy; values near 0 or negative indicate poor model fit.
Value
Updated sigma value (scalar)
Update Regularization Parameter (Interpolation-Enhanced CGT)
Description
Adaptively adjusts sigma using interpolation to estimate local Lipschitz constant from model-function discrepancy. May provide faster adaptation on large-scale problems with varying curvature (Gould, Porcelli, Toint, 2012).
Usage
update_sigma_interpolation(
sigma_current,
rho,
s,
f_trial,
m_trial,
eta1 = 0.1,
eta2 = 0.9,
gamma1 = 0.5,
gamma2 = 2,
sigma_min = 1e-16,
sigma_max = 1e+16
)
Arguments
sigma_current |
Current regularization parameter (positive scalar) |
rho |
Ratio of actual to predicted reduction (scalar) |
s |
Current step vector |
f_trial |
Function value at trial point f(x + s) |
m_trial |
Model value at step m(s) |
eta1 |
Acceptance threshold (default: 0.1) |
eta2 |
Very successful threshold (default: 0.9) |
gamma1 |
Sigma decrease factor for very successful steps (default: 0.5) |
gamma2 |
Sigma increase factor for unsuccessful steps (default: 2.0) |
sigma_min |
Minimum allowed sigma (default: 1e-16) |
sigma_max |
Maximum allowed sigma (default: 1e16) |
Details
Estimates local Lipschitz constant from discrepancy between function and model: L_hat = 2 * |f(x+s) - m(s)| / ||s||^3
The update strategy incorporates this estimate:
-
Very successful (rho >= eta2): sigma = max(L_hat/2, sigma/gamma1, sigma_min)
-
Moderately successful (eta1 <= rho < eta2): Keep sigma unchanged
-
Unsuccessful (rho < eta1): sigma = min(max(L_hat/2, gamma2*sigma), sigma_max)
This may provide better adaptation than Algorithm 2a (classical CGT) when curvature varies significantly across the domain.
Value
Updated sigma value (scalar)
SR1 (Symmetric Rank-1) Hessian Update
Description
Updates the Hessian approximation using the SR1 formula. Unlike BFGS, SR1 can represent indefinite matrices, making it ideal for ARC which naturally handles negative curvature.
Usage
update_sr1(b, s, y, skip_tol = 1e-08, skip_count = 0L, restart_threshold = 5L)
Arguments
b |
Current Hessian approximation (n x n symmetric matrix). |
s |
Step vector: s = x_new - x_old. |
y |
Gradient difference: y = g_new - g_old. |
skip_tol |
Tolerance for skip test (default: 1e-8). Update is skipped if |r'*s| < skip_tol * ||r|| * ||s||. |
skip_count |
Current count of consecutive skipped updates. |
restart_threshold |
Number of consecutive skips before restart (default: 5). |
Details
The SR1 update formula is:
B_{k+1} = B_k + \frac{r r^T}{r^T s}
where r = y - B_k s.
The update is skipped when the denominator is too small relative to the norms of r and s, which prevents numerical instability. After too many consecutive skips, the approximation is reset using Barzilai-Borwein scaling.
Value
A list with components:
b |
Updated Hessian approximation (n x n matrix). |
skipped |
Logical indicating if update was skipped. |
skip_count |
Updated skip counter (reset to 0 if update applied or restart triggered). |
restarted |
Logical indicating if b was reset to scaled identity. |
References
Algorithm 4a in design/pseudocode.qmd
Conn, A. R., Gould, N. I., & Toint, P. L. (1991). Convergence of quasi-Newton matrices generated by the symmetric rank one update. Mathematical Programming, 50(1-3), 177-195.
Validate and Project Initial Point
Description
Validates bounds and projects initial point to feasible region if needed.
Usage
validate_and_project_initial(x0, lower, upper)
Arguments
x0 |
User-provided initial point |
lower |
Lower bounds |
upper |
Upper bounds |
Details
This implements Algorithm 7b from the design specification:
Validate bounds (lower <= upper, correct dimensions)
Check if x0 is feasible
Project to box if infeasible (with warning)
Value
Feasible initial point
Wolfe Line Search
Description
Performs a line search along a descent direction d from point x,
finding a step length alpha that satisfies the strong Wolfe conditions.
Implements the bracketing + zoom scheme from Nocedal & Wright (2006),
Algorithms 3.5 (bracketing) and 3.6 (zoom).
Usage
wolfe_line_search(
fn,
gr,
x,
d,
f_x,
g_x,
c1 = 1e-04,
c2 = 0.9,
alpha_max = 1,
max_iter = 20,
...
)
Arguments
fn |
Objective function. Called as |
gr |
Gradient function. Called as |
x |
Numeric vector; current iterate. |
d |
Numeric vector; search direction (must be a descent direction,
i.e., |
f_x |
Current objective value |
g_x |
Current gradient |
c1 |
Armijo (sufficient decrease) constant; default |
c2 |
Curvature constant (strong Wolfe); default |
alpha_max |
Maximum step length; default |
max_iter |
Maximum total evaluations (bracketing + zoom); default
|
... |
Extra arguments forwarded to |
Details
Strong Wolfe conditions:
Armijo (sufficient decrease):
f(x + \alpha d) \le f(x) + c_1 \alpha g_x^\top dCurvature:
|g(x + \alpha d)^\top d| \le c_2 |g_x^\top d|
The algorithm first brackets an interval containing a Wolfe point, then bisects within the bracket to locate one. Bisection is used for simplicity; quadratic or cubic interpolation as in Nocedal & Wright Sec. 3.5 can be added as a future refinement.
Returns success = FALSE when no strong Wolfe alpha is found within
max_iter evaluations. The caller can still consult alpha (and the
associated trial point) to decide whether to take the best-known step
or reject the iteration.
Value
A list with components:
success |
TRUE if strong Wolfe was satisfied; FALSE otherwise. |
alpha |
Final step length (best available even on failure). |
x_new |
|
f_new |
|
g_new |
|
evals_f |
Number of objective evaluations performed. |
evals_g |
Number of gradient evaluations performed. |
reason |
String describing success or failure mode. |