The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
library(nabla)
#>
#> Attaching package: 'nabla'
#> The following objects are masked from 'package:stats':
#>
#> D, derivD operatorThe D operator composes to arbitrary order.
D(f, x) gives the first derivative;
D(f, x, order = 2) gives the second;
D(f, x, order = 3) gives the third-order tensor; and so
on:
f <- function(x) x[1]^3
D(f, 2) # f'(2) = 3*4 = 12
#> [1] 12
D(f, 2, order = 2) # f''(2) = 6*2 = 12
#> [,1]
#> [1,] 12
D(f, 2, order = 3) # f'''(2) = 6
#> , , 1
#>
#> [,1]
#> [1,] 6For multi-parameter functions, each application of D
appends one dimension to the output. A scalar function \(f: \mathbb{R}^n \to \mathbb{R}\) produces a
gradient vector (\(n\)), then a Hessian
matrix (\(n \times n\)), then a
third-order tensor (\(n \times n \times
n\)):
# Gradient of a 2-parameter function
f <- function(x) x[1]^2 * x[2]
D(f, c(3, 4)) # gradient: c(24, 9)
#> [1] 24 9
# Hessian via D^2
D(f, c(3, 4), order = 2)
#> [,1] [,2]
#> [1,] 8 6
#> [2,] 6 0
# Composition works identically
Df <- D(f)
DDf <- D(Df)
DDf(c(3, 4))
#> [,1] [,2]
#> [1,] 8 6
#> [2,] 6 0For vector-valued functions, D produces the
Jacobian:
g <- function(x) list(x[1] * x[2], x[1]^2 + x[2])
D(g, c(3, 4)) # 2x2 Jacobian: [[4, 3], [6, 1]]
#> [,1] [,2]
#> [1,] 4 3
#> [2,] 6 1The convenience functions gradient(),
hessian(), and jacobian() are thin wrappers
around D:
The curvature of a curve \(y = f(x)\) is:
\[\kappa = \frac{|f''(x)|}{(1 + f'(x)^2)^{3/2}}\]
With D(), we can compute this directly:
curvature <- function(f, x) {
d1 <- D(f, x)
d2 <- D(f, x, order = 2)
abs(d2) / (1 + d1^2)^(3/2)
}
# Curvature of sin(x) at various points
# Wrap sin for D()'s x[1] convention
sin_f <- function(x) sin(x[1])
xs <- seq(0, 2 * pi, length.out = 7)
kappas <- sapply(xs, function(x) curvature(sin_f, x))
data.frame(x = round(xs, 3), curvature = round(kappas, 6))
#> x curvature
#> 1 0.000 0.000000
#> 2 1.047 0.619677
#> 3 2.094 0.619677
#> 4 3.142 0.000000
#> 5 4.189 0.619677
#> 6 5.236 0.619677
#> 7 6.283 0.000000The table reveals the geometry of \(\sin(x)\): curvature is zero at inflection points (integer multiples of \(\pi\), where \(\sin''(x) = 0\)) and peaks at \(\pi/2\) and \(3\pi/2\) where \(\sin\) reaches its extrema. The maximum curvature of 1.0 reflects the unit amplitude — for \(A\sin(x)\) the maximum curvature would be \(A\).
At \(x = \pi/2\), \(\sin\) has maximum curvature because \(|\sin''(\pi/2)| = 1\) and \(\sin'(\pi/2) = 0\):
Curvature peaks where the second derivative is largest in magnitude and the first derivative is small. For \(\sin(x)\), this occurs at \(\pi/2\) and \(3\pi/2\):
xs_curve <- seq(0, 2 * pi, length.out = 200)
sin_vals <- sin(xs_curve)
kappa_vals <- sapply(xs_curve, function(x) curvature(sin_f, x))
oldpar <- par(mar = c(4, 4.5, 2, 4.5))
plot(xs_curve, sin_vals, type = "l", col = "steelblue", lwd = 2,
xlab = "x", ylab = "sin(x)",
main = "sin(x) and its curvature")
par(new = TRUE)
plot(xs_curve, kappa_vals, type = "l", col = "firebrick", lwd = 2, lty = 2,
axes = FALSE, xlab = "", ylab = "")
axis(4, col.axis = "firebrick")
mtext(expression(kappa(x)), side = 4, line = 2.5, col = "firebrick")
abline(v = c(pi / 2, 3 * pi / 2), col = "grey60", lty = 3)
legend("bottom",
legend = c("sin(x)", expression(kappa(x))),
col = c("steelblue", "firebrick"), lty = c(1, 2), lwd = 2,
bty = "n", horiz = TRUE)The curvature reaches its maximum of 1.0 at \(x = \pi/2\) (where \(\sin'(x) = 0\) and \(|\sin''(x)| = 1\)) and returns to zero at integer multiples of \(\pi\) (where \(\sin''(x) = 0\)).
A second-order Taylor approximation of \(f\) around \(x_0\):
\[f(x) \approx f(x_0) + f'(x_0)(x - x_0) + \frac{1}{2} f''(x_0)(x - x_0)^2\]
D() computes this directly:
taylor2 <- function(f, x0, x) {
f0 <- f(x0)
f1 <- D(f, x0)
f2 <- D(f, x0, order = 2)
f0 + f1 * (x - x0) + 0.5 * f2 * (x - x0)^2
}
# Approximate exp(x) near x = 0
f_exp <- function(x) exp(x[1])
xs <- c(-0.1, -0.01, 0, 0.01, 0.1)
data.frame(
x = xs,
exact = exp(xs),
taylor2 = sapply(xs, function(x) taylor2(f_exp, 0, x)),
error = exp(xs) - sapply(xs, function(x) taylor2(f_exp, 0, x))
)
#> x exact taylor2 error
#> 1 -0.10 0.9048374 0.90500 -1.625820e-04
#> 2 -0.01 0.9900498 0.99005 -1.662508e-07
#> 3 0.00 1.0000000 1.00000 0.000000e+00
#> 4 0.01 1.0100502 1.01005 1.670842e-07
#> 5 0.10 1.1051709 1.10500 1.709181e-04Near the expansion point, the approximation is accurate. The error grows as \(O(|x - x_0|^3)\) because the Taylor remainder is \(\frac{f'''(\xi)}{6}(x - x_0)^3\). For \(\exp(x)\) around \(x_0 = 0\), \(f'''(\xi) = \exp(\xi) \approx 1\), so the error at \(x = 0.1\) is approximately \(0.1^3/6 \approx 1.7 \times 10^{-4}\) and at \(x = 0.01\) it drops to \(\approx 1.7 \times 10^{-7}\) — a factor-of-1000 reduction for a factor-of-10 decrease in distance, confirming cubic convergence:
xs_plot <- seq(-2, 3, length.out = 300)
exact_vals <- exp(xs_plot)
taylor_vals <- sapply(xs_plot, function(x) taylor2(f_exp, 0, x))
oldpar <- par(mar = c(4, 4, 2, 1))
plot(xs_plot, exact_vals, type = "l", col = "steelblue", lwd = 2,
xlab = "x", ylab = "y",
main = expression("exp(x) vs 2nd-order Taylor at " * x[0] == 0),
ylim = c(-1, 12))
lines(xs_plot, taylor_vals, col = "firebrick", lwd = 2, lty = 2)
abline(v = 0, col = "grey60", lty = 3)
points(0, 1, pch = 19, col = "grey40", cex = 1.2)
legend("topleft",
legend = c("exp(x)", expression("Taylor " * T[2](x))),
col = c("steelblue", "firebrick"), lty = c(1, 2), lwd = 2,
bty = "n")The Taylor polynomial matches exp(x) closely near \(x_0 = 0\) but diverges at larger distances,
illustrating the local nature of polynomial approximation.
hessian() computes the exact second-derivative matrix at
any point. For maximum likelihood estimation, this gives the
observed information matrix \(J(\hat\theta) = -H(\hat\theta)\), whose
inverse is the asymptotic variance-covariance matrix. Exact Hessians
yield exact standard errors and confidence intervals — the primary
reason to use AD in statistical inference.
Consider a Poisson log-likelihood for \(\lambda\):
set.seed(123)
data_pois <- rpois(50, lambda = 3)
n <- length(data_pois)
sum_x <- sum(data_pois)
sum_lfact <- sum(lfactorial(data_pois))
ll_poisson <- function(theta) {
lambda <- theta[1]
sum_x * log(lambda) - n * lambda - sum_lfact
}
lambda0 <- 2.5Primary approach: Using hessian():
The observed information is simply -hess_helper, and the
standard error is 1 / sqrt(-hess_helper):
se <- 1 / sqrt(-hess_helper[1, 1])
cat("SE(lambda) at lambda =", lambda0, ":", se, "\n")
#> SE(lambda) at lambda = 2.5 : 0.2008048Verification: Manual construction of a second-order
dual number — the same arithmetic that hessian() performs
internally:
# Build a dual_variable_n wrapped in a dual_vector
manual_theta <- dual_vector(list(dual_variable_n(lambda0, 2)))
result_manual <- ll_poisson(manual_theta)
manual_hess <- deriv(deriv(result_manual))
manual_hess
#> [1] -24.8Both approaches yield the same result:
Under the hood, D(f, x, order = 2) uses nested
duals — a dual number whose components are themselves dual
numbers. The structure dual(dual(x, 1), dual(1, 0))
simultaneously tracks:
Nested duals can be constructed directly with
dual_variable_n():
x <- dual_variable_n(2, order = 2)
# Evaluate x^3
result <- x^3
# Extract all three quantities
deriv_n(result, 0) # f(2) = 8
#> [1] 8
deriv_n(result, 1) # f'(2) = 3*4 = 12
#> [1] 12
deriv_n(result, 2) # f''(2) = 6*2 = 12
#> [1] 12For constants in higher-order computations, use
dual_constant_n():
k <- dual_constant_n(5, order = 2)
deriv_n(k, 0) # 5
#> [1] 5
deriv_n(k, 1) # 0
#> [1] 0
deriv_n(k, 2) # 0
#> [1] 0The differentiate_n() helper wraps the construction and
extraction:
# Differentiate sin(x) at x = pi/4
result <- differentiate_n(sin, pi / 4, order = 2)
result$value # sin(pi/4)
#> [1] 0.7071068
result$d1 # cos(pi/4) = f'
#> [1] 0.7071068
result$d2 # -sin(pi/4) = f''
#> [1] -0.7071068Verify against known values:
result$value - sin(pi / 4) # ~0
#> [1] 0
result$d1 - cos(pi / 4) # ~0
#> [1] 0
result$d2 - (-sin(pi / 4)) # ~0
#> [1] 0More complex functions work too:
f <- function(x) x * exp(-x^2)
d2 <- differentiate_n(f, 1, order = 2)
# Analytical: f'(x) = exp(-x^2)(1 - 2x^2)
# f''(x) = exp(-x^2)(-6x + 4x^3)
analytical_f1 <- exp(-1) * (1 - 2)
analytical_f2 <- exp(-1) * (-6 + 4)
d2$d1 - analytical_f1 # ~0
#> [1] 0
d2$d2 - analytical_f2 # ~0
#> [1] 0These low-level functions are useful for understanding how
nabla works internally, but for most applications,
D(), gradient(), and hessian()
are the recommended interface.
The D operator provides exact higher-order derivatives
through a single composable interface — no symbolic differentiation, no
finite-difference grid, and no loss of precision:
D(f, x, order = 2) gives the exact second derivative needed
for curvature analysis, Taylor expansion, and Newton-Raphson
optimization.hessian() directly yields the observed information matrix.
Its inverse gives the asymptotic variance-covariance matrix — the
foundation of confidence intervals and hypothesis tests in
likelihood-based inference.D(f, x, order = 3)
yields third-order tensors for asymptotic skewness analysis (see
vignette("mle-skewness")); higher orders work the same
way.D constructs nested dual numbers at each order level. The
same arithmetic that propagates first derivatives recursively propagates
second, third, and higher derivatives.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.