The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Higher-Order Derivatives

library(nabla)
#> 
#> Attaching package: 'nabla'
#> The following objects are masked from 'package:stats':
#> 
#>     D, deriv

The D operator

The D operator composes to arbitrary order. D(f, x) gives the first derivative; D(f, x, order = 2) gives the second; D(f, x, order = 3) gives the third-order tensor; and so on:

f <- function(x) x[1]^3

D(f, 2)                # f'(2) = 3*4 = 12
#> [1] 12
D(f, 2, order = 2)     # f''(2) = 6*2 = 12
#>      [,1]
#> [1,]   12
D(f, 2, order = 3)     # f'''(2) = 6
#> , , 1
#> 
#>      [,1]
#> [1,]    6

For multi-parameter functions, each application of D appends one dimension to the output. A scalar function \(f: \mathbb{R}^n \to \mathbb{R}\) produces a gradient vector (\(n\)), then a Hessian matrix (\(n \times n\)), then a third-order tensor (\(n \times n \times n\)):

# Gradient of a 2-parameter function
f <- function(x) x[1]^2 * x[2]
D(f, c(3, 4))  # gradient: c(24, 9)
#> [1] 24  9

# Hessian via D^2
D(f, c(3, 4), order = 2)
#>      [,1] [,2]
#> [1,]    8    6
#> [2,]    6    0

# Composition works identically
Df <- D(f)
DDf <- D(Df)
DDf(c(3, 4))
#>      [,1] [,2]
#> [1,]    8    6
#> [2,]    6    0

For vector-valued functions, D produces the Jacobian:

g <- function(x) list(x[1] * x[2], x[1]^2 + x[2])
D(g, c(3, 4))  # 2x2 Jacobian: [[4, 3], [6, 1]]
#>      [,1] [,2]
#> [1,]    4    3
#> [2,]    6    1

The convenience functions gradient(), hessian(), and jacobian() are thin wrappers around D:

gradient(f, c(3, 4))           # == D(f, c(3, 4))
#> [1] 24  9
hessian(f, c(3, 4))            # == D(f, c(3, 4), order = 2)
#>      [,1] [,2]
#> [1,]    8    6
#> [2,]    6    0
jacobian(g, c(3, 4))           # == D(g, c(3, 4))
#>      [,1] [,2]
#> [1,]    4    3
#> [2,]    6    1

Curvature analysis

The curvature of a curve \(y = f(x)\) is:

\[\kappa = \frac{|f''(x)|}{(1 + f'(x)^2)^{3/2}}\]

With D(), we can compute this directly:

curvature <- function(f, x) {
  d1 <- D(f, x)
  d2 <- D(f, x, order = 2)
  abs(d2) / (1 + d1^2)^(3/2)
}

# Curvature of sin(x) at various points
# Wrap sin for D()'s x[1] convention
sin_f <- function(x) sin(x[1])
xs <- seq(0, 2 * pi, length.out = 7)
kappas <- sapply(xs, function(x) curvature(sin_f, x))
data.frame(x = round(xs, 3), curvature = round(kappas, 6))
#>       x curvature
#> 1 0.000  0.000000
#> 2 1.047  0.619677
#> 3 2.094  0.619677
#> 4 3.142  0.000000
#> 5 4.189  0.619677
#> 6 5.236  0.619677
#> 7 6.283  0.000000

The table reveals the geometry of \(\sin(x)\): curvature is zero at inflection points (integer multiples of \(\pi\), where \(\sin''(x) = 0\)) and peaks at \(\pi/2\) and \(3\pi/2\) where \(\sin\) reaches its extrema. The maximum curvature of 1.0 reflects the unit amplitude — for \(A\sin(x)\) the maximum curvature would be \(A\).

At \(x = \pi/2\), \(\sin\) has maximum curvature because \(|\sin''(\pi/2)| = 1\) and \(\sin'(\pi/2) = 0\):

curvature(sin_f, pi / 2)  # should be 1.0
#>      [,1]
#> [1,]    1

Visualizing curvature along sin(x)

Curvature peaks where the second derivative is largest in magnitude and the first derivative is small. For \(\sin(x)\), this occurs at \(\pi/2\) and \(3\pi/2\):

xs_curve <- seq(0, 2 * pi, length.out = 200)
sin_vals <- sin(xs_curve)
kappa_vals <- sapply(xs_curve, function(x) curvature(sin_f, x))

oldpar <- par(mar = c(4, 4.5, 2, 4.5))
plot(xs_curve, sin_vals, type = "l", col = "steelblue", lwd = 2,
     xlab = "x", ylab = "sin(x)",
     main = "sin(x) and its curvature")
par(new = TRUE)
plot(xs_curve, kappa_vals, type = "l", col = "firebrick", lwd = 2, lty = 2,
     axes = FALSE, xlab = "", ylab = "")
axis(4, col.axis = "firebrick")
mtext(expression(kappa(x)), side = 4, line = 2.5, col = "firebrick")
abline(v = c(pi / 2, 3 * pi / 2), col = "grey60", lty = 3)
legend("bottom",
       legend = c("sin(x)", expression(kappa(x))),
       col = c("steelblue", "firebrick"), lty = c(1, 2), lwd = 2,
       bty = "n", horiz = TRUE)

par(oldpar)

The curvature reaches its maximum of 1.0 at \(x = \pi/2\) (where \(\sin'(x) = 0\) and \(|\sin''(x)| = 1\)) and returns to zero at integer multiples of \(\pi\) (where \(\sin''(x) = 0\)).

Taylor expansion

A second-order Taylor approximation of \(f\) around \(x_0\):

\[f(x) \approx f(x_0) + f'(x_0)(x - x_0) + \frac{1}{2} f''(x_0)(x - x_0)^2\]

D() computes this directly:

taylor2 <- function(f, x0, x) {
  f0 <- f(x0)
  f1 <- D(f, x0)
  f2 <- D(f, x0, order = 2)
  f0 + f1 * (x - x0) + 0.5 * f2 * (x - x0)^2
}

# Approximate exp(x) near x = 0
f_exp <- function(x) exp(x[1])
xs <- c(-0.1, -0.01, 0, 0.01, 0.1)
data.frame(
  x = xs,
  exact = exp(xs),
  taylor2 = sapply(xs, function(x) taylor2(f_exp, 0, x)),
  error = exp(xs) - sapply(xs, function(x) taylor2(f_exp, 0, x))
)
#>       x     exact taylor2         error
#> 1 -0.10 0.9048374 0.90500 -1.625820e-04
#> 2 -0.01 0.9900498 0.99005 -1.662508e-07
#> 3  0.00 1.0000000 1.00000  0.000000e+00
#> 4  0.01 1.0100502 1.01005  1.670842e-07
#> 5  0.10 1.1051709 1.10500  1.709181e-04

Near the expansion point, the approximation is accurate. The error grows as \(O(|x - x_0|^3)\) because the Taylor remainder is \(\frac{f'''(\xi)}{6}(x - x_0)^3\). For \(\exp(x)\) around \(x_0 = 0\), \(f'''(\xi) = \exp(\xi) \approx 1\), so the error at \(x = 0.1\) is approximately \(0.1^3/6 \approx 1.7 \times 10^{-4}\) and at \(x = 0.01\) it drops to \(\approx 1.7 \times 10^{-7}\) — a factor-of-1000 reduction for a factor-of-10 decrease in distance, confirming cubic convergence:

xs_plot <- seq(-2, 3, length.out = 300)
exact_vals <- exp(xs_plot)
taylor_vals <- sapply(xs_plot, function(x) taylor2(f_exp, 0, x))

oldpar <- par(mar = c(4, 4, 2, 1))
plot(xs_plot, exact_vals, type = "l", col = "steelblue", lwd = 2,
     xlab = "x", ylab = "y",
     main = expression("exp(x) vs 2nd-order Taylor at " * x[0] == 0),
     ylim = c(-1, 12))
lines(xs_plot, taylor_vals, col = "firebrick", lwd = 2, lty = 2)
abline(v = 0, col = "grey60", lty = 3)
points(0, 1, pch = 19, col = "grey40", cex = 1.2)
legend("topleft",
       legend = c("exp(x)", expression("Taylor " * T[2](x))),
       col = c("steelblue", "firebrick"), lty = c(1, 2), lwd = 2,
       bty = "n")

par(oldpar)

The Taylor polynomial matches exp(x) closely near \(x_0 = 0\) but diverges at larger distances, illustrating the local nature of polynomial approximation.

Connection to MLE: exact Hessians for standard errors

hessian() computes the exact second-derivative matrix at any point. For maximum likelihood estimation, this gives the observed information matrix \(J(\hat\theta) = -H(\hat\theta)\), whose inverse is the asymptotic variance-covariance matrix. Exact Hessians yield exact standard errors and confidence intervals — the primary reason to use AD in statistical inference.

Consider a Poisson log-likelihood for \(\lambda\):

set.seed(123)
data_pois <- rpois(50, lambda = 3)
n <- length(data_pois)
sum_x <- sum(data_pois)
sum_lfact <- sum(lfactorial(data_pois))

ll_poisson <- function(theta) {
  lambda <- theta[1]
  sum_x * log(lambda) - n * lambda - sum_lfact
}

lambda0 <- 2.5

Primary approach: Using hessian():

hess_helper <- hessian(ll_poisson, lambda0)
hess_helper
#>       [,1]
#> [1,] -24.8

The observed information is simply -hess_helper, and the standard error is 1 / sqrt(-hess_helper):

se <- 1 / sqrt(-hess_helper[1, 1])
cat("SE(lambda) at lambda =", lambda0, ":", se, "\n")
#> SE(lambda) at lambda = 2.5 : 0.2008048

Verification: Manual construction of a second-order dual number — the same arithmetic that hessian() performs internally:

# Build a dual_variable_n wrapped in a dual_vector
manual_theta <- dual_vector(list(dual_variable_n(lambda0, 2)))
result_manual <- ll_poisson(manual_theta)
manual_hess <- deriv(deriv(result_manual))
manual_hess
#> [1] -24.8

Both approaches yield the same result:

hess_helper[1, 1] - manual_hess  # ~0
#> [1] 0

How it works: nested duals

Under the hood, D(f, x, order = 2) uses nested duals — a dual number whose components are themselves dual numbers. The structure dual(dual(x, 1), dual(1, 0)) simultaneously tracks:

Nested duals can be constructed directly with dual_variable_n():

x <- dual_variable_n(2, order = 2)

# Evaluate x^3
result <- x^3

# Extract all three quantities
deriv_n(result, 0)  # f(2) = 8
#> [1] 8
deriv_n(result, 1)  # f'(2) = 3*4 = 12
#> [1] 12
deriv_n(result, 2)  # f''(2) = 6*2 = 12
#> [1] 12

For constants in higher-order computations, use dual_constant_n():

k <- dual_constant_n(5, order = 2)
deriv_n(k, 0)  # 5
#> [1] 5
deriv_n(k, 1)  # 0
#> [1] 0
deriv_n(k, 2)  # 0
#> [1] 0

The differentiate_n() helper wraps the construction and extraction:

# Differentiate sin(x) at x = pi/4
result <- differentiate_n(sin, pi / 4, order = 2)
result$value   # sin(pi/4)
#> [1] 0.7071068
result$d1      # cos(pi/4) = f'
#> [1] 0.7071068
result$d2      # -sin(pi/4) = f''
#> [1] -0.7071068

Verify against known values:

result$value - sin(pi / 4)     # ~0
#> [1] 0
result$d1 - cos(pi / 4)        # ~0
#> [1] 0
result$d2 - (-sin(pi / 4))     # ~0
#> [1] 0

More complex functions work too:

f <- function(x) x * exp(-x^2)
d2 <- differentiate_n(f, 1, order = 2)

# Analytical: f'(x) = exp(-x^2)(1 - 2x^2)
# f''(x) = exp(-x^2)(-6x + 4x^3)
analytical_f1 <- exp(-1) * (1 - 2)
analytical_f2 <- exp(-1) * (-6 + 4)
d2$d1 - analytical_f1   # ~0
#> [1] 0
d2$d2 - analytical_f2   # ~0
#> [1] 0

These low-level functions are useful for understanding how nabla works internally, but for most applications, D(), gradient(), and hessian() are the recommended interface.

Summary

The D operator provides exact higher-order derivatives through a single composable interface — no symbolic differentiation, no finite-difference grid, and no loss of precision:

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.