This vignette is to explain nls14, an R package to try to bring the R function nls() up to date and to add capabilities for the extension of the symbolic and automatic derivative tools in R.

A particular goal in nls14 is to attempt, wherever possible, to use analytic or automatic derivatives. The function nls() uses a rather weak forward derivative approximation. A second objective is to use a Marquardt stabilization of the Gauss-Newton equations to avoid the commonly encountered “singular gradient” failure of nls(). This refers to the loss of rank of the Jacobian at the parameters for evaluation. The particular stabilization also incorporates a very simple trick to avoid very small diagonal elements of the Jacobian inner product, though in the present implementations, this is accomplished indirectly. See the section below Implementation of method

A large part of the work of this package was carried out by Duncan Murdoch. Without his input, much of the capability of the package would not exist.

The package and this vignette are a work in progress, and assistance and examples are welcome. Note that there is similar work in the package Deriv (Andrew Clausen and Serguei Sokol (2015) Symbolic Differentiation, version 2.0, http://CRAN.R-project.org/package=Deriv)

Summary of capabilities

Functions in nls14

coef.nls14

extracts and displays the coefficients for a model estimated by nlxb() or nlfb() in the nls14 structured object.

Deriv, fnDeriv, newDeriv

Functions Deriv and fnDeriv are designed as replacements for the stats package functions D and deriv respectively, though the argument lists do not match exactly.

findSubexprs

This function finds common subexpressions in an expression vector so that duplicate computation can be avoided.

model2rjfun, model2ssgrfun, modelexpr

These functions create functions to evaluate residuals or sums of squares at particular parameter locations.

model2rjfunx, model2ssgrfunx

These functions create functions to evaluate residuals or sums of squares at particular parameter locations. The attempt is to have dot-args available. This is experimental.

modeldoc, print.modeldoc

Output a description of the model and the data that was used at the time of its creation to the console and optionally to a file. The purpose of this function is to provide a record of the details underlying the function

nlfb

Given a nonlinear model expressed as an expression of the form lhs ~ formula_for_rhs and a start vector where parameters used in the model formula are named, attempts to find the minimum of the residual sum of squares using the Nash variant (Nash, 1979) of the Marquardt algorithm, where the linear sub-problem is solved by a qr method.

nls14fb

Given a nonlinear model expressed as an expression of the form lhs ~ formula_for_rhs and a start vector where parameters used in the model formula are named, attempts to find the minimum of the residual sum of squares using the Nash variant (Nash, 1979) of the Marquardt algorithm, where the linear sub-problem is solved by a qr method.

nls14xb

Given a nonlinear model expressed as an expression of the form lhs ~ formula_for_rhs and a start vector where parameters used in the model formula are named, attempts to find the minimum of the residual sum of squares using the Nash variant (Nash, 1979) of the Marquardt algorithm, where the linear sub-problem is solved by a qr method.

print.nls14

Print summary output (but involving some serious computations!) of an object of class nls14 from or from package .

resgr

For a nonlinear model originally expressed as an expression of the form lhs ~ formula_for_rhs assume we have a resfn and jacfn that compute the residuals and the Jacobian at a set of parameters. This routine computes the gradient, that is, t(Jacobian) . residuals.

resss

For a nonlinear model originally expressed as an expression of the form lhs ~ formula_for_rhs assume we have a resfn and jacfn that compute the residuals and the Jacobian at a set of parameters. This routine computes the gradient, that is, t(Jacobian) %*% residuals.

newSimplification, Simplify, sysSimplifications, isFALSE, isZERO, isONE, isMINUSONE

Functions to simplify expressions. simplifies expressions according to rules specified by .

summary.nls14

Provide a summary output (but involving some serious computations!) of an object of class nls14 from nls14xb or nls14fb from package nls14.

wrap14nls

Given a nonlinear model expressed as an expression of the form

lhs ~ formula_for_rhs

and a start vector where parameters used in the model formula are named, attempts to find the minimum of the residual sum of squares using the Nash variant (Nash, 1979) of the Marquardt algorithm, where the linear sub-problem is solved by a qr method.

Missing capabilities

Multi-line functions

Multi-line functions present a challenge. This is in part because the chain rule may have to be applied backwards (last line first), but also because there may be structures that are not always differentiable, such as if statements.

Vector parameters

We would like to be able to find the Jacobian or gradient of functions that have as their parameters a vector, e.g., prm. At time of writing (January 2015) we cannot specify such a vector within nls14. The following script shows things that work or fail.

# tryhobbsderiv.R

hobbs.res<-function(x){ # Hobbs weeds problem -- residual
  # This variant uses looping
#  if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3")
  y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948,
       75.995, 91.972)
  t<-1:12
#  if(abs(12*x[3])>50) {
#    res<-rep(Inf,12)
#  } else {
    res<-x[1]/(1+x[2]*exp(-x[3]*t)) - y
#  }
}
# test it
start1 <- c(200, 50, .3)
cat("residuals at start1:")
## residuals at start1:
r1 <- hobbs.res(start1)
print(r1)
##  [1] -0.05050236 -0.20779506 -0.26086802 -0.41247546 -0.61690702
##  [6] -1.60525418 -3.36423901 -2.43016453 -4.28734016 -5.63079076
## [11] -5.67542775 -7.44779537
require(nls14)
## Loading required package: nls14
Jr1a <- try(fnDeriv(r1, "x"))


y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948,
     75.995, 91.972)
t <- 1:12
hobbs1 <- function(x){ res<-x[1]/(1+x[2]*exp(-x[3]*t)) - y }
hobbs1m <- function(x){ res<-x001/(1+x002*exp(-x003*t)) - y }
Jr11m <- try(fnDeriv(hobbs1m, c("x001", "x002", "x003")))
hobbs1me <- function(x){ expression(x001/(1+x002*exp(-x003*t)) - y) }
Jr11me <- try(fnDeriv(hobbs1me, c("x001", "x002", "x003")))
Jr11ex <- try(fnDeriv(expression(x001/(1+x002*exp(-x003*t)) - y)
                  , c("x001", "x002", "x003")))
Jr11ex
## {
##     .expr1 <- -x003
##     .expr2 <- .expr1 * t
##     .expr3 <- exp(.expr2)
##     .expr4 <- x002 * .expr3
##     .expr5 <- 1 + .expr4
##     .expr6 <- .expr5^2
##     .value <- x001/(.expr5) - y
##     .grad <- array(0, c(length(.value), 2L), list(NULL, c("x001", 
##     "x002", "x003")))
##     .grad[, "x001"] <- 1/.expr5
##     .grad[, "x002"] <- -(x001 * .expr3/.expr6)
##     .grad[, "x003"] <- -(x001 * (x002 * (.expr3 * -t))/.expr6)
##     attr(.value, "gradient") <- .grad
##     .value
## }
x001 <- start1[1]
x002 <- start1[2]
x003 <- start1[3]
print(hobbs1m(start1)) # start1 actually ignored
##  [1] -0.05050236 -0.20779506 -0.26086802 -0.41247546 -0.61690702
##  [6] -1.60525418 -3.36423901 -2.43016453 -4.28734016 -5.63079076
## [11] -5.67542775 -7.44779537
print(eval(hobbs1me(start1))) # start1 actually ignored
##  [1] -0.05050236 -0.20779506 -0.26086802 -0.41247546 -0.61690702
##  [6] -1.60525418 -3.36423901 -2.43016453 -4.28734016 -5.63079076
## [11] -5.67542775 -7.44779537
print(try(eval(Jr11ex)))
## [1] "Error in array(0, c(length(.value), 2L), list(NULL, c(\"x001\", \"x002\",  : \n  length of 'dimnames' [2] not equal to array extent\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in array(0, c(length(.value), 2L), list(NULL, c("x001", "x002", "x003"))): length of 'dimnames' [2] not equal to array extent>
resx <- expression(x001/(1+x002*exp(-x003*t)) - y)
res1 <- Deriv(resx, "x001", do_substitute=FALSE)
res1
## expression(1/(1 + x002 * exp(-x003 * t)))
col1 <- eval(res1)
res2 <- Deriv(resx, "x002", do_substitute=FALSE)
res2
## expression(-(x001 * exp(-x003 * t)/(1 + x002 * exp(-x003 * t))^2))
col2 <- eval(res2)
res3 <- Deriv(resx, "x003", do_substitute=FALSE)
res3
## expression(-(x001 * (x002 * (exp(-x003 * t) * -t))/(1 + x002 * 
##     exp(-x003 * t))^2))
col3 <- eval(res3)
hobJac <- cbind(col1, col2, col3)
print(hobJac)
##             col1       col2       col3
##  [1,] 0.02628749 -0.1023858   5.119291
##  [2,] 0.03516102 -0.1356989  13.569891
##  [3,] 0.04688566 -0.1787496  26.812437
##  [4,] 0.06226762 -0.2335615  46.712293
##  [5,] 0.08226046 -0.3019747  75.493681
##  [6,] 0.10793373 -0.3851362 115.540847
##  [7,] 0.14039380 -0.4827335 168.956738
##  [8,] 0.18063918 -0.5920347 236.813864
##  [9,] 0.22934330 -0.7069798 318.140911
## [10,] 0.28658605 -0.8178179 408.908969
## [11,] 0.35159786 -0.9119072 501.548971
## [12,] 0.42262102 -0.9760500 585.629985

Jacobians

Jacobians, the matrices of partial derivatives of residuals with respect to the parameters, have a vector input.

Implementation of optimization methods

Nonlinear least squares

Nonlinear least squares methods are mostly founded on some or other variant of the Gauss-Newton algorithm. The function we wish to minimize is the sum of squares of the (nonlinear) residuals r(x) where there are m observations (elements of r) and n parameters x. Hence the function is

f(x) = sum(r(k)^2)

Newton’s method starts with an original set of parameters x[0]. At a given iteraion, which could be the first, we want to solve

x[k+1]  =  x[k]  -   H^(-1) g

where H is the Hessian and g is the gradient at x[k]. We can rewrite this as a solution, at each iteration, of

H delta = -g

with

x[k+1]  =  x[k] + delta
     

For the particular sum of squares, the gradient is

g(x) = 2 * r(k)

and

H(x) = 2 ( J' J  +  sum(r * Z))

where J is the Jacobian (first derivatives of r w.r.t. x) and Z is the tensor of second derivatives of r w.r.t. x). Note that J’ is the transpose of J.

The primary simplification of the Gauss-Newton method is to assume that the second term above is negligible. As there is a common factor of 2 on each side of the Newton iteration after the simplification of the Hessian, the Gauss-Newton iteration equation is

J' J  delta = - J' r
       

This iteration frequently fails. The approximation of the Hessian by the Jacobian inner-product is one reason, but there is also the possibility that the sum of squares function is not “quadratic” enough that the unit step reduces it. Hartley introduced a line search along delta, while Marquardt suggested replacing J’ J with (J’ J + lambda * D) where D is a diagonal matrix intended to partially approximate the omitted portion of the Hessian.

Marquardt suggested D = I (a unit matrix) or D = (diagonal part of J’ J). The former approach, when lambda is large enough that the iteration is essentially

delta = - g / lambda
  

we get a version of the steepest descents algorithm. Using the diagonal of J’ J, we have a scaled version of this (see http://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm)

Nash (1977) ??ref?? found that on low precision machines, it was common for diagonal elements of J’ J to underflow. A very small modification to solve

(J' J + lambda * (D + phi * I)) * delta = - g

where phi is a small number (phi = 1 seems to work quite well) ?? check??.

In Nash (1979), the iteration equation was solved as stated. However, this involves forming the sum of squares and cross products of J, a process that loses some numerical precision. A better way to solve the linear equations is to apply the QR decomposition to the matrix J itself. However, we still need to incorporate the lambda * I or lambda * D adjustments. This is done by adding rows to J that are the square roots of the “pieces”. We add 1 row for each diagonal element of I and each diagonal element of D.

In each iteration, we reduce the lambda parameter before solution. If the resulting sum of squares is not reduced, lambda is increased, otherwise we move to the next iteration.

Providing extra data to expressions

Almost all statistical functions have exogenous data that is needed to compute residuals or likelihoods and is not dependent on the model parameters. (This section starts from Notes140806.)

model2rjfun does NOT have … args.

Should it have? i.e., a problem where we are fitting a set of time series, 1 for each plant/animal, with some sort of start parameter for each that is NOT estimated (e.g., pH of soil, some index of health).

Difficulty in such a problem is that the residuals are then a matrix, and the nlfb rather than nlxb is a better approach. However, fitting 1 series would still need this data, and example nls14tryextraparms.txt shows that the extra parm (ms in this case) needs to be in the user’s globalenv.

rm(list=ls())
require(nls14)
# want to have data AND extra parameters (NOT to be estimated)
traceval  <-  TRUE  # traceval set TRUE to debug or give full history
# Data for Hobbs problem
ydat  <-  c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 
          38.558, 50.156, 62.948, 75.995, 91.972) # for testing
tdat  <-  seq_along(ydat) # for testing
# A simple starting vector -- must have named parameters for nls14xb, nls, wrap14nls.
start1  <-  c(b1=1, b2=1, b3=1)
startf1  <-  c(b1=1, b2=1, b3=.1)
eunsc  <-   y ~ b1/(1+b2*exp(-b3*tt))
cat("LOCAL DATA IN DATA FRAMES\n")
## LOCAL DATA IN DATA FRAMES
weeddata1  <-  data.frame(y=ydat, tt=tdat)
cat("weeddata contains the original data\n")
## weeddata contains the original data
ms <- 2 # define the external parameter here
cat("wdata scales y by ms =",ms,"\n")
## wdata scales y by ms = 2
wdata <- data.frame(y=ydat/ms, tt=tdat)
wdata
##          y tt
## 1   2.6540  1
## 2   3.6200  2
## 3   4.8190  3
## 4   6.4330  4
## 5   8.5345  5
## 6  11.5960  6
## 7  15.7215  7
## 8  19.2790  8
## 9  25.0780  9
## 10 31.4740 10
## 11 37.9975 11
## 12 45.9860 12
cat("estimate the UNSCALED model with SCALED data\n")
## estimate the UNSCALED model with SCALED data
anls14xbs  <-  try(nls14xb(eunsc, start=start1, trace=traceval, data=wdata))
## formula: y ~ b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
## 
## $phi
## [1] 1
## 
## $lamda
## [1] 1e-04
## 
## $offset
## [1] 100
## 
## $laminc
## [1] 10
## 
## $lamdec
## [1] 4
## 
## $femax
## [1] 10000
## 
## $jemax
## [1] 5000
## 
## $rofftest
## [1] TRUE
## 
## $smallsstest
## [1] TRUE
## 
## vn:[1] "y"  "b1" "b2" "b3" "tt"
## datvar:[1] "y"  "tt"
## Data variable  y : [1]  2.6540  3.6200  4.8190  6.4330  8.5345 11.5960 15.7215 19.2790
##  [9] 25.0780 31.4740 37.9975 45.9860
## Data variable  tt : [1]  1  2  3  4  5  6  7  8  9 10 11 12
## Finished masks check
## model2rjfun: modelformula = y ~ b1/(1 + b2 * exp(-b3 * tt))
## [1] "formula"
## trjfn:
## function (prm) 
## {
##     if (is.null(names(prm))) 
##         names(prm) <- names(pvec)
##     localdata <- list2env(as.list(prm), parent = data)
##     eval(residexpr, envir = localdata)
## }
## <environment: 0x150b258>
## Starting pnum=b1 b2 b3 
##  1  1  1 
## Start:lamda: 1e-04  SS= 5676.925  at  b1 = 1  b2 = 1  b3 = 1  1 / 0
## lamda: 0.001  SS= 6088.946  at  b1 = 25.44327  b2 = -119.8584  b3 = -185.954  2 / 1
## lamda: 0.01  SS= 6088.946  at  b1 = 25.09223  b2 = -94.84981  b3 = -166.6058  3 / 1
## lamda: 0.1  SS= 6088.946  at  b1 = 23.49226  b2 = -12.1639  b3 = -96.94824  4 / 1
## lamda: 1  SS= 6088.946  at  b1 = 19.10491  b2 = 15.51374  b3 = -31.7677  5 / 1
## lamda: 10  SS= 6077.975  at  b1 = 9.663924  b2 = 2.341445  b3 = -1.074466  6 / 1
## <<lamda: 4  SS= 5096.507  at  b1 = 2.508867  b2 = 0.9465915  b3 = 1.140924  7 / 1
## <<lamda: 1.6  SS= 4086.732  at  b1 = 5.539492  b2 = 1.127669  b3 = 0.9866082  8 / 2
## lamda: 16  SS= 5979.157  at  b1 = 10.70911  b2 = 2.436093  b3 = -0.3579844  9 / 3
## <<lamda: 6.4  SS= 3871.108  at  b1 = 6.275743  b2 = 1.194974  b3 = 0.9504393  10 / 3
## <<lamda: 2.56  SS= 3427.063  at  b1 = 7.913975  b2 = 1.462101  b3 = 0.7695198  11 / 4
## <<lamda: 1.024  SS= 2714.043  at  b1 = 11.24279  b2 = 2.292268  b3 = 0.4124356  12 / 5
## <<lamda: 0.4096  SS= 1624.322  at  b1 = 17.58076  b2 = 3.978355  b3 = 0.5133315  13 / 6
## <<lamda: 0.16384  SS= 1595.968  at  b1 = 25.77752  b2 = 7.844312  b3 = 0.2536126  14 / 7
## <<lamda: 0.065536  SS= 389.9944  at  b1 = 36.5797  b2 = 11.35334  b3 = 0.4014417  15 / 8
## <<lamda: 0.0262144  SS= 227.356  at  b1 = 47.30675  b2 = 19.47462  b3 = 0.3253081  16 / 9
## <<lamda: 0.01048576  SS= 30.76802  at  b1 = 61.38048  b2 = 30.70084  b3 = 0.3549442  17 / 10
## <<lamda: 0.004194304  SS= 7.535561  at  b1 = 71.37593  b2 = 39.43433  b3 = 0.3435968  18 / 11
## <<lamda: 0.001677722  SS= 2.40805  at  b1 = 80.80069  b2 = 44.72922  b3 = 0.3344339  19 / 12
## <<lamda: 0.0006710886  SS= 1.188571  at  b1 = 88.88428  b2 = 47.00714  b3 = 0.3237746  20 / 13
## <<lamda: 0.0002684355  SS= 0.7263783  at  b1 = 94.69062  b2 = 48.29672  b3 = 0.3169928  21 / 14
## <<lamda: 0.0001073742  SS= 0.6497016  at  b1 = 97.37788  b2 = 48.92225  b3 = 0.3142756  22 / 15
## <<lamda: 4.294967e-05  SS= 0.6468358  at  b1 = 98.02182  b2 = 49.07495  b3 = 0.3136437  23 / 16
## <<lamda: 1.717987e-05  SS= 0.6468194  at  b1 = 98.08991  b2 = 49.09088  b3 = 0.3135732  24 / 17
## <<lamda: 6.871948e-06  SS= 0.6468193  at  b1 = 98.09306  b2 = 49.09162  b3 = 0.3135698  25 / 18
print(anls14xbs)
## nls14 class object: x 
## residual sumsquares =  0.64682  on  12 observations
##     after  18    Jacobian and  25 function evaluations
##   name            coeff          SE       tstat      pval      gradient    JSingval   
## b1               98.0931         5.653      17.35  3.164e-08  -1.337e-07       505.4  
## b2               49.0916         1.688      29.08  3.281e-10   3.017e-08      0.2344  
## b3               0.31357      0.006863      45.69  5.768e-12  -1.398e-05     0.04632
escal <-  y ~ ms*b1/(1+b2*exp(-b3*tt))
cat("estimate the SCALED model with scaling provided in the call (ms=0.5)\n")
## estimate the SCALED model with scaling provided in the call (ms=0.5)
anls14xbh  <-  try(nls14xb(escal, start=start1, trace=traceval, data=weeddata1, ms=0.5))
## formula: y ~ ms * b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
## 
## $phi
## [1] 1
## 
## $lamda
## [1] 1e-04
## 
## $offset
## [1] 100
## 
## $laminc
## [1] 10
## 
## $lamdec
## [1] 4
## 
## $femax
## [1] 10000
## 
## $jemax
## [1] 5000
## 
## $rofftest
## [1] TRUE
## 
## $smallsstest
## [1] TRUE
## 
## vn:[1] "y"  "ms" "b1" "b2" "b3" "tt"
## datvar:[1] "y"  "ms" "tt"
## Data variable  y : [1]  5.308  7.240  9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable  ms :[1] 2
## Data variable  tt : [1]  1  2  3  4  5  6  7  8  9 10 11 12
## Finished masks check
## model2rjfun: modelformula = y ~ ms * b1/(1 + b2 * exp(-b3 * tt))
## [1] "formula"
## trjfn:
## function (prm) 
## {
##     if (is.null(names(prm))) 
##         names(prm) <- names(pvec)
##     localdata <- list2env(as.list(prm), parent = data)
##     eval(residexpr, envir = localdata)
## }
## <environment: 0x2e30208>
## Starting pnum=b1 b2 b3 
##  1  1  1
print(anls14xbh)
## [1] "Error in resfn(pnum, ...) : unused argument (ms = 0.5)\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in resfn(pnum, ...): unused argument (ms = 0.5)>
cat("\n scaling is now using the globally defined value of ms=",ms,"\n")
## 
##  scaling is now using the globally defined value of ms= 2
anls14xb1a  <-  try(nls14xb(escal, start=start1, trace=traceval, data=weeddata1))
## formula: y ~ ms * b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
## 
## $phi
## [1] 1
## 
## $lamda
## [1] 1e-04
## 
## $offset
## [1] 100
## 
## $laminc
## [1] 10
## 
## $lamdec
## [1] 4
## 
## $femax
## [1] 10000
## 
## $jemax
## [1] 5000
## 
## $rofftest
## [1] TRUE
## 
## $smallsstest
## [1] TRUE
## 
## vn:[1] "y"  "ms" "b1" "b2" "b3" "tt"
## datvar:[1] "y"  "ms" "tt"
## Data variable  y : [1]  5.308  7.240  9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable  ms :[1] 2
## Data variable  tt : [1]  1  2  3  4  5  6  7  8  9 10 11 12
## Finished masks check
## model2rjfun: modelformula = y ~ ms * b1/(1 + b2 * exp(-b3 * tt))
## [1] "formula"
## trjfn:
## function (prm) 
## {
##     if (is.null(names(prm))) 
##         names(prm) <- names(pvec)
##     localdata <- list2env(as.list(prm), parent = data)
##     eval(residexpr, envir = localdata)
## }
## <environment: 0x231a3d8>
## Starting pnum=b1 b2 b3 
##  1  1  1 
## Start:lamda: 1e-04  SS= 22708  at  b1 = 1  b2 = 1  b3 = 1  1 / 0
## lamda: 0.001  SS= 24356  at  b1 = 25.472  b2 = -122.14  b3 = -187.69  2 / 1
## lamda: 0.01  SS= 24356  at  b1 = 25.327  b2 = -113.2  b3 = -180.68  3 / 1
## lamda: 0.1  SS= 24356  at  b1 = 24.283  b2 = -57.588  b3 = -135.68  4 / 1
## lamda: 1  SS= 24356  at  b1 = 20.254  b2 = 14.461  b3 = -54.127  5 / 1
## lamda: 10  SS= 24355  at  b1 = 10.077  b2 = 4.8657  b3 = -4.5699  6 / 1
## <<lamda: 4  SS= 20252  at  b1 = 2.5977  b2 = 0.83113  b3 = 1.4117  7 / 1
## <<lamda: 1.6  SS= 16143  at  b1 = 5.7092  b2 = 1.3529  b3 = 0.86529  8 / 2
## lamda: 16  SS= 22536  at  b1 = 11.269  b2 = 3.0589  b3 = -0.12239  9 / 3
## <<lamda: 6.4  SS= 15214  at  b1 = 6.5093  b2 = 1.428  b3 = 0.86138  10 / 3
## <<lamda: 2.56  SS= 13336  at  b1 = 8.2714  b2 = 1.7659  b3 = 0.74306  11 / 4
## <<lamda: 1.024  SS= 10290  at  b1 = 11.801  b2 = 2.7987  b3 = 0.46561  12 / 5
## <<lamda: 0.4096  SS= 5999.3  at  b1 = 18.477  b2 = 4.9891  b3 = 0.46298  13 / 6
## <<lamda: 0.16384  SS= 3360.8  at  b1 = 27.609  b2 = 9.7076  b3 = 0.35379  14 / 7
## <<lamda: 0.065536  SS= 872.75  at  b1 = 40.302  b2 = 16.934  b3 = 0.38655  15 / 8
## <<lamda: 0.026214  SS= 308.99  at  b1 = 51.624  b2 = 26.628  b3 = 0.36281  16 / 9
## <<lamda: 0.010486  SS= 64.882  at  b1 = 63.651  b2 = 36.557  b3 = 0.35778  17 / 10
## <<lamda: 0.0041943  SS= 19.992  at  b1 = 73.76  b2 = 43.123  b3 = 0.3463  18 / 11
## <<lamda: 0.0016777  SS= 8.8066  at  b1 = 83.053  b2 = 46.013  b3 = 0.33222  19 / 12
## <<lamda: 0.00067109  SS= 4.1734  at  b1 = 91.086  b2 = 47.556  b3 = 0.32103  20 / 13
## <<lamda: 0.00026844  SS= 2.7218  at  b1 = 96.072  b2 = 48.622  b3 = 0.31554  21 / 14
## <<lamda: 0.00010737  SS= 2.5891  at  b1 = 97.8  b2 = 49.023  b3 = 0.31386  22 / 15
## <<lamda: 4.295e-05  SS= 2.5873  at  b1 = 98.074  b2 = 49.087  b3 = 0.31359  23 / 16
## <<lamda: 1.718e-05  SS= 2.5873  at  b1 = 98.093  b2 = 49.091  b3 = 0.31357  24 / 17
print(anls14xb1a)
## nls14 class object: x 
## residual sumsquares =  2.5873  on  12 observations
##     after  17    Jacobian and  24 function evaluations
##   name            coeff          SE       tstat      pval      gradient    JSingval   
## b1               98.0925         5.651      17.36  3.153e-08  -9.046e-06        1011  
## b2               49.0915         1.688      29.09   3.27e-10   6.839e-06      0.4687  
## b3               0.31357      0.006863      45.69  5.769e-12   -0.003104     0.09268
ms <- 1
cat("\n scaling is now using the globally re-defined value of ms=",ms,"\n")
## 
##  scaling is now using the globally re-defined value of ms= 1
anls14xb1b  <-  try(nls14xb(escal, start=start1, trace=traceval, data=weeddata1))
## formula: y ~ ms * b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
## 
## $phi
## [1] 1
## 
## $lamda
## [1] 1e-04
## 
## $offset
## [1] 100
## 
## $laminc
## [1] 10
## 
## $lamdec
## [1] 4
## 
## $femax
## [1] 10000
## 
## $jemax
## [1] 5000
## 
## $rofftest
## [1] TRUE
## 
## $smallsstest
## [1] TRUE
## 
## vn:[1] "y"  "ms" "b1" "b2" "b3" "tt"
## datvar:[1] "y"  "ms" "tt"
## Data variable  y : [1]  5.308  7.240  9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable  ms :[1] 1
## Data variable  tt : [1]  1  2  3  4  5  6  7  8  9 10 11 12
## Finished masks check
## model2rjfun: modelformula = y ~ ms * b1/(1 + b2 * exp(-b3 * tt))
## [1] "formula"
## trjfn:
## function (prm) 
## {
##     if (is.null(names(prm))) 
##         names(prm) <- names(pvec)
##     localdata <- list2env(as.list(prm), parent = data)
##     eval(residexpr, envir = localdata)
## }
## <environment: 0x2d36710>
## Starting pnum=b1 b2 b3 
##  1  1  1 
## Start:lamda: 1e-04  SS= 23521  at  b1 = 1  b2 = 1  b3 = 1  1 / 0
## lamda: 0.001  SS= 24356  at  b1 = 50.886  b2 = -240.72  b3 = -372.91  2 / 1
## lamda: 0.01  SS= 24356  at  b1 = 50.183  b2 = -190.69  b3 = -334.2  3 / 1
## lamda: 0.1  SS= 24356  at  b1 = 46.97  b2 = -25.309  b3 = -194.81  4 / 1
## lamda: 1  SS= 24356  at  b1 = 38.096  b2 = 29.924  b3 = -64.262  5 / 1
## lamda: 10  SS= 24353  at  b1 = 18.798  b2 = 3.551  b3 = -2.9011  6 / 1
## <<lamda: 4  SS= 21065  at  b1 = 4.1015  b2 = 0.86688  b3 = 1.3297  7 / 1
## <<lamda: 1.6  SS= 16852  at  b1 = 10.248  b2 = 1.2302  b3 = 1.0044  8 / 2
## lamda: 16  SS= 24078  at  b1 = 20.944  b2 = 2.9473  b3 = -0.40959  9 / 3
## <<lamda: 6.4  SS= 15934  at  b1 = 11.771  b2 = 1.3084  b3 = 0.98087  10 / 3
## <<lamda: 2.56  SS= 14050  at  b1 = 15.152  b2 = 1.6468  b3 = 0.80462  11 / 4
## <<lamda: 1.024  SS= 10985  at  b1 = 22.005  b2 = 2.6632  b3 = 0.45111  12 / 5
## <<lamda: 0.4096  SS= 6427.1  at  b1 = 35.128  b2 = 4.7135  b3 = 0.50974  13 / 6
## <<lamda: 0.16384  SS= 4725  at  b1 = 52.285  b2 = 9.2948  b3 = 0.31348  14 / 7
## <<lamda: 0.065536  SS= 1132.2  at  b1 = 76.446  b2 = 15.142  b3 = 0.41095  15 / 8
## <<lamda: 0.026214  SS= 518.76  at  b1 = 96.924  b2 = 24.422  b3 = 0.35816  16 / 9
## <<lamda: 0.010486  SS= 79.464  at  b1 = 122.18  b2 = 35.262  b3 = 0.36484  17 / 10
## <<lamda: 0.0041943  SS= 26.387  at  b1 = 141.55  b2 = 42.387  b3 = 0.35215  18 / 11
## <<lamda: 0.0016777  SS= 11.339  at  b1 = 160.02  b2 = 45.519  b3 = 0.33738  19 / 12
## <<lamda: 0.00067109  SS= 5.2767  at  b1 = 176.92  b2 = 47.037  b3 = 0.32443  20 / 13
## <<lamda: 0.00026844  SS= 2.9656  at  b1 = 189.12  b2 = 48.279  b3 = 0.31712  21 / 14
## <<lamda: 0.00010737  SS= 2.6003  at  b1 = 194.72  b2 = 48.92  b3 = 0.31429  22 / 15
## <<lamda: 4.295e-05  SS= 2.5873  at  b1 = 196.04  b2 = 49.075  b3 = 0.31364  23 / 16
## <<lamda: 1.718e-05  SS= 2.5873  at  b1 = 196.18  b2 = 49.091  b3 = 0.31357  24 / 17
## <<lamda: 6.8719e-06  SS= 2.5873  at  b1 = 196.19  b2 = 49.092  b3 = 0.31357  25 / 18
print(anls14xb1b)
## nls14 class object: x 
## residual sumsquares =  2.5873  on  12 observations
##     after  18    Jacobian and  25 function evaluations
##   name            coeff          SE       tstat      pval      gradient    JSingval   
## b1               196.186         11.31      17.35  3.164e-08  -2.663e-07        1011  
## b2               49.0916         1.688      29.08  3.281e-10    1.59e-07      0.4605  
## b3               0.31357      0.006863      45.69  5.768e-12  -5.531e-05     0.04715

Derivatives table

Note that this derivatives table is accomplished by example. We would like to have as many examples of different ways to use the tools as possible in order to provide a good understanding of the tools and their appropriate usage. In particular we want to know of errors. We compare the derivative tools Deriv() and fnDeriv() with the stats package tools D(), deriv() and deriv3().

See specific notes either in comments or at the end.

library(nls14)

# Various derivatives 

new <- fnDeriv(quote(1 + x + y), c("x", "y"))
old <- deriv(quote(1 + x + y), c("x", "y"))
print(new)
## {
##     .value <- 1 + x + y
##     .grad <- array(0, c(length(.value), 2L), list(NULL, c("x", 
##     "y")))
##     .grad[, "x"] <- 1
##     .grad[, "y"] <- 1
##     attr(.value, "gradient") <- .grad
##     .value
## }
# Following generates a very long line on output of knitr (for markdown)
class(new)
## [1] "{"
str(new)
##  language {  .value <- 1 + x + y; .grad <- array(0, c(length(.value), 2L), list(NULL, c("x", "y"; ))); .grad[, "x"] <- 1; .grad[, "y"] <- 1; attr(.value, "gradient") <- .grad; .value }
as.expression(new)
## expression({
##     .value <- 1 + x + y
##     .grad <- array(0, c(length(.value), 2L), list(NULL, c("x", 
##     "y")))
##     .grad[, "x"] <- 1
##     .grad[, "y"] <- 1
##     attr(.value, "gradient") <- .grad
##     .value
## })
print(old)
## expression({
##     .value <- 1 + x + y
##     .grad <- array(0, c(length(.value), 2L), list(NULL, c("x", 
##         "y")))
##     .grad[, "x"] <- 1
##     .grad[, "y"] <- 1
##     attr(.value, "gradient") <- .grad
##     .value
## })
class(old)
## [1] "expression"
str(old)
##   expression({     .value <- 1 + x + y     .grad <- array(0, c(length(.value), 2L), list(NULL, c("x", "y")))     .grad[, "x"] <- 1     .grad[, "y"] <- 1     attr(.value, "gradient") <- .grad     .value })

Unfortunately, the inputs and outputs are not always easily transformed so that the symbolic derivatives can be found. (?? Need to codify this and provide filters so we can get things to work nicely.)

As an example, how could we take object new and embed it in a function we can then use in R? We can certainly copy and paste the output into a function template, as follows,

fnfromnew <- function(x,y){
    .value <- 1 + x + y
    .grad <- array(0, c(length(.value), 2L), list(NULL, c("x", 
    "y")))
    .grad[, "x"] <- 1
    .grad[, "y"] <- 1
    attr(.value, "gradient") <- .grad
    .value
}

print(fnfromnew(3,5))
## [1] 9
## attr(,"gradient")
##      x y
## [1,] 1 1

However, we would ideally like to be able to automate this to generate functions and gradients for nonlinear least squares and optimization calculations. The same criticism applies to the object old

Another issue:

If we have x and y set such that the function is not admissible, then both our old and new functions give a gradient that is seemingly reasonable. While the gradient of this simple function could be considered to be defined for ANY values of x and y, I (JN) am sure most users would wish for a warning at the very least in such cases.

x <- NA
y <- Inf
print(eval(new))
## [1] NA
## attr(,"gradient")
##      x y
## [1,] 1 1
print(eval(old))
## [1] NA
## attr(,"gradient")
##      x y
## [1,] 1 1

SafeD

We could define a way to avoid the issue of character vs. expression (and possibly other classes) as follows:

safeD <- function(obj, var) {
   # safeguarded D() function for symbolic derivs
   if (! is.character(var) ) stop("The variable var MUST be character type")
   if (is.character(obj) ) {
       eobj <- parse(text=obj)
       result <- D(eobj, var)
   } else {
       result <- D(obj, var)
   }
}

lxy2 <- expression(log(x+y^2))
clxy2 <- "log(x+y^2)"
print(D(clxy2, "y"))
## [1] NA
print(D(lxy2, "y"))
## 2 * y/(x + y^2)
print(safeD(clxy2, "y"))
## 2 * y/(x + y^2)
print(safeD(lxy2, "y"))
## 2 * y/(x + y^2)

Derivatives table - 2

## Try different ways to supply the log function
aDeriv <- Deriv(log(x), "x")
class(aDeriv)
## [1] "call"
aDeriv
## 1/x
aderiv <- deriv( ~ log(x), "x")
class(aderiv)
## [1] "expression"
aderiv
## expression({
##     .value <- log(x)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- 1/x
##     attr(.value, "gradient") <- .grad
##     .value
## })
aD <- D(expression(log(x)), "x")
class(aD)
## [1] "call"
aD
## 1/x
cat("but \n")
## but
D( "~ log(x)", "x") # fails -- gives NA rather than expected answer due to quotes
## [1] NA
try(D( ~ log(x), "x"))
interm <- ~ log(x)
interm
## ~log(x)
class(interm)
## [1] "formula"
interme <- as.expression(interm)
class(interme)
## [1] "expression"
try(D(interme, "x"))
try(deriv(interme, "x"))
try(deriv(interm, "x"))
## expression({
##     .value <- log(x)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- 1/x
##     attr(.value, "gradient") <- .grad
##     .value
## })
Deriv(log(x, base=3), "x" ) # OK
## 1/(x * 1.09861228866811)
try(D(expression(log(x, base=3)), "x" )) # fails - only single-argument calls supported
try(deriv(~ log(x, base=3), "x" )) # fails - only single-argument calls supported
try(deriv(expression(log(x, base=3)), "x" )) # fails - only single-argument calls supported
try(deriv3(expression(log(x, base=3)), "x" )) # fails - only single-argument calls supported
fnDeriv(quote(log(x, base=3)), "x" )
## {
##     .value <- log(x, base = 3)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- 1/(x * 1.09861228866811)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(exp(x), "x")
## exp(x)
D(expression(exp(x)), "x") # OK
## exp(x)
deriv(~exp(x), "x") # OK, but much more complicated
## expression({
##     .expr1 <- exp(x)
##     .value <- .expr1
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- .expr1
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(exp(x)), "x")
## {
##     .expr1 <- exp(x)
##     .value <- .expr1
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- .expr1
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(sin(x), "x")
## cos(x)
D(expression(sin(x)), "x")
## cos(x)
deriv(~sin(x), "x")
## expression({
##     .value <- sin(x)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- cos(x)
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(sin(x)), "x")
## {
##     .value <- sin(x)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- cos(x)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(cos(x), "x")
## -sin(x)
D(expression(cos(x)), "x")
## -sin(x)
deriv(~ cos(x), "x")
## expression({
##     .value <- cos(x)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- -sin(x)
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(cos(x)), "x")
## {
##     .value <- cos(x)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- -sin(x)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(tan(x), "x")
## 1/cos(x)^2
D(expression(tan(x)), "x")
## 1/cos(x)^2
deriv(~ tan(x), "x")
## expression({
##     .value <- tan(x)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- 1/cos(x)^2
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(tan(x)), "x")
## {
##     .value <- tan(x)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- 1/cos(x)^2
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(sinh(x), "x")
## cosh(x)
D(expression(sinh(x)), "x")
## cosh(x)
deriv(~sinh(x), "x")
## expression({
##     .value <- sinh(x)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- cosh(x)
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(sinh(x)), "x")
## {
##     .value <- sinh(x)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- cosh(x)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(cosh(x), "x")
## sinh(x)
D(expression(cosh(x)), "x")
## sinh(x)
deriv(~cosh(x), "x")
## expression({
##     .value <- cosh(x)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- sinh(x)
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(cosh(x)), "x")
## {
##     .value <- cosh(x)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- sinh(x)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(sqrt(x), "x")
## 0.5/sqrt(x)
D(expression(sqrt(x)), "x")
## 0.5 * x^-0.5
deriv(~sqrt(x), "x")
## expression({
##     .value <- sqrt(x)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- 0.5 * x^-0.5
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(sqrt(x)), "x")
## {
##     .expr1 <- sqrt(x)
##     .value <- .expr1
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- 0.5/.expr1
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(pnorm(q), "q")
## dnorm(q)
D(expression(pnorm(q)), "q")
## dnorm(q)
deriv(~pnorm(q), "q")
## expression({
##     .value <- pnorm(q)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("q")))
##     .grad[, "q"] <- dnorm(q)
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(pnorm(q)), "q")
## {
##     .value <- pnorm(q)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "q"))
##     .grad[, "q"] <- dnorm(q)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(dnorm(x, mean), "mean")
## dnorm(x - mean) * (x - mean)
D(expression(dnorm(x, mean)), "mean")
## [1] 0
deriv(~dnorm(x, mean), "mean")
## expression({
##     .value <- dnorm(x, mean)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("mean")))
##     .grad[, "mean"] <- 0
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(dnorm(x, mean)), "mean")
## {
##     .expr1 <- x - mean
##     .value <- dnorm(x, mean)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "mean"))
##     .grad[, "mean"] <- dnorm(.expr1) * .expr1
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(asin(x), "x")
## 1/sqrt(1 + x^2)
D(expression(asin(x)), "x")
## 1/sqrt(1 - x^2)
deriv(~asin(x), "x")
## expression({
##     .value <- asin(x)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- 1/sqrt(1 - x^2)
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(asin(x)), "x")
## {
##     .value <- asin(x)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- 1/sqrt(1 + x^2)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(acos(x), "x")
## -1/sqrt(1 + x^2)
D(expression(acos(x)), "x")
## -(1/sqrt(1 - x^2))
deriv(~acos(x), "x")
## expression({
##     .value <- acos(x)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- -(1/sqrt(1 - x^2))
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(acos(x)), "x")
## {
##     .value <- acos(x)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- -1/sqrt(1 + x^2)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(atan(x), "x")
## 1/(1 + x^2)
D(expression(atan(x)), "x")
## 1/(1 + x^2)
deriv(~atan(x), "x")
## expression({
##     .value <- atan(x)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- 1/(1 + x^2)
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(atan(x)), "x")
## {
##     .value <- atan(x)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- 1/(1 + x^2)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(gamma(x), "x")
## gamma(x) * digamma(x)
D(expression(gamma(x)), "x")
## gamma(x) * digamma(x)
deriv(~gamma(x), "x")
## expression({
##     .expr1 <- gamma(x)
##     .value <- .expr1
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- .expr1 * digamma(x)
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(gamma(x)), "x")
## {
##     .expr1 <- gamma(x)
##     .value <- .expr1
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- .expr1 * digamma(x)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(lgamma(x), "x")
## digamma(x)
D(expression(lgamma(x)), "x")
## digamma(x)
deriv(~lgamma(x), "x")
## expression({
##     .value <- lgamma(x)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- digamma(x)
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(lgamma(x)), "x")
## {
##     .value <- lgamma(x)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- digamma(x)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(digamma(x), "x")
## trigamma(x)
D(expression(digamma(x)), "x")
## trigamma(x)
deriv(~digamma(x), "x")
## expression({
##     .value <- digamma(x)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- trigamma(x)
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(digamma(x)), "x")
## {
##     .value <- digamma(x)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- trigamma(x)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(trigamma(x), "x")
## psigamma(x, 2L)
D(expression(trigamma(x)), "x")
## psigamma(x, 2L)
deriv(~trigamma(x), "x")
## expression({
##     .value <- trigamma(x)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- psigamma(x, 2L)
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(trigamma(x)), "x")
## {
##     .value <- trigamma(x)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- psigamma(x, 2L)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(psigamma(x, deriv = 5), "x")
## psigamma(x, 6)
D(expression(psigamma(x, deriv = 5)), "x")
## psigamma(x, 6L)
deriv(~psigamma(x, deriv = 5), "x")
## expression({
##     .value <- psigamma(x, deriv = 5)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- psigamma(x, 6L)
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(psigamma(x, deriv = 5)), "x")
## {
##     .value <- psigamma(x, deriv = 5)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- psigamma(x, 6)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(x*y, "x")
## y
D(expression(x*y), "x")
## y
deriv(~x*y, "x")
## expression({
##     .value <- x * y
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- y
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(x*y), "x")
## {
##     .value <- x * y
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- y
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(x/y, "x")
## 1/y
D(expression(x/y), "x")
## 1/y
deriv(~x/y, "x")
## expression({
##     .value <- x/y
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- 1/y
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(x/y), "x")
## {
##     .value <- x/y
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- 1/y
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(x^y, "x")
## y * x^(y - 1)
D(expression(x^y), "x")
## x^(y - 1) * y
deriv(~x^y, "x")
## expression({
##     .value <- x^y
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- x^(y - 1) * y
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(x^y), "x")
## {
##     .value <- x^y
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- y * x^(y - 1)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv((x), "x")
## [1] 1
D(expression((x)), "x")
## [1] 1
deriv(~(x), "x")
## expression({
##     .value <- (x)
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- 1
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote((x)), "x")
## {
##     .value <- (x)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- 1
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(+x, "x")
## [1] 1
D(expression(+x), "x")
## [1] 1
deriv(~ +x, "x")
## expression({
##     .value <- +x
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- 1
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(+x), "x")
## {
##     .value <- +x
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- 1
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(-x, "x")
## [1] -1
D(expression(- x), "x")
## -1
deriv(~ -x, "x")
## expression({
##     .value <- -x
##     .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
##     .grad[, "x"] <- -1
##     attr(.value, "gradient") <- .grad
##     .value
## })
fnDeriv(quote(-x), "x")
## {
##     .value <- -x
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- -1
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(abs(x), "x")
## sign(x)
try(D(expression(abs(x)), "x")) # 'abs' not in derivatives table
try(deriv(~ abs(x), "x"))
fnDeriv(quote(abs(x)), "x")
## {
##     .value <- abs(x)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- sign(x)
##     attr(.value, "gradient") <- .grad
##     .value
## }
Deriv(sign(x), "x")
## [1] 0
try(D(expression(sign(x)), "x")) # 'sign' not in derivatives table
try(deriv(~ sign(x), "x"))
fnDeriv(quote(sign(x)), "x")
## {
##     .value <- sign(x)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
##     .grad[, "x"] <- 0
##     attr(.value, "gradient") <- .grad
##     .value
## }

Notes:

Simplifying algebraic expressions

nls14 also includes some tools for simplification of algebraic expressions. Such simplification does not appear to be available / exposed in the

# Various simplifications

Simplify(quote(+(a+b)))
## a + b
Simplify(quote(-5))
## [1] -5
Simplify(quote(--(a+b)))
## a + b
Simplify(quote(exp(log(a+b))))
## a + b
Simplify(quote(exp(1)))
## [1] 2.7183
Simplify(quote(log(exp(a+b))))
## a + b
Simplify(quote(log(1)))
## [1] 0
Simplify(quote(!TRUE))
## [1] FALSE
Simplify(quote(!FALSE))
## [1] TRUE
Simplify(quote((a+b)))
## a + b
Simplify(quote(a + b + 0))
## a + b
Simplify(quote(0 + a + b))
## a + b
Simplify(quote((a+b) + (a+b)))
## 2 * (a + b)
Simplify(quote(1 + 4))
## [1] 5
Simplify(quote(a + b - 0))
## a + b
Simplify(quote(0 - a - b))
## -a - b
Simplify(quote((a+b) - (a+b)))
## [1] 0
Simplify(quote(5 - 3))
## [1] 2
Simplify(quote(0*(a+b)))
## [1] 0
Simplify(quote((a+b)*0))
## [1] 0
Simplify(quote(1L * (a+b)))
## a + b
Simplify(quote((a+b) * 1))
## a + b
Simplify(quote((-1)*(a+b)))
## -(a + b)
Simplify(quote((a+b)*(-1)))
## -(a + b)
Simplify(quote(2*5))
## [1] 10
Simplify(quote((a+b) / 1))
## a + b
Simplify(quote((a+b) / (-1)))
## -(a + b)
Simplify(quote(0/(a+b)))
## [1] 0
Simplify(quote(1/3))
## [1] 0.33333
Simplify(quote((a+b) ^ 1))
## a + b
Simplify(quote(2^10))
## [1] 1024
Simplify(quote(log(exp(a), 3)))
## a/1.09861228866811
Simplify(quote(FALSE && b))
## [1] FALSE
Simplify(quote(a && TRUE))
## a
Simplify(quote(TRUE && b))
## b
Simplify(quote(a || TRUE))
## [1] TRUE
Simplify(quote(FALSE || b))
## b
Simplify(quote(a || FALSE))
## a
Simplify(quote(if (TRUE) a+b))
## a + b
Simplify(quote(if (FALSE) a+b))
## NULL
Simplify(quote(if (TRUE) a+b else a*b))
## a + b
Simplify(quote(if (FALSE) a+b else a*b))
## a * b
Simplify(quote(if (cond) a+b else a+b))
## a + b
# This one was wrong...
Simplify(quote(--(a+b)))
## a + b

Comparison with other approaches

There is at least one other symbolic package for R. Here we look at Ryacas. The following example was provided by Gabor Grothendieck.

library(nls14)
dnls14 <- Deriv(sin(x+y), "x")
print(dnls14)
## cos(x + y)
class(dnls14)
## [1] "call"
library(Ryacas)
## 
## Attaching package: 'Ryacas'
## 
## The following object is masked from 'package:nls14':
## 
##     Simplify
x <- Sym("x")
y <- Sym("y")
dryacas <- deriv(sin(x+y), x)
print(dryacas)
## [1] "Starting Yacas!"
## Cos(x+y);
class(dryacas)
## [1] "Sym"       "character"

Examples of use

Nonlinear least squares with expressions

We use the Hobbs Weeds problem (Nash, 1979 and Nash, 2014). Note that nls() fails from start1.

require(nls14)
traceval <- FALSE
# Data for Hobbs problem
ydat  <-  c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 
          38.558, 50.156, 62.948, 75.995, 91.972) # for testing
tdat  <-  seq_along(ydat) # for testing

# A simple starting vector -- must have named parameters for nls14xb, nls, wrap14nls.
start1  <-  c(b1=1, b2=1, b3=1)

eunsc  <-   y ~ b1/(1+b2*exp(-b3*tt))

cat("LOCAL DATA IN DATA FRAMES\n")
## LOCAL DATA IN DATA FRAMES
weeddata1  <-  data.frame(y=ydat, tt=tdat)
weeddata2  <-  data.frame(y=1.5*ydat, tt=tdat)

anls14xb1  <-  try(nls14xb(eunsc, start=start1, trace=TRUE, data=weeddata1,
                     control=list(watch=FALSE)))
## formula: y ~ b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
## 
## $phi
## [1] 1
## 
## $lamda
## [1] 1e-04
## 
## $offset
## [1] 100
## 
## $laminc
## [1] 10
## 
## $lamdec
## [1] 4
## 
## $femax
## [1] 10000
## 
## $jemax
## [1] 5000
## 
## $rofftest
## [1] TRUE
## 
## $smallsstest
## [1] TRUE
## 
## vn:[1] "y"  "b1" "b2" "b3" "tt"
## datvar:[1] "y"  "tt"
## Data variable  y : [1]  5.308  7.240  9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable  tt : [1]  1  2  3  4  5  6  7  8  9 10 11 12
## Finished masks check
## model2rjfun: modelformula = y ~ b1/(1 + b2 * exp(-b3 * tt))
## [1] "formula"
## trjfn:
## function (prm) 
## {
##     if (is.null(names(prm))) 
##         names(prm) <- names(pvec)
##     localdata <- list2env(as.list(prm), parent = data)
##     eval(residexpr, envir = localdata)
## }
## <environment: 0x20de638>
## Starting pnum=b1 b2 b3 
##  1  1  1 
## Start:lamda: 1e-04  SS= 23521  at  b1 = 1  b2 = 1  b3 = 1  1 / 0
## lamda: 0.001  SS= 24356  at  b1 = 50.886  b2 = -240.72  b3 = -372.91  2 / 1
## lamda: 0.01  SS= 24356  at  b1 = 50.183  b2 = -190.69  b3 = -334.2  3 / 1
## lamda: 0.1  SS= 24356  at  b1 = 46.97  b2 = -25.309  b3 = -194.81  4 / 1
## lamda: 1  SS= 24356  at  b1 = 38.096  b2 = 29.924  b3 = -64.262  5 / 1
## lamda: 10  SS= 24353  at  b1 = 18.798  b2 = 3.551  b3 = -2.9011  6 / 1
## <<lamda: 4  SS= 21065  at  b1 = 4.1015  b2 = 0.86688  b3 = 1.3297  7 / 1
## <<lamda: 1.6  SS= 16852  at  b1 = 10.248  b2 = 1.2302  b3 = 1.0044  8 / 2
## lamda: 16  SS= 24078  at  b1 = 20.944  b2 = 2.9473  b3 = -0.40959  9 / 3
## <<lamda: 6.4  SS= 15934  at  b1 = 11.771  b2 = 1.3084  b3 = 0.98087  10 / 3
## <<lamda: 2.56  SS= 14050  at  b1 = 15.152  b2 = 1.6468  b3 = 0.80462  11 / 4
## <<lamda: 1.024  SS= 10985  at  b1 = 22.005  b2 = 2.6632  b3 = 0.45111  12 / 5
## <<lamda: 0.4096  SS= 6427.1  at  b1 = 35.128  b2 = 4.7135  b3 = 0.50974  13 / 6
## <<lamda: 0.16384  SS= 4725  at  b1 = 52.285  b2 = 9.2948  b3 = 0.31348  14 / 7
## <<lamda: 0.065536  SS= 1132.2  at  b1 = 76.446  b2 = 15.142  b3 = 0.41095  15 / 8
## <<lamda: 0.026214  SS= 518.76  at  b1 = 96.924  b2 = 24.422  b3 = 0.35816  16 / 9
## <<lamda: 0.010486  SS= 79.464  at  b1 = 122.18  b2 = 35.262  b3 = 0.36484  17 / 10
## <<lamda: 0.0041943  SS= 26.387  at  b1 = 141.55  b2 = 42.387  b3 = 0.35215  18 / 11
## <<lamda: 0.0016777  SS= 11.339  at  b1 = 160.02  b2 = 45.519  b3 = 0.33738  19 / 12
## <<lamda: 0.00067109  SS= 5.2767  at  b1 = 176.92  b2 = 47.037  b3 = 0.32443  20 / 13
## <<lamda: 0.00026844  SS= 2.9656  at  b1 = 189.12  b2 = 48.279  b3 = 0.31712  21 / 14
## <<lamda: 0.00010737  SS= 2.6003  at  b1 = 194.72  b2 = 48.92  b3 = 0.31429  22 / 15
## <<lamda: 4.295e-05  SS= 2.5873  at  b1 = 196.04  b2 = 49.075  b3 = 0.31364  23 / 16
## <<lamda: 1.718e-05  SS= 2.5873  at  b1 = 196.18  b2 = 49.091  b3 = 0.31357  24 / 17
## <<lamda: 6.8719e-06  SS= 2.5873  at  b1 = 196.19  b2 = 49.092  b3 = 0.31357  25 / 18
print(anls14xb1)
## nls14 class object: x 
## residual sumsquares =  2.5873  on  12 observations
##     after  18    Jacobian and  25 function evaluations
##   name            coeff          SE       tstat      pval      gradient    JSingval   
## b1               196.186         11.31      17.35  3.164e-08  -2.663e-07        1011  
## b2               49.0916         1.688      29.08  3.281e-10    1.59e-07      0.4605  
## b3               0.31357      0.006863      45.69  5.768e-12  -5.531e-05     0.04715
anlsb1 <-  try(nls(eunsc, start=start1, trace=TRUE, data=weeddata1))
## 23521 :  1 1 1
print(anlsb1)
## [1] "Error in nls(eunsc, start = start1, trace = TRUE, data = weeddata1) : \n  singular gradient\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in nls(eunsc, start = start1, trace = TRUE, data = weeddata1): singular gradient>

A different start causes nls14xb to return a large sum of squares. Note that nls() again fails.

startf1  <-  c(b1=1, b2=1, b3=.1)
anlsf1 <-  try(nls(eunsc, start=startf1, trace=TRUE, data=weeddata1))
## 23756 :  1.0 1.0 0.1
print(anlsf1)
## [1] "Error in nls(eunsc, start = startf1, trace = TRUE, data = weeddata1) : \n  singular gradient\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in nls(eunsc, start = startf1, trace = TRUE, data = weeddata1): singular gradient>
anls14xf1  <-  try(nls14xb(eunsc, start=startf1, trace=TRUE, data=weeddata1,
                     control=list(watch=FALSE)))
## formula: y ~ b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
## 
## $phi
## [1] 1
## 
## $lamda
## [1] 1e-04
## 
## $offset
## [1] 100
## 
## $laminc
## [1] 10
## 
## $lamdec
## [1] 4
## 
## $femax
## [1] 10000
## 
## $jemax
## [1] 5000
## 
## $rofftest
## [1] TRUE
## 
## $smallsstest
## [1] TRUE
## 
## vn:[1] "y"  "b1" "b2" "b3" "tt"
## datvar:[1] "y"  "tt"
## Data variable  y : [1]  5.308  7.240  9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable  tt : [1]  1  2  3  4  5  6  7  8  9 10 11 12
## Finished masks check
## model2rjfun: modelformula = y ~ b1/(1 + b2 * exp(-b3 * tt))
## [1] "formula"
## trjfn:
## function (prm) 
## {
##     if (is.null(names(prm))) 
##         names(prm) <- names(pvec)
##     localdata <- list2env(as.list(prm), parent = data)
##     eval(residexpr, envir = localdata)
## }
## <environment: 0x32405f0>
## Starting pnum= b1  b2  b3 
## 1.0 1.0 0.1 
## Start:lamda: 1e-04  SS= 23756  at  b1 = 1  b2 = 1  b3 = 0.1  1 / 0
## lamda: 0.001  SS= 24356  at  b1 = 416.72  b2 = 821.43  b3 = -40.851  2 / 1
## lamda: 0.01  SS= 196562  at  b1 = 162.93  b2 = 368.16  b3 = 7.5932  3 / 1
## <<lamda: 0.004  SS= 10597  at  b1 = 24.764  b2 = 108.11  b3 = 31.926  4 / 1
## <<lamda: 0.0016  SS= 9205.5  at  b1 = 35.486  b2 = 108.11  b3 = 31.926  5 / 2
## <<lamda: 0.00064  SS= 9205.4  at  b1 = 35.532  b2 = 108.11  b3 = 31.926  6 / 3
## <<lamda: 0.000256  SS= 9205.4  at  b1 = 35.532  b2 = 108.11  b3 = 31.926  7 / 4
print(anls14xf1)
## nls14 class object: x 
## residual sumsquares =  9205.4  on  12 observations
##     after  4    Jacobian and  7 function evaluations
##   name            coeff          SE       tstat      pval      gradient    JSingval   
## b1               35.5321            NA         NA         NA  -6.684e-07       3.464  
## b2               108.109            NA         NA         NA  -1.464e-11   5.015e-11  
## b3               31.9262            NA         NA         NA   1.583e-09   6.427e-27
# anls14xb2  <-  try(nls14xb(eunsc, start=start1, trace=FALSE, data=weeddata2))
# print(anls14xb2)

We can discover quickly the difficulty here by computing the Jacobian at this “solution” and checking its singular values.

cf1 <- coef(anls14xf1)
print(cf1)
##      b1      b2      b3 
##  35.532 108.109  31.926 
## attr(,"pkgname")
## [1] "nls14"
jf1 <- anls14xf1$jacobian
svals <- svd(jf1)$d
print(svals)
## [1] 3.4641e+00 5.0146e-11 6.4267e-27

Here we see that the Jacobian is only rank 1, even though there are 3 coefficients. It is therefore not surprising that our nonlinear least squares program has concluded we are unable to make further progress.

Nonlinear least squares with functions

We can run the same example as above using R functions rather than expressions, but now we need to have a gradient function as well as one to compute residuals. nls14 has tools to create these functions from expressions, as we shall see here. First we again set up the data and load the package.

require(nls14)
traceval <- FALSE
# Data for Hobbs problem
ydat  <-  c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 
          38.558, 50.156, 62.948, 75.995, 91.972) # for testing
tdat  <-  seq_along(ydat) # for testing

# A simple starting vector -- must have named parameters for nls14xb, nls, wrap14nls.
start1  <-  c(b1=1, b2=1, b3=1)

eunsc  <-   y ~ b1/(1+b2*exp(-b3*tt))

cat("LOCAL DATA IN DATA FRAMES\n")
## LOCAL DATA IN DATA FRAMES
weeddata1  <-  data.frame(y=ydat, tt=tdat)
weedrj <- model2rjfun(modelformula=eunsc, pvec=start1, data=weeddata1)
## model2rjfun: modelformula = y ~ b1/(1 + b2 * exp(-b3 * tt))
## [1] "formula"
weedrj
## function (prm) 
## {
##     if (is.null(names(prm))) 
##         names(prm) <- names(pvec)
##     localdata <- list2env(as.list(prm), parent = data)
##     eval(residexpr, envir = localdata)
## }
## <environment: 0x2a44e98>
modeldoc(weedrj) # Note how useful this is to report status
## FUNCTION weedrj 
## Formula: y ~ b1/(1 + b2 * exp(-b3 * tt))
## Code:        expression({
##     .expr3 <- exp(-b3 * tt)
##     .expr5 <- 1 + b2 * .expr3
##     .expr10 <- .expr5^2
##     .value <- b1/.expr5 - y
##     .grad <- array(0, c(length(.value), 3L), list(NULL, c("b1", 
##         "b2", "b3")))
##     .grad[, "b1"] <- 1/.expr5
##     .grad[, "b2"] <- -(b1 * .expr3/.expr10)
##     .grad[, "b3"] <- b1 * (b2 * (.expr3 * tt))/.expr10
##     attr(.value, "gradient") <- .grad
##     .value
## })
## Parameters:   b1, b2, b3 
## Data:         y, tt 
## 
## VALUES
## Observations:     12 
## Parameters:
## b1 b2 b3 
##  1  1  1 
## Data (length 12):
##         y tt
## 1   5.308  1
## 2   7.240  2
## 3   9.638  3
## 4  12.866  4
## 5  17.069  5
## 6  23.192  6
## 7  31.443  7
## 8  38.558  8
## 9  50.156  9
## 10 62.948 10
## 11 75.995 11
## 12 91.972 12

check modelexpr() works with an ssgrfun ??

test model2rjfun vs model2rjfunx ??

Why is resss difficult to use in optimization?

y <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558,
50.156, 62.948, 75.995, 91.972)
tt <- seq_along(y) # for testing
mydata <- data.frame(y = y, tt = tt)
f <- y ~ 100*b1/(1 + 10*b2 * exp(-0.1 * b3 * tt))
p <- c(b1 = 1, b2 = 1, b3 = 1)
rjfn <- model2rjfun(f, p, data = mydata)
## model2rjfun: modelformula = y ~ 100 * b1/(1 + 10 * b2 * exp(-0.1 * b3 * tt))
## [1] "formula"
rjfn(p)
##  [1]   4.64386   3.64458   2.25518   0.11562  -2.91533  -7.77921 -14.68094
##  [8] -20.35397 -30.41538 -41.57497 -52.89343 -67.04642
## attr(,"gradient")
##            b1       b2       b3
##  [1,]  9.9519  -8.9615  0.89615
##  [2,] 10.8846  -9.6998  1.93997
##  [3,] 11.8932 -10.4787  3.14361
##  [4,] 12.9816 -11.2964  4.51856
##  [5,] 14.1537 -12.1504  6.07520
##  [6,] 15.4128 -13.0373  7.82235
##  [7,] 16.7621 -13.9524  9.76668
##  [8,] 18.2040 -14.8902 11.91213
##  [9,] 19.7406 -15.8437 14.25933
## [10,] 21.3730 -16.8050 16.80496
## [11,] 23.1016 -17.7647 19.54122
## [12,] 24.9256 -18.7127 22.45528
myfn <- function(p, resfn=rjfn){
  val <- resss(p, resfn)
}

p <- c(b1 = 2, b2=2, b3=1)

a1 <- optim(p, myfn, control=list(trace=0))
a1
## $par
##     b1     b2     b3 
## 1.9618 4.9092 3.1358 
## 
## $value
## [1] 2.5873
## 
## $counts
## function gradient 
##      200       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

We can also embed the function directly.

a2 <- optim(p, function(p,resfn=rjfn){resss(p,resfn)}, control=list(trace=0))
a2
## $par
##     b1     b2     b3 
## 1.9618 4.9092 3.1358 
## 
## $value
## [1] 2.5873
## 
## $counts
## function gradient 
##      200       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

Need more extensive discussion of Simplify??

Using derivative functions to generate gradient functions

One of the common needs for optimization computations is the availability of accurate gradients of the objective functions. While differentiation is relatively straightforward, it is tedious and error-prone.

At 2015-1-24, I have not determined how to automate the use the output of the derivatives generated by nls14 to create a working gradient function. However, the following write-up shows how such a function can be generated in a semi-automatic way.

We use an example that appeared on the R-help mailing list on Jan 14, 2015. Responses by Ravi Varadhan and others, along with some modification I made, gave the following negative log likelihood function to be minimized.

y=c(5,11,21,31,46,75,98,122,145,165,195,224,245,293,321,330,350,420) # data set

Nweibull2 <- function(x,prm){
  la <- prm[1]
  al <- prm[2]
  be<- prm[3]
  val2 <- la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) )
  val2
}
LL2J <- function(par,y) {
R = Nweibull2(y,par)
-sum(log(R))
}

We want the gradient of LL3J() with respect to par, and first compute the derivatives of Nweibull2() with respect to the paramters prm

We start with the central expression in Nweibull2() and compute its partial derivatives. The expression is:

  la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) )
# Put in the main expression for the Nweibull pdf.
require(nls14)

## we generate the three gradient components
g1n <- Deriv(la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) ), "la")
g1n
## la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) * 
##     (al * (1 - exp((x/al)^be)))) + be * (x/al)^(be - 1) * exp((x/al)^be + 
##     la * al * (1 - exp((x/al)^be)))
g2n <- Deriv(la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) ), "al")
g2n
## la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) * 
##     (be * (x/al)^(be - 1) * -(x/al^2) + (la * al * -(exp((x/al)^be) * 
##         (be * (x/al)^(be - 1) * -(x/al^2))) + la * (1 - exp((x/al)^be))))) + 
##     la * be * ((be - 1) * (x/al)^(be - 1 - 1) * -(x/al^2)) * 
##         exp((x/al)^be + la * al * (1 - exp((x/al)^be)))
g3n <- Deriv(la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) ), "be")
g3n
## la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) * 
##     ((x/al)^be * log(x/al) + la * al * -(exp((x/al)^be) * ((x/al)^be * 
##         log(x/al))))) + (la * be * ((x/al)^(be - 1) * log(x/al)) + 
##     la * (x/al)^(be - 1)) * exp((x/al)^be + la * al * (1 - exp((x/al)^be)))

By copying and pasting the output above into a function structure, we get Nwei2g() below.

Nwei2g <- function(x, prm){
  la <- prm[1]
  al <- prm[2]
  be<- prm[3]
g1v <- la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) * 
    (al * (1 - exp((x/al)^be)))) + be * (x/al)^(be - 1) * exp((x/al)^be + 
    la * al * (1 - exp((x/al)^be)))

g2v <- la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) * 
    (be * (x/al)^(be - 1) * -(x/al^2) + (la * al * -(exp((x/al)^be) * 
        (be * (x/al)^(be - 1) * -(x/al^2))) + la * (1 - exp((x/al)^be))))) + 
    la * be * ((be - 1) * (x/al)^(be - 1 - 1) * -(x/al^2)) * 
        exp((x/al)^be + la * al * (1 - exp((x/al)^be)))

g3v <- la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) * 
    ((x/al)^be * log(x/al) + la * al * -(exp((x/al)^be) * ((x/al)^be * 
        log(x/al))))) + (la * be * ((x/al)^(be - 1) * log(x/al)) + 
    la * (x/al)^(be - 1)) * exp((x/al)^be + la * al * (1 - exp((x/al)^be)))
gg <- matrix(data=c(g1v, g2v, g3v), ncol=3)
}

We can check this gradient functionusing the grad() function from package numDeriv.

start1 <- c(lambda=.01,alpha=340,beta=.8)
start2 <- c(lambda=.01,alpha=340,beta=.7)

require(numDeriv)
## Loading required package: numDeriv
ganwei <- Nwei2g(y, prm=start1)

require(numDeriv)
Nw <- function(x, y) {
   Nweibull2(y, x)
} # to allow grad() to work

gnnwei <- matrix(NA, nrow=length(y), ncol=3)
for (i in 1:length(y)){
   gnrow <- grad(Nw, x=start1, y=y[i])
   gnnwei[i,] <- gnrow
}
gnnwei
##             [,1]        [,2]        [,3]
##  [1,]  1.5080091  7.5763e-06 -4.4573e-02
##  [2,]  1.0470119  4.3477e-06 -2.1663e-02
##  [3,]  0.6473921  1.6572e-06 -7.3707e-03
##  [4,]  0.4021585  1.7784e-07 -9.4940e-04
##  [5,]  0.1635446 -1.0061e-06  3.5893e-03
##  [6,] -0.0815624 -1.6713e-06  6.0571e-03
##  [7,] -0.1689654 -1.5386e-06  5.8709e-03
##  [8,] -0.2048835 -1.1819e-06  5.0149e-03
##  [9,] -0.2077443 -7.9701e-07  4.0222e-03
## [10,] -0.1955391 -4.9171e-07  3.1859e-03
## [11,] -0.1644138 -1.3523e-07  2.1110e-03
## [12,] -0.1297028  8.2595e-08  1.3249e-03
## [13,] -0.1055059  1.7196e-07  9.0488e-04
## [14,] -0.0598567  2.2613e-07  3.1939e-04
## [15,] -0.0406518  2.0242e-07  1.4872e-04
## [16,] -0.0355949  1.9079e-07  1.1187e-04
## [17,] -0.0261120  1.6182e-07  5.3031e-05
## [18,] -0.0075317  6.8245e-08 -1.1995e-05
ganwei
##             [,1]        [,2]        [,3]
##  [1,]  1.5080091  7.5763e-06 -4.4573e-02
##  [2,]  1.0470119  4.3477e-06 -2.1663e-02
##  [3,]  0.6473921  1.6572e-06 -7.3707e-03
##  [4,]  0.4021585  1.7784e-07 -9.4940e-04
##  [5,]  0.1635446 -1.0061e-06  3.5893e-03
##  [6,] -0.0815624 -1.6713e-06  6.0571e-03
##  [7,] -0.1689654 -1.5386e-06  5.8709e-03
##  [8,] -0.2048835 -1.1819e-06  5.0149e-03
##  [9,] -0.2077443 -7.9701e-07  4.0222e-03
## [10,] -0.1955391 -4.9171e-07  3.1859e-03
## [11,] -0.1644138 -1.3523e-07  2.1110e-03
## [12,] -0.1297028  8.2595e-08  1.3249e-03
## [13,] -0.1055059  1.7196e-07  9.0488e-04
## [14,] -0.0598567  2.2613e-07  3.1939e-04
## [15,] -0.0406518  2.0242e-07  1.4872e-04
## [16,] -0.0355949  1.9079e-07  1.1187e-04
## [17,] -0.0261120  1.6182e-07  5.3031e-05
## [18,] -0.0075317  6.8245e-08 -1.1995e-05
cat("max(abs(gnnwei - ganwei))= ",   max(abs(gnnwei - ganwei)),"\n")
## max(abs(gnnwei - ganwei))=  2.8041e-11

Now we can build the gradient of the objective function. This requires an application of the chain rule to the summation of logs of the elements of the quantity R. Since the derivative of log(R) w.r.t. R is simply 1/R, this is relatively simple. However, I have not found how to automate this.

## and now we can build the gradient of LL2J
LL2Jg <- function(prm, y) {
    R = Nweibull2(y,prm)
    gNN <- Nwei2g(y, prm)
#    print(str(gNN)
    gL <- - as.vector( t(1/R) %*% gNN) 
}
# test
gaLL2J <- LL2Jg(start1, y)
gaLL2J
## [1] 3365.78244   -0.01441  -18.63351
gnLL2J <- grad(LL2J, start1, y=y)
gnLL2J
## [1] 3365.78244   -0.01441  -18.63351
cat("max(abs(gaLL2J-gnLL2J))= ", max(abs(gaLL2J-gnLL2J)), "\n" )
## max(abs(gaLL2J-gnLL2J))=  1.8254e-08

Issues of programming on the language

One of the key tasks with tools for derivatives is that of taking objects in one or other form (that is, R class) and using it as an input for a symbolic function. The object may, of course, be an output from another such function, and this is one of the reasons we need to do such transformations.

We also note that the different tools for symbolic derivatives use slightly different inputs. For example, for the derivative of log(x), we have

require(nls14)
dlogx <- nls14::Deriv(log(x), "x")
str(dlogx)
##  language 1/x
print(dlogx)
## 1/x

Unfortunately, there are complications when we have an expression object, and we need to specify that we do NOT execute the substitute() function. Here we show how to do this implicitly and with an explicit object.

dlogxs <- nls14::Deriv(expression(log(x)), "x", do_substitute=FALSE)
str(dlogxs)
##   expression(1/x)
print(dlogxs)
## expression(1/x)
cat(as.character(dlogxs), "\n")
## 1/x
fne <- expression(log(x))
dlogxe <- Deriv(fne, "x", do_substitute=FALSE)
str(dlogxe)
##   expression(1/x)
print(dlogxe)
## expression(1/x)
# base R
dblogx <- D(expression(log(x)), "x")
str(dblogx)
##  language 1/x
print(dblogx)
## 1/x
require(Deriv)
## Loading required package: Deriv
## 
## Attaching package: 'Deriv'
## 
## The following object is masked from 'package:Ryacas':
## 
##     Simplify
## 
## The following objects are masked from 'package:nls14':
## 
##     Deriv, Simplify
ddlogx <- Deriv::Deriv(expression(log(x)), "x")
str(ddlogx)
##   expression(1/x)
print(ddlogx)
## expression(1/x)
cat(as.character(ddlogx), "\n")
## 1/x
ddlogxf <- ~ ddlogx
str(ddlogxf)
## Class 'formula' length 2 ~ddlogx
##   ..- attr(*, ".Environment")=<environment: R_GlobalEnv>

Indexed parameters or variables

Erin Hodgess on R-help in January 2015 raised the issue of taking the derivative of an expression that contains an indexed variable. We show the example and its resolution, then give an explanation.

zzz <- expression(y[3]*r1 + r2)
try(deriv(zzz,c("r1","r2")))
require(nls14)
try(nls14::Deriv(zzz, c("r1","r2")))
## Warning in if (as.character(expr) == name) return(1) else return(0): the
## condition has length > 1 and only the first element will be used
## [1] 0
try(fnDeriv(zzz, c("r1","r2")))
newDeriv(`[`(x,y), stop("no derivative when indexing"))
try(nls14::Deriv(zzz, c("r1","r2")))
## Warning in if (as.character(expr) == name) return(1) else return(0): the
## condition has length > 1 and only the first element will be used
## [1] 0
try(nls14::fnDeriv(zzz, c("r1","r2")))
## {
##     .expr1 <- y[3]
##     .expr2 <- stop("no derivative when indexing")
##     .expr3 <- .expr2 * r1
##     .value <- .expr1 * r1 + r2
##     .grad <- array(0, c(length(.value), 2L), list(NULL, c("r1", 
##     "r2")))
##     .grad[, "r1"] <- .expr1 + .expr3
##     .grad[, "r2"] <- .expr3 + 1
##     attr(.value, "gradient") <- .grad
##     .value
## }

Richard Heiberger pointed out that internally, R stores

y[3]

as

"["(y,3)

that is, as a function. Duncan Murdoch pointed out the availability of nls14 and the use of newDeriv() to redefine the “[” function for the purposes of derivatives.

This is not an ideal resolution, especially as we would like to be able to get the gradients of functions with respect to vectors of parameters (Noted also by Sergei Sokol in the manual for package Deriv). The following examples illustrate this.

try(nls14::Deriv(zzz, "y[3]"))
## [1] 0
try(nls14::Deriv(y3*r1+r2,"y3"))
## r1
try(nls14::Deriv(y[3]*r1+r2,"y[3]"))
## stop("no derivative when indexing") * r1