A Quick Introduction to the psqn Package

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

\[\renewcommand\vec{\boldsymbol} \def\bigO#1{\mathcal{O}(#1)} \def\Cond#1#2{\left(#1\,\middle|\, #2\right)} \def\mat#1{\boldsymbol{#1}} \def\der{{\mathop{}\!\mathrm{d}}} \def\argmax{\text{arg}\,\text{max}} \def\Prob{\text{P}} \def\diag{\text{diag}} \def\argmin{\text{arg}\,\text{min}} \def\Expe{\text{E}}\]

This is a quick introduction to the psqn package. A more detailed description can be found in the psqn vignette (call vignette("psqn", package = "psqn")). The main function in the package is the psqn function. The psqn function minimizes functions which can be written like:

\[\vec x_{\mathcal I_i} = (\vec e_{j_{i1}}^\top, \dots ,\vec e_{j_{im_i}}^\top)\vec x, \qquad \mathcal I_i = (j_{i1}, \dots, \mathcal j_{im_i}) \subseteq \{1, \dots, l\},\]

and \(\vec e_k\) is the \(k\)’th column of the \(l\) dimensional identity matrix. We call the \(f_i\)s element functions and assume that each of them only depend on a small number of variables. Furthermore, we assume that each index set \(\mathcal I_i\) is of the form:

\[\begin{align*} \mathcal I_i &= \{1,\dots, p\} \cup \mathcal J_i \\ \mathcal J_i \cap \mathcal J_j &= \emptyset \qquad j\neq i \\ \mathcal J_i \cap \{1,\dots, p\} &= \emptyset \qquad \forall i = 1,\dots, n \end{align*}.\]

That is, each index set contains \(p\) global parameters and \(q_i = \lvert\mathcal J_i\rvert\) private parameters which are particular for each element function, \(f_i\). For implementation reason, we let:

\[\begin{align*} \overleftarrow q_i &= \begin{cases} p & i = 0 \\ p + \sum_{k = 1}^i q_k & i > 0 \end{cases} \\ \mathcal J_i &= \{1 + \overleftarrow q_{i - 1}, \dots , q_i + \overleftarrow q_{i - 1}\} \end{align*}\]

such that the element functions’ private parameters lies in consecutive parts of \(\vec x\). There is also a less restricted optimizer called optimizer_generic were the parameters can overlap in an arbitrary way. The R interface for this function is implemented in the psqn_generic function. See vignette("psqn", package = "psqn") for further details on both the psqn and the psqn_generic function.

The R Interface

As a simple example, we consider the element functions:

\[ f_i((\vec\beta^\top, \vec u^\top_i)^\top) = -\vec y_i(\mat X_i\vec\beta + \mat Z_i\vec u_i) + \sum_{k = 1}^{t_i} \log(1 + \exp(\vec x_{ik}^\top\vec\beta + \vec z_{ik}^\top\vec u_i)) + \frac 12 \vec u^\top_i\mat\Sigma^{-1} \vec u_i. \]

\(\vec\beta\) is the \(p\) dimensional global parameter and \(\vec u_i\) is the \(q_i = q\) dimensional private parameters for the \(i\)th element function. \(\vec y_i\in \{0, 1\}^{t_i}\), \(\mat X_i\in\mathbb R^{t_i\times p}\), and \(\mat Z_i\in\mathbb R^{t_i\times q}\) are particular to each element function. We simulate some data below to use:

# assign global parameters, number of private parameters, etc.
q <- 4 # number of private parameters per cluster
p <- 5 # number of global parameters
beta <- sqrt((1:p) / sum(1:p))
Sigma <- diag(q)

# simulate a data set
n_ele_func <- 1000L # number of element functions
set.seed(80919915)

sim_dat <- replicate(n_ele_func, {
  t_i <- sample.int(40L, 1L) + 2L
  X <- matrix(runif(p * t_i, -sqrt(6 / 2), sqrt(6 / 2)), 
              p)
  u <- drop(rnorm(q) %*% chol(Sigma))
  Z <- matrix(runif(q * t_i, -sqrt(6 / 2 / q), sqrt(6 / 2 / q)), 
              q)
  eta <- drop(beta %*% X + u %*% Z)
  y <- as.numeric((1 + exp(-eta))^(-1) > runif(t_i))
  
  list(X = X, Z = Z, y = y, u = u, Sigma_inv = solve(Sigma))
}, simplify = FALSE)

# data for the first element function
sim_dat[[1L]]
#> $X
#>        [,1]   [,2]   [,3]   [,4]   [,5]    [,6]   [,7]   [,8]   [,9]  [,10]
#> [1,]  0.147 -1.246  0.840 -1.083  0.701 -1.6532  0.549  1.522 -0.449  0.254
#> [2,] -0.241  1.391 -0.873 -0.877  1.005  1.5401 -0.207 -0.999  1.249  1.379
#> [3,] -1.713  0.698  1.550 -1.335  0.687  0.0999  0.688  0.493 -0.992  0.780
#> [4,]  1.116  1.687  0.557 -1.380  0.294  1.2391 -1.331 -0.459  0.262 -1.351
#> [5,] -0.391  1.310 -1.477 -0.836 -1.542  1.3278 -0.788 -0.675 -1.184  0.208
#>        [,11]  [,12]   [,13]  [,14]  [,15]  [,16]   [,17]  [,18]    [,19]  [,20]
#> [1,]  0.3633 -0.578  1.5378  1.488  0.653  1.707 -1.4558 -1.396  0.58917  1.473
#> [2,] -0.4122 -0.616 -1.2711  0.256 -1.494  0.615 -0.4410  0.114 -0.56704 -0.261
#> [3,]  0.0818 -0.272 -1.4706  1.060 -0.959 -1.141  0.0916 -0.928  1.68352 -0.155
#> [4,] -1.4245  1.716 -0.9433  0.428  1.670 -0.254 -0.1064 -0.245  0.00692  0.161
#> [5,]  1.6009  1.628  0.0971 -0.818  0.402 -0.497 -1.3034  0.636  0.72653 -0.425
#>       [,21]   [,22]   [,23]  [,24]  [,25]   [,26]  [,27]    [,28]  [,29]  [,30]
#> [1,] -1.086 -0.8711  1.2213  0.698  0.721  0.9319 -0.326 -0.00238 -1.164  0.203
#> [2,]  0.397  1.1903 -0.3113 -0.837  1.501 -0.0304  1.509 -0.17466  0.547 -0.667
#> [3,]  0.440  0.0235 -0.7929  0.305 -0.809  0.0949 -0.946 -0.44998 -0.761 -0.724
#> [4,]  0.222  1.2529 -0.0905 -0.879 -0.274  1.0152  0.492 -1.48076 -0.213  1.332
#> [5,]  0.872 -1.2783  1.0110 -1.225  0.904  1.0819 -1.243  0.34144  0.919  0.404
#>       [,31]  [,32]   [,33]  [,34]  [,35]  [,36]  [,37]  [,38]
#> [1,] -0.333 -0.842 -0.3760  1.529  0.439 -1.227 -0.235 -0.562
#> [2,]  0.649  1.103 -1.1518 -0.277 -1.369 -0.951  1.702  1.685
#> [3,]  1.370 -1.343  1.5827  0.355  0.457 -1.509 -1.427  0.779
#> [4,]  0.179  1.544 -0.0281 -0.199 -0.923 -0.524  0.406  0.515
#> [5,]  0.138 -0.470  1.4224  0.271 -0.424  1.090  0.290  1.585
#> 
#> $Z
#>        [,1]   [,2]    [,3]   [,4]    [,5]   [,6]    [,7]     [,8]    [,9]
#> [1,] -0.178  0.838 -0.6811 -0.752 -0.0552 -0.076 -0.0308 -0.19838  0.7404
#> [2,]  0.843  0.372 -0.0235 -0.652 -0.1140 -0.531  0.7292  0.74028  0.0757
#> [3,]  0.432  0.231  0.8555  0.624  0.2864  0.583  0.5691  0.00112 -0.6749
#> [4,] -0.288 -0.297  0.6028  0.536 -0.8658  0.142 -0.0209  0.53635  0.8298
#>       [,10]  [,11]  [,12]  [,13]   [,14]   [,15]  [,16]   [,17]  [,18]  [,19]
#> [1,] 0.0646 -0.657 -0.566  0.806 -0.1562  0.0875 -0.039 -0.7200  0.835  0.220
#> [2,] 0.6481 -0.232  0.215 -0.425  0.0812 -0.3133  0.378 -0.0546 -0.553 -0.365
#> [3,] 0.1781 -0.331  0.691  0.683 -0.8539 -0.8270 -0.257 -0.4874 -0.753  0.747
#> [4,] 0.7046  0.643  0.141 -0.308 -0.7163  0.7274 -0.711 -0.0990 -0.400  0.642
#>       [,20]  [,21]   [,22]   [,23]   [,24]  [,25]  [,26]  [,27]  [,28]   [,29]
#> [1,] -0.315 -0.395 -0.4401 -0.1950  0.0695 -0.402 -0.297 -0.862 -0.448  0.8003
#> [2,] -0.053 -0.498 -0.6587  0.0463  0.4262 -0.440 -0.551  0.852  0.764  0.6065
#> [3,] -0.323  0.307  0.3877  0.5228 -0.5408 -0.532 -0.758  0.346 -0.847 -0.6973
#> [4,] -0.779 -0.779  0.0879  0.8470 -0.1618 -0.320 -0.858  0.527 -0.784  0.0282
#>        [,30]    [,31]  [,32]  [,33]  [,34]   [,35]  [,36]  [,37]  [,38]
#> [1,] -0.4176 -0.60760  0.502 -0.816 -0.568 -0.3325 -0.085 -0.656 -0.828
#> [2,] -0.6609  0.00662  0.296  0.480 -0.538 -0.3001  0.750  0.375 -0.358
#> [3,]  0.1750  0.25447  0.124 -0.367  0.129 -0.0108  0.711  0.751 -0.159
#> [4,]  0.0145  0.08370 -0.802  0.500  0.263 -0.0894 -0.637  0.416 -0.677
#> 
#> $y
#>  [1] 1 1 0 0 1 0 0 0 0 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1
#> 
#> $u
#> [1] -0.170  0.427  0.918 -2.080
#> 
#> $Sigma_inv
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    0    0    0
#> [2,]    0    1    0    0
#> [3,]    0    0    1    0
#> [4,]    0    0    0    1

We work with \(\mat X_i^\top\) and \(\mat Z_i^\top\) for computational reasons. The function we need to pass to psqn needs to take three arguments:

An index of the element function.
A vector with \(\vec x_{\mathcal I_i}\). It will have length zero if the backend requests an integer vector with \(p\) and \(q_i\).
A logical variable which is TRUE if the function should return an attribute with the gradient with respect to \(\vec x_{\mathcal I_i}\).

The function should return the element function value (potentially with the gradient as an attribute) or \(p\) and \(q_i\). Thus, an example in our case will be:

r_func <- function(i, par, comp_grad){
  dat <- sim_dat[[i]]
  X <- dat$X
  Z <- dat$Z
  
  if(length(par) < 1)
    # requested the dimension of the parameter
    return(c(global_dim = NROW(dat$X), private_dim = NROW(dat$Z)))
  
  y <- dat$y
  Sigma_inv <- dat$Sigma_inv
  
  beta <- par[1:p]
  u_i <- par[1:q + p]
  eta <- drop(beta %*% X + u_i %*% Z)
  exp_eta <- exp(eta)
  
  # compute the element function
  out <- -sum(y * eta) + sum(log(1 + exp_eta)) + 
    sum(u_i * (Sigma_inv %*% u_i)) / 2
  
  if(comp_grad){
    # we also need to compute the gradient
    d_eta <- -y + exp_eta / (1 + exp_eta)
    grad <- c(X %*% d_eta, 
              Z %*% d_eta + dat$Sigma_inv %*% u_i)
    attr(out, "grad") <- grad
  }
  
  out
}

Then we can optimize the function as follows:

library(psqn)
start_val <- numeric(p + n_ele_func * q) # the starting value
opt_res <- psqn(par = start_val, fn = r_func, n_ele_func = n_ele_func)

# check the minimum
opt_res$value
#> [1] 11969

# check the estimated global parameters
head(opt_res$par, length(beta))
#> [1] 0.254 0.356 0.410 0.520 0.564

# should be close to 
beta
#> [1] 0.258 0.365 0.447 0.516 0.577

The R Interface for optimizer_generic

We can also use the psqn_generic function although it will be slower because of some additional computational overhead because the function is more general. The function we need to pass to psqn_generic needs to take three arguments:

An index of the element function.
A vector with \(\vec x_{\mathcal I_i}\). This time, we make no assumptions about the index sets, the \(\mathcal I_i\)s. Thus, the argument will have length zero if the backend requests an integer vector with \(\mathcal I_i\).
A logical variable which is TRUE if the function should return an attribute with the gradient with respect to \(\vec x_{\mathcal I_i}\).

We assign the function we need to pass to psqn_generic for the example in this vignette:

r_func_generic <- function(i, par, comp_grad){
  dat <- sim_dat[[i]]
  X <- dat$X
  Z <- dat$Z
  
  if(length(par) < 1)
    # return the index set. This is one-based like in R
    return(c(1:NROW(dat$X), 
             seq_len(NROW(dat$Z)) + NROW(dat$X) + (i - 1L) * NROW(dat$Z)))
  
  y <- dat$y
  Sigma_inv <- dat$Sigma_inv
  
  beta <- par[1:p]
  u_i <- par[1:q + p]
  eta <- drop(beta %*% X + u_i %*% Z)
  exp_eta <- exp(eta)
  
  # compute the element function
  out <- -sum(y * eta) + sum(log(1 + exp_eta)) + 
    sum(u_i * (Sigma_inv %*% u_i)) / 2
  
  if(comp_grad){
    # we also need to compute the gradient
    d_eta <- -y + exp_eta / (1 + exp_eta)
    grad <- c(X %*% d_eta, 
              Z %*% d_eta + dat$Sigma_inv %*% u_i)
    attr(out, "grad") <- grad
  }
  
  out
}

Then we can optimize the function as follows:

opt_res_generic <- psqn_generic(
  par = start_val, fn = r_func_generic, n_ele_func = n_ele_func)

# we get the same
all.equal(opt_res_generic$value, opt_res$value)
#> [1] TRUE
all.equal(opt_res_generic$par  , opt_res$par)
#> [1] TRUE

# the generic version is slower
bench::mark(
  psqn = psqn(par = start_val, fn = r_func, n_ele_func = n_ele_func), 
  psqn_generic = psqn_generic(
  par = start_val, fn = r_func_generic, n_ele_func = n_ele_func), 
  min_iterations = 5)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#>   expression        min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 psqn            287ms    290ms      3.42    30.9MB     13.7
#> 2 psqn_generic    296ms    301ms      3.32    30.9MB     13.3

Ending Remarks

The package can also be used as a header-only library in C++. This can yield a very large reduction in computation time and be easy to implement with the Rcpp package. Two examples are shown in the psqn vignette (see vignette("psqn", package = "psqn")).

There is also a BFGS implementation in the package. This can be used in R with the psqn_bfgs function. The BFGS implementation can also be used in C++ using the psqn-bfgs.h file.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.