The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
\[\renewcommand\vec{\boldsymbol} \def\bigO#1{\mathcal{O}(#1)} \def\Cond#1#2{\left(#1\,\middle|\, #2\right)} \def\mat#1{\boldsymbol{#1}} \def\der{{\mathop{}\!\mathrm{d}}} \def\argmax{\text{arg}\,\text{max}} \def\Prob{\text{P}} \def\diag{\text{diag}} \def\argmin{\text{arg}\,\text{min}} \def\Expe{\text{E}}\]
This is a quick introduction to the psqn package. A more detailed description can be found in the psqn vignette (call vignette("psqn", package = "psqn")
). The main function in the package is the psqn
function. The psqn
function minimizes functions which can be written like:
\[f(\vec x) = \sum_{i = 1}^n f_i(\vec x_{\mathcal I_i})\]
where \(\vec x\in \mathbb R^l\),
\[\vec x_{\mathcal I_i} = (\vec e_{j_{i1}}^\top, \dots ,\vec e_{j_{im_i}}^\top)\vec x, \qquad \mathcal I_i = (j_{i1}, \dots, \mathcal j_{im_i}) \subseteq \{1, \dots, l\},\]
and \(\vec e_k\) is the \(k\)’th column of the \(l\) dimensional identity matrix. We call the \(f_i\)s element functions and assume that each of them only depend on a small number of variables. Furthermore, we assume that each index set \(\mathcal I_i\) is of the form:
\[\begin{align*} \mathcal I_i &= \{1,\dots, p\} \cup \mathcal J_i \\ \mathcal J_i \cap \mathcal J_j &= \emptyset \qquad j\neq i \\ \mathcal J_i \cap \{1,\dots, p\} &= \emptyset \qquad \forall i = 1,\dots, n \end{align*}.\]
That is, each index set contains \(p\) global parameters and \(q_i = \lvert\mathcal J_i\rvert\) private parameters which are particular for each element function, \(f_i\). For implementation reason, we let:
\[\begin{align*} \overleftarrow q_i &= \begin{cases} p & i = 0 \\ p + \sum_{k = 1}^i q_k & i > 0 \end{cases} \\ \mathcal J_i &= \{1 + \overleftarrow q_{i - 1}, \dots , q_i + \overleftarrow q_{i - 1}\} \end{align*}\]
such that the element functions’ private parameters lies in consecutive parts of \(\vec x\). There is also a less restricted optimizer called optimizer_generic
were the parameters can overlap in an arbitrary way. The R interface for this function is implemented in the psqn_generic
function. See vignette("psqn", package = "psqn")
for further details on both the psqn
and the psqn_generic
function.
As a simple example, we consider the element functions:
\[ f_i((\vec\beta^\top, \vec u^\top_i)^\top) = -\vec y_i(\mat X_i\vec\beta + \mat Z_i\vec u_i) + \sum_{k = 1}^{t_i} \log(1 + \exp(\vec x_{ik}^\top\vec\beta + \vec z_{ik}^\top\vec u_i)) + \frac 12 \vec u^\top_i\mat\Sigma^{-1} \vec u_i. \]
\(\vec\beta\) is the \(p\) dimensional global parameter and \(\vec u_i\) is the \(q_i = q\) dimensional private parameters for the \(i\)th element function. \(\vec y_i\in \{0, 1\}^{t_i}\), \(\mat X_i\in\mathbb R^{t_i\times p}\), and \(\mat Z_i\in\mathbb R^{t_i\times q}\) are particular to each element function. We simulate some data below to use:
# assign global parameters, number of private parameters, etc.
q <- 4 # number of private parameters per cluster
p <- 5 # number of global parameters
beta <- sqrt((1:p) / sum(1:p))
Sigma <- diag(q)
# simulate a data set
n_ele_func <- 1000L # number of element functions
set.seed(80919915)
sim_dat <- replicate(n_ele_func, {
t_i <- sample.int(40L, 1L) + 2L
X <- matrix(runif(p * t_i, -sqrt(6 / 2), sqrt(6 / 2)),
p)
u <- drop(rnorm(q) %*% chol(Sigma))
Z <- matrix(runif(q * t_i, -sqrt(6 / 2 / q), sqrt(6 / 2 / q)),
q)
eta <- drop(beta %*% X + u %*% Z)
y <- as.numeric((1 + exp(-eta))^(-1) > runif(t_i))
list(X = X, Z = Z, y = y, u = u, Sigma_inv = solve(Sigma))
}, simplify = FALSE)
# data for the first element function
sim_dat[[1L]]
#> $X
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 0.147 -1.246 0.840 -1.083 0.701 -1.6532 0.549 1.522 -0.449 0.254
#> [2,] -0.241 1.391 -0.873 -0.877 1.005 1.5401 -0.207 -0.999 1.249 1.379
#> [3,] -1.713 0.698 1.550 -1.335 0.687 0.0999 0.688 0.493 -0.992 0.780
#> [4,] 1.116 1.687 0.557 -1.380 0.294 1.2391 -1.331 -0.459 0.262 -1.351
#> [5,] -0.391 1.310 -1.477 -0.836 -1.542 1.3278 -0.788 -0.675 -1.184 0.208
#> [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,] 0.3633 -0.578 1.5378 1.488 0.653 1.707 -1.4558 -1.396 0.58917 1.473
#> [2,] -0.4122 -0.616 -1.2711 0.256 -1.494 0.615 -0.4410 0.114 -0.56704 -0.261
#> [3,] 0.0818 -0.272 -1.4706 1.060 -0.959 -1.141 0.0916 -0.928 1.68352 -0.155
#> [4,] -1.4245 1.716 -0.9433 0.428 1.670 -0.254 -0.1064 -0.245 0.00692 0.161
#> [5,] 1.6009 1.628 0.0971 -0.818 0.402 -0.497 -1.3034 0.636 0.72653 -0.425
#> [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
#> [1,] -1.086 -0.8711 1.2213 0.698 0.721 0.9319 -0.326 -0.00238 -1.164 0.203
#> [2,] 0.397 1.1903 -0.3113 -0.837 1.501 -0.0304 1.509 -0.17466 0.547 -0.667
#> [3,] 0.440 0.0235 -0.7929 0.305 -0.809 0.0949 -0.946 -0.44998 -0.761 -0.724
#> [4,] 0.222 1.2529 -0.0905 -0.879 -0.274 1.0152 0.492 -1.48076 -0.213 1.332
#> [5,] 0.872 -1.2783 1.0110 -1.225 0.904 1.0819 -1.243 0.34144 0.919 0.404
#> [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38]
#> [1,] -0.333 -0.842 -0.3760 1.529 0.439 -1.227 -0.235 -0.562
#> [2,] 0.649 1.103 -1.1518 -0.277 -1.369 -0.951 1.702 1.685
#> [3,] 1.370 -1.343 1.5827 0.355 0.457 -1.509 -1.427 0.779
#> [4,] 0.179 1.544 -0.0281 -0.199 -0.923 -0.524 0.406 0.515
#> [5,] 0.138 -0.470 1.4224 0.271 -0.424 1.090 0.290 1.585
#>
#> $Z
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#> [1,] -0.178 0.838 -0.6811 -0.752 -0.0552 -0.076 -0.0308 -0.19838 0.7404
#> [2,] 0.843 0.372 -0.0235 -0.652 -0.1140 -0.531 0.7292 0.74028 0.0757
#> [3,] 0.432 0.231 0.8555 0.624 0.2864 0.583 0.5691 0.00112 -0.6749
#> [4,] -0.288 -0.297 0.6028 0.536 -0.8658 0.142 -0.0209 0.53635 0.8298
#> [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19]
#> [1,] 0.0646 -0.657 -0.566 0.806 -0.1562 0.0875 -0.039 -0.7200 0.835 0.220
#> [2,] 0.6481 -0.232 0.215 -0.425 0.0812 -0.3133 0.378 -0.0546 -0.553 -0.365
#> [3,] 0.1781 -0.331 0.691 0.683 -0.8539 -0.8270 -0.257 -0.4874 -0.753 0.747
#> [4,] 0.7046 0.643 0.141 -0.308 -0.7163 0.7274 -0.711 -0.0990 -0.400 0.642
#> [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29]
#> [1,] -0.315 -0.395 -0.4401 -0.1950 0.0695 -0.402 -0.297 -0.862 -0.448 0.8003
#> [2,] -0.053 -0.498 -0.6587 0.0463 0.4262 -0.440 -0.551 0.852 0.764 0.6065
#> [3,] -0.323 0.307 0.3877 0.5228 -0.5408 -0.532 -0.758 0.346 -0.847 -0.6973
#> [4,] -0.779 -0.779 0.0879 0.8470 -0.1618 -0.320 -0.858 0.527 -0.784 0.0282
#> [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38]
#> [1,] -0.4176 -0.60760 0.502 -0.816 -0.568 -0.3325 -0.085 -0.656 -0.828
#> [2,] -0.6609 0.00662 0.296 0.480 -0.538 -0.3001 0.750 0.375 -0.358
#> [3,] 0.1750 0.25447 0.124 -0.367 0.129 -0.0108 0.711 0.751 -0.159
#> [4,] 0.0145 0.08370 -0.802 0.500 0.263 -0.0894 -0.637 0.416 -0.677
#>
#> $y
#> [1] 1 1 0 0 1 0 0 0 0 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1
#>
#> $u
#> [1] -0.170 0.427 0.918 -2.080
#>
#> $Sigma_inv
#> [,1] [,2] [,3] [,4]
#> [1,] 1 0 0 0
#> [2,] 0 1 0 0
#> [3,] 0 0 1 0
#> [4,] 0 0 0 1
We work with \(\mat X_i^\top\) and \(\mat Z_i^\top\) for computational reasons. The function we need to pass to psqn
needs to take three arguments:
TRUE
if the function should return an attribute with the gradient with respect to \(\vec x_{\mathcal I_i}\).The function should return the element function value (potentially with the gradient as an attribute) or \(p\) and \(q_i\). Thus, an example in our case will be:
r_func <- function(i, par, comp_grad){
dat <- sim_dat[[i]]
X <- dat$X
Z <- dat$Z
if(length(par) < 1)
# requested the dimension of the parameter
return(c(global_dim = NROW(dat$X), private_dim = NROW(dat$Z)))
y <- dat$y
Sigma_inv <- dat$Sigma_inv
beta <- par[1:p]
u_i <- par[1:q + p]
eta <- drop(beta %*% X + u_i %*% Z)
exp_eta <- exp(eta)
# compute the element function
out <- -sum(y * eta) + sum(log(1 + exp_eta)) +
sum(u_i * (Sigma_inv %*% u_i)) / 2
if(comp_grad){
# we also need to compute the gradient
d_eta <- -y + exp_eta / (1 + exp_eta)
grad <- c(X %*% d_eta,
Z %*% d_eta + dat$Sigma_inv %*% u_i)
attr(out, "grad") <- grad
}
out
}
Then we can optimize the function as follows:
library(psqn)
start_val <- numeric(p + n_ele_func * q) # the starting value
opt_res <- psqn(par = start_val, fn = r_func, n_ele_func = n_ele_func)
# check the minimum
opt_res$value
#> [1] 11969
# check the estimated global parameters
head(opt_res$par, length(beta))
#> [1] 0.254 0.356 0.410 0.520 0.564
# should be close to
beta
#> [1] 0.258 0.365 0.447 0.516 0.577
We can also use the psqn_generic
function although it will be slower because of some additional computational overhead because the function is more general. The function we need to pass to psqn_generic
needs to take three arguments:
TRUE
if the function should return an attribute with the gradient with respect to \(\vec x_{\mathcal I_i}\).We assign the function we need to pass to psqn_generic
for the example in this vignette:
r_func_generic <- function(i, par, comp_grad){
dat <- sim_dat[[i]]
X <- dat$X
Z <- dat$Z
if(length(par) < 1)
# return the index set. This is one-based like in R
return(c(1:NROW(dat$X),
seq_len(NROW(dat$Z)) + NROW(dat$X) + (i - 1L) * NROW(dat$Z)))
y <- dat$y
Sigma_inv <- dat$Sigma_inv
beta <- par[1:p]
u_i <- par[1:q + p]
eta <- drop(beta %*% X + u_i %*% Z)
exp_eta <- exp(eta)
# compute the element function
out <- -sum(y * eta) + sum(log(1 + exp_eta)) +
sum(u_i * (Sigma_inv %*% u_i)) / 2
if(comp_grad){
# we also need to compute the gradient
d_eta <- -y + exp_eta / (1 + exp_eta)
grad <- c(X %*% d_eta,
Z %*% d_eta + dat$Sigma_inv %*% u_i)
attr(out, "grad") <- grad
}
out
}
Then we can optimize the function as follows:
opt_res_generic <- psqn_generic(
par = start_val, fn = r_func_generic, n_ele_func = n_ele_func)
# we get the same
all.equal(opt_res_generic$value, opt_res$value)
#> [1] TRUE
all.equal(opt_res_generic$par , opt_res$par)
#> [1] TRUE
# the generic version is slower
bench::mark(
psqn = psqn(par = start_val, fn = r_func, n_ele_func = n_ele_func),
psqn_generic = psqn_generic(
par = start_val, fn = r_func_generic, n_ele_func = n_ele_func),
min_iterations = 5)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 psqn 287ms 290ms 3.42 30.9MB 13.7
#> 2 psqn_generic 296ms 301ms 3.32 30.9MB 13.3
The package can also be used as a header-only library in C++. This can yield a very large reduction in computation time and be easy to implement with the Rcpp package. Two examples are shown in the psqn vignette (see vignette("psqn", package = "psqn")
).
There is also a BFGS implementation in the package. This can be used in R with the psqn_bfgs
function. The BFGS implementation can also be used in C++ using the psqn-bfgs.h file.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.