The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The glmbayes package constructs envelopes for
accept–reject sampling by evaluating the negative log posterior (f2) and
its gradient (f3) at each point of a tangency grid. The
grid size grows with dimension (e.g., \(3^\ell\) points for \(\ell\) coefficients), and each grid point
can be evaluated independently. These evaluations are
embarrassingly parallel and are a natural target for
GPU acceleration.
The optional OpenCL implementation accelerates these
evaluations by running them on a GPU (or OpenCL-capable CPU). OpenCL is
vendor-neutral and works across NVIDIA, AMD, and Intel hardware. When
OpenCL is available and use_opencl = TRUE, the package uses
GPU-accelerated evaluation inside EnvelopeEval, which is
called by EnvelopeBuild and
EnvelopeDispersionBuild. If OpenCL is not available or
use_opencl = FALSE, the CPU path
(f2_f3_non_opencl) is used instead. The package runs
correctly in either case.
This chapter describes the OpenCL implementation: the two-layer design (kernel wrapper and kernel runner), how the OpenCL program is assembled from ported nmath/rmath functions, and how the pilot and safeguards integrate with the envelope build. For installation and availability checks, see Chapter 12.
The OpenCL path consists of two layers:
| Layer | Function | Location | Role |
|---|---|---|---|
| Wrapper | f2_f3_opencl |
kernel_wrappers.cpp |
Flatten R inputs, select kernel, load sources, call runner, reshape outputs |
| Runner | f2_f3_kernel_runner |
kernel_runners.cpp |
Platform/device setup, buffer creation, kernel launch, readback |
EnvelopeEval (EnvelopeEval.cpp)
|-- if use_opencl && pilot/check passes:
| f2_f3_opencl() -> f2_f3_kernel_runner()
\-- else:
f2_f3_non_opencl() -> CPU famfuncs (f2_f3_binomial_logit, etc.)
EnvelopeEval receives the grid G4
(coefficients at each tangency point), the design matrix, prior
parameters, and family/link. It chooses the GPU or CPU path based on
use_opencl and the outcome of the pilot (when applicable).
The GPU path calls f2_f3_opencl, which assembles the OpenCL
program and invokes f2_f3_kernel_runner; the CPU path calls
f2_f3_non_opencl, which dispatches to the appropriate C++
family function.
OpenCL kernels are not stored as single monolithic files. Instead, the package builds the program by concatenating several source components in a fixed order. This mirrors a C/C++ build where headers and libraries are included before the main source.
The full program string is:
all_src = OPENCL.cl
+ rmath (load_kernel_library("rmath"))
+ dpq (load_kernel_library("dpq"))
+ nmath (load_kernel_library("nmath"))
+ kernel file (load_kernel_source, e.g. "src/f2_f3_binomial_logit.cl")
cl_khr_fp64, cl_khr_printf), IEEE constants
(ML_NAN, ML_POSINF), feature detection for
expm1/log1p, and utility macros.R_D__0, R_DT_val, etc.) used in
give_log/lower_tail logic.f2_f3_binomial_logit.cl) that computes f2 and f3 for each
grid point.load_kernel_source(relative_path) –
Loads a single .cl file from inst/cl/ via
system.file("cl", relative_path, package = "glmbayes").load_kernel_library(subdir) – Loads
all .cl files in a subdirectory (e.g.,
"nmath", "rmath", "dpq"), parses
@provides and @depends annotations, performs a
dependency-aware topological sort, and concatenates the
files so that dependents appear after their dependencies.The @provides and @depends tags allow
modular, maintainable kernel code: each file declares what it provides
and what it needs. See ?load_kernel_library for
details.
The likelihood and gradient computations require statistical functions that match R’s behavior. Because OpenCL C does not include these, the package ports a core set from R’s nmath and rmath libraries to OpenCL C.
| Module | Provides | Purpose |
|---|---|---|
| nmath.cl | ML_, ME_, ISNAN, R_FINITE, ML_ERROR, etc. | Constants, error handling, validation macros |
| log1p.cl | Rlog1p | Log(1+x) for numerical stability |
| expm1.cl | expm1 | exp(x)-1 for numerical stability |
| bd0.cl | bd0, ebd0, etc. | Poisson/binomial deviance terms |
| stirlerr.cl | stirlerr | Stirling error for factorial terms |
| lgamma.cl | lgammafn | Log-gamma |
| gamma.cl | gammafn | Gamma function |
| dbinom.cl | dbinom, dbinom_raw | Binomial density |
| dpois.cl | dpois, dpois_raw | Poisson density |
| dgamma.cl | dgamma | Gamma density |
| dnorm.cl | dnorm4 | Normal density |
| pnorm.cl | pnorm5, pnorm_both | Normal CDF (probit) |
| pgamma.cl | pgamma | Gamma CDF (inverse link) |
| d1mach.cl | d1mach | Machine constants |
| chebyshev.cl | chebyshev_eval | Polynomial evaluation |
Rmath.cl provides additional constants (M_E, M_PI,
M_LN2, etc.) and distribution functions consistent with R’s
Rmath library.
dpq.cl and dpq_prelude.cl provide macros
for density and CDF handling with give_log and
lower_tail, matching R’s DPQ (density, probability,
quantile) conventions.
These ports ensure that the OpenCL kernels produce results numerically consistent with the CPU path and with R itself.
Each supported family/link has a dedicated kernel file that implements the f2/f3 logic for that model.
| Family | Link | Kernel file |
|---|---|---|
| binomial | logit | f2_f3_binomial_logit.cl |
| binomial | probit | f2_f3_binomial_probit.cl |
| binomial | cloglog | f2_f3_binomial_cloglog.cl |
| poisson | log | f2_f3_poisson.cl |
| Gamma | inverse | f2_f3_gamma.cl |
| gaussian | identity | f2_f3_gaussian.cl |
Each kernel follows the same pattern:
int j = get_global_id(0); one work item per grid point
\(j\).qf[j].dbinom_raw, dpois_raw, lgamma,
pnorm5, etc.) as needed.qf[j] = negative log
posterior; grad[k*m1 + j] = gradient (column-major
layout).The kernels use a fixed MAX_L2 for local arrays (e.g.,
64); this limits the number of coefficients when using OpenCL. See the
source for current limits.
The wrapper f2_f3_opencl in
kernel_wrappers.cpp performs:
(family, link)
to kernel_name and kernel_file (e.g., binomial
+ logit → f2_f3_binomial_logit,
src/f2_f3_binomial_logit.cl).all_src.f2_f3_kernel_runner(all_src, kernel_name, l1, l2, m1, ...)
with flattened inputs and output buffers.qf_flat into
an R vector; wrap grad_flat as an Armadillo matrix (m1 ×
l2) for return.The returned list has components qf (negative log
posterior per grid point) and grad (gradient matrix).
The runner in kernel_runners.cpp handles the low-level
OpenCL API:
clGetPlatformIDs, clGetDeviceIDs
(CL_DEVICE_TYPE_DEFAULT).clCreateContext,
clCreateCommandQueueWithProperties.clCreateProgramWithSource
with the concatenated string, clBuildProgram.clCreateKernel with the
kernel name.clEnqueueNDRangeKernel with
global = m1 (one work item per grid point).clEnqueueReadBuffer for qf
and grad.The runner is GLM-specific and lives in the
glmbayes::opencl namespace; the general OpenCL utilities
(kernel loading, device enumeration) are in openclPort.
For large grids (e.g., \(m_1 >
50{,}000\)), EnvelopeEval runs
f2_f3_opencl_pilot before the full evaluation to estimate
runtime.
f2_f3_opencl on a small
grid slice.refined_est_total_sec = per_grid_sec × m1.refined_est_total_sec > 300:
"Do you want to continue? [y/N]: ".This mirrors the parallel sampling pilots in Chapter A09. The envelope build phase cannot be interrupted once the full OpenCL run starts, so the pilot gives users a chance to decline long runs.
The pilot runs when m1_total > 50000 and
use_opencl is true. For smaller grids, the full evaluation
runs without a pilot.
OpenCL support is optional. The package compiles and
runs without it; in that case, use_opencl is ignored and
the CPU path is always used.
To enable OpenCL:
Use has_opencl() to check availability and
diagnose_glmbayes() for a full diagnostic report. See
Chapter 12 for platform-specific installation instructions (Windows,
Linux, macOS) and AMD ROCm notes.
has_opencl, diagnose_glmbayes?EnvelopeBuild,
?EnvelopeEval – use_opencl argument?load_kernel_library,
?load_kernel_source – Program
assembly?glmb,
?rglmb – use_opencl argumentThese binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.