The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
An OpenCL kernel is the entry point that executes on
the GPU. It is declared with the __kernel qualifier and is
compiled at runtime (by the OpenCL driver) from a source string
assembled by the host program. Each invocation of the kernel processes
one work-item — typically one element of an output array — in parallel
with all other work-items.
In nmathopencl, kernels live in
inst/cl/src/. Each file defines exactly one
__kernel void function. The file name follows the pattern
<function>_kernel.cl; for example,
dnorm_kernel.cl implements the vectorized density
computation for the normal distribution.
// dnorm_kernel.cl
// Vectorized wrapper kernel for the public Mathlib dnorm interface.
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
__kernel void dnorm_kernel(
__global const double* x,
const double mu,
const double sigma,
const int give_log,
__global double* out,
const int n
) {
int i = get_global_id(0);
if (i >= n) return;
out[i] = dnorm4(x[i], mu, sigma, give_log);
}Key elements:
| Element | Purpose |
|---|---|
#pragma OPENCL EXTENSION cl_khr_fp64 : enable |
Activates double-precision arithmetic (required for all
nmathopencl kernels) |
__kernel void |
Marks this function as a GPU entry point |
__global const double* x |
Input buffer in device (global) memory |
const double mu |
Scalar parameter passed by value |
__global double* out |
Output buffer in device (global) memory |
const int n |
Work-item count guard |
get_global_id(0) |
Returns this work-item’s index in the first dimension |
dnorm4(...) |
Calls the nmath library function, defined in the concatenated library source |
The if (i >= n) return; guard handles the case where
the global work size is rounded up to a multiple of the local work-group
size, leaving excess work-items that must not write out-of-bounds.
All kernels in inst/cl/src/ follow a consistent argument
layout designed to work with
openclPort::opencl_dbl_scalar_kernel_runner:
kernel(double arg0, double arg1, ..., double argN,
__global double* out, int n_out)
Scalar double parameters come first (in any order needed
by the underlying nmath function), followed by the output buffer and its
length. Input vectors are also represented as
__global const double* and precede the scalar arguments
when present.
*_ex_kernel.cl variantsSome distributions expose a secondary interface. The
bessel_*_ex_kernel.cl files, for example, provide a version
that returns both the function value and a sign/exponent component
simultaneously. These exist to mirror the extended interfaces in nmath
(bessel_i vs bessel_i with expo
argument) and are distinguished by the _ex_ infix.
The ~130 kernel files in inst/cl/src/ cover every
function exposed by the nmathopencl R API:
dbeta_kernel.cl, dbinom_kernel.cl,
dbinom_raw_kernel.cl, dcauchy_kernel.cl,
dchisq_kernel.cl, dexp_kernel.cl,
df_kernel.cl, dgamma_kernel.cl,
dgeom_kernel.cl, dhyper_kernel.cl,
dlnorm_kernel.cl, dlogis_kernel.cl,
dnbeta_kernel.cl, dnbinom_kernel.cl,
dnbinom_mu_kernel.cl, dnchisq_kernel.cl,
dnf_kernel.cl, dnorm_kernel.cl,
dnt_kernel.cl, dpois_kernel.cl,
dpois_raw_kernel.cl, dsignrank_kernel.cl,
dt_kernel.cl, dunif_kernel.cl,
dweibull_kernel.cl, dwilcox_kernel.cl
pbeta_kernel.cl, pbinom_kernel.cl,
pcauchy_kernel.cl, pchisq_kernel.cl,
pexp_kernel.cl, pf_kernel.cl,
pgamma_kernel.cl, pgeom_kernel.cl,
phyper_kernel.cl, plnorm_kernel.cl,
plogis_kernel.cl, pnbeta_kernel.cl,
pnbinom_kernel.cl, pnbinom_mu_kernel.cl,
pnchisq_kernel.cl, pnf_kernel.cl,
pnorm_kernel.cl, pnt_kernel.cl,
ppois_kernel.cl, psignrank_kernel.cl,
pt_kernel.cl, ptukey_kernel.cl,
punif_kernel.cl, pweibull_kernel.cl,
pwilcox_kernel.cl
qbeta_kernel.cl, qbinom_kernel.cl,
qcauchy_kernel.cl, qchisq_kernel.cl,
qexp_kernel.cl, qf_kernel.cl,
qgamma_kernel.cl, qgeom_kernel.cl,
qhyper_kernel.cl, qlnorm_kernel.cl,
qlogis_kernel.cl, qnbeta_kernel.cl,
qnbinom_kernel.cl, qnbinom_mu_kernel.cl,
qnchisq_kernel.cl, qnf_kernel.cl,
qnorm_kernel.cl, qnt_kernel.cl,
qpois_kernel.cl, qsignrank_kernel.cl,
qt_kernel.cl, qtukey_kernel.cl,
qunif_kernel.cl, qweibull_kernel.cl,
qwilcox_kernel.cl
rbeta_kernel.cl, rbinom_kernel.cl,
rcauchy_kernel.cl, rchisq_kernel.cl,
rexp_kernel.cl, rf_kernel.cl,
rgamma_kernel.cl, rgeom_kernel.cl,
rhyper_kernel.cl, rlnorm_kernel.cl,
rlogis_kernel.cl, rmultinom_kernel.cl,
rnbinom_kernel.cl, rnbinom_mu_kernel.cl,
rnchisq_kernel.cl, rnorm_kernel.cl,
rpois_kernel.cl, rsignrank_kernel.cl,
rt_kernel.cl, runif_kernel.cl,
rweibull_kernel.cl, rwilcox_kernel.cl
beta_special_kernel.cl,
choose_special_kernel.cl,
lbeta_special_kernel.cl,
lchoose_special_kernel.cl, gammafn_kernel.cl,
lgammafn_kernel.cl, lgammafn_sign_kernel.cl,
lgamma1p_kernel.cl, digamma_kernel.cl,
trigamma_kernel.cl, psigamma_kernel.cl,
pentagamma_kernel.cl, tetragamma_kernel.cl,
dpsifn_kernel.cl
fmax2_kernel.cl, fmin2_kernel.cl,
fprec_kernel.cl, fround_kernel.cl,
fsign_kernel.cl, ftrunc_kernel.cl,
imax2_kernel.cl, imin2_kernel.cl,
sign_kernel.cl, r_pow_kernel.cl,
r_pow_di_kernel.cl, pow1p_kernel.cl,
log1pmx_kernel.cl, log1pexp_kernel.cl,
log1mexp_kernel.cl, logspace_add_kernel.cl,
logspace_sub_kernel.cl,
logspace_sum_kernel.cl
norm_rand_kernel.cl, unif_rand_kernel.cl,
exp_rand_kernel.cl, r_unif_index_kernel.cl
r_check_user_interrupt_kernel.cl,
r_check_stack_kernel.cl
bessel_i_kernel.cl, bessel_i_ex_kernel.cl,
bessel_j_kernel.cl, bessel_j_ex_kernel.cl,
bessel_k_kernel.cl, bessel_k_ex_kernel.cl,
bessel_y_kernel.cl, bessel_y_ex_kernel.cl
Kernels are never compiled in isolation. The host-side runner assembles a complete OpenCL program string by concatenating:
opencltools::load_kernel_library("nmath", package = "nmathopencl"))opencltools::load_kernel_source("src/<name>_kernel.cl", package = "nmathopencl"))This concatenated string is then passed to
clBuildProgram. Because
opencltools::load_kernel_library() performs topological
sorting, the library functions are always defined before the kernel
refers to them. Chapter 08 covers this assembly process in detail.
To write a kernel that calls an nmathopencl nmath
function:
dgamma).@provides tag in the corresponding
.cl file to confirm the exact C signature.__kernel void function following the standard
layout.@all_depends metadata as a guide (see Chapter 08).Chapter 10 works through this process end-to-end for the
glmbayes GLM log-posterior kernels.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.