The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Chapter 07: Kernels — Writing and Using OpenCL Kernel Files

Kjell Nygren

2026-06-11

What is a kernel?

An OpenCL kernel is the entry point that executes on the GPU. It is declared with the __kernel qualifier and is compiled at runtime (by the OpenCL driver) from a source string assembled by the host program. Each invocation of the kernel processes one work-item — typically one element of an output array — in parallel with all other work-items.

In nmathopencl, kernels live in inst/cl/src/. Each file defines exactly one __kernel void function. The file name follows the pattern <function>_kernel.cl; for example, dnorm_kernel.cl implements the vectorized density computation for the normal distribution.

Anatomy of a simple kernel

// dnorm_kernel.cl
// Vectorized wrapper kernel for the public Mathlib dnorm interface.

#pragma OPENCL EXTENSION cl_khr_fp64 : enable

__kernel void dnorm_kernel(
    __global const double* x,
    const double mu,
    const double sigma,
    const int give_log,
    __global double* out,
    const int n
) {
    int i = get_global_id(0);
    if (i >= n) return;

    out[i] = dnorm4(x[i], mu, sigma, give_log);
}

Key elements:

Element Purpose
#pragma OPENCL EXTENSION cl_khr_fp64 : enable Activates double-precision arithmetic (required for all nmathopencl kernels)
__kernel void Marks this function as a GPU entry point
__global const double* x Input buffer in device (global) memory
const double mu Scalar parameter passed by value
__global double* out Output buffer in device (global) memory
const int n Work-item count guard
get_global_id(0) Returns this work-item’s index in the first dimension
dnorm4(...) Calls the nmath library function, defined in the concatenated library source

The if (i >= n) return; guard handles the case where the global work size is rounded up to a multiple of the local work-group size, leaving excess work-items that must not write out-of-bounds.

Standard argument layout

All kernels in inst/cl/src/ follow a consistent argument layout designed to work with openclPort::opencl_dbl_scalar_kernel_runner:

kernel(double arg0, double arg1, ..., double argN,
       __global double* out, int n_out)

Scalar double parameters come first (in any order needed by the underlying nmath function), followed by the output buffer and its length. Input vectors are also represented as __global const double* and precede the scalar arguments when present.

The *_ex_kernel.cl variants

Some distributions expose a secondary interface. The bessel_*_ex_kernel.cl files, for example, provide a version that returns both the function value and a sign/exponent component simultaneously. These exist to mirror the extended interfaces in nmath (bessel_i vs bessel_i with expo argument) and are distinguished by the _ex_ infix.

Kernel file index

The ~130 kernel files in inst/cl/src/ cover every function exposed by the nmathopencl R API:

Density kernels

dbeta_kernel.cl, dbinom_kernel.cl, dbinom_raw_kernel.cl, dcauchy_kernel.cl, dchisq_kernel.cl, dexp_kernel.cl, df_kernel.cl, dgamma_kernel.cl, dgeom_kernel.cl, dhyper_kernel.cl, dlnorm_kernel.cl, dlogis_kernel.cl, dnbeta_kernel.cl, dnbinom_kernel.cl, dnbinom_mu_kernel.cl, dnchisq_kernel.cl, dnf_kernel.cl, dnorm_kernel.cl, dnt_kernel.cl, dpois_kernel.cl, dpois_raw_kernel.cl, dsignrank_kernel.cl, dt_kernel.cl, dunif_kernel.cl, dweibull_kernel.cl, dwilcox_kernel.cl

CDF kernels

pbeta_kernel.cl, pbinom_kernel.cl, pcauchy_kernel.cl, pchisq_kernel.cl, pexp_kernel.cl, pf_kernel.cl, pgamma_kernel.cl, pgeom_kernel.cl, phyper_kernel.cl, plnorm_kernel.cl, plogis_kernel.cl, pnbeta_kernel.cl, pnbinom_kernel.cl, pnbinom_mu_kernel.cl, pnchisq_kernel.cl, pnf_kernel.cl, pnorm_kernel.cl, pnt_kernel.cl, ppois_kernel.cl, psignrank_kernel.cl, pt_kernel.cl, ptukey_kernel.cl, punif_kernel.cl, pweibull_kernel.cl, pwilcox_kernel.cl

Quantile kernels

qbeta_kernel.cl, qbinom_kernel.cl, qcauchy_kernel.cl, qchisq_kernel.cl, qexp_kernel.cl, qf_kernel.cl, qgamma_kernel.cl, qgeom_kernel.cl, qhyper_kernel.cl, qlnorm_kernel.cl, qlogis_kernel.cl, qnbeta_kernel.cl, qnbinom_kernel.cl, qnbinom_mu_kernel.cl, qnchisq_kernel.cl, qnf_kernel.cl, qnorm_kernel.cl, qnt_kernel.cl, qpois_kernel.cl, qsignrank_kernel.cl, qt_kernel.cl, qtukey_kernel.cl, qunif_kernel.cl, qweibull_kernel.cl, qwilcox_kernel.cl

Random-variate kernels

rbeta_kernel.cl, rbinom_kernel.cl, rcauchy_kernel.cl, rchisq_kernel.cl, rexp_kernel.cl, rf_kernel.cl, rgamma_kernel.cl, rgeom_kernel.cl, rhyper_kernel.cl, rlnorm_kernel.cl, rlogis_kernel.cl, rmultinom_kernel.cl, rnbinom_kernel.cl, rnbinom_mu_kernel.cl, rnchisq_kernel.cl, rnorm_kernel.cl, rpois_kernel.cl, rsignrank_kernel.cl, rt_kernel.cl, runif_kernel.cl, rweibull_kernel.cl, rwilcox_kernel.cl

Special function kernels

beta_special_kernel.cl, choose_special_kernel.cl, lbeta_special_kernel.cl, lchoose_special_kernel.cl, gammafn_kernel.cl, lgammafn_kernel.cl, lgammafn_sign_kernel.cl, lgamma1p_kernel.cl, digamma_kernel.cl, trigamma_kernel.cl, psigamma_kernel.cl, pentagamma_kernel.cl, tetragamma_kernel.cl, dpsifn_kernel.cl

Math support kernels

fmax2_kernel.cl, fmin2_kernel.cl, fprec_kernel.cl, fround_kernel.cl, fsign_kernel.cl, ftrunc_kernel.cl, imax2_kernel.cl, imin2_kernel.cl, sign_kernel.cl, r_pow_kernel.cl, r_pow_di_kernel.cl, pow1p_kernel.cl, log1pmx_kernel.cl, log1pexp_kernel.cl, log1mexp_kernel.cl, logspace_add_kernel.cl, logspace_sub_kernel.cl, logspace_sum_kernel.cl

RNG core kernels

norm_rand_kernel.cl, unif_rand_kernel.cl, exp_rand_kernel.cl, r_unif_index_kernel.cl

Utility kernels

r_check_user_interrupt_kernel.cl, r_check_stack_kernel.cl

Bessel kernels (standard and extended)

bessel_i_kernel.cl, bessel_i_ex_kernel.cl, bessel_j_kernel.cl, bessel_j_ex_kernel.cl, bessel_k_kernel.cl, bessel_k_ex_kernel.cl, bessel_y_kernel.cl, bessel_y_ex_kernel.cl

Program assembly

Kernels are never compiled in isolation. The host-side runner assembles a complete OpenCL program string by concatenating:

  1. The nmath library source (via opencltools::load_kernel_library("nmath", package = "nmathopencl"))
  2. The kernel file (via opencltools::load_kernel_source("src/<name>_kernel.cl", package = "nmathopencl"))

This concatenated string is then passed to clBuildProgram. Because opencltools::load_kernel_library() performs topological sorting, the library functions are always defined before the kernel refers to them. Chapter 08 covers this assembly process in detail.

Writing a custom kernel

To write a kernel that calls an nmathopencl nmath function:

  1. Identify the nmath function you need (e.g., dgamma).
  2. Check its @provides tag in the corresponding .cl file to confirm the exact C signature.
  3. Write a __kernel void function following the standard layout.
  4. Load the nmath library subset your function needs using the @all_depends metadata as a guide (see Chapter 08).
  5. Append your kernel source after the library source in the assembled program string.

Chapter 10 works through this process end-to-end for the glmbayes GLM log-posterior kernels.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.