The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Chapter 09: Generic OpenCL Kernel Runners (openclPort layer)

Kjell Nygren

2026-06-11

The openclPort namespace

The openclPort namespace (declared in openclPort.h and implemented in opencl_kernel_runners.cpp) provides the generic OpenCL infrastructure that underlies all of nmathopencl’s GPU execution. It is designed to be completely independent of any particular mathematical library: it knows only about OpenCL C types and standard C++ types, not about nmath, distributions, or R data structures.

This namespace is currently shipped inside nmathopencl to make the package self-contained. It will migrate to a standalone openclPort CRAN package in a future release (see Chapter 00 for the release roadmap). The design described here will remain stable during that migration; downstream packages using these utilities via #include "openclPort.h" will require only a change in their LinkingTo field.

opencl_dbl_scalar_kernel_runner

This is the single generic runner that powers every kernel in nmathopencl.

Declaration

// openclPort.h (within namespace openclPort, guarded by #ifdef USE_OPENCL)
void opencl_dbl_scalar_kernel_runner(
    const std::string&         kernel_source,
    const char*                kernel_name,
    const std::vector<double>& dargs,
    int                        n_out,
    std::vector<double>&       out_flat
);

Arguments

Parameter Type Description
kernel_source const std::string& Complete, concatenated OpenCL program source (shims + library + kernel). Must be syntactically valid OpenCL C.
kernel_name const char* Name of the __kernel void function to execute, e.g., "dnorm_kernel".
dargs const std::vector<double>& Scalar double arguments to pass to the kernel, in the order the kernel declares them.
n_out int Number of double values to read back from the output buffer. Also becomes the global work size.
out_flat std::vector<double>& Output buffer; resized to n_out on entry and filled with the kernel’s results.

What it does

The runner performs the complete OpenCL lifecycle for a single kernel invocation:

  1. Platform and device selection — queries clGetPlatformIDs and clGetDeviceIDs to select the first available GPU device.
  2. Context and command queue — creates a cl_context and a cl_command_queue for the selected device.
  3. Program build — calls clCreateProgramWithSource followed by clBuildProgram. On failure, it retrieves the build log via CL_PROGRAM_BUILD_LOG and throws a std::runtime_error containing the compiler output.
  4. Kernel creation — calls clCreateKernel with the supplied kernel_name.
  5. Output buffer — allocates a CL_MEM_WRITE_ONLY device buffer of size n_out * sizeof(double).
  6. Argument setting — sets each element of dargs as a scalar kernel argument (clSetKernelArg) in index order, then sets the output buffer as the next argument, and finally n_out as the last integer argument.
  7. Execution — enqueues the kernel with a 1-D global work size of n_out via clEnqueueNDRangeKernel.
  8. Result retrieval — blocks on clFinish then reads the output buffer back to host memory via clEnqueueReadBuffer.
  9. Resource cleanup — releases all OpenCL objects in reverse order of creation.

Error checking is performed after every OpenCL API call. Any non-CL_SUCCESS status causes the runner to clean up all already-created resources and throw a std::runtime_error with a message that includes the status code, its symbolic name, and the platform/device information.

Argument layout contract

The runner assumes the kernel is declared with the following argument signature:

__kernel void my_kernel(
    double arg0, double arg1, ..., double argN,
    __global double* out,
    int n_out
)

The dargs vector maps element-by-element to arg0 through argN. The runner always appends the output buffer and n_out as the last two arguments automatically. Kernels in inst/cl/src/ that accept vector inputs (e.g., dnorm_kernel) do not fit this layout and use a separate, more specialized runner in kernel_runners.cpp; the double-scalar runner handles the scalar-parameter kernels that produce replicated output.

Example: calling the runner from a kernel wrapper

// Inside a kernel_wrappers.cpp function (nmathopencl namespace):
namespace nmathopencl {

Rcpp::NumericVector dnorm_opencl_impl(
    int n, double mu, double sigma, bool give_log, bool verbose)
{
    std::string src = openclPort::load_kernel_library("nmath") + "\n"
                    + openclPort::load_kernel_source("src/dnorm_scalar_kernel.cl");

    std::vector<double> out(n);
    openclPort::opencl_dbl_scalar_kernel_runner(
        src,
        "dnorm_scalar_kernel",
        { mu, sigma, static_cast<double>(give_log) },  // dargs
        n,                                              // n_out
        out
    );
    return Rcpp::NumericVector(out.begin(), out.end());
}

} // namespace nmathopencl

Error-handling utilities

Five inline functions in openclPort.h provide consistent error reporting across all runners. Because they are inline, they are available to any translation unit that includes openclPort.h; downstream packages that LinkingTo: nmathopencl get them for free.

opencl_status_name

inline const char* opencl_status_name(cl_int status);

Returns the symbolic name of an OpenCL error code as a C string:

CL_SUCCESS, CL_DEVICE_NOT_FOUND, CL_BUILD_PROGRAM_FAILURE, ...

Returns "UNKNOWN_OR_VENDOR_SPECIFIC" for codes not in the table. Covers all error codes defined by the OpenCL 3.0 specification that are commonly encountered in practice.

opencl_status_hint

inline const char* opencl_status_hint(cl_int status);

Returns a plain-English diagnostic hint for the most actionable error codes:

Status Hint
CL_OUT_OF_RESOURCES Device/runtime resource limit exceeded (watchdog, register pressure, …)
CL_OUT_OF_HOST_MEMORY Host memory allocation failed
CL_MEM_OBJECT_ALLOCATION_FAILURE Device memory allocation failed
CL_BUILD_PROGRAM_FAILURE Kernel compilation failed; inspect build log
CL_INVALID_CONTEXT OpenCL context is invalid
CL_DEVICE_NOT_AVAILABLE Device present but temporarily unavailable

Returns "No additional hint available." for all other codes.

opencl_read_platform_info_str / opencl_read_device_info_str

inline std::string opencl_read_platform_info_str(
    cl_platform_id platform, cl_platform_info param);

inline std::string opencl_read_device_info_str(
    cl_device_id device, cl_device_info param);

Query string-valued platform or device info (e.g., CL_PLATFORM_NAME, CL_DEVICE_NAME, CL_DRIVER_VERSION) safely: they handle null handles, query failures, and trailing null terminators, returning "unknown" on any error.

Typical usage:

std::string name = openclPort::opencl_read_device_info_str(
    device, CL_DEVICE_NAME);
std::string vendor = openclPort::opencl_read_platform_info_str(
    platform, CL_PLATFORM_VENDOR);

opencl_make_context_error

inline std::runtime_error opencl_make_context_error(
    cl_int status, cl_platform_id platform, cl_device_id device);

Constructs a std::runtime_error with a message of the form:

OpenCL error at clCreateContext (status=-6, name=CL_OUT_OF_HOST_MEMORY).
platform_name=NVIDIA CUDA, platform_vendor=NVIDIA Corporation,
device_name=NVIDIA GeForce RTX 4090, driver_version=560.94.
This may indicate a transient driver/runtime context failure.

Used inside opencl_dbl_scalar_kernel_runner and f2_f3_kernel_runner immediately after a clCreateContext failure.

Other utilities in openclPort

Rcpp -> std::vector conversion

std::vector<double> openclPort::flattenMatrix(const Rcpp::NumericMatrix& mat);
std::vector<double> openclPort::copyVector(const Rcpp::NumericVector& vec);

flattenMatrix converts a column-major R matrix to a row-major (or simply contiguous) std::vector<double> suitable for passing as a device buffer. copyVector does the same for a numeric vector. Both are implemented in OpenCL_helper.cpp.

Device probing (C++ / openclPort)

At the R prompt, host GPU inventory uses opencltools::gpu_names() (see Chapters 00–01). Inside the package DLL, openclPort exposes related C++ helpers for kernel runners:

Rcpp::CharacterVector openclPort::gpu_names();
int                   openclPort::detect_num_gpus_internal();
bool                  openclPort::has_opencl();
int                   openclPort::get_opencl_core_count();

OpenCL build configuration

struct openclPort::OpenCLConfig {
    bool have_expm1;
    bool have_log1p;
    std::string buildOptions;
};

openclPort::OpenCLConfig openclPort::configureOpenCL(
    cl_context context, cl_device_id device);

configureOpenCL probes the device to determine which standard math functions are available as OpenCL built-ins (specifically expm1 and log1p). It returns a set of buildOptions (e.g., -DHAVE_EXPM1 -DHAVE_LOG1P) that are passed to clBuildProgram so that the nmath .cl files can conditionally use the built-in or the ported version.

Using openclPort from a downstream package

A package that LinkingTo: nmathopencl can include openclPort.h directly:

// In a downstream package's source file:
#include "openclPort.h"

void my_kernel_runner(const std::string& program_source, int n) {
    std::vector<double> out;
    openclPort::opencl_dbl_scalar_kernel_runner(
        program_source,
        "my_kernel",
        { 1.0, 2.5 },   // scalar args
        n,
        out
    );
}

No additional CMake or autoconf work is needed beyond the LinkingTo field, because openclPort.h is installed to inst/include/ and the exported symbols are visible in the shared library.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.