The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Chapter 05 described the internal structure of a kernel wrapper: how inputs are converted, how the runner is dispatched, and how results are converted back. This chapter takes a step back and looks at how kernel wrappers fit into the broader codebase of a package.
Two questions arise immediately:
What happens when OpenCL is not available? Every kernel wrapper must have a CPU path. A wrapper that simply returns zeros is safe but unhelpful; most real wrappers need to fall back to a correct CPU computation.
How is the wrapper exposed? Some kernel wrappers have a direct interface into R (callable from R code). Others are purely internal C++ components, called by other C++ functions that hold the R-facing API. The choice depends on whether the computation has a natural direct R use.
nmathopencl contains examples of both patterns. The
distribution-function wrappers (dnorm_opencl,
pnorm_opencl, etc.) are exported R functions with rich CPU
fallback logic. The GLM gradient wrapper (f2_f3_opencl) is
a purely internal C++ component, called by a C++ dispatcher that also
has a separate CPU implementation (f2_f3_non_opencl). Both
patterns are explored in detail below.
In this pattern the kernel wrapper (or a thin R function that calls
it) is exported and callable directly from R. The CPU fallback is the
equivalent computation using standard R or C functions — in
nmathopencl’s case, the stats:: distribution
functions.
R caller
│
▼
R wrapper function (exported, input validation, recycling)
│ if inputs are non-finite, sd == 0, etc. → fallback_full()
│
▼
.opencl_try_or_fallback()
│ if !nmathopencl_has_opencl() → fallback_expr() (CPU path)
│ if OpenCL call succeeds → return GPU result
│ if OpenCL call fails
│ and fallback = TRUE → fallback_expr() (CPU path)
│ and fallback = FALSE → propagate error
▼
C++ kernel wrapper (internal, not exported)
│ #ifdef USE_OPENCL + nmathopencl_has_opencl() guard
│ type conversion + program assembly + runner dispatch
▼
GPU result
The fallback can be triggered at two separate levels:
R level (before the C++ call): when input
validation detects a condition the GPU path cannot handle
(e.g. sd == 0, non-finite values).
fallback_full() calls
stats::dnorm(x, mean, sd, log = log) directly.
C++ / runtime level:
.opencl_try_or_fallback() checks
nmathopencl_has_opencl() before attempting the GPU path. If
OpenCL is not available it calls fallback_expr() without
ever touching the C++ kernel wrapper. If a GPU call throws an exception
and fallback = TRUE, it catches the error and calls
fallback_expr().
In this pattern the kernel wrapper has no direct R interface. It is
called from within a C++ dispatcher function alongside a CPU
counterpart. The R interface belongs to a higher-level function that
selects between the two based on a use_opencl flag passed
in by the caller.
R caller
│
▼
Exported R function (e.g. Ex_EnvelopeEval)
│ validates inputs, passes use_opencl flag
▼
.EnvelopeEval_cpp() (internal R → C++ bridge, [[Rcpp::export]])
▼
EnvelopeEval_cpp() (C++ dispatcher)
│ if use_opencl && nmathopencl_has_opencl()
│ → f2_f3_opencl() (OpenCL kernel wrapper)
│ else
│ → f2_f3_non_opencl() (pure C++ CPU implementation)
▼
Result (qf, grad) returned regardless of path taken
The two implementations — f2_f3_opencl and
f2_f3_non_opencl — share the same function signature and
return the same data structure. The caller cannot tell from the return
value which path was taken.
dnorm_opencldnorm_opencl in R/normal_opencl.R is the
user-facing function. It mirrors the interface of
stats::dnorm and adds opencl_parallel,
fallback, and verbose arguments.
# R/normal_opencl.R (simplified)
#' @export
dnorm_opencl <- function(x, mean = 0, sd = 1, log = FALSE,
opencl_parallel = NA, fallback = FALSE,
verbose = FALSE) {
# ── Input validation ──────────────────────────────────────────────────────
# These checks mirror stats::dnorm behavior.
if (!is.numeric(x)) stop("`x` must be numeric.")
if (!is.numeric(mean)) stop("`mean` must be numeric.")
if (!is.numeric(sd)) stop("`sd` must be numeric.")
if (length(x) == 0L) return(numeric(0))
# ── Recycling (like stats::dnorm) ─────────────────────────────────────────
len <- max(length(x), length(mean), length(sd))
xv <- rep_len(as.double(x), len)
mv <- rep_len(as.double(mean), len)
sv <- rep_len(as.double(sd), len)
logv <- rep_len(log, len)
# ── R-level fallback function ─────────────────────────────────────────────
# Called when inputs contain conditions the GPU path cannot handle,
# or when OpenCL is unavailable and fallback = TRUE.
fallback_full <- function() {
stats::dnorm(x, mean = mean, sd = sd, log = log)
}
# ── R-level conditions that force the CPU path ────────────────────────────
if (any(!is.finite(xv) | !is.finite(mv) | !is.finite(sv))) {
return(fallback_full()) # stats::dnorm handles NaN, Inf, NA
}
if (any(sv < 0)) {
stop("`sd` must be non-negative.", call. = FALSE)
}
if (any(sv == 0)) {
return(fallback_full()) # degenerate case; stats::dnorm handles it
}
# ── Dispatch: try GPU, fall back to CPU on failure if fallback = TRUE ─────
log_int <- as.integer(logv)
opc <- .encode_opencl_parallel(opencl_parallel)
.opencl_try_or_fallback(
opencl_expr = function() .dnorm_opencl(xv, mv, sv, log_int, opc, verbose),
fallback_expr = fallback_full,
fallback = fallback,
verbose = verbose,
fn_name = "dnorm_opencl"
)
}.dnorm_opencl (dot-prefixed) is the internal
Rcpp-exported symbol for the C++ kernel wrapper. It is not part of the
public API; it exists only to make the C++ function callable from R.
.opencl_try_or_fallback helperThis helper encapsulates the runtime dispatch logic that every Pattern 1 wrapper shares:
# R/opencl_linkage_utils.R
.opencl_try_or_fallback <- function(opencl_expr, fallback_expr,
fallback, verbose, fn_name) {
if (!nmathopencl_has_opencl()) {
# OpenCL not available in this build or session — go straight to CPU.
if (verbose)
message(sprintf("[%s] OpenCL unavailable; using CPU fallback.", fn_name))
return(fallback_expr())
}
# OpenCL available: try the GPU path.
out <- tryCatch(opencl_expr(), error = function(e) e)
if (inherits(out, "error")) {
if (fallback) {
# GPU call failed and the caller requested a fallback.
if (verbose) {
message(sprintf("[%s] OpenCL call failed; using CPU fallback.", fn_name))
message(out$message)
}
return(fallback_expr())
}
stop(out$message, call. = FALSE) # no fallback requested — propagate error
}
out # GPU call succeeded
}The design makes the fallback behavior explicit and controllable:
fallback = FALSE (default): if the GPU call fails, the
error propagates to the caller. The caller sees an actual error rather
than silently receiving CPU results.fallback = TRUE: if the GPU call fails, the CPU path is
used transparently. Useful in batch workflows where any result is better
than an error.The C++ kernel wrapper .dnorm_opencl is exported to R
via // [[Rcpp::export(name = ".dnorm_opencl")]]. It is the
minimal C++ entry point: it converts inputs, runs the GPU path if
available, and returns zeros if not.
// src/kernel_wrappers.cpp (within nmathopencl namespace)
// [[Rcpp::export(name = ".dnorm_opencl")]]
Rcpp::NumericVector dnorm_opencl(
const Rcpp::NumericVector& x,
const Rcpp::NumericVector& mean,
const Rcpp::NumericVector& sd,
const Rcpp::IntegerVector& give_log,
int opencl_parallel_code,
bool verbose
) {
const int len = x.size();
Rcpp::NumericVector out(len); // zero-initialized
#ifdef USE_OPENCL
if (!nmathopencl_has_opencl() || len == 0) return out;
try {
d_givelog_ndrange_kernel_fill(
"src/dnorm_kernel.cl", "dnorm_kernel",
len, {&x, &mean, &sd}, give_log, out, verbose);
} catch (const std::exception& e) {
if (verbose) Rcpp::Rcout << e.what() << "\n";
throw;
}
#endif
return out;
}Note that the C++ wrapper itself returns zeros when
!nmathopencl_has_opencl(). It does not
call stats::dnorm. The R wrapper is responsible for the
fallback to stats::dnorm; the C++ wrapper simply reports
“no GPU result” via zeros. This keeps the C++ layer free of any R
evaluation machinery.
f2_f3_openclEx_EnvelopeEval (in R/ex_glmbayes.R) is the
user-facing function. It accepts a use_opencl flag and
delegates entirely to the C++ dispatcher:
# R/ex_glmbayes.R
#' @export
Ex_EnvelopeEval <- function(G4, y, x, mu, P, alpha, wt,
family, link,
use_opencl = FALSE,
verbose = FALSE) {
# Input validation (matrix/vector type checks) ...
.EnvelopeEval_cpp(G4, y, x, mu, P, alpha, wt,
family, link, use_opencl, verbose)
}There is no R-level fallback function here. The fallback is handled entirely inside the C++ dispatcher.
EnvelopeEval_cpp (inside src/) receives
use_opencl and decides which C++ implementation to
call:
// src/ (conceptual structure — details in actual source)
Rcpp::List EnvelopeEval_cpp(
Rcpp::NumericMatrix G4, Rcpp::NumericVector y,
Rcpp::NumericMatrix x, Rcpp::NumericMatrix mu,
Rcpp::NumericMatrix P, Rcpp::NumericVector alpha,
Rcpp::NumericVector wt,
std::string family, std::string link,
bool use_opencl, bool verbose
) {
// Prepare shared inputs (common to both paths) ...
if (use_opencl && nmathopencl_has_opencl()) {
// GPU path: call the OpenCL kernel wrapper
return ex_glmbayes::opencl::f2_f3_opencl(
family, link, b, y, x, mu, P, alpha, wt, verbose);
} else {
// CPU path: call the pure C++ implementation
return ex_glmbayes::f2_f3_non_opencl(
family, link, b, y, x, mu, P, alpha, wt);
}
}Both f2_f3_opencl and f2_f3_non_opencl
return a Rcpp::List with identical structure:
list(qf = numeric(m1), grad = matrix(m1, l2)). The
dispatcher’s caller cannot tell from the return value which path was
used.
For Pattern 1 (distribution functions), the CPU fallback is an
existing well-tested function from stats::. No separate CPU
implementation is needed.
For the GLM gradient computation, no equivalent off-the-shelf CPU
function exists. f2_f3_non_opencl is a pure C++
implementation of the same mathematical computation, written without any
OpenCL dependency. It compiles on every platform and produces
bit-for-bit equivalent results to the GPU path (within double-precision
rounding).
Having both implementations under explicit control also makes it
possible to benchmark them directly: use_opencl = FALSE
forces the CPU path even on a GPU-equipped machine.
The choice between Pattern 1 and Pattern 2 comes down to whether there is a natural existing CPU computation to fall back to.
| Criterion | Pattern 1 (R interface + R fallback) | Pattern 2 (C++ dispatch + CPU implementation) |
|---|---|---|
| Existing CPU function available? | Yes (stats::, base::, etc.) |
No; need to write the CPU implementation |
| Does the computation have a direct R use? | Yes (called directly from R) | Often not (called from a C++ simulation loop) |
| Where does fallback live? | R level (fallback_full()) + runtime
(nmathopencl_has_opencl()) |
C++ level
(use_opencl && nmathopencl_has_opencl()) |
| Caller can request optional fallback? | Yes (fallback = TRUE/FALSE argument) |
Caller controls via use_opencl flag |
| Wrapper directly R-callable? | Yes (exported via [[Rcpp::export]]) |
Not necessarily — may be purely internal C++ |
Both patterns guarantee that the package compiles and runs correctly on any machine. The GPU path is always optional; the CPU path always produces a valid (if unaccelerated) result.
nmathopencl uses a consistent naming scheme to make the
role of each function clear:
| Name | Type | Role |
|---|---|---|
dnorm_opencl |
Exported R function | User-facing API; validates inputs; manages fallback |
.dnorm_opencl |
Internal R → C++ bridge | Rcpp export; positional R → C++ call only |
nmathopencl::dnorm_opencl |
C++ kernel wrapper | #ifdef guard; type conversion; runner dispatch |
nmathopencl::dnorm_kernel_runner |
C++ kernel runner | Full OpenCL lifecycle; #ifdef USE_OPENCL only |
Ex_EnvelopeEval |
Exported R function | User-facing API; passes use_opencl flag |
.EnvelopeEval_cpp |
Internal R → C++ bridge | Positional R → C++ call only |
f2_f3_opencl |
C++ kernel wrapper | OpenCL path; used inside dispatcher |
f2_f3_non_opencl |
C++ CPU implementation | CPU path; used inside same dispatcher |
The .dot prefix on internal R functions signals that
they are not part of the public API and will not appear in
?help search or autocompletion.
For your own package, a consistent analogous scheme might be:
myfunc_opencl() # exported R function (if direct R use)
.myfunc_opencl() # internal R → C++ bridge
mypkg::myfunc_opencl() # C++ kernel wrapper (in namespace)
mypkg::myfunc_runner() # C++ kernel runner (in namespace, #ifdef only)
mypkg::myfunc_cpu() # C++ CPU fallback (if Pattern 2)
Every kernel wrapper needs a CPU path. The two patterns differ in where that path lives and who controls the dispatch:
Pattern 1 puts the fallback logic in R, using
the existing stats:: ecosystem. It is the right choice when
the computation mirrors an existing R function and has direct R
users.
Pattern 2 puts the fallback logic in C++, alongside a dedicated CPU implementation. It is the right choice when the computation is novel, when it is called from a C++ simulation loop rather than directly from R, or when benchmarking between the two paths is important.
In both patterns the OpenCL infrastructure — the runner and the kernel — is identical. What differs is only how the wrapper is wired into the rest of the package.
Chapter 12 describes the nmathopencl R API in full,
showing how the distribution-function wrappers are documented and
organized. Chapter 10 works through the ex_glmbayes pattern
end-to-end.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.