The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Automatic Protection

Motivation and design

Heavily inspired by cpp11’s double-linked protection pool, this vignette explores cppally’s automatic protection design and benchmarks its overhead against cpp11. Like cpp11, cppally offers automatic protection for all cppally types enabling users of the API to not need to think about protection of every R object.

cppally uses a double-linked protection pool design like cpp11, but instead of a single pair list, we implement a double-linked chain of VECSXP (vector list) chunks with a reference counting system. This offers much less overhead in inserting and releasing and practically free copying due to the reference counting design. Because we utilise vector list chunks, adding nodes is cheap (via SET_VECTOR_ELT()). When an r_sexp is copied, we increment the reference count associated with that node and only release the node once the reference count is 0, which means all r_sexp objects pointing to that SEXP have been destroyed.

The vector list chunks initialise at size 2^10 and double in size every time a chunk is filled, to a maximum size of 2^14. While these sizes seem to strike a nice balance, they are somewhat arbitrary and likely not optimal, so there is likely room for improvement.

Protection benchmark: cpp11 vs cppally

All functions have been compiled with C++20, GCC 14.2.0 with -O2 optimisations.

First, let’s load cppally

library(cppally)

Insert/release benchmark


#include <cpp11.hpp>

[[cppally::linking_to("cpp11")]]

using namespace cpp11;
#include <chrono>

[[cppally::register]]
double bench_protect_insert_release_cpp11(int n) {
  SEXP dummy = Rf_ScalarInteger(42);
  R_PreserveObject(dummy);
  auto start = std::chrono::high_resolution_clock::now();
  for (int i = 0; i < n; ++i) {
    sexp x(dummy); // insert into pool
  }           // destructor → release from pool
  auto end = std::chrono::high_resolution_clock::now();
  R_ReleaseObject(dummy);
  double ns = std::chrono::duration<double, std::nano>(end - start).count();
  return ns / n; // nanoseconds per insert+release cycle
}

#include <cppally.hpp>
using namespace cppally;
#include <chrono>

[[cppally::register]]
double bench_protect_insert_release_cppally(int n) {
  SEXP dummy = Rf_ScalarInteger(42);
  R_PreserveObject(dummy);
  auto start = std::chrono::high_resolution_clock::now();
  for (int i = 0; i < n; ++i) {
    r_sexp x(dummy); // insert into pool
  }           // destructor → release from pool
  auto end = std::chrono::high_resolution_clock::now();
  R_ReleaseObject(dummy);
  double ns = std::chrono::duration<double, std::nano>(end - start).count();
  return ns / n; // nanoseconds per insert+release cycle
}

Results in nanoseconds per insert & release

insert_release_cpp11 <- replicate(10^4, bench_protect_insert_release_cpp11(10^4)) 
mean(insert_release_cpp11)
#> [1] 42.41752
insert_release_cppally <- replicate(10^4, bench_protect_insert_release_cppally(10^4))
mean(insert_release_cppally)
#> [1] 11.40666

On my machine, cpp11 performs an insert & release every ~42 nanoseconds. cppally performs significantly better, with ~11 nanoseconds per insert & release.

Copy benchmark



#include <cpp11.hpp>
[[cppally::linking_to("cpp11")]]

using namespace cpp11;
#include <chrono>

[[cppally::register]]
double bench_protect_copy_cpp11(int n) {
  SEXP dummy = Rf_ScalarInteger(42);
  sexp dummy2 = sexp(dummy);

  auto start = std::chrono::high_resolution_clock::now();
  for (int i = 0; i < n; ++i) {
    sexp x = dummy2; // Copy
  }
  auto end = std::chrono::high_resolution_clock::now();

  double ns = std::chrono::duration<double, std::nano>(end - start).count();
  return ns / n;  // nanoseconds per copy
}

#include <cppally.hpp>
using namespace cppally;
#include <chrono>

[[cppally::register]]
double bench_protect_copy_cppally(int n) {
  SEXP dummy = Rf_ScalarInteger(42);
  r_sexp dummy2 = r_sexp(dummy);
  auto start = std::chrono::high_resolution_clock::now();
  for (int i = 0; i < n; ++i) {
    r_sexp x = dummy2; // Copy
  }
  auto end = std::chrono::high_resolution_clock::now();
  double ns = std::chrono::duration<double, std::nano>(end - start).count();
  return ns / n; // nanoseconds per copy
}

Results in nanoseconds per copy

copy_sexp_cpp11 <- replicate(10^4, bench_protect_copy_cpp11(10^4))
mean(copy_sexp_cpp11)
#> [1] 38.32048
copy_sexp_cppally <- replicate(10^4, bench_protect_copy_cppally(10^4))
mean(copy_sexp_cppally)
#> [1] 0.303891

In these benchmark results we can see a drastic difference, with cpp11 at ~38 ns/copy and cppally at ~0.3 ns/copy.

Impact of protection overhead, a real example

Let’s look at a real example of how much protection overhead can have an impact on performance.

Example: Counting the number of NA values in a character vector


// Pure R C API NA count - As fast as it can reasonably get
[[cppally::register]] // Registered via cppally for convenience
int C_na_count(SEXP x){
 r_size_t n = Rf_xlength(x);
 int na_count = 0;
 const SEXP *p_x = STRING_PTR_RO(x);
 for (r_size_t i = 0; i < n; ++i){
  SEXP str = p_x[i]; // No protection so no extra overhead
  na_count += str == NA_STRING;
 }
 return na_count;
}

#include <cpp11.hpp>
[[cppally::linking_to("cpp11")]]

[[cppally::register]]
int cpp11_na_count(SEXP x){
  using namespace cpp11;
  strings x_(x);
  R_xlen_t n = x_.size();

  int na_count = 0;

  for (R_xlen_t i = 0; i < n; ++i){
    r_string str = x_[i]; // r_string protects the underlying CHARSXP
    na_count += cpp11::is_na(str);
  }
  return na_count;
}

[[cppally::register]]
int cppally_na_count(r_vec<r_str> x){
 r_size_t n = x.length();
 int na_count = 0;
 for (r_size_t i = 0; i < n; ++i){
  r_str str = x.get(i); // r_str protects the underlying CHARSXP
  na_count += is_na(str);
 }
 return na_count;
}
set.seed(42)
x <- sample(letters, 10^5, TRUE)
x[sample.int(length(x), 10^3)] <- NA

R C API results - Extremely fast <30 microseconds

library(bench)
mark(C_na_count(x))
#> # A tibble: 1 × 6
#>   expression         min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 C_na_count(x)   28.4µs   33.6µs    26417.        0B        0

cpp11 results - ~5 milliseconds

mark(cpp11_na_count(x))
#> # A tibble: 1 × 6
#>   expression             min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>        <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 cpp11_na_count(x)   5.53ms   6.62ms      145.        0B     43.5

cppally results - ~750 microseconds

mark(cppally_na_count(x))
#> # A tibble: 1 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 cppally_na_count(x)    849µs    1.1ms      862.        0B        0

Counting values is a simple operation and because of its simplicity, the overhead associated with protection will dominate the benchmark. We can see that when removing the protection overhead and using the R C API directly, we get a baseline result of 30 microseconds. Think of that as the best possible result for this operation. The same operation with cpp11 results in a function execution time of 5 milliseconds, 180x slower than the R C API. Considerably better (but still relatively slower), cppally_na_count() results in a function execution time of 730 microseconds, 27x slower than the R C API. This is much better though it still goes to show that for certain operations, one may want to consider other approaches where performance is critical.

cppally views: A solution to the protection overhead problem

As we saw in the previous section, certain performance-heavy functions can be slowed down by cppally protection overhead. Luckily cppally offers some tools to avoid this if it becomes a real issue.

r_str_view

In the previous section, we extracted a temporary r_str, checked if it is NA, and then incremented the NA count if so. We didn’t end up using the r_str element beyond just reading its value which means we could have improved performance by using r_str_view, a non-owning version of r_str.


[[cppally::register]]
int cppally_fast_na_count(r_vec<r_str_view> x){
 r_size_t n = x.length();
 int na_count = 0;
 for (r_size_t i = 0; i < n; ++i){
  r_str_view str = x.get(i); // `r_str_view` does NOT re-protect the underlying CHARSXP
  na_count += is_na(str);
 }
 return na_count;
}
mark(cppally_fast_na_count(x))
#> # A tibble: 1 × 6
#>   expression                    min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>               <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 cppally_fast_na_count(x)   77.7µs   92.2µs     9468.        0B        0

Looking at the benchmark results, we have effectively eliminated the protection overhead and the median execution time is much closer to that of the R C API.

view()

We could have also used view(), a non-owning version of get() that extracts elements but in a non-owning context.


[[cppally::register]]
int cppally_fast_na_count_v2(r_vec<r_str> x){
 r_size_t n = x.length();
 int na_count = 0;
 for (r_size_t i = 0; i < n; ++i){
  // view() is safe in a short-lived read-only context
  na_count += is_na(x.view(i));
 }
 return na_count;
}
mark(cppally_fast_na_count_v2(x))
#> # A tibble: 1 × 6
#>   expression                       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 cppally_fast_na_count_v2(x)   78.1µs     90µs     9726.        0B        0

The results are similar to that of cppally_fast_na_count().

Using views safely

views can be used to eliminate the small overhead associated with automatic protection of objects wrapping SEXP, but they must be used carefully. As demonstrated above, r_str_view is one such class which is a lightweight wrapper around a SEXP and never protects the underlying SEXP. Its lifetime must be shorter than the object it is pointing to.

DO:


void good(r_str x){
  r_str_view str = x;
  if (str.cpp_str() == "true"){
    print("true");
  } else {
    print("false");
  }
}

The above example is safe because str is not returned by good() AND does not outlive x.

DON’T:


r_str_view bad(){
  r_str new_str("I will be destroyed at the end of `bad()`");
  r_str_view bad_str = new_str; // A view of new_str
  return bad_str; // Points to underlying CHARSXP but nothing protecting it
}

The above is a classic example of what not to do with string views, which is to have the view outlive the owner. In this case bad_str is returned at the end of bad() at which point new_str goes out of scope and gets destroyed. This means nothing is protecting the underlying CHARSXP that new_str was protecting and once that CHARSXP is garbage-collected by R, bad_str will become a dangling-pointer, leading to use-after-free bugs (undefined behaviour, crashes, corrupted data, etc).

To conclude, views can be a useful tool for performance-critical functions with the downside being that one must pay more attention to view object lifetimes to ensure they do not outlive the SEXP they are pointing to.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.