The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette compares two approaches to parallel processing in R: mirai and bakerrr. Both packages enable parallel execution of computationally intensive tasks, but with different design philosophies and usage patterns.
We’ll use a bootstrap calculation function that simulates long-running computations:
library(mirai)
library(bakerrr)
#>
#> Attaching package: 'bakerrr'
#> The following object is masked from 'package:mirai':
#>
#> status
#> The following object is masked from 'package:base':
#>
#> summary
long_stat_calc <- function(x, n_boot, sleep_time) {
# x: numeric vector
# n_boot: number of bootstraps
# sleep_time: pause after each bootstrap (sec)
if (!is.numeric(x)) stop("Input x must be numeric.")
if (length(x) < 2) stop("Input x must have at least 2 values.")
start_time <- Sys.time()
boot_means <- numeric(n_boot)
for (i in seq_len(n_boot)) {
boot_means[i] <- mean(sample(x, replace = TRUE))
if (sleep_time > 0) Sys.sleep(sleep_time)
}
end_time <- Sys.time()
result <- list(
boot_mean = mean(boot_means),
boot_sd = sd(boot_means),
elapsed = difftime(end_time, start_time, units = "secs")
)
class(result) <- "long_stat_calc"
result
}
# Print method for easy reporting
print.long_stat_calc <- function(x, ...) {
cat("Bootstrap Mean:", x$boot_mean, "\n")
cat("Bootstrap SD: ", x$boot_sd, "\n")
cat("Elapsed Time: ", x$elapsed, "seconds\n")
}
# Arguments for 10 parallel jobs
args_list <- list(
list(rnorm(100), n_boot = 3000, sleep_time = 0.002),
list(rnorm(100), n_boot = 3000, sleep_time = 0.002),
list(rnorm(100), n_boot = 3000, sleep_time = 0.002),
list(rnorm(100), n_boot = 3000, sleep_time = 0.002),
list(rnorm(100), n_boot = 3000, sleep_time = 0.002),
list(rnorm(100), n_boot = 3000, sleep_time = 0.002),
list(rnorm(100), n_boot = 3000, sleep_time = 0.002),
list(rnorm(100), n_boot = 3000, sleep_time = 0.002),
list(rnorm(100), n_boot = 3000, sleep_time = 0.002),
list(rnorm(100), n_boot = 3000, sleep_time = 0.002)
)
mirai provides a lightweight, async-focused approach:
# Clean slate
mirai::daemons(0)
set.seed(10)
mirai_timing <- system.time({
mirai::daemons(6) # Start 6 daemon processes
res <- mirai::mirai_map(
.x = list(
rnorm(100), rnorm(100), rnorm(100), rnorm(100),
rnorm(100), rnorm(100), rnorm(100), rnorm(100),
rnorm(100), rnorm(100)
),
.f = long_stat_calc,
.args = list(n_boot = 3000, sleep_time = 0.002)
)
# Check progress and collect results
res[.progress]
mirai_results <- res[.flat]
})
#> ■■■■ 10% | ETA: 1m
#> ■■■■■■■■■■■■■■■■■■■■■■ 70% | ETA: 6s
#> ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 100% | ETA: 0s
print(mirai_timing)
#> user system elapsed
#> 0.055 0.011 13.397
mirai::daemons(0) # Clean up
bakerrr offers an object-oriented approach with built-in job management:
bakerrr_timing <- system.time({
baker <- bakerrr::bakerrr(
long_stat_calc,
args_list = args_list,
n_daemons = 6
# Optional: bg_args = list(stdout = "out.log", stderr = "error.log") # nolint
) |>
bakerrr::run_jobs(wait_for_results = TRUE)
bakerrr_results <- baker@results
})
print(bakerrr_timing)
#> user system elapsed
#> 14.359 2.686 13.837
Both approaches show similar performance for CPU-bound tasks, with actual timing dependent on:
# Both approaches return similar structured results
str(mirai_results[[1]])
#> num -0.0305
str(bakerrr_results[[1]])
#> List of 3
#> $ boot_mean: num 0.0546
#> $ boot_sd : num 0.102
#> $ elapsed : 'difftime' num 6.452388048172
#> ..- attr(*, "units")= chr "secs"
#> - attr(*, "class")= chr "long_stat_calc"
# Print first result from each method
print(mirai_results[[1]])
#> [1] -0.03050123
print(bakerrr_results[[1]])
#> Bootstrap Mean: 0.05458999
#> Bootstrap SD: 0.1015962
#> Elapsed Time: 6.452388 seconds
Both mirai and bakerrr provide effective parallel processing capabilities. The choice depends on your specific requirements:
For production workflows requiring robust error handling and logging, bakerrr may be preferable. For performance-critical applications needing minimal overhead, mirai could be the better choice.
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Red Hat Enterprise Linux 8.10 (Ootpa)
#>
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.15.so; LAPACK version 3.9.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] bakerrr_0.1.0 mirai_2.5.0
#>
#> loaded via a namespace (and not attached):
#> [1] crayon_1.5.3 vctrs_0.6.5 cli_3.6.5 knitr_1.50
#> [5] rlang_1.1.6 xfun_0.53 processx_3.8.6 purrr_1.1.0
#> [9] S7_0.2.0 jsonlite_2.0.0 carrier_0.3.0.4 glue_1.8.0
#> [13] nanonext_1.7.0 htmltools_0.5.8.1 ps_1.9.1 sass_0.4.10
#> [17] rmarkdown_2.29 evaluate_1.0.5 jquerylib_0.1.4 fastmap_1.2.0
#> [21] yaml_2.3.10 lifecycle_1.0.4 config_0.3.2 compiler_4.4.2
#> [25] fs_1.6.6 rstudioapi_0.17.1 digest_0.6.37 R6_2.6.1
#> [29] parallel_4.4.2 magrittr_2.0.4 callr_3.7.6 bslib_0.9.0
#> [33] tools_4.4.2 cachem_1.1.0
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.