The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
taskqueue is an R package for asynchronous parallel
computing based on PostgreSQL database. It is designed to dynamically
allocate tasks to workers, efficiently utilizing all available computing
resources until all tasks are completed.
This package is suitable for Embarrassingly parallel problems - parallel computing without any communication among parallel tasks.
Before using taskqueue, ensure you have:
Install the development version from GitHub:
A computing resource is a facility/computer that can run multiple jobs/workers.
resource_add(
name = "hpc",
type = "slurm",
host = "hpc.example.com",
nodename = "hpc",
workers = 500,
log_folder = "/home/user/log_folder/"
)Parameters:
name: Resource nametype: Resource type (currently only slurm
is supported)host: Network name to access the resourcenodename: Obtained by Sys.info() on the
resourceworkers: Maximum number of available coreslog_folder: Folder to store log files (important for
troubleshooting)Note: log_folder is split by project.
It’s better to use a high-speed hard drive due to frequent I/O
operations.
taskqueue manages tasks by project. Each project has its
own resources, working directory, runtime requirements, and
configurations.
Create a function that:
id as the first argumentAfter developing and testing your function, save it to a file (e.g.,
rcode.R) and deploy to your HPC resource:
# Reset task status (if needed)
project_reset("test_project")
# Start the project
project_start("test_project")
# Schedule tasks on slurm resource
worker_slurm("test_project", "hpc", "rcode.R", modules = "sqlite/3.43.1")
# Check task status
task_status("test_project")
# Stop the project when done
project_stop("test_project")Each task has one of four statuses:
idle: Task is not runningworking: Task is currently running on
a workerfailed: Task failed for some reason
(check the log folder for troubleshooting)finished: Task completed without
errorsReset all tasks in a project:
Reset only failed or working tasks:
Here’s a complete example of using taskqueue:
library(taskqueue)
# 1. Initialize database (first time only)
db_init()
# 2. Add resource
resource_add(
name = "my_hpc",
type = "slurm",
host = "hpc.university.edu",
nodename = "hpc",
workers = 200,
log_folder = "/home/user/taskqueue_logs/"
)
# 3. Create project
project_add("simulation_study", memory = 16)
# 4. Assign resource to project
project_resource_add(project = "simulation_study", resource = "my_hpc")
# 5. Add tasks
task_add("simulation_study", num = 1000, clean = TRUE)
# 6. Create worker function (save as worker_script.R)
library(taskqueue)
run_simulation <- function(task_id) {
out_file <- sprintf("results/sim_%04d.Rds", task_id)
if (file.exists(out_file)) return()
# Run your simulation
result <- your_simulation_function(task_id)
# Save results
saveRDS(result, out_file)
}
worker("simulation_study", run_simulation)
# 7. Deploy to HPC
project_start("simulation_study")
worker_slurm("simulation_study", "my_hpc", "worker_script.R")
# 8. Monitor progress
task_status("simulation_study")
# 9. Stop when complete
project_stop("simulation_study")worker() before deploying to HPCtask_status()
regularly to monitor progressproject_reset() to
restart failed tasksThese binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.