This vignette is a guide to debugging and testing drake
projects. Please also see the “caution” vignette, which addresses drake
's known edge cases, pitfalls, and weaknesses that may or may not be fixed in future releases. For the most up-to-date information on unhandled edge cases, please visit the issue tracker, where you can submit your own bug reports as well. Be sure to search the closed issues too, especially if you are not using the most up-to-date development version.
Most of drake
's functions rely on a central config
list. An understanding of config
will help you grasp the internals. make()
and drake_config()
both return the config
list. Unlike make()
, drake_config()
's return value is visible, and its only purpose is to construct your config
.
load_basic_example() # Get the code with drake_example("basic").
config <- drake_config(my_plan)
sort(names(config))
## [1] "args" "cache" "cache_log_file"
## [4] "cache_path" "caching" "command"
## [7] "cpu" "elapsed" "envir"
## [10] "evaluator" "fetch_cache" "graph"
## [13] "hook" "imports_only" "jobs"
## [16] "keep_going" "lazy_load" "log_progress"
## [19] "long_hash_algo" "parallelism" "plan"
## [22] "prepend" "prework" "recipe_command"
## [25] "retries" "seed" "session"
## [28] "session_info" "short_hash_algo" "skip_imports"
## [31] "skip_safety_checks" "targets" "timeout"
## [34] "trigger" "verbose"
The fields of config
mostly arguments to make()
and are documented there. The rest of the fields are as follows.
graph
: An igraph object with the directed acyclic graph (DAG) of the workflow.inventory
: A running list of the cached objects in each storr
namespace. Maintaining this list helps avoid repeated calls to config$cache$list()
, which increases speed.long_hash_algo
: Name of the long hash algorithm used throughout make()
. Used to generate hash keys that will not become the names of files. See the storage vignette for details.seed
: The random number generator seed taken from the user's R session. Each target is built reproducibly using a deterministic function of this seed, and the build does not change the seed outside the scope of the target's command.short_hash_algo
: Name of the short hash algorithm used throughout make()
. Used to generate hash keys that could become names of files. See the storage vignette for details.Early in make()
, the config
list is stored in the cache. You can retrieve it with
read_drake_config()
and you can access parts of it with some companion functions.
read_drake_graph()
read_drake_plan()
The workflow plan data frame is your responsibility, and it takes effort and care. Fortunately, functions in drake
can help. You can check the plan for formatting issues, missing input files, etc. with the check_plan()
function.
load_basic_example() # Get the code with drake_example("basic").
my_plan
## # A tibble: 15 x 2
## target command
## <chr> <chr>
## 1 "" "knit(knitr_in(\"report.Rmd\"), file_out(\"repo…
## 2 small simulate(48)
## 3 large simulate(64)
## 4 regression1_small reg1(small)
## 5 regression1_large reg1(large)
## 6 regression2_small reg2(small)
## 7 regression2_large reg2(large)
## 8 summ_regression1_small suppressWarnings(summary(regression1_small$resi…
## 9 summ_regression1_large suppressWarnings(summary(regression1_large$resi…
## 10 summ_regression2_small suppressWarnings(summary(regression2_small$resi…
## 11 summ_regression2_large suppressWarnings(summary(regression2_large$resi…
## 12 coef_regression1_small suppressWarnings(summary(regression1_small))$co…
## 13 coef_regression1_large suppressWarnings(summary(regression1_large))$co…
## 14 coef_regression2_small suppressWarnings(summary(regression2_small))$co…
## 15 coef_regression2_large suppressWarnings(summary(regression2_large))$co…
check_plan(my_plan) # No issues.
After quality-checking your plan, you should check that you understand how the steps of your workflow are interconnected. The web of dependencies affects which targets are built and which ones are skipped during make()
.
# Hover, click, drag, zoom, and pan. See args 'from' and 'to'.
config <- drake_config(my_plan)
vis_drake_graph(config, width = "100%", height = "500px")
See the rendered graph vignette to learn more about how graphing can help (for example, how to visualize small subgraphs). If you want to take control of your own visNetwork graph, use the dataframes_graph()
function to get data frames of nodes, edges, and legend nodes.
Programmatically, several functions can help you check immediate dependencies.
deps(reg2)
## [1] "lm"
# knitr_in() makes sure your target depends on `report.Rmd`
# and any dependencies loaded with loadd() and readd()
# in the report's active code chunks.
deps(my_plan$command[1])
## [1] "\"report.Rmd\"" "\"report.md\""
## [3] "coef_regression2_small" "knit"
## [5] "large" "small"
deps(my_plan$command[nrow(my_plan)])
## [1] "regression2_large" "summary" "suppressWarnings"
Drake
takes special precautions so that a target/import does not depend on itself. For example, deps(f)
might return "f"
if f()
is a recursive function, but make()
just ignores this conflict and runs as expected. In other words, make()
automatically removes all self-referential loops in the dependency network.
List all the reproducibly-tracked objects and files, including imports and targets.
tracked(my_plan, targets = "small")
## [1] "nrow" "sample.int" "data.frame" "mtcars" "random_rows"
## [6] "small" "simulate"
tracked(my_plan)
## [1] "nrow" "sample.int"
## [3] "data.frame" "mtcars"
## [5] "random_rows" "\"report.Rmd\""
## [7] "lm" "coef_regression2_small"
## [9] "knit" "large"
## [11] "small" "simulate"
## [13] "reg1" "reg2"
## [15] "regression1_small" "summary"
## [17] "suppressWarnings" "regression1_large"
## [19] "regression2_small" "regression2_large"
## [21] "\"report.md\"" "summ_regression1_small"
## [23] "summ_regression1_large" "summ_regression2_small"
## [25] "summ_regression2_large" "coef_regression1_small"
## [27] "coef_regression1_large" "coef_regression2_large"
missed()
reports import dependencies missing from your environment
config <- drake_config(my_plan, verbose = FALSE)
missed(config) # Nothing is missing right now.
## character(0)
outdated()
reports any targets that are outdated, plus any downstream targets that depend on them.
outdated(config)
## [1] "\"report.md\"" "coef_regression1_large"
## [3] "coef_regression1_small" "coef_regression2_large"
## [5] "coef_regression2_small" "large"
## [7] "regression1_large" "regression1_small"
## [9] "regression2_large" "regression2_small"
## [11] "small" "summ_regression1_large"
## [13] "summ_regression1_small" "summ_regression2_large"
## [15] "summ_regression2_small"
To find out why a target is out of date, you can load the storr-based cache and compare the appropriate hash keys to the output of dependency_profile()
. To use dependency_profile()
, be sure to supply the master configuration list as the config
argument. The same is true for drake_meta()
, another alternative.
load_basic_example() # Get the code with drake_example("basic").
config <- make(my_plan, verbose = FALSE)
# Change a dependency.
reg2 <- function(d) {
d$x3 <- d$x ^ 3
lm(y ~ x3, data = d)
}
outdated(config)
## [1] "\"report.md\"" "coef_regression2_large"
## [3] "coef_regression2_small" "regression2_large"
## [5] "regression2_small" "summ_regression2_large"
## [7] "summ_regression2_small"
dependency_profile(target = "regression2_small", config = config)
## $cached_command
## [1] "{\n reg2(small) \n}"
##
## $current_command
## [1] "{\n reg2(small) \n}"
##
## $cached_file_modification_time
## NULL
##
## $cached_dependency_hash
## [1] "bff91683ab896912a57d3010b489e68a50294f7c46dc5c8bc80797a3a616194b"
##
## $current_dependency_hash
## [1] "8685dbd7c688d9ceca90b9c7cdde2e151e39a4882af35b18c9697a04c24e9d63"
##
## $hashes_of_dependencies
## reg2 small
## "d47109544c89ca7a" "40fb781de184c741"
drake_meta(target = "regression2_small", config = config)
## $target
## [1] "regression2_small"
##
## $imported
## [1] FALSE
##
## $foreign
## [1] TRUE
##
## $missing
## [1] FALSE
##
## $seed
## [1] 1034257256
##
## $command
## [1] "{\n reg2(small) \n}"
##
## $depends
## [1] "8685dbd7c688d9ceca90b9c7cdde2e151e39a4882af35b18c9697a04c24e9d63"
##
## $file
## [1] NA
config$cache$get_hash(key = "small", namespace = "kernels") # same
## [1] "40fb781de184c741"
config$cache$get_hash(key = "small") # same
## [1] "40fb781de184c741"
config$cache$get_hash(key = "reg2", namespace = "kernels") # same
## [1] "d47109544c89ca7a"
config$cache$get_hash(key = "reg2") # different
## [1] "89c33700774643ff"
In drake
, the “kernel” of a target or import is the piece of the output that is reproducibly tracked. For ordinary R objects, the kernel is just the object itself. For custom external files, it is a separate hash. But for functions, the kernel is the deparsed body of the function, together with the dependency hash if the function is imported (see drake:::store_function()
).
The internal functions drake:::meta()
and drake:::meta_list()
compute the metadata on each target that drake
uses to decide which targets to build and which to skip (via drake:::should_build_target()
). Then, after the target/import is processed, drake:::finish_meta()
updates the metadata (except for the $missing
element) before it is cached. See diagnose()
to read available metadata, along with any errors, warnings, and messages generated during the build.
str(diagnose(small))
## List of 11
## $ target : chr "small"
## $ imported : logi FALSE
## $ foreign : logi TRUE
## $ missing : logi TRUE
## $ seed : num 1.95e+09
## $ command : chr "{\n simulate(48) \n}"
## $ depends : chr "5aa9da170b33b159cfd382e15229d7efe6d6a5777d1c69e71b0c6a1188ee5116"
## $ file : chr NA
## $ start :Class 'proc_time' Named num [1:5] 13.905 0.22 14.745 0.024 0.018
## .. ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
## $ time_command:'data.frame': 1 obs. of 5 variables:
## ..$ item : chr "small"
## ..$ type : chr "target"
## ..$ elapsed: num 0
## ..$ user : num 0.001
## ..$ system : num 0
## $ time_build :'data.frame': 1 obs. of 5 variables:
## ..$ item : chr "small"
## ..$ type : chr "target"
## ..$ elapsed: num 0.003
## ..$ user : num 0.003
## ..$ system : num 0
str(diagnose("\"report.md\""))
## List of 12
## $ target : chr "\"report.md\""
## $ imported : logi FALSE
## $ foreign : logi TRUE
## $ missing : logi TRUE
## $ seed : num 1.85e+09
## $ command : chr "{\n knit(knitr_in(\"report.Rmd\"), file_out(\"report.md\"), quiet = TRUE) \n}"
## $ depends : chr "5f72f8f2a06b7a515e17a3ea3f57b1dd3522d7143e0cbd0650a3c0985683f81d"
## $ file : chr "ed35f108e2ccd75a904273f1e8559d5a0acb9c2700531276a7acdcfba09decc6"
## $ start :Class 'proc_time' Named num [1:5] 14.001 0.224 14.845 0.024 0.018
## .. ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
## $ time_command:'data.frame': 1 obs. of 5 variables:
## ..$ item : chr "\"report.md\""
## ..$ type : chr "target"
## ..$ elapsed: num 0.032
## ..$ user : num 0.031
## ..$ system : num 0
## $ mtime : POSIXct[1:1], format: "2018-04-09 23:50:07"
## $ time_build :'data.frame': 1 obs. of 5 variables:
## ..$ item : chr "\"report.md\""
## ..$ type : chr "target"
## ..$ elapsed: num 0.034
## ..$ user : num 0.035
## ..$ system : num 0
If your target's last build succeeded, then diagnose(your_target)
has the most current information from that build. But if your target failed, then only diagnose(your_target)$error
, diagnose(your_target)$warnings
, and diagnose(your_target)$messages
correspond to the failure, and all the other metadata correspond to the last build that completed without an error.
To track dependencies and make decisions about what needs building, make()
store the fingerprint, or hash, of each target. Hashing is great for detecting the right changes in targets, but if all you want to do is test and debug a workflow, the full rigor can be time-consuming.
Fortunately, you can change the triggers that tell drake
when to (re)build each target. Below, drake
disregards outdatedness and just builds the targets that are missing.
clean(verbose = FALSE) # Start from scratch
config <- make(my_plan, trigger = "missing")
## Unloading targets from environment:
## coef_regression2_small
## large
## small
## target large: trigger "missing"
## target small: trigger "missing"
## target regression1_large: trigger "missing"
## target regression1_small: trigger "missing"
## target regression2_large: trigger "missing"
## target regression2_small: trigger "missing"
## target coef_regression1_large: trigger "missing"
## target coef_regression1_small: trigger "missing"
## target coef_regression2_large: trigger "missing"
## target coef_regression2_small: trigger "missing"
## target summ_regression1_large: trigger "missing"
## target summ_regression1_small: trigger "missing"
## target summ_regression2_large: trigger "missing"
## target summ_regression2_small: trigger "missing"
## target file "report.md": trigger "missing"
## Used non-default triggers. Some targets may not be up to date.
You can choose from any of the following triggers for all targets or for each target individually.
always
: Always build the target regardless of the circumstance, even if the target is already up to date. any
: Apply all the triggers below (default). In other words, trigger a build if the command
trigger, depends
trigger, file
trigger, or missing
trigger is activated.command
: Build if the workflow plan command changed since the last make()
or the target is missing.depends
: Build if any of the target's dependencies changed since the last make()
or if the target is missing.file
: Build if the target is an output file and the file is either missing or corrupted. Also build if the file's hash is missing from the cache.missing
: Build if and only if the target is missing.To select triggers for individual targets, create an optional trigger
column in the workflow plan data frame. Entries in this column override the trigger
argument to make()
my_plan$trigger <- "command"
my_plan$trigger[1] <- "file"
my_plan
## # A tibble: 15 x 3
## target command trigger
## <chr> <chr> <chr>
## 1 "" "knit(knitr_in(\"report.Rmd\"), file_ou… file
## 2 small simulate(48) command
## 3 large simulate(64) command
## 4 regression1_small reg1(small) command
## 5 regression1_large reg1(large) command
## 6 regression2_small reg2(small) command
## 7 regression2_large reg2(large) command
## 8 summ_regression1_small suppressWarnings(summary(regression1_sm… command
## 9 summ_regression1_large suppressWarnings(summary(regression1_la… command
## 10 summ_regression2_small suppressWarnings(summary(regression2_sm… command
## 11 summ_regression2_large suppressWarnings(summary(regression2_la… command
## 12 coef_regression1_small suppressWarnings(summary(regression1_sm… command
## 13 coef_regression1_large suppressWarnings(summary(regression1_la… command
## 14 coef_regression2_small suppressWarnings(summary(regression2_sm… command
## 15 coef_regression2_large suppressWarnings(summary(regression2_la… command
# Change an imported dependency:
reg2
## function(d) {
## d$x3 <- d$x ^ 3
## lm(y ~ x3, data = d)
## }
reg2 <- function(d) {
d$x3 <- d$x ^ 3
lm(y ~ x3, data = d)
}
make(my_plan, trigger = "any") # Nothing changes!
## Unloading targets from environment:
## coef_regression2_small
## large
## small
## Used non-default triggers. Some targets may not be up to date.
The outdated()
function responds to triggers. For example, even if outdated(my_plan)
shows all targets up to date, outdated(my_plan, trigger = "always")
will claim that all the targets are outdated.
Similar to triggers, you can also to skip the processing of imported objects and files. However, you should only use this for testing purposes. If some of your imports are not already cached and up to date, any built targets will be out of sync. In other words, outdated()
is more likely to be wrong, and your project may no longer be reproducible.
clean(verbose = FALSE)
my_plan$trigger <- NULL
make(my_plan, skip_imports = TRUE)
## target large
## target small
## target regression1_large
## target regression1_small
## target regression2_large
## target regression2_small
## target coef_regression1_large
## target coef_regression1_small
## target coef_regression2_large
## target coef_regression2_small
## target summ_regression1_large
## target summ_regression1_small
## target summ_regression2_large
## target summ_regression2_small
## target file "report.md"
## Skipped the imports. If some imports are not already cached, targets could be out of date.
See the timeout
, cpu
, elapsed
, and retries
argument to make()
.
clean(verbose = FALSE)
f <- function(...){
Sys.sleep(1)
}
debug_plan <- drake_plan(x = 1, y = f(x))
debug_plan
## # A tibble: 2 x 2
## target command
## <chr> <chr>
## 1 x 1
## 2 y f(x)
withr::with_message_sink(
stdout(),
make(debug_plan, timeout = 1e-3, retries = 2)
)
## Unloading targets from environment:
## x
## target x
## target y
## retry y: 1 of 2
## retry y: 2 of 2
## [2018-04-09 23:50:14] TimeoutException: reached CPU time limit [cpu=0.001s,
## elapsed=0.001s]
## Warning: No message sink to remove.
To tailor these settings to each individual target, create new timeout
, cpu
, elapsed
, or retries
columns in your workflow plan. These columns override the analogous arguments to make()
.
clean(verbose = FALSE)
debug_plan$timeout <- c(1e-3, 2e-3)
debug_plan$retries <- 1:2
debug_plan
## # A tibble: 2 x 4
## target command timeout retries
## <chr> <chr> <dbl> <int>
## 1 x 1 0.00100 1
## 2 y f(x) 0.00200 2
withr::with_message_sink(
new = stdout(),
make(debug_plan, timeout = Inf, retries = 0)
)
## Unloading targets from environment:
## x
## target x
## target y
## fail y
## Error: Target `y`` failed. Call `diagnose(y)` for details. Error message:
## reached elapsed time limit
## Warning: No message sink to remove.
Drake
records diagnostic metadata on all your targets, including the latest errors, warnings, messages, and other bits of context.
diagnose(verbose = FALSE) # Targets with available metadata.
## [1] "Sys.sleep" "f" "x" "y"
f <- function(x){
if (x < 0){
stop("`x` cannot be negative.")
}
x
}
bad_plan <- drake_plan(
a = 12,
b = -a,
my_target = f(b)
)
bad_plan
## # A tibble: 3 x 2
## target command
## <chr> <chr>
## 1 a 12
## 2 b -a
## 3 my_target f(b)
withr::with_message_sink(
new = stdout(),
make(bad_plan)
)
## target a
## target b
## target my_target
## fail my_target
## Error: Target `my_target`` failed. Call `diagnose(my_target)` for details. Error message:
## `x` cannot be negative.
## Warning: No message sink to remove.
failed(verbose = FALSE) # from the last make() only
## [1] "my_target" "y"
# See also warnings and messages.
error <- diagnose(my_target, verbose = FALSE)$error
error$message
## [1] "`x` cannot be negative."
error$call
## f(b)
error$calls # View the traceback.
## [[1]]
## local({
## f(b)
## })
##
## [[2]]
## eval.parent(substitute(eval(quote(expr), envir)))
##
## [[3]]
## eval(expr, p)
##
## [[4]]
## eval(expr, p)
##
## [[5]]
## eval(quote({
## f(b)
## }), new.env())
##
## [[6]]
## eval(quote({
## f(b)
## }), new.env())
##
## [[7]]
## f(b)
##
## [[8]]
## stop("`x` cannot be negative.")
To figure out what went wrong, you could try to build the failed target interactively. To do that, simply call drake_build()
. This function first calls loadd(deps = TRUE)
to load any missing dependencies (see the replace
argument here) and then builds your target.
# Pretend we just opened a new R session.
library(drake)
# Unloads target `b`.
config <- drake_config(plan = bad_plan)
## Unloading targets from environment:
## b
# my_target depends on b.
"b" %in% ls()
## [1] FALSE
# Try to build my_target until the error is fixed.
# Skip all that pesky work checking dependencies.
drake_build(my_target, config = config)
## target my_target
## fail my_target
## Error: Target `my_target`` failed. Call `diagnose(my_target)` for details. Error message:
## `x` cannot be negative.
# The target failed, but the dependency was loaded.
"b" %in% ls()
## [1] TRUE
# What was `b` again?
b
## [1] -12
# How was `b` used?
diagnose(my_target)$message
## NULL
diagnose(my_target)$call
## NULL
f
## function(x){
## if (x < 0){
## stop("`x` cannot be negative.")
## }
## x
## }
# Aha! The error was in f(). Let's fix it and try again.
f <- function(x){
x <- abs(x)
if (x < 0){
stop("`x` cannot be negative.")
}
x
}
# Now it works!
# Since you called make() previously, `config` is read from the cache
# if you do not supply it.
drake_build(my_target)
## target my_target
readd(my_target)
## [1] 12
Running commands in your R console is not always exactly like running them with make()
. That's because make()
uses tidy evaluation as implemented in the rlang
package.
# This workflow plan uses rlang's quasiquotation operator `!!`.
my_plan <- drake_plan(list = c(
little_b = "\"b\"",
letter = "!!little_b"
))
my_plan
## # A tibble: 2 x 2
## target command
## <chr> <chr>
## 1 little_b "\"b\""
## 2 letter !!little_b
make(my_plan)
## Unloading targets from environment:
## little_b
## letter
## target little_b
## target letter
readd(letter)
## [1] "b"
After your project is at least somewhat built, you can inspect and read your results from the cache.
make(my_plan, verbose = FALSE)
# drake_session(verbose = FALSE) # Prints the sessionInfo() of the last make(). # nolint
cached(verbose = FALSE)
## [1] "Sys.sleep" "a" "b" "f" "letter" "little_b"
## [7] "my_target" "stop" "x"
built(verbose = FALSE)
## [1] "a" "b" "letter" "little_b" "my_target" "x"
imported(verbose = FALSE)
## [1] "Sys.sleep" "f" "stop"
loadd(little_b, verbose = FALSE)
little_b
## [1] "b"
readd(letter, verbose = FALSE)
## [1] "b"
progress(verbose = FALSE)
## Error in progress(verbose = FALSE): unused argument (verbose = FALSE)
in_progress(verbose = FALSE) # Unfinished targets
## character(0)
There are functions to help you locate the project's cache.
# find_project() # nolint
# find_cache() # nolint
For more information on the cache, see the storage vignette.
The load_basic_example()
function loads the basic example from drake_example("basic")
right into your workspace. The workflow plan data frame, workspace, and import files are set up for you. Only make(my_plan)
is left to you.
Drake
has many more built-in examples. To see your choices, use
drake_examples()
## [1] "Docker-psock" "Makefile-cluster" "basic"
## [4] "gsp" "packages" "sge"
## [7] "slurm" "torque"
To write the files for an example, use drake_example()
.
drake_example("basic")
drake_example("slurm")