Thanks to Jasper Clarkberg, drake records how long it takes to build each target. For large projects that take hours or days to run, this feature becomes important for planning and execution.

library(drake)
load_basic_example() # Get the code with drake_example("basic").
## Unloading targets from environment:
##   large
##   small
##   coef_regression2_small
make(my_plan, jobs = 2, verbose = FALSE) # See also max_useful_jobs(my_plan).

build_times(digits = 8) # From the cache.
## # A tibble: 28 x 5
##    item                   type   elapsed        user           system     
##  * <chr>                  <chr>  <S4: Duration> <S4: Duration> <S4: Durat>
##  1 "\"report.Rmd\""       import 0.002s         0s             0.002s     
##  2 "\"report.md\""        target 0.042s         0.038s         0.004s     
##  3 coef_regression1_large target 0.007s         0.002s         0.005s     
##  4 coef_regression1_small target 0.006s         0.007s         0s         
##  5 coef_regression2_large target 0.005s         0.004s         0s         
##  6 coef_regression2_small target 0.004s         0.003s         0.001s     
##  7 data.frame             import 0.041s         0.037s         0.004s     
##  8 knit                   import 0.034s         0.027s         0.007s     
##  9 large                  target 0.008s         0.008s         0s         
## 10 lm                     import 0.021s         0.009s         0.011s     
## # ... with 18 more rows

# `dplyr`-style `tidyselect` commands
build_times(starts_with("coef"), digits = 8)
## # A tibble: 4 x 5
##   item                   type   elapsed        user           system      
## * <chr>                  <chr>  <S4: Duration> <S4: Duration> <S4: Durati>
## 1 coef_regression1_large target 0.007s         0.002s         0.005s      
## 2 coef_regression1_small target 0.006s         0.007s         0s          
## 3 coef_regression2_large target 0.005s         0.004s         0s          
## 4 coef_regression2_small target 0.004s         0.003s         0.001s

build_times(digits = 8, targets_only = TRUE)
## # A tibble: 15 x 5
##    item                   type   elapsed        user           system     
##  * <chr>                  <chr>  <S4: Duration> <S4: Duration> <S4: Durat>
##  1 "\"report.md\""        target 0.042s         0.038s         0.004s     
##  2 coef_regression1_large target 0.007s         0.002s         0.005s     
##  3 coef_regression1_small target 0.006s         0.007s         0s         
##  4 coef_regression2_large target 0.005s         0.004s         0s         
##  5 coef_regression2_small target 0.004s         0.003s         0.001s     
##  6 large                  target 0.008s         0.008s         0s         
##  7 regression1_large      target 0.009s         0.01s          0s         
##  8 regression1_small      target 0.008s         0.006s         0.003s     
##  9 regression2_large      target 0.006s         0.001s         0.004s     
## 10 regression2_small      target 0.005s         0.001s         0.004s     
## 11 small                  target 0.007s         0.006s         0.001s     
## 12 summ_regression1_large target 0.005s         0s             0.004s     
## 13 summ_regression1_small target 0.005s         0.003s         0.002s     
## 14 summ_regression2_large target 0.005s         0.004s         0s         
## 15 summ_regression2_small target 0.005s         0s             0.004s

For drake version 4.1.0 and earlier, build_times() just measures the elapsed runtime of each command in my_plan$command. For later versions, the build times also account for all the internal operations in drake:::build(), such as storage and hashing.

Predicting runtime

Drake uses these times to predict the runtime of the next make(). At this moment, everything is up to date in the current example, so the next make() should be fast. Here, we only factor in the times of the targets (excluding the imports using targets_only = TRUE).

config <- drake_config(my_plan, verbose = FALSE)
predict_runtime(
  config,
  digits = 8,
  targets_only = TRUE
)
## [1] "0s"

But you can also predict the elapsed time of a full runthrough scratch (either after clean() or with make(..., trigger = "always")).

predict_runtime(
  config,
  from_scratch = TRUE,
  digits = 8,
  targets_only = TRUE
)
## [1] "0.127s"

Suppose we change a dependency to make some targets out of date. Now, even though from_scatch is FALSE, the next make() should take some time.

reg2 <- function(d){
  d$x3 <- d$x ^ 3
  lm(y ~ x3, data = d)
}

predict_runtime(
  config,
  digits = 8,
  targets_only = TRUE
)
## [1] "0.072s"

We can also factor in parallelism using the future_jobs argument, which is just jobs for a hypothetical next make().

predict_runtime(
  config,
  future_jobs = 1,
  from_scratch = TRUE,
  digits = 8,
  targets_only = TRUE
)
## [1] "0.127s"

predict_runtime(
  config,
  future_jobs = 2,
  from_scratch = TRUE,
  digits = 8,
  targets_only = TRUE
)
## [1] "0.09s"

predict_runtime(
  config,
  future_jobs = 4,
  from_scratch = TRUE,
  digits = 8,
  targets_only = TRUE
)
## [1] "0.072s"

Rate-limiting targets

To predict the next runtime with multiple parallel jobs, drake makes some assumptions.

  1. The outdated targets are spread out evenly over the available jobs.
  2. One job gets all the slowest targets (pessimistic scenario).

Then, drake simply takes the targets from the slowest job in each parallelizable stage and sums the corresponding elapsed build times. A parallelizable stage is a usually a column in the dependency graph, but if there are up-to-date targets in a column, drake skips ahead to try to fit as many targets as possible in a stage.

# Hover, click, drag, zoom, and pan.
vis_drake_graph(my_plan, width = "100%", height = "500px")

You can explore the rate-limiting targets

rate_limiting_times(
  config,
  from_scratch = TRUE,
  digits = 8,
  targets_only = TRUE
)
## # A tibble: 15 x 6
##    item                   type   elapsed    user  system stage
##  * <chr>                  <chr>    <dbl>   <dbl>   <dbl> <dbl>
##  1 large                  target 0.00800 0.00800 0.         4.
##  2 small                  target 0.00700 0.00600 0.00100    4.
##  3 regression1_large      target 0.00900 0.0100  0.         5.
##  4 regression1_small      target 0.00800 0.00600 0.00300    5.
##  5 regression2_large      target 0.00600 0.00100 0.00400    5.
##  6 regression2_small      target 0.00500 0.00100 0.00400    5.
##  7 coef_regression1_large target 0.00700 0.00200 0.00500    6.
##  8 coef_regression1_small target 0.00600 0.00700 0.         6.
##  9 coef_regression2_large target 0.00500 0.00400 0.         6.
## 10 summ_regression1_large target 0.00500 0.      0.00400    6.
## 11 summ_regression2_large target 0.00500 0.00400 0.         6.
## 12 summ_regression1_small target 0.00500 0.00300 0.00200    6.
## 13 summ_regression2_small target 0.00500 0.      0.00400    6.
## 14 coef_regression2_small target 0.00400 0.00300 0.00100    6.
## 15 "\"report.md\""        target 0.0420  0.0380  0.00400    7.

rate_limiting_times(
  config,
  future_jobs = 2,
  from_scratch = TRUE,
  digits = 8,
  targets_only = TRUE
)
## # A tibble: 8 x 6
##   item                   type   elapsed    user  system stage
## * <chr>                  <chr>    <dbl>   <dbl>   <dbl> <dbl>
## 1 large                  target 0.00800 0.00800 0.         4.
## 2 regression1_large      target 0.00900 0.0100  0.         5.
## 3 regression1_small      target 0.00800 0.00600 0.00300    5.
## 4 coef_regression1_large target 0.00700 0.00200 0.00500    6.
## 5 coef_regression1_small target 0.00600 0.00700 0.         6.
## 6 coef_regression2_large target 0.00500 0.00400 0.         6.
## 7 summ_regression1_large target 0.00500 0.      0.00400    6.
## 8 "\"report.md\""        target 0.0420  0.0380  0.00400    7.

rate_limiting_times(
  config,
  future_jobs = 4,
  from_scratch = TRUE,
  digits = 8,
  targets_only = TRUE
)
## # A tibble: 5 x 6
##   item                   type   elapsed    user  system stage
## * <chr>                  <chr>    <dbl>   <dbl>   <dbl> <dbl>
## 1 large                  target 0.00800 0.00800 0.         4.
## 2 regression1_large      target 0.00900 0.0100  0.         5.
## 3 coef_regression1_large target 0.00700 0.00200 0.00500    6.
## 4 coef_regression1_small target 0.00600 0.00700 0.         6.
## 5 "\"report.md\""        target 0.0420  0.0380  0.00400    7.

and the parallelizable stages in general.

parallel_stages(config, from_scratch = TRUE)
##                      item imported  file stage
## 1            "report.Rmd"     TRUE  TRUE     1
## 2              data.frame     TRUE FALSE     1
## 3                    knit     TRUE FALSE     1
## 4                      lm     TRUE FALSE     1
## 5                  mtcars     TRUE FALSE     1
## 6                    nrow     TRUE FALSE     1
## 7              sample.int     TRUE FALSE     1
## 8                 summary     TRUE FALSE     1
## 9        suppressWarnings     TRUE FALSE     1
## 10            random_rows     TRUE FALSE     2
## 11                   reg1     TRUE FALSE     2
## 12                   reg2     TRUE FALSE     2
## 13               simulate     TRUE FALSE     3
## 14                  large    FALSE FALSE     4
## 15                  small    FALSE FALSE     4
## 16      regression1_large    FALSE FALSE     5
## 17      regression1_small    FALSE FALSE     5
## 18      regression2_large    FALSE FALSE     5
## 19      regression2_small    FALSE FALSE     5
## 20 coef_regression1_large    FALSE FALSE     6
## 21 coef_regression1_small    FALSE FALSE     6
## 22 coef_regression2_large    FALSE FALSE     6
## 23 coef_regression2_small    FALSE FALSE     6
## 24 summ_regression1_large    FALSE FALSE     6
## 25 summ_regression1_small    FALSE FALSE     6
## 26 summ_regression2_large    FALSE FALSE     6
## 27 summ_regression2_small    FALSE FALSE     6
## 28            "report.md"    FALSE  TRUE     7

A word of caution

Drake only accounts for the targets with logged build times. If some targets have not been timed, drake throws a warning and lists the names of the untimed targets.