Time logging

William Michael Landau

2017-09-29

Build times

Thanks to Jasper Clarkberg, drake records how long it takes to build each target.

library(drake)
load_basic_example()
make(my_plan, jobs = 2, verbose = FALSE) # See also max_useful_jobs(my_plan).
build_times(digits = 8) # From the cache.
##                      item   type elapsed   user system
## 1            'report.Rmd' import  0.004s     0s     0s
## 2             'report.md' target   0.02s 0.016s 0.004s
## 3                       c import  0.003s     0s     0s
## 4                    coef import  0.003s 0.004s     0s
## 5  coef_regression1_large target  0.002s     0s     0s
## 6  coef_regression1_small target  0.002s     0s     0s
## 7  coef_regression2_large target  0.002s     0s     0s
## 8  coef_regression2_small target  0.002s     0s     0s
## 9              data.frame import  0.007s 0.008s     0s
## 10                   knit import  0.005s 0.008s     0s
## 11                  large target  0.003s     0s     0s
## 12                     lm import  0.003s 0.004s     0s
## 13                my_knit import  0.004s 0.004s     0s
## 14                   reg1 import  0.004s     0s 0.004s
## 15                   reg2 import  0.003s 0.004s     0s
## 16      regression1_large target  0.005s 0.004s     0s
## 17      regression1_small target  0.005s     0s 0.004s
## 18      regression2_large target  0.004s     0s 0.004s
## 19      regression2_small target  0.004s 0.004s     0s
## 20    report_dependencies target  0.002s     0s     0s
## 21                  rpois import  0.003s     0s     0s
## 22               simulate import  0.002s     0s     0s
## 23                  small target  0.003s 0.004s     0s
## 24           stats::rnorm import  0.003s     0s 0.004s
## 25 summ_regression1_large target  0.004s 0.004s     0s
## 26 summ_regression1_small target  0.004s     0s 0.004s
## 27 summ_regression2_large target  0.002s 0.004s     0s
## 28 summ_regression2_small target  0.003s 0.004s     0s
## 29                summary import  0.005s 0.004s     0s
## 30       suppressWarnings import  0.006s 0.004s     0s
build_times(digits = 8, targets_only = TRUE)
##                      item   type elapsed   user system
## 2             'report.md' target   0.02s 0.016s 0.004s
## 5  coef_regression1_large target  0.002s     0s     0s
## 6  coef_regression1_small target  0.002s     0s     0s
## 7  coef_regression2_large target  0.002s     0s     0s
## 8  coef_regression2_small target  0.002s     0s     0s
## 11                  large target  0.003s     0s     0s
## 16      regression1_large target  0.005s 0.004s     0s
## 17      regression1_small target  0.005s     0s 0.004s
## 18      regression2_large target  0.004s     0s 0.004s
## 19      regression2_small target  0.004s 0.004s     0s
## 20    report_dependencies target  0.002s     0s     0s
## 23                  small target  0.003s 0.004s     0s
## 25 summ_regression1_large target  0.004s 0.004s     0s
## 26 summ_regression1_small target  0.004s     0s 0.004s
## 27 summ_regression2_large target  0.002s 0.004s     0s
## 28 summ_regression2_small target  0.003s 0.004s     0s

For drake version 4.1.0 and earlier, build_times() just measures the elapsed runtime of each command in my_plan$command. For later versions, the build times also account for all the internal operations in drake:::build(), such as storage and hashing.

Predicting runtime

Drake uses these times to predict the runtime of the next make(). At the moment, everything is up to date, so a hypothetical next make() should be fast. Here, we only factor in the times of the targets (excluding the imports using targets_only = TRUE).

predict_runtime(
  my_plan,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0s"

But you can also predict the elapsed time of a full runthrough scratch (after clean()).

predict_runtime(
  my_plan,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0.067s"

Let’s change a dependency to make some targets out of date. Now, even though from_scatch is FALSE, the next make() should take some time.

reg2 <- function(d){
  d$x3 <- d$x ^ 3
  lm(y ~ x3, data = d)
}
predict_runtime(
  my_plan,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0.039s"

We can also factor in parallelism using the future_jobs argument.

predict_runtime(
  my_plan,
  future_jobs = 1,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0.067s"
predict_runtime(
  my_plan,
  future_jobs = 2,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0.048s"
predict_runtime(
  my_plan,
  future_jobs = 4,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
## [1] "0.038s"

Rate-limiting targets

To predict the next runtime with multiple parallel jobs, drake makes some assumptions.

  1. The outdated targets are spread out evenly over the available jobs. (All targets are used if from_scratch is TRUE)
  2. One job gets all the slowest targets (pessimistic scenario).

Then, drake simply takes the targets from the slowest job in each parallelizable stage and sums the corresponding elapsed build times. A parallelizable stage is a column in the workflow graph.

# Hover, click, drag, zoom, and pan.
plot_graph(my_plan, width = "100%", height = "500px")

You can explore the rate-limiting targets.

rate_limiting_times(
  my_plan,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
##                      item   type elapsed   user system stage
## 1                   small target  0.003s 0.004s     0s     1
## 2                   large target  0.003s     0s     0s     1
## 3       regression1_small target  0.005s     0s 0.004s     2
## 4       regression1_large target  0.005s 0.004s     0s     2
## 5       regression2_small target  0.004s 0.004s     0s     2
## 6       regression2_large target  0.004s     0s 0.004s     2
## 7  summ_regression1_small target  0.004s     0s 0.004s     3
## 8  summ_regression1_large target  0.004s 0.004s     0s     3
## 9  summ_regression2_small target  0.003s 0.004s     0s     3
## 10 summ_regression2_large target  0.002s 0.004s     0s     3
## 11 coef_regression1_small target  0.002s     0s     0s     3
## 12 coef_regression1_large target  0.002s     0s     0s     3
## 13 coef_regression2_small target  0.002s     0s     0s     3
## 14 coef_regression2_large target  0.002s     0s     0s     3
## 15    report_dependencies target  0.002s     0s     0s     4
## 16            'report.md' target   0.02s 0.016s 0.004s     5
rate_limiting_times(
  my_plan,
  future_jobs = 2,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
##                     item   type elapsed   user system stage
## 1                  small target  0.003s 0.004s     0s     1
## 2      regression1_small target  0.005s     0s 0.004s     2
## 3      regression1_large target  0.005s 0.004s     0s     2
## 4 summ_regression1_small target  0.004s     0s 0.004s     3
## 5 summ_regression1_large target  0.004s 0.004s     0s     3
## 6 summ_regression2_small target  0.003s 0.004s     0s     3
## 7 summ_regression2_large target  0.002s 0.004s     0s     3
## 8    report_dependencies target  0.002s     0s     0s     4
## 9            'report.md' target   0.02s 0.016s 0.004s     5
rate_limiting_times(
  my_plan,
  future_jobs = 4,
  from_scratch = TRUE,
  digits = 8,
  verbose = FALSE,
  targets_only = TRUE
)
##                     item   type elapsed   user system stage
## 1                  small target  0.003s 0.004s     0s     1
## 2      regression1_small target  0.005s     0s 0.004s     2
## 3 summ_regression1_small target  0.004s     0s 0.004s     3
## 4 summ_regression1_large target  0.004s 0.004s     0s     3
## 5    report_dependencies target  0.002s     0s     0s     4
## 6            'report.md' target   0.02s 0.016s 0.004s     5

A word of caution

Drake only accounts for the targets with logged build times. If some targets have not been timed, drake throws a warning and lists the names of the untimed targets.