Thanks to Jasper Clarkberg, drake
records how long it takes to build each target. For large projects that take hours or days to run, this feature becomes important for planning and execution.
library(drake)
load_basic_example() # Get the code with drake_example("basic").
## Unloading targets from environment:
## large
## small
## coef_regression2_small
make(my_plan, jobs = 2, verbose = FALSE) # See also max_useful_jobs(my_plan).
build_times(digits = 8) # From the cache.
## # A tibble: 28 x 5
## item type elapsed user system
## * <chr> <chr> <S4: Duration> <S4: Duration> <S4: Durat>
## 1 "\"report.Rmd\"" import 0.002s 0s 0.002s
## 2 "\"report.md\"" target 0.042s 0.038s 0.004s
## 3 coef_regression1_large target 0.007s 0.002s 0.005s
## 4 coef_regression1_small target 0.006s 0.007s 0s
## 5 coef_regression2_large target 0.005s 0.004s 0s
## 6 coef_regression2_small target 0.004s 0.003s 0.001s
## 7 data.frame import 0.041s 0.037s 0.004s
## 8 knit import 0.034s 0.027s 0.007s
## 9 large target 0.008s 0.008s 0s
## 10 lm import 0.021s 0.009s 0.011s
## # ... with 18 more rows
# `dplyr`-style `tidyselect` commands
build_times(starts_with("coef"), digits = 8)
## # A tibble: 4 x 5
## item type elapsed user system
## * <chr> <chr> <S4: Duration> <S4: Duration> <S4: Durati>
## 1 coef_regression1_large target 0.007s 0.002s 0.005s
## 2 coef_regression1_small target 0.006s 0.007s 0s
## 3 coef_regression2_large target 0.005s 0.004s 0s
## 4 coef_regression2_small target 0.004s 0.003s 0.001s
build_times(digits = 8, targets_only = TRUE)
## # A tibble: 15 x 5
## item type elapsed user system
## * <chr> <chr> <S4: Duration> <S4: Duration> <S4: Durat>
## 1 "\"report.md\"" target 0.042s 0.038s 0.004s
## 2 coef_regression1_large target 0.007s 0.002s 0.005s
## 3 coef_regression1_small target 0.006s 0.007s 0s
## 4 coef_regression2_large target 0.005s 0.004s 0s
## 5 coef_regression2_small target 0.004s 0.003s 0.001s
## 6 large target 0.008s 0.008s 0s
## 7 regression1_large target 0.009s 0.01s 0s
## 8 regression1_small target 0.008s 0.006s 0.003s
## 9 regression2_large target 0.006s 0.001s 0.004s
## 10 regression2_small target 0.005s 0.001s 0.004s
## 11 small target 0.007s 0.006s 0.001s
## 12 summ_regression1_large target 0.005s 0s 0.004s
## 13 summ_regression1_small target 0.005s 0.003s 0.002s
## 14 summ_regression2_large target 0.005s 0.004s 0s
## 15 summ_regression2_small target 0.005s 0s 0.004s
For drake
version 4.1.0 and earlier, build_times()
just measures the elapsed runtime of each command in my_plan$command
. For later versions, the build times also account for all the internal operations in drake:::build()
, such as storage and hashing.
Drake uses these times to predict the runtime of the next make()
. At this moment, everything is up to date in the current example, so the next make()
should be fast. Here, we only factor in the times of the targets (excluding the imports using targets_only = TRUE
).
config <- drake_config(my_plan, verbose = FALSE)
predict_runtime(
config,
digits = 8,
targets_only = TRUE
)
## [1] "0s"
But you can also predict the elapsed time of a full runthrough scratch (either after clean()
or with make(..., trigger = "always")
).
predict_runtime(
config,
from_scratch = TRUE,
digits = 8,
targets_only = TRUE
)
## [1] "0.127s"
Suppose we change a dependency to make some targets out of date. Now, even though from_scatch
is FALSE
, the next make()
should take some time.
reg2 <- function(d){
d$x3 <- d$x ^ 3
lm(y ~ x3, data = d)
}
predict_runtime(
config,
digits = 8,
targets_only = TRUE
)
## [1] "0.072s"
We can also factor in parallelism using the future_jobs
argument, which is just jobs
for a hypothetical next make()
.
predict_runtime(
config,
future_jobs = 1,
from_scratch = TRUE,
digits = 8,
targets_only = TRUE
)
## [1] "0.127s"
predict_runtime(
config,
future_jobs = 2,
from_scratch = TRUE,
digits = 8,
targets_only = TRUE
)
## [1] "0.09s"
predict_runtime(
config,
future_jobs = 4,
from_scratch = TRUE,
digits = 8,
targets_only = TRUE
)
## [1] "0.072s"
To predict the next runtime with multiple parallel jobs, drake
makes some assumptions.
Then, drake
simply takes the targets from the slowest job in each parallelizable stage and sums the corresponding elapsed build times. A parallelizable stage is a usually a column in the dependency graph, but if there are up-to-date targets in a column, drake
skips ahead to try to fit as many targets as possible in a stage.
# Hover, click, drag, zoom, and pan.
vis_drake_graph(my_plan, width = "100%", height = "500px")
You can explore the rate-limiting targets
rate_limiting_times(
config,
from_scratch = TRUE,
digits = 8,
targets_only = TRUE
)
## # A tibble: 15 x 6
## item type elapsed user system stage
## * <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 large target 0.00800 0.00800 0. 4.
## 2 small target 0.00700 0.00600 0.00100 4.
## 3 regression1_large target 0.00900 0.0100 0. 5.
## 4 regression1_small target 0.00800 0.00600 0.00300 5.
## 5 regression2_large target 0.00600 0.00100 0.00400 5.
## 6 regression2_small target 0.00500 0.00100 0.00400 5.
## 7 coef_regression1_large target 0.00700 0.00200 0.00500 6.
## 8 coef_regression1_small target 0.00600 0.00700 0. 6.
## 9 coef_regression2_large target 0.00500 0.00400 0. 6.
## 10 summ_regression1_large target 0.00500 0. 0.00400 6.
## 11 summ_regression2_large target 0.00500 0.00400 0. 6.
## 12 summ_regression1_small target 0.00500 0.00300 0.00200 6.
## 13 summ_regression2_small target 0.00500 0. 0.00400 6.
## 14 coef_regression2_small target 0.00400 0.00300 0.00100 6.
## 15 "\"report.md\"" target 0.0420 0.0380 0.00400 7.
rate_limiting_times(
config,
future_jobs = 2,
from_scratch = TRUE,
digits = 8,
targets_only = TRUE
)
## # A tibble: 8 x 6
## item type elapsed user system stage
## * <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 large target 0.00800 0.00800 0. 4.
## 2 regression1_large target 0.00900 0.0100 0. 5.
## 3 regression1_small target 0.00800 0.00600 0.00300 5.
## 4 coef_regression1_large target 0.00700 0.00200 0.00500 6.
## 5 coef_regression1_small target 0.00600 0.00700 0. 6.
## 6 coef_regression2_large target 0.00500 0.00400 0. 6.
## 7 summ_regression1_large target 0.00500 0. 0.00400 6.
## 8 "\"report.md\"" target 0.0420 0.0380 0.00400 7.
rate_limiting_times(
config,
future_jobs = 4,
from_scratch = TRUE,
digits = 8,
targets_only = TRUE
)
## # A tibble: 5 x 6
## item type elapsed user system stage
## * <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 large target 0.00800 0.00800 0. 4.
## 2 regression1_large target 0.00900 0.0100 0. 5.
## 3 coef_regression1_large target 0.00700 0.00200 0.00500 6.
## 4 coef_regression1_small target 0.00600 0.00700 0. 6.
## 5 "\"report.md\"" target 0.0420 0.0380 0.00400 7.
and the parallelizable stages in general.
parallel_stages(config, from_scratch = TRUE)
## item imported file stage
## 1 "report.Rmd" TRUE TRUE 1
## 2 data.frame TRUE FALSE 1
## 3 knit TRUE FALSE 1
## 4 lm TRUE FALSE 1
## 5 mtcars TRUE FALSE 1
## 6 nrow TRUE FALSE 1
## 7 sample.int TRUE FALSE 1
## 8 summary TRUE FALSE 1
## 9 suppressWarnings TRUE FALSE 1
## 10 random_rows TRUE FALSE 2
## 11 reg1 TRUE FALSE 2
## 12 reg2 TRUE FALSE 2
## 13 simulate TRUE FALSE 3
## 14 large FALSE FALSE 4
## 15 small FALSE FALSE 4
## 16 regression1_large FALSE FALSE 5
## 17 regression1_small FALSE FALSE 5
## 18 regression2_large FALSE FALSE 5
## 19 regression2_small FALSE FALSE 5
## 20 coef_regression1_large FALSE FALSE 6
## 21 coef_regression1_small FALSE FALSE 6
## 22 coef_regression2_large FALSE FALSE 6
## 23 coef_regression2_small FALSE FALSE 6
## 24 summ_regression1_large FALSE FALSE 6
## 25 summ_regression1_small FALSE FALSE 6
## 26 summ_regression2_large FALSE FALSE 6
## 27 summ_regression2_small FALSE FALSE 6
## 28 "report.md" FALSE TRUE 7
Drake
only accounts for the targets with logged build times. If some targets have not been timed, drake
throws a warning and lists the names of the untimed targets.