Drake
has powerful visuals to help you plan your project. You can generate an interactive workflow network with either drake_graph()
or vis_drake_graph()
. Then click, drag, hover, zoom, and pan. Use either the mouse or the green buttons near the bottom.
Initially, your entire project is out of date.
library(drake)
load_basic_example() # Get the code with drake_example("basic").
config <- drake_config(my_plan)
vis_drake_graph(config) # Same as drake_graph()
After make()
, the whole project is all caught up.
config <- make(my_plan, jobs = 4, verbose = FALSE)
vis_drake_graph(config)
But when you change a dependency, some targets are out of date until the next make(my_plan)
.
reg2 <- function(d){
d$x3 <- d$x ^ 3
lm(y ~ x3, data = d)
}
vis_drake_graph(config)
Graphs can grow enormous for serious projects, so there are multiple ways to focus on a manageable subgraph. The most brute-force way is to just pick a manual subset
of nodes. However, with the subset
argument, vis_drake_graph()
may drop intermediate nodes and edges.
vis_drake_graph(
config,
subset = c("regression2_small", file_store("report.md"))
)
The rest of the subgraph functionality preserves connectedness. Use targets_only
to ignore the imports.
vis_drake_graph(config, targets_only = TRUE)
Similarly, you can just show downstream nodes.
vis_drake_graph(config, from = c("regression2_small", "regression2_large"))
Or upstream ones.
vis_drake_graph(config, from = "small", mode = "in")
In fact, let us just take a small neighborhood around a target in both directions.
vis_drake_graph(config, from = "small", mode = "all", order = 1)
The report.md
node is drawn in somewhat, but it is still the farthest to the right in order to communicate drake
's parallel computing strategy.
Drake
shows its parallel computing strategy plainly in the graph.
The nodes in each column above are conditionally independent given the dependencies to the left. So in general, the targets and imports are processed column by column from left to right, and everything within a column is executed in parallel. When some targets are already up to date, drake
searches ahead in the graph to maximize the number of outdated targets in each parallelizable stage.
To show the parallelizable stages of the next make()
programmatically, use the parallel_stages()
function. All the targets/imports in a stage are processed in parallel before moving on to the next stage.
To remove superfluous information from the legend, set the full_legend
argument to FALSE
.
vis_drake_graph(config, full_legend = FALSE)
To remove the legend altogether, set the ncol_legend
argument to 0
.
vis_drake_graph(config, ncol_legend = 0)
We have only scratched the surface of vis_drake_graph()
. The help files (?vis_drake_graph
) document much more functionality. In particular, the dataframes_graph()
and render_drake_graph()
functions let you customize your own visNetwork graph.