The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The basic workflow in
vignette("delarr-getting-started", package = "delarr") is
enough when you only need a lazy pipeline and a final
collect(). This vignette is for the next step:
understanding chunk plans, running several summaries in one pass,
streaming to backends, and checking whether an optional parallel path
behaves the way you expect.
All examples use one small dense matrix and validate the key claims in code.
explain() shows the effective output shape, the chunk
axis, the chosen chunk size, and the recorded operations after
optimization.
delarr choose a chunk size?If you do not want to hard-code chunk_size, you can pass
a memory budget with target_bytes.
d_reduce_many() runs several built-in reducers together
and returns a matrix when the outputs have a common length.
row_summary <- d_reduce_many(
delarr(mat),
fns = list(sum = sum, mean = mean, max = max),
dim = "rows",
chunk_size = 3L
)
row_summary[1:4, , drop = FALSE]
#> sum mean max
#> sample_1 -0.8688508 -0.1086063 2.2616090
#> sample_2 3.5083317 0.4385415 2.3396931
#> sample_3 -2.9752109 -0.3719014 1.1194619
#> sample_4 -3.9809604 -0.4976200 0.5064512block_apply() is useful when you want chunk-local
summaries or diagnostics without materializing the whole array.
d_matmul() returns another delarr, so you
can materialize only the block you need from a larger product.
rhs <- matrix(rnorm(30), nrow = 6, ncol = 5)
product_block <- d_matmul(delarr(mat[, 1:6, drop = FALSE]), delarr(rhs))[1:4, 1:3] |>
collect(chunk_size = 2L)
product_block
#> [,1] [,2] [,3]
#> sample_1 0.6587435 0.7804025 -0.4433443
#> sample_2 -0.1209847 -1.3106836 -3.0230087
#> sample_3 -0.9759765 -0.2282075 -3.7345642
#> sample_4 -3.0678323 2.4169777 1.4554712The writer interface is useful when the result is still large enough
that you do not want to hold it in memory. The HDF5 backend is optional;
the chunks below run only when the hdf5r package is
installed.
profile_collect() repeats collect() and
records elapsed time plus the size of the realized output.
Return to
vignette("delarr-getting-started", package = "delarr") for
the core lazy workflow, then use explain(),
block_apply(), d_reduce_many(), and
collect_shard() as you tune real pipelines for storage
layout, chunking, and execution strategy.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.