The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction to orderly

This vignette provides a how-to style introduction to orderly, an overview of key ingredients to writing orderly reports, and a summary of key features and ideas. It may be useful to look at vignette("orderly") for a more roundabout discussion of what orderly is trying to achieve, or vignette("migrating") if you are familiar with version 1 of orderly as this explains concepts in terms of differences from the previous version.

You might also prefer the orderly tutorial which works through similar material in slide form, or watch a short talk that describes the ideas in the package, and why it differs to other approaches to reproducibility and workflows.

library(orderly)

Installation

If you don’t already have orderly installed, you can install it from CRAN with

install.packages("orderly")

or a potentially more recent version from our R-universe:

install.packages(
  "orderly",
  repos = c("https://mrc-ide.r-universe.dev", "https://cloud.r-project.org"))

Creating an empty orderly repository

The first step is to initialise an empty orderly repository. An orderly repository is a directory with the file orderly_config.json within it, and since version 2 also a directory .outpack/. Files within the .outpack/ directory should never be directly modified by users and this directory should be excluded from version control (see orderly_gitignore_update).

Create an orderly repository by calling orderly_init():

path <- tempfile() # we'll use a temporary directory here - see note below
orderly_init(path)
## ✔ Created orderly root at '/tmp/RtmpBVLQ2N/file24a4c42d1ea664'

which creates a few files:

## .
## ├── .outpack
## │   ├── config.json
## │   ├── location
## │   └── metadata
## └── orderly_config.json

This step should be performed on a completely empty directory, otherwise an error will be thrown. Later, you will re-initialise an orderly repository when cloning to a new machine, such as when working with others; this is discussed in vignette("collaboration").

The orderly_config.json file contains very little by default:

{"minimum_orderly_version": "2.0.0"}

For this vignette, the created orderly root is in R’s per-session temporary directory, which will be deleted once R exits. If you want to use a directory that will persist across restarting R (which you would certainly want when using orderly on a real project!) you should replace this with a path within your home directory, or other location that you control.

For the rest of the vignette we will evaluate commands from within this directory, by changing the directory to the path we’ve created:

setwd(path)

Creating your first orderly report

An orderly report is a directory src/<name> containing an orderly file <name>.R. That file may have special commands in it, but for now we’ll create one that is as simple as possible; we’ll create some random data and save it to disk. This seems silly, but imagine this standing in for something like:

Our directory structure (ignoring the hidden .outpack directory) looks like:

## .
## ├── orderly_config.json
## └── src
##     └── incoming_data
##         ├── data.csv
##         └── incoming_data.R

and src/incoming_data/incoming_data.R contains:

d <- read.csv("data.csv")
d$z <- resid(lm(y ~ x, d))
saveRDS(d, "data.rds")

To run the report and create a new “packet”, use orderly_run():

id <- orderly_run("incoming_data")
## ℹ Starting packet 'incoming_data' `20251008-130624-5cfde767` at 2025-10-08 14:06:24.368635
## > d <- read.csv("data.csv")
## > d$z <- resid(lm(y ~ x, d))
## > saveRDS(d, "data.rds")
## ✔ Finished running 'incoming_data.R'
## ℹ Finished 20251008-130624-5cfde767 at 2025-10-08 14:06:24.389258 (0.02062249 secs)
id
## [1] "20251008-130624-5cfde767"

The id that is created is a new identifier for the packet that will be both unique among all packets (within reason) and chronologically sortable. A packet that has an id that sorts after another packet’s id was started before that packet.

Having run the report, our directory structure looks like:

## .
## ├── archive
## │   └── incoming_data
## │       └── 20251008-130624-5cfde767
## │           ├── data.csv
## │           ├── data.rds
## │           └── incoming_data.R
## ├── draft
## │   └── incoming_data
## ├── orderly_config.json
## └── src
##     └── incoming_data
##         ├── data.csv
##         └── incoming_data.R

A few things have changed here:

In addition, quite a few files have changed within the .outpack directory, but these are not covered here.

That’s it! Notice that the initial script is just a plain R script, and you can develop it interactively from within the src/incoming_data directory. Note however, that any paths referred to within will be relative to src/incoming_data and not the orderly repository root. This is important as all reports only see the world relative to the directory containing their <name>.R file (here, incoming_data.R).

Once created, you can then refer to this report by id and pull its files wherever you need them, both in the context of another orderly report or just to copy to your desktop to email someone. For example, to copy the file data.rds that we created to some location outside of orderly’s control you could do

dest <- tempfile()
fs::dir_create(dest)
orderly_copy_files(id, files = c("final.rds" = "data.rds"),
                   dest = dest)

which copies data.rds to some new temporary directory dest with name final.rds.

Depending on packets from another report

Creating a new dataset is mostly useful if someone else can use it. To do this we introduce the first of the special orderly commands that you can use from an orderly file

The src/ directory now looks like:

## src
## ├── analysis
## │   └── analysis.R
## └── incoming_data
##     ├── data.csv
##     └── incoming_data.R

and src/analysis/analysis.R contains:

orderly_dependency("incoming_data", "latest()",
                   c("incoming.rds" = "data.rds"))
d <- readRDS("incoming.rds")
png("analysis.png")
plot(y ~ x, d)
dev.off()

Here, we’ve used orderly_dependency() to pull in the file data.rds from the most recent version (latest()) of the data packet with the filename incoming.rds, then we’ve used that file as normal to make a plot, which we’ve saved as analysis.png.

We can run this just as before, using orderly_run():

id <- orderly_run("analysis")
## ℹ Starting packet 'analysis' `20251008-130624-8203ffd8` at 2025-10-08 14:06:24.512875
## > orderly_dependency("incoming_data", "latest()",
## +                    c("incoming.rds" = "data.rds"))
## ℹ Depending on incoming_data @ `20251008-130624-5cfde767` (via latest(name == "incoming_data"))
## > d <- readRDS("incoming.rds")
## > png("analysis.png")
## > plot(y ~ x, d)
## > dev.off()
## png 
##   2
## ✔ Finished running 'analysis.R'
## ℹ Finished 20251008-130624-8203ffd8 at 2025-10-08 14:06:24.552172 (0.0392971 secs)

See how (from the logs) orderly has found the data packet that we created before and arranged to copy the files from one place to another on demand. When it does this it also records metadata about this relationship, which we can query later.

For more information on dependencies, see vignette("dependencies").

Available in-report orderly commands

The function orderly_dependency() is designed to operate while the packet runs. These functions all act by adding metadata to the final packet, and perhaps by copying files into the directory.

In addition, there is also a function orderly_run_info() that can be used while running a report that returns information about the currently running report (its id, resolved dependencies etc).

Let’s add some additional annotations to the previous reports:

orderly_strict_mode()
orderly_resource("data.csv")
orderly_artefact(description = "Processed data", "data.rds")

d <- read.csv("data.csv")
d$z <- resid(lm(y ~ x, d))
saveRDS(d, "data.rds")

Here, we’ve added a block of special orderly commands; these could go anywhere, for example above the files that they refer to. If strict mode is enabled (see below) then orderly_resource() calls must go before the files are used as they will only be made available at that point (see below).

id <- orderly_run("incoming_data")
## ℹ Starting packet 'incoming_data' `20251008-130624-9b911b40` at 2025-10-08 14:06:24.611568
## > orderly_strict_mode()
## > orderly_resource("data.csv")
## > orderly_artefact(description = "Processed data", "data.rds")
## > d <- read.csv("data.csv")
## > d$z <- resid(lm(y ~ x, d))
## > saveRDS(d, "data.rds")
## ✔ Finished running 'incoming_data.R'
## ℹ Finished 20251008-130624-9b911b40 at 2025-10-08 14:06:24.635057 (0.02348876 secs)

This has no impact on the data that is produced, but provides an easy way to associate extra metadata into the produced packet, and allows us to start building guarantees about what parts of the graph will produce.

Parameterised reports

Much of the flexibility that comes from the orderly graph comes from using parameterised reports; these are reports that take a set of parameters and then change behaviour based on these parameters. Downstream reports can depend on a parameterised report and filter based on suitable parameters.

For example, consider a simple report where we generate samples based on some parameter:

pars <- orderly_parameters(n_samples = 10)
x <- seq_len(pars$n_samples)
d <- data.frame(x = x, y = x + rnorm(pars$n_samples))
saveRDS(d, "data.rds")

This creates a report that has a single parameter n_samples with a default value of 10. We could have used

pars <- orderly_parameters(n_samples = NULL)

to define a parameter with no default, or defined multiple parameters with

pars <- orderly_parameters(n_samples = 10, distribution = "normal")

You can do anything in your report that switches on the value of a parameter:

However, you should see parameters as relatively heavyweight things and try to have a consistent set over all packets created from a report. In this report we use it to control the size of the generated data set.

id <- orderly_run("random", list(n_samples = 15))
## ℹ Starting packet 'random' `20251008-130624-b1ac8503` at 2025-10-08 14:06:24.69884
## ℹ Parameters:
## • n_samples: 15
## > pars <- orderly_parameters(n_samples = 10)
## > x <- seq_len(pars$n_samples)
## > d <- data.frame(x = x, y = x + rnorm(pars$n_samples))
## > saveRDS(d, "data.rds")
## ✔ Finished running 'random.R'
## ℹ Finished 20251008-130624-b1ac8503 at 2025-10-08 14:06:24.72406 (0.02522016 secs)

Our resulting file has 15 rows, as the parameter we passed in affected the report:

orderly_copy_files(id, files = c("random.rds" = "data.rds"),
                   dest = dest)
readRDS(file.path(dest, "random.rds"))
##     x         y
## 1   1  1.351350
## 2   2  1.974858
## 3   3  2.377816
## 4   4  4.797487
## 5   5  4.193369
## 6   6  5.936639
## 7   7  5.651874
## 8   8  9.871785
## 9   9  9.018771
## 10 10  9.230953
## 11 11 11.070891
## 12 12 11.771472
## 13 13 12.495760
## 14 14 14.496390
## 15 15 14.900772

You can use these parameters in orderly’s search functions. For example we can find the most recent version of a packet by running:

orderly_search('latest(name == "random")')
## [1] "20251008-130624-b1ac8503"

But we can also pass in parameter queries here:

orderly_search('latest(name == "random" && parameter:n_samples > 10)')
## [1] "20251008-130624-b1ac8503"

These can be used within orderly_dependency() (the name == "random" part is implied by the first name argument), for example

orderly_dependency("random", "latest(parameter:n_samples > 10)",
                   c("random.rds" = "data.rds"))

In this case if the report that you are querying from also has parameters you can use these within the query, using the this prefix. So suppose our downstream report simply uses n for the number of samples we might write:

orderly_dependency("random", "latest(parameter:n_samples == this:n)",
                   c("randm.rds" = "data.rds"))

to depend on the most recent packet called random where it has a parameter n_samples which has the same value as the current report’s parameter n.

See vignette("query") for much more detail on this.

Shared resources

Sometimes it is useful to share data between different reports, for example some common source utilities that don’t warrant their own package, or some common data.

To do this, create a directory shared at the orderly root and put in it any files or directories you might want to share.

Suppose our shared directory contains a file data.csv:

## .
## ├── archive
## │   ├── analysis
## │   │   └── 20251008-130624-8203ffd8
## │   │       ├── analysis.R
## │   │       ├── analysis.png
## │   │       └── incoming.rds
## │   ├── incoming_data
## │   │   ├── 20251008-130624-5cfde767
## │   │   │   ├── data.csv
## │   │   │   ├── data.rds
## │   │   │   └── incoming_data.R
## │   │   └── 20251008-130624-9b911b40
## │   │       ├── data.csv
## │   │       ├── data.rds
## │   │       └── incoming_data.R
## │   └── random
## │       └── 20251008-130624-b1ac8503
## │           ├── data.rds
## │           └── random.R
## ├── draft
## │   ├── analysis
## │   ├── incoming_data
## │   └── random
## ├── orderly_config.json
## ├── shared
## │   └── data.csv
## └── src
##     ├── analysis
##     │   └── analysis.R
##     ├── incoming_data
##     │   ├── data.csv
##     │   └── incoming_data.R
##     └── random
##         └── random.R

We can then write an orderly report use_shared that uses this shared file, with its use_shared.R containing:

orderly_shared_resource("data.csv")
orderly_artefact(description = "analysis", "analysis.png")

d <- read.csv("data.csv")
png("analysis.png")
plot(y ~ x, d)
dev.off()

We can run this:

id <- orderly_run("use_shared")
## ℹ Starting packet 'use_shared' `20251008-130624-da30ffb4` at 2025-10-08 14:06:24.857326
## > orderly_shared_resource("data.csv")
## > orderly_artefact(description = "analysis", "analysis.png")
## > d <- read.csv("data.csv")
## > png("analysis.png")
## > plot(y ~ x, d)
## > dev.off()
## png 
##   2
## ✔ Finished running 'use_shared.R'
## ℹ Finished 20251008-130624-da30ffb4 at 2025-10-08 14:06:24.892219 (0.03489327 secs)

In the resulting archive, the file that was used from the shared directory is present:

## archive/use_shared
## └── 20251008-130624-da30ffb4
##     ├── analysis.png
##     ├── data.csv
##     └── use_shared.R

This is a general property of orderly: it tries to save all the inputs alongside the final results of the analysis, so that later on you can check to see what went into an analysis and what might have changed between versions.

The boundaries between source code under version control, shared resources and dependencies are blurry, and we expect teams will find ways of working that suit them; one group’s solution may not please another.

Strict mode

The previous version of orderly (orderly1; see vignette("migrating")) was very fussy about all input being strictly declared before a report could be run, so that it was clear what was really required in order to run something. From version 2 this is relaxed by default, but you can opt into most of the old behaviours and checks by adding

orderly_strict_mode()

anywhere within your orderly file (conventionally at the top). We may make this more granular in future, but by adding this we:

Using strict mode also helps orderly clean up the src/<reportname> directory more effectively after interactive development (see next section).

Interactive development

Set your working directory to src/<reportname> and any orderly script should be fully executable (e.g., source with Rstudio’s Source button, or R’s source() function). Dependencies will be copied over as needed.

After doing this, you will have a mix of files within your source directory. We recommend a per-source-directory .gitignore which will keep these files out of version control (see below).

For example, suppose that we have interactively run our incoming_data/incoming_data.R script, we would leave behind generated files. We can report on this with orderly_cleanup_status():

orderly_cleanup_status("incoming_data")
## ✖ incoming_data is not clean:
## ℹ 1 file can be deleted by running 'orderly_cleanup("incoming_data")':
##   • data.rds

If you have files here that are unknown to orderly it will tell you about them and prompt you to tell it about them explicitly.

You can clean up generated files by running (as suggested in the message):

orderly_cleanup("incoming_data")
## ℹ Deleting 1 file from 'incoming_data':
## • data.rds

There is a dry_run = TRUE argument you can pass if you want to see what would be deleted without using the status function.

You can also keep these files out of git by using the orderly_gitignore_update() function:

orderly_gitignore_update("incoming_data")
## ✔ Wrote 'src/incoming_data/.gitignore'

This creates (or updates) a .gitignore file within the report so that generated files will not be included by git. If you have already accidentally committed them then the gitignore has no real effect and you should do some git surgery, see the git manuals or this handy, if profane, guide.

Deleting things from the archive

If you delete packets from your archive/ directory then this puts orderly into an inconsistent state with its metadata store. Sometimes this does not matter (e.g., if you delete old copies that would never be candidates for inclusion with orderly_dependency() you will never notice). However, if you delete the most recent copy of a packet and then try and depend on it, you will get an error.

At the moment, we have two copies of the incoming_data task:

orderly_metadata_extract(
  name = "incoming_data",
  extract = c(time = "time.start"))
##                         id                time
## 1 20251008-130624-5cfde767 2025-10-08 13:06:24
## 2 20251008-130624-9b911b40 2025-10-08 13:06:24

When we run the analysis task, it will pull in the most recent version (20251008-130624-9b911b40). However, if you had deleted this manually (e.g., to save space or accidentally) or corrupted it (e.g., by opening some output in Excel and letting it save changes) it will not be able to be included, and running analysis will fail:

orderly_run("analysis")
## ℹ Starting packet 'analysis' `20251008-130625-160c27ed` at 2025-10-08 14:06:25.090856
## > orderly_dependency("incoming_data", "latest()",
## +                    c("incoming.rds" = "data.rds"))
## ✖ Error running 'analysis.R'
## ℹ Finished 20251008-130625-160c27ed at 2025-10-08 14:06:25.154635 (0.06377912 secs)
## Error in `orderly_run()`:
## ! Failed to run report
## Caused by error in `orderly_copy_files()`:
## ! Unable to copy files, due to deleted packet 20251008-130624-9b911b40
## ℹ Consider 'orderly_validate_archive("20251008-130624-9b911b40", action =
##   "orphan")' to remove this packet from consideration
## Caused by error:
## ! File not found in archive
## ✖ data.rds

The error here tries to be fairly informative, telling us that we failed because when copying files from 20251008-130624-9b911b40 we found that the packet was corrupt, because the file data.rds was not found in the archive. It also suggests a fix; we can tell orderly that 20251008-130624-9b911b40 is “orphaned” and should not be considered for inclusion when we look for dependencies.

We can carry out the suggestion and just validate this packet by running

orderly_validate_archive("20251008-130624-9b911b40", action = "orphan")

or we can validate all the packets we have:

orderly_validate_archive(action = "orphan")
## ✔ 20251008-130624-5cfde767 (incoming_data) is valid
## ✔ 20251008-130624-8203ffd8 (analysis) is valid
## ✖ 20251008-130624-9b911b40 (incoming_data) is invalid due to its files
## ✔ 20251008-130624-b1ac8503 (random) is valid
## ✔ 20251008-130624-da30ffb4 (use_shared) is valid

If we had the option core.require_complete_tree enabled, then this process would also look for any packets that used our now-deleted packet and orphan those too, as we no longer have a complete tree that includes them.

If you want to remove references to the orphaned packets, you can use orderly_prune_orphans() to remove them entirely:

orderly_prune_orphans()
## ℹ Pruning 1 orphan packet

Interaction with version control

Some guidelines:

Make sure to exclude some files from git by listing them in .gitignore:

You absolutely should version control some files:

Your source repository will end up in multiple people’s machines, each of which are configured differently. The configuration option set via orderly_config_set are designed to be (potentially) different for different users, so this configuration needs to be not version controlled. It also means that reports/packets can’t directly refer to values set here. This includes the directory used to save archive packets at (if enabled) and the names of locations (equivalent to git remotes).

You may find it useful to include scripts that help users set up common locations, but like with git, different users may use different names for the same remote (e.g., one user may have a location called data while for another it is called data-incoming, depending on their perspective about the use of the location).

orderly will always try and save information about the current state of the git source repository alongside the packet metadata. This includes the current branch, commit (sha) and remote url. This is to try and create links between the final version of the packet and the upstream source repository.

Interaction with the outpack store

As alluded to above, the .outpack directory contains lots of information about packets that have been run, but is typically “out of bounds” for normal use. This is effectively the “database” of information about packets that have been run. Understanding how this directory is structured is not required for using orderly, but is included here for the avoidance of mystery!

After all the work above, our directory structure looks like:

## .outpack
## ├── config.json
## ├── index
## │   └── outpack.rds
## ├── location
## │   ├── local
## │   │   ├── 20251008-130624-5cfde767
## │   │   ├── 20251008-130624-8203ffd8
## │   │   ├── 20251008-130624-b1ac8503
## │   │   └── 20251008-130624-da30ffb4
## │   └── orphan
## └── metadata
##     ├── 20251008-130624-5cfde767
##     ├── 20251008-130624-8203ffd8
##     ├── 20251008-130624-b1ac8503
##     └── 20251008-130624-da30ffb4

As can be perhaps inferred from the filenames, the files .outpack/metadata/<packet-id> are the metadata for each packet as it has been run. The files .outpack/location/<location-id>/<packet-id> holds information about when the packet was first known about by a location (here the location is the special “local” location).

The default orderly configuration is to store the final files in a directory called archive/, but alternatively (or additionally) you can use a content- addressable file store. With this enabled, the .outpack directory looks like:

## .outpack
## ├── config.json
## ├── files
## │   └── sha256
## │       ├── 1a
## │       │   └── 8a343aa8454a87be752c55b1dfbd1eda5eed99d9234822890edd880470156d
## │       ├── 3b
## │       │   └── d7a6623cc8235b320e93ac289b31fe1bb1c215e706c54b10a5600ac08d224f
## │       ├── 5f
## │       │   └── 96f49230c2791c05706f24cb2335cd0fad5d3625dc6bca124c44a51857f3f8
## │       ├── 9c
## │       │   └── dbb62b1ba4cecc1753fcd16d30d00496f255c44e80f4bb0ce5021b1b736d4a
## │       ├── a5
## │       │   └── 7c910d317aa24ec82edfafb665c0a314fc95ba1dc55c05437e22d23a98e2d6
## │       ├── aa
## │       │   └── 994dbde68580e1df76dbcc9e32157902c498fe9582e4784b40a437b9cb0cdd
## │       ├── b0
## │       │   └── bbd0c75a47435b74298ecb3ebdb3ceb77b00a373063092b1d4f716daff7477
## │       ├── b3
## │       │   └── 69412c2748c9c7762534c66ac8edb904cca5cc33126f72222d9a16e7a6b985
## │       ├── d7
## │       │   └── c0ce651d6be6529448f4d348a113b77676b6cc26b9778c3251ec48dfe06012
## │       └── e6
## │           └── 5922c667b2a1dd34180d1085bb562c61223fc0748a83859cb61dfeb4890854
## ├── index
## │   └── outpack.rds
## ├── location
## │   ├── local
## │   │   ├── 20251008-130624-5cfde767
## │   │   ├── 20251008-130624-8203ffd8
## │   │   ├── 20251008-130624-b1ac8503
## │   │   └── 20251008-130624-da30ffb4
## │   └── orphan
## └── metadata
##     ├── 20251008-130624-5cfde767
##     ├── 20251008-130624-8203ffd8
##     ├── 20251008-130624-b1ac8503
##     └── 20251008-130624-da30ffb4

The files under .outpack/files/ should never be modified or deleted. This approach to storage naturally deduplicates the file archive, so that a large file used in many places is only ever stored once.

Relationship between orderly and outpack

The orderly package is built on a metadata and file storage system called outpack; we have implemented support for working with these metadata archives in other languages (see outpack_server for our server implementation in Rust and pyorderly in Python). The metadata is discussed in more detail in vignette("metadata") and we will document the general ideas more fully at mrc-ide/outpack.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.