The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Getting Started with SCIproj

What is a research compendium?

A research compendium is a self-contained collection of data, code, and documentation that accompanies a research project. By structuring a project as an R package, you gain:

SCIproj automates the creation of such a compendium, adding opinionated defaults for reproducible workflows (targets), dependency snapshots (renv), and FAIR-compliant metadata (CITATION.cff).

Getting started

Install SCIproj from GitHub:

# install.packages("remotes")
remotes::install_github("saskiaotto/SCIproj")

Create a new project with a single call:

library(SCIproj)
create_proj("~/projects/my_analysis")

This creates a fully scaffolded research compendium with renv and targets enabled by default.

Customizing the call

create_proj("~/projects/baltic_cod",
  add_license = "MIT",
  license_holder = "Jane Doe",
  orcid = "0000-0001-2345-6789",
  use_docker = TRUE,
  use_git = TRUE
)

Directory names with underscores or hyphens are fine — the R package name in DESCRIPTION is automatically sanitized (e.g., baltic_cod becomes baltic.cod).

Project structure

After creation, the project directory looks like this:

your-project/
├── DESCRIPTION             # Project metadata, dependencies, and author info (with ORCID).
├── README.Rmd              # Top-level project description.
├── your-project.Rproj      # RStudio project file.
├── CITATION.cff            # Machine-readable citation metadata for FAIR compliance.
├── CONTRIBUTING.md         # Contribution guidelines.
├── LICENSE.md              # Full license text (here: MIT).
├── NAMESPACE               # Auto-generated by roxygen2 (do not edit by hand).
│
├── data-raw/               # Raw data files and pre-processing scripts.
│   ├── clean_data.R        # Script template for data cleaning.
│   ├── DATA_SOURCES.md     # Data provenance: source, license, DOI, download date.
│   └── ...
│
├── data/                   # Cleaned datasets stored as .rda files.
│
├── R/                      # Custom R functions and dataset documentation.
│   ├── function_ex.R       # Template for custom functions.
│   ├── data.R              # Template for dataset documentation.
│   └── ...
│
├── analyses/               # R scripts or R Markdown/Quarto documents for analyses.
│   ├── figures/            # Generated plots.
│   └── ...
│
├── docs/                   # Publication-ready documents (article, report, presentation).
├── trash/                  # Temporary files that can be safely deleted.
│
├── _targets.R              # Pipeline definition for reproducible workflow.
├── renv/                   # renv library and settings.
├── renv.lock               # Lockfile for reproducible package versions.
└── Dockerfile              # Container definition for full reproducibility.
Directory / File Purpose
R/ Reusable R functions (documented with roxygen2)
data/ Cleaned, analysis-ready datasets (.rda format)
data-raw/ Raw data files and the script that cleans them
analyses/ Analysis scripts, R Markdown reports, figures
docs/ Manuscripts, presentations, supplementary material
trash/ Temporary files not under version control
_targets.R Pipeline definition for targets
CITATION.cff Machine-readable citation metadata
CONTRIBUTING.md Guidelines for collaborators

FAIR compliance

SCIproj encourages FAIR (Findable, Accessible, Interoperable, Reusable) research practices through several built-in features:

CITATION.cff

A Citation File Format file is created automatically. It includes the project title, author name, version, release date, and optionally a license and ORCID iD. Services like GitHub and Zenodo can parse this file to generate proper citations.

create_proj("my_project",
  license_holder = "Jane Doe",
  orcid = "0000-0001-2345-6789",
  add_license = "MIT"
)

DATA_SOURCES.md

When data_raw = TRUE (the default), a DATA_SOURCES.md template is placed in data-raw/. Use it to document the provenance of every dataset: source, URL, DOI, license, download date, and file names.

ORCID

Pass your ORCID iD via the orcid parameter to embed it in CITATION.cff, making your authorship unambiguously machine-readable.

Workflow with targets

By default (use_targets = TRUE), SCIproj adds a _targets.R pipeline template. The targets package provides:

A typical workflow:

# 1. Define targets in _targets.R
# 2. Inspect the pipeline
targets::tar_manifest()
targets::tar_visnetwork()
# 3. Run the pipeline
targets::tar_make()
# 4. Read a result
targets::tar_read(my_result)

Edit _targets.R to define your data-loading, analysis, and reporting steps. Each step is a target that depends on upstream targets and R functions in R/.

Dependency management with renv

By default (use_renv = TRUE), SCIproj initializes renv with the "explicit" snapshot type. This means renv discovers dependencies from DESCRIPTION rather than scanning all R files, which is the recommended approach for package-based compendia.

Key commands:

renv::status()     # check if lockfile is in sync
renv::snapshot()   # update the lockfile after adding packages
renv::restore()    # reinstall packages from the lockfile

The renv.lock file should be committed to version control so collaborators can reproduce your exact package versions.

Optional features

Docker

Set use_docker = TRUE to add a Dockerfile and .dockerignore. The Dockerfile provides a template for building a container that reproduces your computational environment, independent of the host system.

GitHub and CI

Set create_github_repo = TRUE to create a GitHub repository (requires a configured GITHUB_PAT). Add ci = "gh-actions" to include a GitHub Actions workflow for automated R CMD check on push.

create_proj("my_project",
  use_git = TRUE,
  create_github_repo = TRUE,
  ci = "gh-actions"
)

Licenses

Choose from "MIT", "GPL", "AGPL", "LGPL", "Apache", "CCBY", or"CC0" via the add_license parameter. The selected license is applied to DESCRIPTION and recorded in CITATION.cff.

testthat

Set testthat = TRUE to add testing infrastructure (tests/testthat.R and tests/testthat/). Writing tests for your analysis functions helps catch regressions early.

Makefile

Set makefile = TRUE to add a makefile.R script as an alternative to targets for orchestrating your workflow.

Typical development cycle

  1. Create the project

    SCIproj::create_proj("~/projects/my_study", add_license = "MIT",
      license_holder = "Your Name")
  2. Open the .Rproj file in RStudio.

  3. Add raw data to data-raw/ and document it in DATA_SOURCES.md.

  4. Write cleaning code in data-raw/clean_data.R; save cleaned data to data/ with usethis::use_data().

  5. Write analysis functions in R/ and document them with roxygen2.

  6. Define the pipeline in _targets.R to connect data, functions, and reports.

  7. Run targets::tar_make() to execute the pipeline.

  8. Write reports in analyses/ using R Markdown or Quarto, reading results with targets::tar_read().

  9. Snapshot dependencies with renv::snapshot() before sharing.

  10. Push to GitHub and let CI run R CMD check automatically.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.