The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

artoo is a lightweight, lossless, CDISC-native reader and writer for clinical-trial datasets. It moves data between SAS XPORT (XPT), CDISC Dataset-JSON v1.1, NDJSON, Apache Parquet, and RDS through one canonical metadata model, so converting between any two is lossless by construction — not by best effort.
Install the released version from CRAN:
install.packages("artoo")Or the development version from GitHub:
# install.packages("pak")
pak::pak("vthanik/artoo")
# or
remotes::install_github("vthanik/artoo")A spec describes the dataset; apply_spec() conforms a
raw frame to it; the writers carry every piece of metadata to disk — one
pipeable chain:
library(artoo)
# Coerce, order, sort, stamp metadata, then write. The writers return their
# input invisibly, so one conformed frame fans out to every deliverable.
path <- tempfile(fileext = ".xpt")
adsl <- cdisc_adsl |>
apply_spec(adam_spec, "ADSL") |>
write_xpt(path)
#> 6 variables the spec declares are absent from the data (not added): `TRTDURD`,
#> `DISONDT`, `EOSSTT`, `DCSREAS`, `EOSDISP`, and `MMS1TSBL`.
#> ℹ See `conformance(x)` for the findings.
# Read it back — labels, formats, types, and record count intact.
get_meta(read_xpt(path))@dataset$records
#> [1] 60columns() is the quick look a SAS programmer expects
from PROC CONTENTS, on a conformed frame or straight off a
file:
columns(adsl)
#> <artoo_columns> ADSL -- 48 variables, 60 obs
#> # Variable Type Len Format Label Key
#> 1 STUDYID Char 12 Study Identifier 1
#> 2 USUBJID Char 11 Unique Subject Identifier 2
#> 3 SUBJID Char 4 Subject Identifier for the Study
#> 4 SITEID Char 3 Study Site Identifier
#> 5 SITEGR1 Char 3 Pooled Site Group 1
#> 6 ARM Char 20 Description of Planned Arm
#> 7 TRT01P Char 20 Planned Treatment for Period 01
#> 8 TRT01PN Num Planned Treatment for Period 01 (N)
#> 9 TRT01A Char 20 Actual Treatment for Period 01
#> 10 TRT01AN Num Actual Treatment for Period 01 (N)
#> 11 TRTSDT Num DATE9. Date of First Exposure to Treatment
#> 12 TRTEDT Num DATE9. Date of Last Exposure to Treatment
#> 13 AVGDD Num 5.1 Avg Daily Dose (as planned)
#> 14 CUMDOSE Num 8.1 Cumulative Dose (as planned)
#> 15 AGE Num Age
#> 16 AGEGR1 Char 5 Pooled Age Group 1
#> 17 AGEGR1N Num Pooled Age Group 1 (N)
#> 18 AGEU Char 5 Age Units
#> 19 RACE Char 32 Race
#> 20 RACEN Num Race (N)
#> 21 SEX Char 1 Sex
#> 22 ETHNIC Char 22 Ethnicity
#> 23 SAFFL Char 1 Safety Population Flag
#> 24 ITTFL Char 1 Intent-To-Treat Population Flag
#> 25 EFFFL Char 1 Efficacy Population Flag
#> 26 COMP8FL Char 1 Completers of Week 8 Population Flag
#> 27 COMP16FL Char 1 Completers of Week 16 Population Flag
#> 28 COMP24FL Char 1 Completers of Week 24 Population Flag
#> 29 DISCONFL Char 1 Subject Discontinued Study Flag
#> 30 DSRAEFL Char 1 Subject Discontinued due to AE Flag
#> 31 DTHFL Char 1 Subject Death Flag
#> 32 BMIBL Num 5.1 Baseline BMI (kg/m^2)
#> 33 BMIBLGR1 Char 6 Pooled Baseline BMI Group 1
#> 34 HEIGHTBL Num 6.1 Baseline Height (cm)
#> 35 WEIGHTBL Num 6.1 Baseline Weight (kg)
#> 36 EDUCLVL Num Years of Education
#> 37 DURDIS Num 6.1 Duration of Disease (Months)
#> 38 DURDSGR1 Char 4 Pooled Disease Duration Group 1
#> 39 VISIT1DT Num DATE9. Date of Visit 1
#> 40 RFSTDTC Char 10 Subject Reference Start Date/Time
#> 41 RFENDTC Char 10 Subject Reference End Date/Time
#> 42 VISNUMEN Num End of Trt Visit (Vis 12 or Early Term.)
#> 43 RFENDT Num Date of Discontinuation/Completion
#> 44 TRTDUR Num
#> 45 DISONSDT Num DATE9.
#> 46 DCDECOD Char 27
#> 47 DCREASCD Char 18
#> 48 MMSETOT Num--DTC
text, and codelists follow the Dataset-JSON v1.1 vocabulary; specs read
from Define-XML, Pinnacle 21 workbooks, or native JSON.artoo is the carrier between the formats a clinical-trial dataset travels in: the XPORT a regulator expects, the Dataset-JSON modern CDISC exchange uses, the Parquet an analytics stack reads, and an R-native checkpoint. Reach for it whenever a dataset must change formats without losing the metadata that makes it submission-ready — labels, types, lengths, display formats, codelists, and keys — and you want that guarantee enforced rather than hoped for. It is a focused reader/writer, not a validation suite or a table renderer.
| Format | Reader | Writer | Use |
|---|---|---|---|
| SAS XPORT (XPT) | read_xpt() |
write_xpt() |
FDA / PMDA submission |
| CDISC Dataset-JSON | read_json() |
write_json() |
Modern CDISC interchange |
| NDJSON | read_ndjson() |
write_ndjson() |
Streaming Dataset-JSON |
| Apache Parquet | read_parquet() |
write_parquet() |
Analytics, columnar store |
| RDS | read_rds() |
write_rds() |
Fast R-native storage |
The generic read_dataset() /
write_dataset() dispatch on the file extension; every
reader supports partial reads via col_select and
n_max.
Partial ISO 8601 dates are first-class: a character
--DTC column typed date writes to XPT as ISO
text — "1951-12" survives byte for byte — while
targetDataType = "integer" drives the ADaM numeric-date
convention. SAS TIME values arrive as hms
(seconds since midnight), and >24h, negative, and
fractional times round-trip every format.
apply_spec() and every conformance
finding.MIT © Vignesh Thanikachalam
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.