The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Initial CRAN release. artoo is a lightweight, lossless, CDISC-native
reader and writer for clinical-trial datasets, built around one
canonical metadata model (artoo_meta) so that conversion
between any two supported formats is lossless by construction. Pure R
and lightweight, with no external SAS or Java runtime.
Reads and writes SAS XPORT (v5 and v8), CDISC Dataset-JSON v1.1,
NDJSON, Apache Parquet, and RDS. Every codec carries the full
artoo_meta — labels, CDISC data types, lengths, SAS display
formats, controlled-terminology references, and sort keys — so
any-to-any conversion preserves the complete metadata. For Parquet the
metadata rides as a metadata_json sidecar; a file written
by another tool with no sidecar degrades gracefully to a bare frame
rather than an error.
Generic read_dataset() /
write_dataset() dispatch on the file extension, with
read_xpt() / write_xpt() and the matching
read_json(), read_ndjson(),
read_parquet(), and read_rds() pairs as direct
entry points. Cross-cutting encoding, checks,
and created arguments flow through
....
Partial reads (col_select, n_max) on
every reader; gzip-transparent JSON and NDJSON; multi-member SAS XPORT
libraries via xpt_members() plus
read_xpt(member = ).
Numeric fidelity is exact end to end: a decimal
value is exchanged as a string at IEEE round-trip precision,
integer values beyond R’s 32-bit range stay numeric rather
than overflowing, and NaN / infinite values are rejected as
invalid CDISC numerics. Rows are sorted in C-locale (byte) order, so a
written file is deterministic across locales and matches SAS
PROC SORT for ASCII keys.
Encodings follow the IANA and SAS standards: the readers and
writers accept a charset name in either the SAS or R spelling (see
artoo_encodings()), character columns are transcoded to
UTF-8 and NFC-normalized on read, and the
on_invalid = c("error", "replace", "ignore") policy governs
invalid bytes uniformly across every writer.
artoo_spec() builds the canonical metadata model
from a Pinnacle 21 Excel workbook, a Define-XML 2.0 / 2.1 file, or a
native artoo JSON spec. read_spec() /
write_spec() dispatch on the file extension:
.xlsx writes a Pinnacle 21 workbook (Define-XML to P21 is
one composition), and the native JSON form is the lossless interchange
that round-trips a spec identically.
The spec is single-standard by construction:
@standard is resolved once from the explicit argument or
the source, and study-level fields are canonicalized to the CDISC ODM
vocabulary. Accessors include spec_standard(),
spec_variables(), spec_codelists(),
spec_methods(), and spec_comments().
set_type() returns a spec with one or more variables
retyped through the CDISC vocabulary; repair_spec() retypes
every variable a check_spec() run flags as fractional or
out-of-range under an integer data type, so a frame the
original spec would refuse coerces after one call.
apply_spec(x, spec, dataset, conformance = , na_position = )
coerces each column to its CDISC data type, orders the columns and sorts
the rows by the spec’s keys, and stamps the artoo_meta.
extra = c("keep", "drop") controls whether undeclared
columns survive; on_coercion_loss = c("error", "keep")
governs a coercion that would lose data. The pipeline never silently
fabricates or drops a column: an undeclared column is reported and kept,
a declared-but-absent column is reported and left absent.
check_spec() validates a data frame against its spec
across conformance dimensions toggled by artoo_checks();
check_study() runs it over a whole study and returns one
stacked findings frame; conformance() reads the findings
back off a stamped frame. validate_spec() checks a spec for
internal consistency against a bundled rule catalog, with no external
dependency.
decode_column() translates coded values to or from
their codelist decodes; sync_meta() reconciles a stamped
frame’s metadata after manual edits.
members() is the format-neutral inventory of the
dataset(s) a path holds, one row per dataset, dispatched through the
codec registry. columns() is the SAS PROC CONTENTS /
Universal Viewer variable pane over a stamped frame, a plain data frame,
or a file path. get_meta() / set_meta() read
and attach the artoo_meta.artoo_<severity>_<kind>,
artoo_<severity>, artoo_condition — so a
handler can catch a specific kind, a whole severity, or every artoo
condition. The data-protection conditions attach their evidence as data
(cnd$variables, cnd$findings) for programmatic
inspection.adam_spec (ADaMIG 1.1) and
sdtm_spec (SDTMIG 3.1.2), built reproducibly from the
official CDISC Define-XML 2.1 release examples and shipped also as
Pinnacle 21 workbooks under inst/extdata/. Demo datasets
come from the PHUSE Test Data Factory; the constructor tables
cdisc_adam_datasets / cdisc_adam_variables,
cdisc_sdtm_datasets / cdisc_sdtm_variables,
and the shared cdisc_codelists build a spec by hand. Every
bundled dataset conforms to its bundled spec, gated at build and test
time.vignette("artoo") plus task-oriented
web articles (specifications; conform and validate; formats and lossless
conversion; recipes), and a pkgdown reference site.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.