vetR - Trust, but Verify

Brodie Gaslam

Trust, but Verify

Easily

When you write functions that operate on S3 or unclassed objects you can either trust that your inputs will be structured as expected, or tediously check that they are.

vetr takes the tedium out of structure verification, so that you can trust, but verify. It lets you express structural requirements declaratively with templates, and it auto-generates human-friendly error messages as needed.

Quickly

vetr is written in C to minimize overhead from parameter checks in your functions. It has no dependencies.

Declarative Checks with Templates

Templates

Declare a template that an object should conform to, and let vetr take care of the rest:

library(vetr)
tpl <- numeric(1L)
vet(tpl, 1:3)
## [1] "`1:3` should be length 1 (is 3)"
vet(tpl, "hello")
## [1] "`\"hello\"` should be type \"numeric\" (is \"character\")"
vet(tpl, 42)
## [1] TRUE

Zero length templates match any length:

tpl <- integer()
vet(tpl, 1L:3L)
## [1] TRUE
vet(tpl, 1L)
## [1] TRUE

And for convenience short (<= 100 length) integer-like numerics are considered integer:

tpl <- integer(1L)
vet(tpl, 1)       # this is a numeric, not an integer
## [1] TRUE
vet(tpl, 1.0001)
## [1] "`1.0001` should be type \"integer-like\" (is \"double\")"

vetr can compare recursive objects such as lists, or data.frames:

tpl.iris <- iris[0, ]      # 0 row DF matches any number of rows in object
iris.fake <- iris
levels(iris.fake$Species)[3] <- "sibirica"   # tweak levels

vet(tpl.iris, iris)
## [1] TRUE
vet(tpl.iris, iris.fake)
## [1] "`levels(iris.fake$Species)[3]` should be \"virginica\" (is \"sibirica\")"

From our declared template iris[0, ], vetr infers all the required checks. In this case, vet(iris[0, ], iris.fake, stop=TRUE) is equivalent to:

stopifnot_iris <- function(x) {
  stopifnot(
    is.list(x), inherits(x, "data.frame"),
    length(x) == 5, is.integer(attr(x, 'row.names')),
    identical(
      names(x),
      c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species")
    ),
    all(vapply(x[1:4], is.numeric, logical(1L))),
    typeof(x$Species) == "integer", is.factor(x$Species),
    identical(levels(x$Species), c("setosa", "versicolor", "virginica"))
  )
}
stopifnot_iris(iris.fake)
## Error: identical(levels(x$Species), c("setosa", "versicolor", "virginica")) is not TRUE

vetr saved us typing, and the time and thought needed to come up with the things that need to be compared.

You could just as easily have created templates for nested lists, or data frames in lists. Templates are compared to objects with the alike. For a thorough description of templates and how they work see the alike vignette. For template examples see example(alike).

Auto-Generated Error Messages

Let’s revisit the error message:

vet(tpl.iris, iris.fake)
## [1] "`levels(iris.fake$Species)[3]` should be \"virginica\" (is \"sibirica\")"

It tells us:

vetr does what it can to reduce the time from error to resolution. The location of failure is generated such that you can easily copy it in part or full to the R prompt for further examination.

Vetting Expressions

Introduction

You can combine templates with && / ||:

vet(numeric(1L) || NULL, NULL)
## [1] TRUE
vet(numeric(1L) || NULL, 42)
## [1] TRUE
vet(numeric(1L) || NULL, "foo")
## [1] "`\"foo\"` should be \"NULL\", or type \"numeric\" (is \"character\")"

Templates only check structure. When you need to check values use . to refer to the object:

vet(numeric(1L) && . > 0, -42)  # strictly positive scalar numeric
## [1] "`-42 > 0` is not TRUE (FALSE)"
vet(numeric(1L) && . > 0, 42)
## [1] TRUE

You can compose vetting expressions as language objects and combine them:

scalar.num.pos <- quote(numeric(1L) && . > 0)
foo.or.bar <- quote(character(1L) && . %in% c('foo', 'bar'))
vet.exp <- quote(scalar.num.pos || foo.or.bar)

vet(vet.exp, 42)
## [1] TRUE
vet(vet.exp, "foo")
## [1] TRUE
vet(vet.exp, "baz")
## [1] "At least one of these should pass:"                         
## [2] "  - `\"baz\"` should be type \"numeric\" (is \"character\")"
## [3] "  - `\"baz\" %in% c(\"foo\", \"bar\")` is not TRUE (FALSE)"

There are a number of predefined vetting tokens you can use in your vetting expressions:

vet(NUM.POS, -runif(5))    # positive numeric
## [1] "`-runif(5)` should contain only positive values, but has negatives"
vet(LGL.1, NA)             # TRUE or FALSE
## [1] "`NA` should not contain NAs, but does"

See ?vet_token for a full listing, and for instructions on how to define your own tokens with custom error messages.

Vetting expressions are designed to be intuitive to use, but their implementation is complex. We recommend you look at example(vet) for usage ideas, or at the “Non Standard Evaluation” section of the vignette for the gory details.

Non Standard Evaluation

Vetting Expressions are Language Objects

vet captures the first argument unevaluated. For example in:

vet(. > 0, 1:3)

. > 0 is captured, processed, and evaluated in a special manner. This is a common pattern in R (e.g. as in with, subset, etc.) called Non Standard Evaluation (NSE). One additional wrinkle with vet is that symbols in the captured expression are recursively substituted:

a <- quote(integer() && . > 0)
b <- quote(logical(1L) && !is.na(.))
c <- quote(a || b)

vet(c, 1:3)

The above is thus equivalent to:

vet((integer() && . > 0) || (logical(1L) && !is.na(.)), 1:3)

The recursive substitution removes the typical limitation on “programming” with NSE, although there are a few things to know:

  • Symbols in vetting expressions that evaluate to language objects (calls or symbols) in the parent frame are substituted with the corresponding language object.
  • The result of this substitution is implicitly wrapped in parentheses to avoid operator precedence problems.
  • The function part of a call is never substituted (e.g. the fun in fun(a, b)); this extends to operators.
  • . is never substituted, though you can work around that by escaping it with an additional . (i.e. ..).
  • You must take particular care when constructing vetting expressions for language objects.

To illustrate the last point, suppose we want to check that an object is a call in the form x + y, then we could use:

vet(quote(x + y), my.call)       # notice `quote`

Or:

tpl.call <- quote(quote(x + y))  # notice `quote(quote(...))`
vet(tpl.call, my.call)

Additionally, you will need to ensure that x and y themselves do not evaluate to language objects in the parent frame.

Parsing and Evaluation Rules

Once a vetting expression has been recursively substituted, it is parsed into tokens. Tokens are the parts of the vetting expression bounded by the && and || operators and optionally enclosed in parentheses. For example, there are three tokens in the following vetting expression:

logical(1) || (numeric(1) && (. > 0 & . < 1))

They are logical(1), numeric(1), and . > 0 & . < 1. The last token is just one token not because of the parentheses around it but because it is a call to & as opposed to &&. Here we use the parentheses to remove parsing ambiguity caused by & and && having the same operator precedence.

After the tokens have been identified they are classified as custom tokens or template tokens. Custom tokens are those that contain the . symbol. Every other token is considered a template token.

Custom tokens are further processed by substituting any . with the value of the object being vetted. These tokens are then evaluated and if those evaluations produce TRUE or logical vectors containing only TRUE, the tokens pass, otherwise they fail. With:

vet(. > 0, 1:3)
## [1] TRUE

. > 0 becomes 1:3 > 0, which evaluates to c(TRUE, TRUE, TRUE) and the token passes.

Template tokens, i.e. tokens without a . symbol, are evaluated and the resulting R object is sent along with the object to vet to alike for structural comparison. If alike returns TRUE then the token passes, otherwise it fails.

Finally, the result of evaluating each token is plugged back into the original expression. So1:

vet(logical(1) || (numeric(1) && (. > 0 & . < 1)), 42)
# becomes:
alike(logical(1L), 42) || (alike(numeric(1L), 42) && all(42 > 0 & 42 < 1))
# becomes:
FALSE || (TRUE && FALSE)
# becomes:
FALSE

And the vetting fails:

vet(logical(1) || (numeric(1) && (. > 0 & . < 1)), 42)
## [1] "At least one of these should pass:"                 
## [2] "  - `42` should be type \"logical\" (is \"double\")"
## [3] "  - `42 > 0 & 42 < 1` is not TRUE (FALSE)"

Special Cases

If you need to reference a literal dot (.) in a token, you can escape it by adding another dot so that . becomes ... If you want to reference ... you’ll need to use ..... If you have a custom token that does not reference the vetting object (i.e. does not use .) you can mark it as a custom token by wrapping it in .() (if you want to use a literal .() you can use ..()).

If you need && or || to be interpreted literally you can wrap the call in I to tell vet to treat the entire call as a single token:

I(length(a) == length(b) && . %in% 0:1)

vet will stop searching for tokens at the first call to a function other than (, &&, and ||. The use of I here is just an example of this behavior and convenient since I does not change the meaning of the vetting token. An implication of this is you should not nest template tokens inside functions as vet will not identify them as template tokens and you may get unexpected results. For example:

I(logical(1L) && my_special_fun(.))

will always fail because logical(1L) is part of a custom token and is evaluated as FALSE rather than used a template token for a scalar logical.

In Functions

The vetr function streamlines parameter checks in functions. It behaves just like vet, except that you need only specify the vetting expressions. The objects to vet are captured from the function environment:

fun <- function(x, y, z) {
  vetr(
    matrix(numeric(), ncol=3),
    logical(1L),
    character(1L) && . %in% c("foo", "bar")
  )
  TRUE  # do work...
}
fun(matrix(1:12, 3), TRUE, "baz")
## Error in fun(x = matrix(1:12, 3), y = TRUE, z = "baz"): For argument `x`, `matrix(1:12, 3)` should have 3 columns (has 4)
fun(matrix(1:12, 4), TRUE, "baz")
## Error in fun(x = matrix(1:12, 4), y = TRUE, z = "baz"): For argument `z`, `"baz" %in% c("foo", "bar")` is not TRUE (FALSE)
fun(matrix(1:12, 4), TRUE, "foo")
## [1] TRUE

The arguments to vetr are matched to the arguments of the enclosing function in the same way as with match.call. For example, if we wished to vet just the third argument:

fun <- function(x, y, z) {
  vetr(z=character(1L) && . %in% c("foo", "bar"))
  TRUE  # do work...
}
fun(matrix(1:12, 3), TRUE, "baz")
## Error in fun(x = matrix(1:12, 3), y = TRUE, z = "baz"): For argument `z`, `"baz" %in% c("foo", "bar")` is not TRUE (FALSE)
fun(matrix(1:12, 4), TRUE, "bar")
## [1] TRUE

Vetting expressions work the same way with vetr as they do with vet.

Performance Considerations

Benchmarks

vetr is written primarily in C to minimize the performance impact of adding validation checks to your functions. Performance should be faster than using stopifnot except for the most trivial of checks. The vetr function itself carries some additional overhead from matching arguments, but it should still be faster than stopifnot except in the simplest of cases. Here we run our checks on valid iris objects we used to illustrate declarative checks:

vetr_iris <- function(x) vetr(tpl.iris)

bench_mark(times=1e4,
  vet(tpl.iris, iris),
  vetr_iris(iris),
  stopifnot_iris(iris)   # defined in "Templates" section
)
## Mean eval time from 10000 iterations, in microseconds:
##   vet(tpl.iris, iris)   ~  10.0
##   vetr_iris(iris)       ~  19.6
##   stopifnot_iris(iris)  ~  97.3

Performance is optimized for the success case. Failure cases should still perform reasonably well, but will be slower than most success cases.

Templates and Performance

Complex templates will be slower to evaluate than simple ones, particularly for lists with lots of nested elements. Note however that the cost of the vetting expression is a function of the complexity of the template, not that of the value being vetted.

We recommend that you predefine templates in your package and not in the validation expression since some seemingly innocuous template creation expressions carry substantial overhead:

bench_mark(data.frame(a=numeric()))
## Mean eval time from 1000 iterations, in microseconds:
##   data.frame(a = numeric())  ~  143

In this case the data.frame call alone take over 100us. In your package code you could use:

df.tpl <- data.frame(a=numeric())

my_fun <- function(x) {
  vetr(x=df.tpl)
  TRUE    # do work
}

This way the template is created once on package load and re-used each time your function is called.

Alternatives

stopifnot

stopifnot is a fine solution in many cases, but the error messages that result from it can be cryptic, and it can take a fair bit of effort to set-up comprehensive tests for complex objects. Additionally, it is slower except perhaps for the simplest of checks.

S4 Classes

The natural solution to enforcing structural requirements in R objects is to use S4 classes. Unfortunately S3 objects are so prevalent in R that a “backwards compatible” mechanism for enforcing structure requirement is warranted.

Valaddin

valaddin is a great package by Eugene Ha. Both valaddin and vetr are primarily intended for function input vetting.

valaddin works by modifying function objects, which makes it easy to add validation to existing functions in the workspace or functions from packages you do not control. The interface is very flexible and allows you to specify pretty much any vetting requirement you wish.

vetr requires you to modify the source of a function or to explicitly wrap a function in another. vetr does have some advantages though:

valaddin definitely has its own advantages though, so you should review its vignette to decide what works best for you.

Other Third Party Packages


1We take some liberties in this example for clarity. For instance, alike returns a character vector on failure, not FALSE, so really what vet is doing is isTRUE(alike(...)).