In the ideal case, all it takes to start using futures in R is to replace select standard assignments (<-
) in your R code with future assignments (%<-%
) and make sure the right-hand side (RHS) expressions are within curly brackets ({ ... }
). Also, if you assign these to lists (e.g. in a for loop), you need to use a list environment (listenv
) instead of a plain list.
However, there are few cases where you have to take extra precautions. These are often related to how global variables are falsely identified in non-standard evaluation (NSE), e.g. subset(data, x < 3)
. Global variables need to be identified when futures are created, but this is a particularly hard task when non-standard evaluation is involved.
If you identify other use cases, please consider reporting them so they can be documented here and possibly even be fixed.
Consider the following use of subset()
:
> data <- data.frame(x=1:5, y=1:5)
> v <- subset(data, x < 3)$y
> v
[1] 1 2
From a static code inspection point of view, the expression x < 3
asks for variable x
to be compared to 3, and there is nothing specifying that x
is part of data
and not the global environment. That x
is indeed part of the data
object can only safely be inferred at runtime when subset()
is called. This is not a problem in the above snippet, but when using futures all global/unknown variables need to be captured when the future is created (it is too late to do it when the future is resolved), e.g.
> library(future)
> data <- data.frame(x=1:5, y=1:5)
> v %<-% subset(data, x < 3)$y
Error in globalsOf(expr, envir = envir, tweak = tweakExpression, dotdotdot = "return", :
Identified a global by static code inspection, but failed to locate the corresponding
object in the relevant environments: 'x'
Above, code inspection of the future expression subset(data, x < 3)$y
incorrectly identifies x
as a global variables that needs to be captured (“frozen”) for the (lazy) future. Since no such variable x
exists, we get an error.
A clearer and backward-compatible solution to this problem is to explicitly specify the context of x
, i.e.
> data <- data.frame(x=1:5, y=1:5)
> v %<-% subset(data, data$x < 3)$y
> v
[1] 1 2
An alternative is to use a dummy variable. In contrast to the code-inspection algorithm used to identify globals, we know from reading the documentation that subset()
will look for x
in the data
object, not in the parent environments. Armed with this knowledge, we can trick the future package (more precisely the globals package) to pick up a dummy variable x
instead, e.g.
> data <- data.frame(x=1:5, y=1:5)
> x <- NULL ## To please future et al.
> v %<-% subset(data, x < 3)$y
> v
[1] 1 2
Another common use case for non-standard evaluation is when creating ggplot2 figures. For instance, in
> library(ggplot2)
> p <- ggplot(mtcars, aes(wt, mpg)) + geom_point()
> p
fields mpg
and wt
of the mtcars
data object are plotted against each other. That mpg
and wt
are fields of mtcars
cannot be inferred from code inspection alone, but you need to know that that is how ggplot2 works. Analogously to the above subset()
example, this explains why we get the following error:
> library(future)
> library(ggplot2)
> p %<-% { ggplot(mtcars, aes(wt, mpg)) + geom_point() }
Error in globalsOf(expr, envir = envir, tweak = tweakExpression, dotdotdot = "return", :
Identified a global by static code inspection, but failed to locate the corresponding
object in the relevant environments: 'wt'
A few comments are needed here. First of all, because %<-%
has higher precedence than +
, we need to place all of the ggplot2 expression within curly brackets, otherwise, we get an error. Second, the reason for only wt
being listed as a missing global variable and not mpg
is because the latter is (incorrectly) located to be ggplot2::mpg
.
One workaround is to make use of the *_string()
functions of ggplot2, e.g.
> p %<-% { ggplot(mtcars, aes_string('wt', 'mpg')) + geom_point() }
> p
Another one, is to explicitly specify mtcars$wt
and mtcars$mpg
, which may become a bit tedious.
A third alternative is to make use of dummy variables wt
and mpg
, i.e.
> p %<-% {
+ wt <- mpg <- NULL
+ ggplot(mtcars, aes(wt, mpg)) + geom_point()
+ }
> p
By the way, since all futures are evaluated in a local environment, the dummy variables are not assigned to the calling environment.
When a global variable is a vector, a matrix, a list, a data frame, an environment, or any other type of object that can be assigned via subsetting, the global package fails to identify it as a global variable if its first occurrence in the future expression is as part of a subsetting assignment. For example,
> library(future)
> x <- matrix(1:12, nrow=3, ncol=4)
> y %<-% {
+ x[1,1] <- 3
+ 42
+ }
> rm(x)
> y
Error in x[1, 1] <- 3 : object 'x' not found
Another example is
> library(future)
> x <- list(a=1, b=2)
> y %<-% {
+ x$c <- 3
+ 42
+ }
> rm(x)
> y
Error in x$c <- 3 : object 'x' not found
A workaround is to explicitly tell the future package about the global variable by simply listing it at the beginning of the expression, e.g.
> library(future)
> x <- list(a=1, b=2)
> y %<-% {
+ x ## Force 'x' to be global
+ x$c <- 3
+ 42
+ }
> rm(x)
> y
[1] 42
When calling a function using do.call()
make sure to specify the function as the object itself and not by name. This will help identify the function as a global object in the future expression. For instance, use
do.call(file_ext, list("foo.txt"))
instead of
do.call("file_ext", list("foo.txt"))
so that file_ext()
is properly located and exported. Although you may not notice a difference when evaluating futures in the same R session, it may become a problem if you use a character string instead of a function object when futures are evaluated in external R sessions, such as on a cluster.
It may also become a problem with lazy futures if the intended function is redefined after the future is resolved. For example,
> library("future")
> library("listenv")
> library("tools")
> plan(lazy)
> pathnames <- c("foo.txt", "bar.png", "yoo.md")
> res <- listenv()
> for (ii in seq_along(pathnames)) {
+ res[[ii]] %<-% do.call("file_ext", list(pathnames[ii]))
+ }
> file_ext <- function(...) "haha!"
> unlist(res)
[1] "haha!" "haha!" "haha!"
It is not possible for a future to resolve another one unless it was created by the future trying to resolve it. For instance, the following gives an error:
> library("future")
> plan(multiprocess)
> f1 <- future({ Sys.getpid() })
> f2 <- future({ value(f1) })
> v1 <- value(f1)
[1] 7464
> v2 <- value(f2)
Error: Invalid usage of futures: A future whose value has not yet been collected
can only be queried by the R process (cdd013cb-e045-f4a5-3977-9f064c31f188; pid
1276 on MyMachine) that created it, not by any other R processes (5579f789-e7b6
-bace-c50d-6c7a23ddb5a3; pid 2352 on MyMachine): {; Sys.getpid(); }
This is because the main R process creates two futures, but then the second future tries to retrieve the value of the first one. This is an invalid request because the second future has no channel to communicate with the first future; it is only the process that created a future who can communicate with it(*).
Note that it is only unresolved futures that cannot be queried this way. Thus, the solution to the above problem is to make sure all futures are resolved before they are passed to other futures, e.g.
> f1 <- future({ Sys.getpid() })
> v1 <- value(f1)
> v1
[1] 7464
> f2 <- future({ value(f1) })
> v2 <- value(f2)
> v2
[1] 7464
This works because the value has already been collected and stored inside future f1
before future f2
is created. Since the value is already stored internally, value(f1)
is readily available everywhere. Of course, instead of using value(f1)
for the second future, it would be more readable and cleaner to simply use v1
.
The above is typically not a problem when future assignments are used. For example:
> v1 %<-% { Sys.getpid() })
> v2 %<-% { v1 }
> v1
[1] 2352
> v2
[1] 2352
The reason that this approach works out of the box is because in the second future assignment v1
is identified as a global variable, which is retrieved. Up to this point, v1
is a promise (“delayed assignment” in R), but when it is retrieved as a global variable its value is resolved and v1
becomes a regular variable.
However, there are cases where future assignments can be passed via global variables without being resolved. This can happen if the future assignment is done to an element of an environment (including list environments). For instance,
> library("listenv")
> x <- listenv()
> x$a %<-% { Sys.getpid() }
> x$b %<-% { x$a }
> x$a
[1] 2352
> x$b
Error: Invalid usage of futures: A future whose value has not yet been collected
can only be queried by the R process (cdd013cb-e045-f4a5-3977-9f064c31f188; pid
1276 on localhost) that created it, not by any other R processes (2ce86ccd-5854
-7a05-1373-e1b20022e4d8; pid 7464 on localhost): {; Sys.getpid(); }
As previously, this can be avoided by making sure x$a
is resolved first, which can be one in various ways, e.g. dummy <- x$a
, resolve(x$a)
and force(x$a)
.
Footnote: (*) Although certain types of futures, such as eager and lazy ones, could be passed on to other futures and be resolved there because they share the same evaluation process, by definition of the Future API it is invalid to do so regardless of future type. This conservative approach is taken in order to make future expressions behave consistently regardless of the type of future used.
The future assignment operator %<-%
is a binary infix operator, which means it has higher precedence than most other binary operators but also higher than some of the unary operators in R. For instance, this explains why we get the following error:
> x %<-% 2 * runif(1)
Error in x %<-% 2 * runif(1) : non-numeric argument to binary operator
What effectively is happening here is that because of the higher priority of %<-%
, we first create a future x %<-% 2
and then we try to multiply it (not its value) with the value of runif(1)
- which makes no sense. In order to properly assign the future variable, we need to put the future expression within curly brackets;
> x %<-% { 2 * runif(1) }
> x
[1] 1.030209
Parentheses will also do. For details on precedence on operators in R, see Section 'Infix and prefix operators' in the 'R Language Definition' document.
Copyright Henrik Bengtsson, 2015-2016