With the downsize
package, you can toggle the test and production versions of your workflow with the flip of a TRUE/FALSE
global option. This is helpful when your workflow takes a long time to run, you want to test it quickly, and unit testing is too reductionist to cover everything.
Say you want to analyze a large dataset.
big_data <- data.frame(x = rnorm(1e4), y = rnorm(1e4))
But for the sake of time, you want to test and debug your code on a smaller dataset. In your code, select your dataset with a call to downsize()
.
my_data <- downsize(big_data) # downsize(big = big_data)
Above, my_data
becomes big_data
if getOption("downsize")
is FALSE
or NULL
(default). If getOption("downsize")
is TRUE
, big_data
becomes head(big_data)
. You can toggle the global option downsize
with calls to scale_up()
and scale_down()
, and you can override the option with downsize(..., downsize = L)
, where L
is TRUE
or FALSE
. Check if the workflow is scaled up or down with the scaling()
function.
Here is an example script in test mode.
library(downsize)
scale_down() # scales the workflow appropriately
scaling() # shows if the workflow is scaled up or down
big_data <- data.frame(x = rnorm(1e4), y = rnorm(1e4)) # always large
my_data <- downsize(big_data) # either large or small
nrow(my_data) # responds to scale_down() and scale_up()
# ...more code, time-consuming if my_data is large...
To scale up the workflow to production mode, replace scale_down()
with scale_up()
and leave everything else exactly the same.
library(downsize)
scale_up() # scales the workflow appropriately
scaling() # shows if the workflow is scaled up or down
big_data <- data.frame(x = rnorm(1e4), y = rnorm(1e4)) # always large
my_data <- downsize(big_data) # either large or small
nrow(my_data) # responds to scale_down() and scale_up()
# ...more code, time-consuming if my_data is large...
Thus, tedium and human error are avoided, and the test is a close approximation to the original task at hand.
You can provide a replacement for big_data
using argument small
in downsize()
.
library(downsize)
big_data <- data.frame(x = rnorm(1e4), y = rnorm(1e4))
small_data <- data.frame(x = runif(16), y = runif(16))
scale_down()
scaling() # getOption("downsize") is TRUE
## [1] "scaled down"
my_data <- downsize(big_data, small_data) # downsize(big = big_data, small = small_data)
identical(my_data, small_data)
## [1] TRUE
If you set small
yourself, be sure that subsequent code can accept both small
and big
. For example, if small
is a data frame and big
is a matrix, your code may work fine in test mode and break in production mode. In addition, downsize()
will warn you if small
is identical to or bigger in memory than big
(disable with downsize(..., warn = FALSE
)). To be safer, use the subsetting capabilities of the downsize()
function.
The command my_data <- downsize(big = big_data)
is equivalent to my_data <- downsize(big = big_data, nrow = 6)
. There are multiple ways to subset argument big
in downsize()
when it is time to scale down. As in the following examples, be sure that small
is set to NULL
(default).
scale_down()
downsize(1:10, length = 2)
## [1] 1 2
m <- matrix(1:36, ncol = 6)
downsize(m, ncol = 2)
## [,1] [,2]
## [1,] 1 7
## [2,] 2 8
## [3,] 3 9
## [4,] 4 10
## [5,] 5 11
## [6,] 6 12
downsize(m, nrow = 2)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 7 13 19 25 31
## [2,] 2 8 14 20 26 32
downsize(m, dim = c(2, 2))
## [,1] [,2]
## [1,] 1 7
## [2,] 2 8
downsize(data.frame(x = 1:10, y = 1:10), nrow = 5)
## x y
## 1 1 1
## 2 2 2
## 3 3 3
## 4 4 4
## 5 5 5
x = array(0, dim = c(10, 100, 2, 300, 12))
dim(x)
## [1] 10 100 2 300 12
my_array <- downsize(x, dim = rep(3, 5))
dim(my_array)
## [1] 3 3 2 3 3
my_array <- downsize(x, dim = c(1, 4))
dim(my_array)
## [1] 1 4 2 300 12
my_array <- downsize(x, ncol = 1)
dim(my_array)
## [1] 10 1 2 300 12
Set random
to TRUE
to take a random subset of your data rather than just the first few rows or columns.
set.seed(6)
downsize(m, ncol = 2, random = T)
## [,1] [,2]
## [1,] 19 25
## [2,] 20 26
## [3,] 21 27
## [4,] 22 28
## [5,] 23 29
## [6,] 24 30
You can interchange entire blocks of code based on the scaling of the workload.
scale_down()
downsize(big = {a = 1; a + 10}, small = {a = 1; a + 1})
## [1] 2
scale_up()
downsize(big = {a = 1; a + 10}, small = {a = 1; a + 1})
## [1] 11
Variables set in code blocks are available after calls to downsize()
.
scale_down()
tmp <- downsize(
big = {
x = "long code"
y = 1000
},
small = {
x = "short code"
y = 3.14
})
x == "short code" & y == 3.14
## [1] TRUE
scale_up()
tmp <- downsize(
big = {
x = "long code"
y = 1000
},
small = {
x = "short code"
y = 3.14
})
x == "long code" & y == 1000
## [1] TRUE
The downsize()
function checks the values of big
and small
, so any code blocks passed as arguments must return non-NULL
values. To avoid warnings, the return value of small
should be different from and no larger in memory than that of big
(or just call downsize(..., warn = FALSE)
). If subsetter arguments such as dim
and length
are used, they apply to the return value of big
rather than any variables defined in the code block.
Please refer to TROUBLESHOOTING.md on the GitHub page for instructions.