As part of a reproducible workflow, caching of function calls, code chunks, and other elements of a project is a critical component. The objective of a reproducible workflow is is likely that an entire work flow from raw data to publication, decision support, report writing, presentation building etc., could be built and be reproducible anywhere, on any computer, operating system, with any starting conditions, on demand. The reproducible::Cache
function is built to work with any R function.
Cache
uses 2 key the archivist
functions saveToLocalRepo
and loadFromLocalRepo
, but does not use archivist::cache
. Similar to archivist::cache
, there is some reliance on digest::digest
to determine whether the arguments are identical in subsequent iterations; however, it also uses fastdigest::fastdigest
to make it substantially faster in many cases. It also but does many things that make standard caching with digest::digest
don’t work reliably between systems. For these, the function .robustDigest
is introduced to make caching transferable between systems. This is relevant for file paths, environments, parallel clusters, functions (which are contained within an environment), and many others (e.g., see ?.robustDigest
for methods). Cache
also adds important elements like automated tagging and the option to retrieve disk-cached values via stashed objects in memory using memoise::memoise
. This means that running Cache
1, 2, and 3 times on the same function will get progressively faster. This can be extremely useful for web apps built with, say shiny
.
Any function can be cached using: Cache(FUN = functionName, ...)
.
This will be a slight change to a function call, such as: projectRaster(raster, crs = crs(newRaster))
to Cache(projectRaster, raster, crs = crs(newRaster))
.
This is particularly useful for expensive operations.
library(raster)
## Loading required package: sp
library(reproducible)
tmpDir <- file.path(tempdir(), "reproducible_examples", "Cache")
checkPath(tmpDir, create = TRUE)
## [1] "/tmp/RtmprXj7l8/reproducible_examples/Cache"
ras <- raster(extent(0,1000,0,1000), vals = 1:1e6, res = 1)
crs(ras) <- "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +ellps=WGS84"
newCRS <- "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
# No Cache
system.time(map1 <- projectRaster(ras, crs = newCRS))
## user system elapsed
## 3.216 0.228 3.446
# With Cache -- a little slower the first time because saving to disk
system.time(map1 <- Cache(projectRaster, ras, crs = newCRS, cacheRepo = tmpDir,
notOlderThan = Sys.time()))
## user system elapsed
## 2.921 0.220 3.239
# vastly faster the second time
system.time(map2 <- Cache(projectRaster, ras, crs = newCRS, cacheRepo = tmpDir))
## loading cached result from previous projectRaster call, adding to memoised copy
## user system elapsed
## 0.167 0.007 0.174
# even faster the third time
system.time(map3 <- Cache(projectRaster, ras, crs = newCRS, cacheRepo = tmpDir))
## loading memoised result from previous projectRaster call.
## user system elapsed
## 0.051 0.003 0.055
all.equal(map1, map2) # TRUE
## [1] TRUE
all.equal(map1, map3) # TRUE
## [1] TRUE
library(raster)
# magrittr, if loaded, gives an error below
try(detach("package:magrittr", unload = TRUE), silent = TRUE)
try(clearCache(tmpDir), silent = TRUE) # just to make sure it is clear
ranNumsA <- Cache(rnorm, 10, 16, cacheRepo = tmpDir)
# All same
ranNumsB <- Cache(rnorm, 10, 16, cacheRepo = tmpDir) # recovers cached copy
## loading cached result from previous rnorm call, adding to memoised copy
ranNumsC <- rnorm(10, 16) %>% Cache(cacheRepo = tmpDir) # recovers cached copy
## loading memoised result from previous 'rnorm' pipe sequence call.
ranNumsD <- Cache(quote(rnorm(n = 10, 16)), cacheRepo = tmpDir) # recovers cached copy
## loading memoised result from previous rnorm call.
# Any minor change makes it different
ranNumsE <- rnorm(10, 6) %>% Cache(cacheRepo = tmpDir) # different
ranNumsA <- Cache(rnorm, 4, cacheRepo = tmpDir, userTags = "objectName:a")
ranNumsB <- Cache(runif, 4, cacheRepo = tmpDir, userTags = "objectName:b")
# access it again, from Cache
ranNumsA <- Cache(rnorm, 4, cacheRepo = tmpDir, userTags = "objectName:a")
## loading cached result from previous rnorm call, adding to memoised copy
wholeCache <- showCache(tmpDir)
## Cache size:
## Total (including Rasters): 12.3 Kb
## Selected objects (not including Rasters): 337 bytes
# keep only items accessed "recently" (i.e., only objectName:a)
onlyRecentlyAccessed <- showCache(tmpDir, userTags = max(wholeCache[tagKey == "accessed"]$tagValue))
## Cache size:
## Total (including Rasters): 12.3 Kb
## Selected objects (not including Rasters): 337 bytes
# inverse join with 2 data.tables ... using: a[!b]
# i.e., return all of wholeCache that was not recently accessed
toRemove <- unique(wholeCache[!onlyRecentlyAccessed], by = "artifact")$artifact
clearCache(tmpDir, toRemove) # remove ones not recently accessed
## Cache size:
## Total (including Rasters): 12.3 Kb
## Selected objects (not including Rasters): 337 bytes
showCache(tmpDir) # still has more recently accessed
## Cache size:
## Total (including Rasters): 12 Kb
## Selected objects (not including Rasters): 0 bytes
## Empty data.table (0 rows) of 3 cols: md5hash,name,createdDate
clearCache(tmpDir)
ranNumsA <- Cache(rnorm, 4, cacheRepo = tmpDir, userTags = "objectName:a")
ranNumsB <- Cache(runif, 4, cacheRepo = tmpDir, userTags = "objectName:b")
# keep only those cached items from the last 24 hours
oneDay <- 60 * 60 * 24
keepCache(tmpDir, after = Sys.time() - oneDay)
## Cache size:
## Total (including Rasters): 12.3 Kb
## Selected objects (not including Rasters): 337 bytes
## Cache size:
## Total (including Rasters): 12.3 Kb
## Selected objects (not including Rasters): 337 bytes
## artifact tagKey
## 1: 29ed7601511ded2524bd4033d3c376b7 format
## 2: 29ed7601511ded2524bd4033d3c376b7 name
## 3: 29ed7601511ded2524bd4033d3c376b7 class
## 4: 29ed7601511ded2524bd4033d3c376b7 date
## 5: 29ed7601511ded2524bd4033d3c376b7 cacheId
## 6: 29ed7601511ded2524bd4033d3c376b7 objectName
## 7: 29ed7601511ded2524bd4033d3c376b7 function
## 8: 29ed7601511ded2524bd4033d3c376b7 object.size
## 9: 29ed7601511ded2524bd4033d3c376b7 accessed
## 10: 29ed7601511ded2524bd4033d3c376b7 otherFunctions
## 11: 29ed7601511ded2524bd4033d3c376b7 otherFunctions
## 12: 29ed7601511ded2524bd4033d3c376b7 otherFunctions
## 13: 29ed7601511ded2524bd4033d3c376b7 otherFunctions
## 14: 29ed7601511ded2524bd4033d3c376b7 otherFunctions
## 15: 29ed7601511ded2524bd4033d3c376b7 preDigest
## 16: 29ed7601511ded2524bd4033d3c376b7 preDigest
## 17: 9f857dee1e70c48ab0bc195ab49203dd format
## 18: 9f857dee1e70c48ab0bc195ab49203dd name
## 19: 9f857dee1e70c48ab0bc195ab49203dd class
## 20: 9f857dee1e70c48ab0bc195ab49203dd date
## 21: 9f857dee1e70c48ab0bc195ab49203dd cacheId
## 22: 9f857dee1e70c48ab0bc195ab49203dd objectName
## 23: 9f857dee1e70c48ab0bc195ab49203dd function
## 24: 9f857dee1e70c48ab0bc195ab49203dd object.size
## 25: 9f857dee1e70c48ab0bc195ab49203dd accessed
## 26: 9f857dee1e70c48ab0bc195ab49203dd otherFunctions
## 27: 9f857dee1e70c48ab0bc195ab49203dd otherFunctions
## 28: 9f857dee1e70c48ab0bc195ab49203dd otherFunctions
## 29: 9f857dee1e70c48ab0bc195ab49203dd otherFunctions
## 30: 9f857dee1e70c48ab0bc195ab49203dd otherFunctions
## 31: 9f857dee1e70c48ab0bc195ab49203dd preDigest
## 32: 9f857dee1e70c48ab0bc195ab49203dd preDigest
## artifact tagKey
## tagValue createdDate
## 1: rda 2018-06-15 10:23:32
## 2: Cache 2018-06-15 10:23:32
## 3: numeric 2018-06-15 10:23:32
## 4: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 5: e37bb635c97bc2eeecab63816b881bbc 2018-06-15 10:23:32
## 6: b 2018-06-15 10:23:32
## 7: runif 2018-06-15 10:23:32
## 8: 688 2018-06-15 10:23:32
## 9: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 10: process_file 2018-06-15 10:23:32
## 11: process_group 2018-06-15 10:23:32
## 12: process_group.block 2018-06-15 10:23:32
## 13: call_block 2018-06-15 10:23:32
## 14: block_exec 2018-06-15 10:23:32
## 15: n:969a49ec15bcd4323ff31538af321264 2018-06-15 10:23:32
## 16: .FUN:d2631d24c3b38b89c7bdd4ab7faaaac3 2018-06-15 10:23:32
## 17: rda 2018-06-15 10:23:32
## 18: Cache 2018-06-15 10:23:32
## 19: numeric 2018-06-15 10:23:32
## 20: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 21: 85874f26b2e0c1ef689a7d379d275ebf 2018-06-15 10:23:32
## 22: a 2018-06-15 10:23:32
## 23: rnorm 2018-06-15 10:23:32
## 24: 688 2018-06-15 10:23:32
## 25: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 26: process_file 2018-06-15 10:23:32
## 27: process_group 2018-06-15 10:23:32
## 28: process_group.block 2018-06-15 10:23:32
## 29: call_block 2018-06-15 10:23:32
## 30: block_exec 2018-06-15 10:23:32
## 31: n:969a49ec15bcd4323ff31538af321264 2018-06-15 10:23:32
## 32: .FUN:7e9a928f110f80b3612e71883a6ec1f4 2018-06-15 10:23:32
## tagValue createdDate
# Keep all Cache items created with an rnorm() call
keepCache(tmpDir, userTags = "rnorm")
## Cache size:
## Total (including Rasters): 12.3 Kb
## Selected objects (not including Rasters): 337 bytes
## Cache size:
## Total (including Rasters): 12.3 Kb
## Selected objects (not including Rasters): 171 bytes
## Cache size:
## Total (including Rasters): 12.3 Kb
## Selected objects (not including Rasters): 166 bytes
## artifact tagKey
## 1: 9f857dee1e70c48ab0bc195ab49203dd format
## 2: 9f857dee1e70c48ab0bc195ab49203dd name
## 3: 9f857dee1e70c48ab0bc195ab49203dd class
## 4: 9f857dee1e70c48ab0bc195ab49203dd date
## 5: 9f857dee1e70c48ab0bc195ab49203dd cacheId
## 6: 9f857dee1e70c48ab0bc195ab49203dd objectName
## 7: 9f857dee1e70c48ab0bc195ab49203dd function
## 8: 9f857dee1e70c48ab0bc195ab49203dd object.size
## 9: 9f857dee1e70c48ab0bc195ab49203dd accessed
## 10: 9f857dee1e70c48ab0bc195ab49203dd otherFunctions
## 11: 9f857dee1e70c48ab0bc195ab49203dd otherFunctions
## 12: 9f857dee1e70c48ab0bc195ab49203dd otherFunctions
## 13: 9f857dee1e70c48ab0bc195ab49203dd otherFunctions
## 14: 9f857dee1e70c48ab0bc195ab49203dd otherFunctions
## 15: 9f857dee1e70c48ab0bc195ab49203dd preDigest
## 16: 9f857dee1e70c48ab0bc195ab49203dd preDigest
## tagValue createdDate
## 1: rda 2018-06-15 10:23:32
## 2: Cache 2018-06-15 10:23:32
## 3: numeric 2018-06-15 10:23:32
## 4: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 5: 85874f26b2e0c1ef689a7d379d275ebf 2018-06-15 10:23:32
## 6: a 2018-06-15 10:23:32
## 7: rnorm 2018-06-15 10:23:32
## 8: 688 2018-06-15 10:23:32
## 9: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 10: process_file 2018-06-15 10:23:32
## 11: process_group 2018-06-15 10:23:32
## 12: process_group.block 2018-06-15 10:23:32
## 13: call_block 2018-06-15 10:23:32
## 14: block_exec 2018-06-15 10:23:32
## 15: n:969a49ec15bcd4323ff31538af321264 2018-06-15 10:23:32
## 16: .FUN:7e9a928f110f80b3612e71883a6ec1f4 2018-06-15 10:23:32
# Remove all Cache items that happened within a rnorm() call
clearCache(tmpDir, userTags = "rnorm")
## Cache size:
## Total (including Rasters): 12.2 Kb
## Selected objects (not including Rasters): 171 bytes
showCache(tmpDir) ## empty
## Cache size:
## Total (including Rasters): 12 Kb
## Selected objects (not including Rasters): 0 bytes
## Empty data.table (0 rows) of 3 cols: md5hash,name,createdDate
# default userTags is "and" matching; for "or" matching use |
ranNumsA <- Cache(runif, 4, cacheRepo = tmpDir, userTags = "objectName:a")
ranNumsB <- Cache(rnorm, 4, cacheRepo = tmpDir, userTags = "objectName:b")
# show all objects (runif and rnorm in this case)
showCache(tmpDir)
## Cache size:
## Total (including Rasters): 12.3 Kb
## Selected objects (not including Rasters): 338 bytes
## artifact tagKey
## 1: 35858207db8172f61dd3740656f776c1 format
## 2: 35858207db8172f61dd3740656f776c1 name
## 3: 35858207db8172f61dd3740656f776c1 class
## 4: 35858207db8172f61dd3740656f776c1 date
## 5: 35858207db8172f61dd3740656f776c1 cacheId
## 6: 35858207db8172f61dd3740656f776c1 objectName
## 7: 35858207db8172f61dd3740656f776c1 function
## 8: 35858207db8172f61dd3740656f776c1 object.size
## 9: 35858207db8172f61dd3740656f776c1 accessed
## 10: 35858207db8172f61dd3740656f776c1 otherFunctions
## 11: 35858207db8172f61dd3740656f776c1 otherFunctions
## 12: 35858207db8172f61dd3740656f776c1 otherFunctions
## 13: 35858207db8172f61dd3740656f776c1 otherFunctions
## 14: 35858207db8172f61dd3740656f776c1 otherFunctions
## 15: 35858207db8172f61dd3740656f776c1 preDigest
## 16: 35858207db8172f61dd3740656f776c1 preDigest
## 17: dd46a68e7d2a13babc75346de9b9db9e format
## 18: dd46a68e7d2a13babc75346de9b9db9e name
## 19: dd46a68e7d2a13babc75346de9b9db9e class
## 20: dd46a68e7d2a13babc75346de9b9db9e date
## 21: dd46a68e7d2a13babc75346de9b9db9e cacheId
## 22: dd46a68e7d2a13babc75346de9b9db9e objectName
## 23: dd46a68e7d2a13babc75346de9b9db9e function
## 24: dd46a68e7d2a13babc75346de9b9db9e object.size
## 25: dd46a68e7d2a13babc75346de9b9db9e accessed
## 26: dd46a68e7d2a13babc75346de9b9db9e otherFunctions
## 27: dd46a68e7d2a13babc75346de9b9db9e otherFunctions
## 28: dd46a68e7d2a13babc75346de9b9db9e otherFunctions
## 29: dd46a68e7d2a13babc75346de9b9db9e otherFunctions
## 30: dd46a68e7d2a13babc75346de9b9db9e otherFunctions
## 31: dd46a68e7d2a13babc75346de9b9db9e preDigest
## 32: dd46a68e7d2a13babc75346de9b9db9e preDigest
## artifact tagKey
## tagValue createdDate
## 1: rda 2018-06-15 10:23:32
## 2: Cache 2018-06-15 10:23:32
## 3: numeric 2018-06-15 10:23:32
## 4: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 5: e37bb635c97bc2eeecab63816b881bbc 2018-06-15 10:23:32
## 6: a 2018-06-15 10:23:32
## 7: runif 2018-06-15 10:23:32
## 8: 688 2018-06-15 10:23:32
## 9: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 10: process_file 2018-06-15 10:23:32
## 11: process_group 2018-06-15 10:23:32
## 12: process_group.block 2018-06-15 10:23:32
## 13: call_block 2018-06-15 10:23:32
## 14: block_exec 2018-06-15 10:23:32
## 15: n:969a49ec15bcd4323ff31538af321264 2018-06-15 10:23:32
## 16: .FUN:d2631d24c3b38b89c7bdd4ab7faaaac3 2018-06-15 10:23:32
## 17: rda 2018-06-15 10:23:32
## 18: Cache 2018-06-15 10:23:32
## 19: numeric 2018-06-15 10:23:32
## 20: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 21: 85874f26b2e0c1ef689a7d379d275ebf 2018-06-15 10:23:32
## 22: b 2018-06-15 10:23:32
## 23: rnorm 2018-06-15 10:23:32
## 24: 688 2018-06-15 10:23:32
## 25: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 26: process_file 2018-06-15 10:23:32
## 27: process_group 2018-06-15 10:23:32
## 28: process_group.block 2018-06-15 10:23:32
## 29: call_block 2018-06-15 10:23:32
## 30: block_exec 2018-06-15 10:23:32
## 31: n:969a49ec15bcd4323ff31538af321264 2018-06-15 10:23:32
## 32: .FUN:7e9a928f110f80b3612e71883a6ec1f4 2018-06-15 10:23:33
## tagValue createdDate
# show objects that are both runif and rnorm
# (i.e., none in this case, because objecs are either or, not both)
showCache(tmpDir, userTags = c("runif", "rnorm")) ## empty
## Cache size:
## Total (including Rasters): 12.3 Kb
## Selected objects (not including Rasters): 338 bytes
## Empty data.table (0 rows) of 4 cols: artifact,tagKey,tagValue,createdDate
# show objects that are either runif or rnorm ("or" search)
showCache(tmpDir, userTags = "runif|rnorm")
## Cache size:
## Total (including Rasters): 12.3 Kb
## Selected objects (not including Rasters): 338 bytes
## artifact tagKey
## 1: 35858207db8172f61dd3740656f776c1 format
## 2: 35858207db8172f61dd3740656f776c1 name
## 3: 35858207db8172f61dd3740656f776c1 class
## 4: 35858207db8172f61dd3740656f776c1 date
## 5: 35858207db8172f61dd3740656f776c1 cacheId
## 6: 35858207db8172f61dd3740656f776c1 objectName
## 7: 35858207db8172f61dd3740656f776c1 function
## 8: 35858207db8172f61dd3740656f776c1 object.size
## 9: 35858207db8172f61dd3740656f776c1 accessed
## 10: 35858207db8172f61dd3740656f776c1 otherFunctions
## 11: 35858207db8172f61dd3740656f776c1 otherFunctions
## 12: 35858207db8172f61dd3740656f776c1 otherFunctions
## 13: 35858207db8172f61dd3740656f776c1 otherFunctions
## 14: 35858207db8172f61dd3740656f776c1 otherFunctions
## 15: 35858207db8172f61dd3740656f776c1 preDigest
## 16: 35858207db8172f61dd3740656f776c1 preDigest
## 17: dd46a68e7d2a13babc75346de9b9db9e format
## 18: dd46a68e7d2a13babc75346de9b9db9e name
## 19: dd46a68e7d2a13babc75346de9b9db9e class
## 20: dd46a68e7d2a13babc75346de9b9db9e date
## 21: dd46a68e7d2a13babc75346de9b9db9e cacheId
## 22: dd46a68e7d2a13babc75346de9b9db9e objectName
## 23: dd46a68e7d2a13babc75346de9b9db9e function
## 24: dd46a68e7d2a13babc75346de9b9db9e object.size
## 25: dd46a68e7d2a13babc75346de9b9db9e accessed
## 26: dd46a68e7d2a13babc75346de9b9db9e otherFunctions
## 27: dd46a68e7d2a13babc75346de9b9db9e otherFunctions
## 28: dd46a68e7d2a13babc75346de9b9db9e otherFunctions
## 29: dd46a68e7d2a13babc75346de9b9db9e otherFunctions
## 30: dd46a68e7d2a13babc75346de9b9db9e otherFunctions
## 31: dd46a68e7d2a13babc75346de9b9db9e preDigest
## 32: dd46a68e7d2a13babc75346de9b9db9e preDigest
## artifact tagKey
## tagValue createdDate
## 1: rda 2018-06-15 10:23:32
## 2: Cache 2018-06-15 10:23:32
## 3: numeric 2018-06-15 10:23:32
## 4: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 5: e37bb635c97bc2eeecab63816b881bbc 2018-06-15 10:23:32
## 6: a 2018-06-15 10:23:32
## 7: runif 2018-06-15 10:23:32
## 8: 688 2018-06-15 10:23:32
## 9: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 10: process_file 2018-06-15 10:23:32
## 11: process_group 2018-06-15 10:23:32
## 12: process_group.block 2018-06-15 10:23:32
## 13: call_block 2018-06-15 10:23:32
## 14: block_exec 2018-06-15 10:23:32
## 15: n:969a49ec15bcd4323ff31538af321264 2018-06-15 10:23:32
## 16: .FUN:d2631d24c3b38b89c7bdd4ab7faaaac3 2018-06-15 10:23:32
## 17: rda 2018-06-15 10:23:32
## 18: Cache 2018-06-15 10:23:32
## 19: numeric 2018-06-15 10:23:32
## 20: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 21: 85874f26b2e0c1ef689a7d379d275ebf 2018-06-15 10:23:32
## 22: b 2018-06-15 10:23:32
## 23: rnorm 2018-06-15 10:23:32
## 24: 688 2018-06-15 10:23:32
## 25: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 26: process_file 2018-06-15 10:23:32
## 27: process_group 2018-06-15 10:23:32
## 28: process_group.block 2018-06-15 10:23:32
## 29: call_block 2018-06-15 10:23:32
## 30: block_exec 2018-06-15 10:23:32
## 31: n:969a49ec15bcd4323ff31538af321264 2018-06-15 10:23:32
## 32: .FUN:7e9a928f110f80b3612e71883a6ec1f4 2018-06-15 10:23:33
## tagValue createdDate
# keep only objects that are either runif or rnorm ("or" search)
keepCache(tmpDir, userTags = "runif|rnorm")
## Cache size:
## Total (including Rasters): 12.3 Kb
## Selected objects (not including Rasters): 338 bytes
## Cache size:
## Total (including Rasters): 12.3 Kb
## Selected objects (not including Rasters): 338 bytes
## artifact tagKey
## 1: 35858207db8172f61dd3740656f776c1 format
## 2: 35858207db8172f61dd3740656f776c1 name
## 3: 35858207db8172f61dd3740656f776c1 class
## 4: 35858207db8172f61dd3740656f776c1 date
## 5: 35858207db8172f61dd3740656f776c1 cacheId
## 6: 35858207db8172f61dd3740656f776c1 objectName
## 7: 35858207db8172f61dd3740656f776c1 function
## 8: 35858207db8172f61dd3740656f776c1 object.size
## 9: 35858207db8172f61dd3740656f776c1 accessed
## 10: 35858207db8172f61dd3740656f776c1 otherFunctions
## 11: 35858207db8172f61dd3740656f776c1 otherFunctions
## 12: 35858207db8172f61dd3740656f776c1 otherFunctions
## 13: 35858207db8172f61dd3740656f776c1 otherFunctions
## 14: 35858207db8172f61dd3740656f776c1 otherFunctions
## 15: 35858207db8172f61dd3740656f776c1 preDigest
## 16: 35858207db8172f61dd3740656f776c1 preDigest
## 17: dd46a68e7d2a13babc75346de9b9db9e format
## 18: dd46a68e7d2a13babc75346de9b9db9e name
## 19: dd46a68e7d2a13babc75346de9b9db9e class
## 20: dd46a68e7d2a13babc75346de9b9db9e date
## 21: dd46a68e7d2a13babc75346de9b9db9e cacheId
## 22: dd46a68e7d2a13babc75346de9b9db9e objectName
## 23: dd46a68e7d2a13babc75346de9b9db9e function
## 24: dd46a68e7d2a13babc75346de9b9db9e object.size
## 25: dd46a68e7d2a13babc75346de9b9db9e accessed
## 26: dd46a68e7d2a13babc75346de9b9db9e otherFunctions
## 27: dd46a68e7d2a13babc75346de9b9db9e otherFunctions
## 28: dd46a68e7d2a13babc75346de9b9db9e otherFunctions
## 29: dd46a68e7d2a13babc75346de9b9db9e otherFunctions
## 30: dd46a68e7d2a13babc75346de9b9db9e otherFunctions
## 31: dd46a68e7d2a13babc75346de9b9db9e preDigest
## 32: dd46a68e7d2a13babc75346de9b9db9e preDigest
## artifact tagKey
## tagValue createdDate
## 1: rda 2018-06-15 10:23:32
## 2: Cache 2018-06-15 10:23:32
## 3: numeric 2018-06-15 10:23:32
## 4: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 5: e37bb635c97bc2eeecab63816b881bbc 2018-06-15 10:23:32
## 6: a 2018-06-15 10:23:32
## 7: runif 2018-06-15 10:23:32
## 8: 688 2018-06-15 10:23:32
## 9: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 10: process_file 2018-06-15 10:23:32
## 11: process_group 2018-06-15 10:23:32
## 12: process_group.block 2018-06-15 10:23:32
## 13: call_block 2018-06-15 10:23:32
## 14: block_exec 2018-06-15 10:23:32
## 15: n:969a49ec15bcd4323ff31538af321264 2018-06-15 10:23:32
## 16: .FUN:d2631d24c3b38b89c7bdd4ab7faaaac3 2018-06-15 10:23:32
## 17: rda 2018-06-15 10:23:32
## 18: Cache 2018-06-15 10:23:32
## 19: numeric 2018-06-15 10:23:32
## 20: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 21: 85874f26b2e0c1ef689a7d379d275ebf 2018-06-15 10:23:32
## 22: b 2018-06-15 10:23:32
## 23: rnorm 2018-06-15 10:23:32
## 24: 688 2018-06-15 10:23:32
## 25: 2018-06-15 10:23:32 2018-06-15 10:23:32
## 26: process_file 2018-06-15 10:23:32
## 27: process_group 2018-06-15 10:23:32
## 28: process_group.block 2018-06-15 10:23:32
## 29: call_block 2018-06-15 10:23:32
## 30: block_exec 2018-06-15 10:23:32
## 31: n:969a49ec15bcd4323ff31538af321264 2018-06-15 10:23:32
## 32: .FUN:7e9a928f110f80b3612e71883a6ec1f4 2018-06-15 10:23:33
## tagValue createdDate
clearCache(tmpDir)
ras <- raster(extent(0, 5, 0, 5), res = 1,
vals = sample(1:5, replace = TRUE, size = 25),
crs = "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +ellps=WGS84")
# A slow operation, like GIS operation
notCached <- suppressWarnings(
# project raster generates warnings when run non-interactively
projectRaster(ras, crs = crs(ras), res = 5, cacheRepo = tmpDir)
)
cached <- suppressWarnings(
# project raster generates warnings when run non-interactively
# using quote works also
Cache(projectRaster, ras, crs = crs(ras), res = 5, cacheRepo = tmpDir)
)
# second time is much faster
reRun <- suppressWarnings(
# project raster generates warnings when run non-interactively
Cache(projectRaster, ras, crs = crs(ras), res = 5, cacheRepo = tmpDir)
)
## loading cached result from previous projectRaster call, adding to memoised copy
# recovered cached version is same as non-cached version
all.equal(notCached, reRun) ## TRUE
## [1] TRUE
Nested caching, which is when Caching of a function occurs inside an outer function, which is itself cached. This is a critical element to working within a reproducible work flow. It is not enough during development to cache flat code chunks, as there will be many levels of “slow” functions. Ideally, at all points in a development cycle, it should be possible to get to any line of code starting from the very initial steps, running through everything up to that point, in less that 1 second. If the workflow can be kept very fast like this, then there is a guarantee that it will work at any point.
##########################
## Nested Caching
# Make 2 functions
inner <- function(mean) {
d <- 1
Cache(rnorm, n = 3, mean = mean)
}
outer <- function(n) {
Cache(inner, 0.1, cacheRepo = tmpdir2)
}
# make 2 different cache paths
tmpdir1 <- file.path(tempdir(), "first")
tmpdir2 <- file.path(tempdir(), "second")
# Run the Cache ... notOlderThan propagates to all 3 Cache calls,
# but cacheRepo is tmpdir1 in top level Cache and all nested
# Cache calls, unless individually overridden ... here inner
# uses tmpdir2 repository
Cache(outer, n = 2, cacheRepo = tmpdir1, notOlderThan = Sys.time())
## [1] 1.215708 1.381725 1.017966
## attr(,"tags")
## [1] "cacheId:e09bf93970f9e94d5c639cfa8ca722f0"
## attr(,"newCache")
## [1] TRUE
## attr(,"call")
## [1] ""
showCache(tmpdir1) # 2 function calls
## Cache size:
## Total (including Rasters): 12.3 Kb
## Selected objects (not including Rasters): 327 bytes
## artifact tagKey
## 1: 52c58d46b069902fae2967a18b97c193 format
## 2: 52c58d46b069902fae2967a18b97c193 name
## 3: 52c58d46b069902fae2967a18b97c193 class
## 4: 52c58d46b069902fae2967a18b97c193 date
## 5: 52c58d46b069902fae2967a18b97c193 cacheId
## 6: 52c58d46b069902fae2967a18b97c193 function
## 7: 52c58d46b069902fae2967a18b97c193 object.size
## 8: 52c58d46b069902fae2967a18b97c193 accessed
## 9: 52c58d46b069902fae2967a18b97c193 otherFunctions
## 10: 52c58d46b069902fae2967a18b97c193 otherFunctions
## 11: 52c58d46b069902fae2967a18b97c193 otherFunctions
## 12: 52c58d46b069902fae2967a18b97c193 otherFunctions
## 13: 52c58d46b069902fae2967a18b97c193 otherFunctions
## 14: 52c58d46b069902fae2967a18b97c193 preDigest
## 15: 52c58d46b069902fae2967a18b97c193 preDigest
## 16: 82e9927aa53be5aef0783d17151b2fbc format
## 17: 82e9927aa53be5aef0783d17151b2fbc name
## 18: 82e9927aa53be5aef0783d17151b2fbc class
## 19: 82e9927aa53be5aef0783d17151b2fbc date
## 20: 82e9927aa53be5aef0783d17151b2fbc cacheId
## 21: 82e9927aa53be5aef0783d17151b2fbc function
## 22: 82e9927aa53be5aef0783d17151b2fbc object.size
## 23: 82e9927aa53be5aef0783d17151b2fbc accessed
## 24: 82e9927aa53be5aef0783d17151b2fbc otherFunctions
## 25: 82e9927aa53be5aef0783d17151b2fbc otherFunctions
## 26: 82e9927aa53be5aef0783d17151b2fbc otherFunctions
## 27: 82e9927aa53be5aef0783d17151b2fbc otherFunctions
## 28: 82e9927aa53be5aef0783d17151b2fbc otherFunctions
## 29: 82e9927aa53be5aef0783d17151b2fbc otherFunctions
## 30: 82e9927aa53be5aef0783d17151b2fbc preDigest
## 31: 82e9927aa53be5aef0783d17151b2fbc preDigest
## 32: 82e9927aa53be5aef0783d17151b2fbc preDigest
## artifact tagKey
## tagValue createdDate
## 1: rda 2018-06-15 10:23:33
## 2: Cache 2018-06-15 10:23:33
## 3: numeric 2018-06-15 10:23:33
## 4: 2018-06-15 10:23:33 2018-06-15 10:23:33
## 5: e09bf93970f9e94d5c639cfa8ca722f0 2018-06-15 10:23:33
## 6: outer 2018-06-15 10:23:33
## 7: 688 2018-06-15 10:23:33
## 8: 2018-06-15 10:23:33 2018-06-15 10:23:33
## 9: process_file 2018-06-15 10:23:33
## 10: process_group 2018-06-15 10:23:33
## 11: process_group.block 2018-06-15 10:23:33
## 12: call_block 2018-06-15 10:23:33
## 13: block_exec 2018-06-15 10:23:33
## 14: n:8128a6180a705341ab7c05cfa945edfb 2018-06-15 10:23:33
## 15: .FUN:b5f6bcbdd9f23e39c2c5d4020e73a6ff 2018-06-15 10:23:33
## 16: rda 2018-06-15 10:23:33
## 17: Cache 2018-06-15 10:23:33
## 18: numeric 2018-06-15 10:23:33
## 19: 2018-06-15 10:23:33 2018-06-15 10:23:33
## 20: cec73d63ad3864af8bcd7efc5fae864d 2018-06-15 10:23:33
## 21: rnorm 2018-06-15 10:23:33
## 22: 688 2018-06-15 10:23:33
## 23: 2018-06-15 10:23:33 2018-06-15 10:23:33
## 24: process_file 2018-06-15 10:23:33
## 25: process_group 2018-06-15 10:23:33
## 26: process_group.block 2018-06-15 10:23:33
## 27: call_block 2018-06-15 10:23:33
## 28: block_exec 2018-06-15 10:23:33
## 29: do.call 2018-06-15 10:23:33
## 30: n:4ae3e6b6364de42fdc243469d73448cc 2018-06-15 10:23:33
## 31: mean:c28b87a0be6a99966bdaa5e556974b43 2018-06-15 10:23:33
## 32: .FUN:7e9a928f110f80b3612e71883a6ec1f4 2018-06-15 10:23:33
## tagValue createdDate
showCache(tmpdir2) # 1 function call
## Cache size:
## Total (including Rasters): 12.2 Kb
## Selected objects (not including Rasters): 164 bytes
## artifact tagKey
## 1: 052fbbc504c95d009eaacfbb84dca968 format
## 2: 052fbbc504c95d009eaacfbb84dca968 name
## 3: 052fbbc504c95d009eaacfbb84dca968 class
## 4: 052fbbc504c95d009eaacfbb84dca968 date
## 5: 052fbbc504c95d009eaacfbb84dca968 cacheId
## 6: 052fbbc504c95d009eaacfbb84dca968 function
## 7: 052fbbc504c95d009eaacfbb84dca968 object.size
## 8: 052fbbc504c95d009eaacfbb84dca968 accessed
## 9: 052fbbc504c95d009eaacfbb84dca968 otherFunctions
## 10: 052fbbc504c95d009eaacfbb84dca968 otherFunctions
## 11: 052fbbc504c95d009eaacfbb84dca968 otherFunctions
## 12: 052fbbc504c95d009eaacfbb84dca968 otherFunctions
## 13: 052fbbc504c95d009eaacfbb84dca968 otherFunctions
## 14: 052fbbc504c95d009eaacfbb84dca968 otherFunctions
## 15: 052fbbc504c95d009eaacfbb84dca968 preDigest
## 16: 052fbbc504c95d009eaacfbb84dca968 preDigest
## tagValue createdDate
## 1: rda 2018-06-15 10:23:33
## 2: Cache 2018-06-15 10:23:33
## 3: numeric 2018-06-15 10:23:33
## 4: 2018-06-15 10:23:33 2018-06-15 10:23:33
## 5: 19b808ac6871e0184e63c421a116cb61 2018-06-15 10:23:33
## 6: inner 2018-06-15 10:23:33
## 7: 688 2018-06-15 10:23:33
## 8: 2018-06-15 10:23:33 2018-06-15 10:23:33
## 9: process_file 2018-06-15 10:23:33
## 10: process_group 2018-06-15 10:23:33
## 11: process_group.block 2018-06-15 10:23:33
## 12: call_block 2018-06-15 10:23:33
## 13: block_exec 2018-06-15 10:23:33
## 14: do.call 2018-06-15 10:23:33
## 15: mean:c28b87a0be6a99966bdaa5e556974b43 2018-06-15 10:23:33
## 16: .FUN:56a1302d7ef43383766d7af6ca072c4e 2018-06-15 10:23:33
# userTags get appended
# all items have the outer tag propagate, plus inner ones only have inner ones
clearCache(tmpdir1)
outerTag <- "outerTag"
innerTag <- "innerTag"
inner <- function(mean) {
d <- 1
Cache(rnorm, n = 3, mean = mean, notOlderThan = Sys.time() - 1e5, userTags = innerTag)
}
outer <- function(n) {
Cache(inner, 0.1)
}
aa <- Cache(outer, n = 2, cacheRepo = tmpdir1, userTags = outerTag)
showCache(tmpdir1) # rnorm function has outerTag and innerTag, inner and outer only have outerTag
## Cache size:
## Total (including Rasters): 20.5 Kb
## Selected objects (not including Rasters): 491 bytes
## artifact tagKey
## 1: 26e98ec0cb27c56caf44b823919b2ee1 format
## 2: 26e98ec0cb27c56caf44b823919b2ee1 name
## 3: 26e98ec0cb27c56caf44b823919b2ee1 class
## 4: 26e98ec0cb27c56caf44b823919b2ee1 date
## 5: 26e98ec0cb27c56caf44b823919b2ee1 cacheId
## 6: 26e98ec0cb27c56caf44b823919b2ee1 outerTag
## 7: 26e98ec0cb27c56caf44b823919b2ee1 function
## 8: 26e98ec0cb27c56caf44b823919b2ee1 object.size
## 9: 26e98ec0cb27c56caf44b823919b2ee1 accessed
## 10: 26e98ec0cb27c56caf44b823919b2ee1 otherFunctions
## 11: 26e98ec0cb27c56caf44b823919b2ee1 otherFunctions
## 12: 26e98ec0cb27c56caf44b823919b2ee1 otherFunctions
## 13: 26e98ec0cb27c56caf44b823919b2ee1 otherFunctions
## 14: 26e98ec0cb27c56caf44b823919b2ee1 otherFunctions
## 15: 26e98ec0cb27c56caf44b823919b2ee1 preDigest
## 16: 26e98ec0cb27c56caf44b823919b2ee1 preDigest
## 17: ba930d7f0b75e548e8e3dca57b07105e format
## 18: ba930d7f0b75e548e8e3dca57b07105e name
## 19: ba930d7f0b75e548e8e3dca57b07105e class
## 20: ba930d7f0b75e548e8e3dca57b07105e date
## 21: ba930d7f0b75e548e8e3dca57b07105e cacheId
## 22: ba930d7f0b75e548e8e3dca57b07105e innerTag
## 23: ba930d7f0b75e548e8e3dca57b07105e outerTag
## 24: ba930d7f0b75e548e8e3dca57b07105e function
## 25: ba930d7f0b75e548e8e3dca57b07105e object.size
## 26: ba930d7f0b75e548e8e3dca57b07105e accessed
## 27: ba930d7f0b75e548e8e3dca57b07105e otherFunctions
## 28: ba930d7f0b75e548e8e3dca57b07105e otherFunctions
## 29: ba930d7f0b75e548e8e3dca57b07105e otherFunctions
## 30: ba930d7f0b75e548e8e3dca57b07105e otherFunctions
## 31: ba930d7f0b75e548e8e3dca57b07105e otherFunctions
## 32: ba930d7f0b75e548e8e3dca57b07105e otherFunctions
## 33: ba930d7f0b75e548e8e3dca57b07105e preDigest
## 34: ba930d7f0b75e548e8e3dca57b07105e preDigest
## 35: ba930d7f0b75e548e8e3dca57b07105e preDigest
## 36: e88b80c44d0c6120929c6a83c344d76a format
## 37: e88b80c44d0c6120929c6a83c344d76a name
## 38: e88b80c44d0c6120929c6a83c344d76a class
## 39: e88b80c44d0c6120929c6a83c344d76a date
## 40: e88b80c44d0c6120929c6a83c344d76a cacheId
## 41: e88b80c44d0c6120929c6a83c344d76a outerTag
## 42: e88b80c44d0c6120929c6a83c344d76a function
## 43: e88b80c44d0c6120929c6a83c344d76a object.size
## 44: e88b80c44d0c6120929c6a83c344d76a accessed
## 45: e88b80c44d0c6120929c6a83c344d76a otherFunctions
## 46: e88b80c44d0c6120929c6a83c344d76a otherFunctions
## 47: e88b80c44d0c6120929c6a83c344d76a otherFunctions
## 48: e88b80c44d0c6120929c6a83c344d76a otherFunctions
## 49: e88b80c44d0c6120929c6a83c344d76a otherFunctions
## 50: e88b80c44d0c6120929c6a83c344d76a otherFunctions
## 51: e88b80c44d0c6120929c6a83c344d76a preDigest
## 52: e88b80c44d0c6120929c6a83c344d76a preDigest
## artifact tagKey
## tagValue createdDate
## 1: rda 2018-06-15 10:23:33
## 2: Cache 2018-06-15 10:23:33
## 3: numeric 2018-06-15 10:23:33
## 4: 2018-06-15 10:23:33 2018-06-15 10:23:33
## 5: 44f57deb36c53cd9c395e04c51fea77a 2018-06-15 10:23:33
## 6: outerTag 2018-06-15 10:23:33
## 7: outer 2018-06-15 10:23:33
## 8: 688 2018-06-15 10:23:33
## 9: 2018-06-15 10:23:33 2018-06-15 10:23:33
## 10: process_file 2018-06-15 10:23:33
## 11: process_group 2018-06-15 10:23:34
## 12: process_group.block 2018-06-15 10:23:34
## 13: call_block 2018-06-15 10:23:34
## 14: block_exec 2018-06-15 10:23:34
## 15: n:8128a6180a705341ab7c05cfa945edfb 2018-06-15 10:23:34
## 16: .FUN:62302feda89e19149a56ca40fde725e1 2018-06-15 10:23:34
## 17: rda 2018-06-15 10:23:33
## 18: Cache 2018-06-15 10:23:33
## 19: numeric 2018-06-15 10:23:33
## 20: 2018-06-15 10:23:33 2018-06-15 10:23:33
## 21: cec73d63ad3864af8bcd7efc5fae864d 2018-06-15 10:23:33
## 22: innerTag 2018-06-15 10:23:33
## 23: outerTag 2018-06-15 10:23:33
## 24: rnorm 2018-06-15 10:23:33
## 25: 688 2018-06-15 10:23:33
## 26: 2018-06-15 10:23:33 2018-06-15 10:23:33
## 27: process_file 2018-06-15 10:23:33
## 28: process_group 2018-06-15 10:23:33
## 29: process_group.block 2018-06-15 10:23:33
## 30: call_block 2018-06-15 10:23:33
## 31: block_exec 2018-06-15 10:23:33
## 32: do.call 2018-06-15 10:23:33
## 33: n:4ae3e6b6364de42fdc243469d73448cc 2018-06-15 10:23:33
## 34: mean:c28b87a0be6a99966bdaa5e556974b43 2018-06-15 10:23:33
## 35: .FUN:7e9a928f110f80b3612e71883a6ec1f4 2018-06-15 10:23:33
## 36: rda 2018-06-15 10:23:33
## 37: Cache 2018-06-15 10:23:33
## 38: numeric 2018-06-15 10:23:33
## 39: 2018-06-15 10:23:33 2018-06-15 10:23:33
## 40: 994d1330fbd961f795ab0dc508271963 2018-06-15 10:23:33
## 41: outerTag 2018-06-15 10:23:33
## 42: inner 2018-06-15 10:23:33
## 43: 688 2018-06-15 10:23:33
## 44: 2018-06-15 10:23:33 2018-06-15 10:23:33
## 45: process_file 2018-06-15 10:23:33
## 46: process_group 2018-06-15 10:23:33
## 47: process_group.block 2018-06-15 10:23:33
## 48: call_block 2018-06-15 10:23:33
## 49: block_exec 2018-06-15 10:23:33
## 50: do.call 2018-06-15 10:23:33
## 51: mean:c28b87a0be6a99966bdaa5e556974b43 2018-06-15 10:23:33
## 52: .FUN:b910401646b09073940de757678db03d 2018-06-15 10:23:33
## tagValue createdDate
Sometimes, it is not absolutely desirable to maintain the work flow intact because changes that are irrelevant to the analysis, such as changing messages sent to a user, may be changed, without a desire to rerun functions. The cacheId
argument is for this. Once a piece of code is run, then the cacheId
can be manually extracted (it is reported at the end of a Cache call) and manually placed in the code, passed in as, say, cacheId = "ad184ce64541972b50afd8e7b75f821b"
.
### cacheId
set.seed(1)
Cache(rnorm, 1, cacheRepo = tmpdir1)
## [1] -0.6264538
## attr(,"tags")
## [1] "cacheId:23dc247384c1b270f0d36de4bca1b138"
## attr(,"newCache")
## [1] TRUE
## attr(,"call")
## [1] ""
# manually look at output attribute which shows cacheId: ad184ce64541972b50afd8e7b75f821b
Cache(rnorm, 1, cacheRepo = tmpdir1, cacheId = "ad184ce64541972b50afd8e7b75f821b") # same value
## cacheId is not same as calculated hash. Manually searching for cacheId:ad184ce64541972b50afd8e7b75f821b
## [1] 0.1836433
## attr(,"tags")
## [1] "cacheId:ad184ce64541972b50afd8e7b75f821b"
## attr(,"newCache")
## [1] TRUE
## attr(,"call")
## [1] ""
# override even with different inputs:
Cache(rnorm, 2, cacheRepo = tmpdir1, cacheId = "ad184ce64541972b50afd8e7b75f821b")
## cacheId is not same as calculated hash. Manually searching for cacheId:ad184ce64541972b50afd8e7b75f821b
## loading cached result from previous rnorm call, adding to memoised copy
## [1] 0.1836433
## attr(,"tags")
## [1] "cacheId:ad184ce64541972b50afd8e7b75f821b"
## attr(,"newCache")
## [1] FALSE
## attr(,"call")
## [1] ""
## cleanup
unlink(c("filename.rda", "filename1.rda"))
Since the cache is simply an archivist
repository, all archivist
functions will work as is. In addition, there are several helpers in the reproducible
package, including showCache
, keepCache
and clearCache
that may be useful. Also, one can access cached items manually (rather than simply rerunning the same Cache
function again).
if (requireNamespace("archivist")) {
# get the RasterLayer that was produced with the gaussMap function:
mapHash <- unique(showCache(tmpDir, userTags = "projectRaster")$artifact)
map <- archivist::loadFromLocalRepo(md5hash = mapHash[1], repoDir = tmpDir, value = TRUE)
plot(map)
}
## Cache size:
## Total (including Rasters): 12.8 Kb
## Selected objects (not including Rasters): 808 bytes
## cleanup
unlink(dirname(tmpDir), recursive = TRUE)
In general, we feel that a liberal use of Cache
will make a re-usable and reproducible work flow. shiny
apps can be made, taking advantage of Cache
. Indeed, much of the difficulty in managing data sets and saving them for future use, can be accommodated by caching.
Cache(<functionName>, <other arguments>)
This will allow fine scale control of individual function calls.