The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The functions and methods of igapfill
provide an
intuitive, user-friendly, console-based interface that allows us to fill
missing values of time series of satellite images (TSSI); the main
engine for filling these missing values is the spatio-temporal approach
implemented in the R
package gapfill
.
Throughout this note, we will employ the Garnica dataset, also
included in igapfill
, to showcase the use of this
package.
This dataset is comprised of two .tif
files which store
a TSSI recorded over the National Park Cerro de Garnica located
in Michoacan, Mexico. The file
garnica_250m_16_days_NDVI.tif
contains 572 layers of the
Normalized Differenced Vegetation Index (NDVI). The accompanying file
garnica_250m_16_days_pixel_reliability.tif
contains
information about the quality of each pixel of the 572 NDVI layers. Both
datasets are spatial subsets extracted from larger images belonging to
the MOD13Q1 NASA’s product. Due to the temporal resolution of this
product -16 days- the 572 layers correspond to a time series of images
starting on the 49-th Julian date of 2000 and ending on the 353-rd
Julian date of 2024.
We can access to this dataset as follows:
ndvi_path <- system.file("extdata", "garnica_250m_16_days_NDVI.tif",
package = "igapfill")
reliability_path <- system.file("extdata",
"garnica_250m_16_days_pixel_reliability.tif",
package = "igapfill")
spatNDVI <- rast(ndvi_path)
spatRELIABILITY <- rast(reliability_path)
Although by definition the NDVI takes values on the interval
(-1,1)
, NASA uses 1e4
as an scale factor and
distributes the NDVI MOD13Q1 product through files that when read in the
R
environment, the stored information has the
INT4S
datatype. Thus spatNDVI
has values
ranging from -1e4
to 1e4
.
How much of the information contained in the Garnica NDVI TSSI can be used with high reliability in a scientific study?
mvSieve
counts the amount of missing values in a TSSI
and hence it gives us a fair estimate of the overall quality of this
type of datasets. Now, prior to use mvSieve
, let’s see how
to use the pixel realiability product to yield a highly reliable NDVI
product.
For scientific purposes, it is mandatory to filter out those pixels
with low reliability from a TSSSI dataset. Quality layers, such as those
provided in spatRELIABILITY
, allow us to pursuit this
task.
Specifically, for the Garnica dataset, the quality flags are 0 (Good
data), 1 (Marginal data), 2 (Snow/clouds) and 3 (No data). It is known
that a reliable scientific dataset based on TSSI can be built upon
pixels with Good and Marginal data quality flags. In the Appendix, there
is code to generate the product
garnica_250m_16_days_NDVI_QA
that results from filtering
out ( masking ) those NDVI pixels whose reliability is either
equal to Snow/clouds or No data.1
The function mvSieve()
can now be applied to the
NDVI_QA
product. Observe that mvSieve
’s
argument dirs
is equal to a character vector pointing to
the sub-directories containing the TSSI files. The remaining arguments
are straightforward: filesPerDir=23
,
startPeriod=2001
, endPeriod=2024
and
colNames
is a character vector defined below:2
garnicaNDVIQAdir <- paste0( getwd(), "/garnica/250m_16_days_NDVI_QA" )
dir.create( path = garnicaNDVIQAdir, recursive = TRUE )
DIRS <- list.dirs(path = garnicaNDVIQAdir)[-1]
COLNAMES <- paste0( "DoY-", seq(1,365,by=16) )
missingValueSieve <- mvSieve(dirs=DIRS[-1],
filesPerDir = 23,
startPeriod = 2001,
endPeriod = 2024,
colNames = COLNAMES)
missingValueSieve
can be construed as a missing values
matrix and thanks to the R
package heatmaply
we can display this kind of matrices in a graphical form. Below is the
missing value matrix of the Garnica dataset as a heatmap.
This heatmap allows us to spot a recurrent feature in NDVI TS, i.e. that during the period of vegetation growth, this kind of datasets usually exhibit a fair amount of information loss. In particular, for the Garnica dataset, there is a non-negligible amount of missing values in the interval between the 145-th day of the year (DoY) and the 289-th. That is, for this example, between May and October -arguably the season of vegetation growth- there is an important information loss.
The spatio-temporal approach of gapfill
is applied to
blocks of \(d \times
d\) images. It is clear that in order to reduce significantly the
amount of missing values in the entire dataset, gapfill
has
to be applied sequentially on a series of blocks whose dimensions can be
of order \(d=2,3,4,\ldots\).
Based on a heatmap such as the one above, it may be not too difficult
to select a block of \(2\times 2\),
images from which to start reducing the amount of overall missing values
in the dataset. For instance, we can define block0
as the
\(2\times 2\) set comprised of the
images acquired in the 113-th and 129-th DoY of 2011 and 2012. This
block-selection task can become automatic due to the
minmaxBlock()
function, though.
For those studies in which the task of establishing a block of images
with the smallest (or largest) amount of missing values requires
something more than a quick visual inspection we have developed
minmaxBlock()
. This function has three arguments,
sieve
, type
and rank
. In order to
set the first argument, we can use a missing values matrix, such as
missingValueSieve
. The argument type
defines
whether you seek for a block with minimal or a maximal amount of missing
values. The last argument is a numeric
controlling the size
of the search grid on which the block with the minimal or maximal amount
of missing values will be found. In what follows we provide details for
the case type="min"
, the other case is analogous.
minmaxBlock
searches for the minimal \(2\times 2\), non-zero, sub-matrix of a
general missing values matrix with entries denoted as \(m_{i,j}\). For any given \(2\times 2\) block we have defined
blockMissingness
as \(\log(
\texttt{cumsum} (m_{i,j}) )\) for \(1\leq i,j\leq 2\). The minimal block is
defined as that block with the minimum block missingness. We preferred
the cumsum
function rather than others, such as
cumprod
or det
, because a general missing
values matrix could have a large amount of zeros.
The search for the minimal block runs over the values of a non-zero
numeric vector of length equal to the value of the rank
argument; for simplicity we call this vector as the rank
vector. The entries of this vector are non-zero values of the
missing values matrix provided by the sieve
argument.
Consider a given entry of the rank vector and notice that this number
can occupy any of the four entries of a \(2
\times 2\) sub-matrix of the original missing values matrix
sieve
. Correspondingly, based on this value, four different
\(2 \times 2\) matrices are defined and
the block missingness of each of these matrices is calculated. This
process is applied to each value of the rank vector. Having calculated
all the possible blocks, and their corresponding missingness, we define
the minimal block of the sieve
as the \(2\times 2\) block with the smallest
missingness.
Below it is established that for the Garnica’s missing values matrix, the \(2\times 2\) minimal block is comprised by the images acquired in the 113-rd and 129-th DoY of 2011 and 2012.
SIEVE <- (missingValueSieve/max(missingValueSieve)) * 100
minmaxBlock(sieve=SIEVE, rank=10)
#> $rows
#> [1] 11 12
#>
#> $cols
#> [1] 8 9
#>
#> $block
#> DoY-113 DoY-129
#> 2011 0 0.0000000
#> 2012 0 0.6956522
#>
#> $blockMissingness
#> [1] -0.3629055
In the next section we show how to fill the missing values of this block.
Now that we have found a block of images to apply the spatio-temporal
approach introduced in the R
package gapfill
let’s make a sub-directory ( block0 ) in which we make safe
copies of the images to be filled.
DIRS <- list.dirs(path = garnicaNDVIQAdir)[-1]
files2011 <- list.files(path = DIRS[12],
pattern = ".tif$",
full.names = TRUE)
files2012 <- list.files(path = DIRS[13],
pattern = ".tif$",
full.names = TRUE)
block0DIR <- paste0( getwd(), "/garnica/block0" )
if( !dir.exists(block0DIR) ){
dir.create( block0DIR )
}
file.copy(from=files2011[8:9], to=block0DIR)
#> [1] TRUE TRUE
file.copy(from=files2012[8:9], to=block0DIR)
#> [1] TRUE TRUE
create_dirs()
Speaking of safety, we have found extremely useful to create a set of
sub-directories in which we can store all the auxiliary files that are
generated through the process of setting arguments for applying
gapfill
functions; we encourage to the users to avoid
saving auxiliary files in hard-to-remember locations of their systems.
With create_dirs()
, as shown below, we can create this
directory tree.
The directory tree with block0 as its root looks as follows
dimsReport()
In order to avoid memory overflow and/or to reduce execution time, it
is recommended to know the spatial dimensions of the images to be
processed with gapfill
; obviously, having a fair
understanding of your systems characteristics is highly recommended too.
In addition to information about the spatial dimensions (number of rows
and columns) of the images, dimsReport()
returns all the
divisors of these dimensions. A direct application of this function
returns something as shown below:
sort_split()
Depending on system characteristics, the execution time of the
gapfill
’s functions tends to increase in direct
relationship with the images dimension. We have found that an effective
way of speeding up the gap-filling process consists of splitting the
original images into smaller spatio-temporal chunks.
The sort_split()
function takes care of the splitting
process through a console-based application. That is, the user is asked
to pass a series of arguments. By accepting the default arguments of
sort_split()
, we ensure that the general workflow of
igapfill()
is followed but the user has the flexibility of
changing this workflow. When its arguments nrow_split
and
ncol_split
are not provided, sort_split()
integrates dimsReport()
in its execution as it can be seen
below:
As seen above, the last message of sort_split()
returns
the location where the recently generated chunks were saved. Once we
have these chunks, we will continue our workflow by applying the
gapfill
’s functions to each chunk.
applyGapfill()
Now that we have our original images splitted in smaller chunks, we
can explode parallel computing and apply Gapfill()
-the
main function of the R
package gapfill()
- and
reduce the amount of missing values in our images.3
Since we have created a directory to host our original images,
moreover, we have a directory tree to store all the auxiliary files of
this process, now we can simply take advantage of this structure. That
is, by setting a vector with absolute directory names we can set the
inputDir
, outputDir
and
progressDir
arguments easily. The arguments
lat
and lon
are vectors containing numeric
values associated to the latitude and longitude coordinates of each
pixel within the images to fill; the functions get_LAT()
and get_LON()
can be used to calculate appropriate values
for these arguments.4 The arguments days
and
years
are intended to register the days of the year and the
years, respectively, involved in the spatio-temporal chunk defined by
the images.5 Notice that these four parameters
contribute to define a 4-dimensional array and, in effect,
Gapfill()
is applied to this 4-dimensional structure. More
details about this 4-dimensional set are given in the documentation of
the functions get_3Darray()
and
get_4Darray()
.
In spite of having splitted the original images, a large execution
time could still be an issue without parallel computing. With the
argument numCores
we indicate how many processing cores are
willing to employ in this gap-filling process. Our argument
clipRange
is equivalent to gapfill
’s argument
clip
; this vector defines an interval for valid predicted
(filling) values.
As it is well-known, the NDVI takes values on the interval \((-1,1)\), this interval is the default
value for the clipRange
argument. In the code below, the
value for clipRange
reflects the fact that the Garnica
dataset contains NDVI values recorded in INT4S
datatype,
that is, these NDVI images take values on the interval
(-1e4, 1e4)
. Accordingly, the scale
argument
is set equal to 1, reflecting that we are using the NDVI images without
scaling them; when the default value for clipRange
is used
then scale
must be set accordingly.
allDIRS <- list.dirs(path = block0DIR, full.names = TRUE)#[-1]
block0FILES <- list.files(path = block0DIR, pattern = ".tif$",
full.names = TRUE)
LAT <- get_LAT(stack = stack(block0FILES))
LON <- get_LON(stack = stack(block0FILES))
applyGapfill(inputDir = allDIRS[7],
outputDir = allDIRS[5],
progressDir = allDIRS[6],
lat = LAT,
lon = LON,
days = c(113,129),
years = 2011:2012,
numCores = 10,
scale = 1e0,
clipRange = c(-1e4,1e4))
The output of applyGapfill()
is a set of
.RData
files store at the path given by the argument
outputDir
. Each file contains a 4-dimensional
array
object. That is, for all practical purposes, at this
stage the process has produced a series of numeric matrices. Thus,
rasterizing and mosaicking these matrices is in
order.
In order to obtain the gap-filled images, we must rasterize
the matrices contained in each array and then the resulting rasters must
be glued together. It is straightforward to obtain values for the
arguments inputDirImages
, inputDirRData
,
inputDirMaster
, outputDir
and
progressReportDir
due to the auxiliary objects defined
above. Below we use scaleFactor = 1e0
and
dataType = "INT4S"
reflecting that our original images
contain NDVI values recorded in INT4S
datatype. With the
arguments utilized above in sort_split()
we ended up with 5
splits, so below we use numCores = 3
; in general we
recommend that the value for numCores
does not exceed the
number of splits.
parallel_mosaic(inputDirImages = block0DIR,
inputDirRData = allDIRS[5],
inputDirMaster = allDIRS[4],
outputDir = allDIRS[3],
progressReportDir = allDIRS[6],
scaleFactor = 1e0,
dataType = "INT4S",
numCores = 3) # numCores must not exceed number of splits
The output of parallel_mosaic()
is a set of GeoTiff
files stored at the path indicated by the argument
outputDir
. In order to verify that the images associated
with these files have a reduced amount of missing values we can employ
the code below.
First, we read the original images and plot them.
initFILES <- list.files(path = block0DIR,
pattern = ".tif$",
full.names = TRUE)
gapfilledFILES <- list.files(path = allDIRS[3],
pattern = ".tif$",
full.names = TRUE)
initIMAGES <- rast(initFILES)
gapfilledIMAGES <- rast(gapfilledFILES)
par(mfrow=c(2,2))
plot(initIMAGES, cex.main=0.75,
main=sapply(c("Day113, 2011", "Day129, 2011", "Day113, 2012", "Day129, 2012"),
function(s) paste("Original:", s)))
Then, we compared them with their corresponding igapfilled-versions.
plot(gapfilledIMAGES, cex.main=0.75,
main=sapply(c("Day113, 2011", "Day129, 2011", "Day113, 2012", "Day129, 2012"),
function(s) paste("Gapfill:", s)))
Most of the steps presented above lead to igapfill()
, a
console-based wrap-up of create_dirs()
,
dimsReport()
, sort_split()
,
applyGapfill()
and parallel_mosaic()
. All that
is left to the users is the selection of the images that are to be
gap-filled, but as shown above this can be easily done with a
combination of mvSieve()
, minmaxBlock()
and
R
base code.
Although in our application we applied igapfill
’s
functions to try and fill most of the missing values in the entire
Garnica dataset, we conclude this vignette by showing
igapfill()
’s output when it is applied to a block of \(3 \times 3\) matrices that combined had a
blockMissingness of 2.835768
. This is
block14 in our application, that is, the following results
correspond to the 15-th application of igapfill()
’s
functions to distinct blocks of Garnica’s images.6
First, igapfill()
takes care of setting up the directory
tree and splitting the original images:
Then, the gap-filling process starts and some parameters are requested. Once this process has finished, another set of parameters are requested prior to embark ourselves into rasterizing and mosaicking the final output:
Finally, as done above, we read and plot the original block14´s images:
And then, we can plot the final result of applying
igapfill()
to block14’s images.
mvSieve
section
# --- Auxiliary code used in mvSieve section ---
garnicaNDVIQAdir <- paste0( getwd(), "/garnica/250m_16_days_NDVI_QA" )
dir.create( path = garnicaNDVIQAdir, recursive = TRUE )
YEARS <- 2000:2024
ndviNAMES <- names(spatNDVI)
reliabilityNAMES <- names(spatRELIABILITY)
ndviQANAMES <- sapply(ndviNAMES,
function(s) paste0( strsplit(s, ".tif")[[1]][1], "_QA.tif" ))
for(i in 1:20){ #### Year 2000 has 20 images only
if(!dir.exists( paste0( garnicaNDVIQAdir, "/", YEARS[1] ) )){
dir.create( paste0( garnicaNDVIQAdir, "/", YEARS[1] ) )
}
TEMPndvi <- subset(spatNDVI,i)
TEMPreliability <- subset(spatRELIABILITY,i)
TEMPndvi[ TEMPreliability >= 2] <- NA
writeRaster(TEMPndvi,
filename = paste0( garnicaNDVIQAdir, "/", YEARS[1], "/",
ndviQANAMES[i] ),
datatype = "INT2S", overwrite = TRUE)
}
for (i in 2:length(YEARS)) {
if(!dir.exists( paste0( garnicaNDVIQAdir, "/", YEARS[i] ) )){
dir.create( paste0( garnicaNDVIQAdir, "/", YEARS[i] ) )
}
for(j in 1:23){
count <- 20 + ((i-2)*23) + j
TEMPndvi <- subset(spatNDVI, count)
TEMPreliability <- subset(spatRELIABILITY,count)
TEMPndvi[ TEMPreliability >= 2 ] <- NA
writeRaster(TEMPndvi,
filename = paste0( garnicaNDVIQAdir, "/", YEARS[i], "/",
ndviQANAMES[count] ),
datatype = "INT2S", overwrite = TRUE)
}
}
# --- Auxiliary code used in mSieve section ---
out <- heatmaply((missingValueSieve/max(missingValueSieve)) * 100,
limits = c(0,100),
colors = cool_warm,
dendrogram = "none",
xlab = "", ylab = "",
width = 900,
height = 650,
main = "% of missing values in NDVI TS of Cerro de Garnica (2001-2024)",
scale = "none",
draw_cellnote=TRUE,
cellnote_textposition = "middle center",
cellnote_size = 6,
margins = c(60, 100, 40, 20),
titleX = FALSE,
hide_colorbar = TRUE,
labRow = rownames(missingValueSieve),
labCol = colnames(missingValueSieve),
heatmap_layers = theme(axis.line = element_blank()))
html_out <- as.tags(out)
scrollable_out <- tags$div(
class = "scrollable",
style = "width: 100%;",
html_out
)
scrollable_out
We have saved the layers of this product in sub-directories organized by years, the first being 2000 and the last one being 2024.↩︎
Although the TSSI starts in 2000, there are only 20
images for that year. Starting in 2001, every year has 23 NDVI images.
Hence in our example startPeriod=2001
.↩︎
In general, depending on the amount of missing information in the chosen block the reduction can be total or marginal.↩︎
The length of these vectors must coincide with the
number of rows and columns of the spatRaster
generated by
the images to fill.↩︎
The length of these vectors must coincide with the number of images to process by year and with the number of years to process.↩︎
Recall that we started with block0.↩︎
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.