Sensitivity analysis and calibration

Miquel De Caceres

2021-01-08

About this vignette

The present document shows how to conduct a sensitivity analyses and calibration exercises on the simulation models included in package medfate. The document is written assuming that the user is familiarized with the basic water balance model (i.e. function spwb). The aim of the exercises presented here are:

  1. To determine which spwb() model parameters are more influential in determining stand transpiration and plant drought stress.
  2. To determine which model parameters are more influential to determine model fit to soil water content dynamics.
  3. To reduce the uncertainty in parameters determining fine root distribution, given an observed data set of soil water content dynamics.

Preparing model inputs

As an example data set we will use here the same data sets provided to illustrate simulation functions in medfate. This data set consists of a forest with two tree species (Pinus halepensis/T1_54 and Quercus ilex/T2_68) and one shrub species (Quercus coccifera/S1_65 or Kermes oak).

We begin by loading the package and the example forest data:

library(medfate)
## Loading required package: sp
data(exampleforestMED)
exampleforestMED
## $ID
## [1] "1"
## 
## $patchsize
## [1] 10000
## 
## $treeData
##   Species   N   DBH Height Z50  Z95
## 1      54 168 37.55    800 750 3000
## 2      68 384 14.60    660 750 3000
## 
## $shrubData
##   Species Cover Height Z50  Z95
## 1      65  3.75     80 300 1500
## 
## $herbCover
## [1] 10
## 
## $herbHeight
## [1] 20
## 
## attr(,"class")
## [1] "forest" "list"

We also load the example weather data set and the default species parametrization:

data(examplemeteo)
data(SpParamsMED)

We then initialize a soil with four layers (default values of texture, bulk density and rock content) and the species input parameters for simulation function spwb():

examplesoil1 = soil(defaultSoilParams(4))
x1 = forest2spwbInput(exampleforestMED,examplesoil1, SpParamsMED, control = defaultControl())

Although it is not necessary, we make an initial call to the model (spwb()) with the default parameter settings:

S1<-spwb(x1, examplesoil1, examplemeteo, latitude = 41.82592, elevation = 100)
## Initial soil water content (mm): 291.257
## Initial snowpack content (mm): 0
## Performing daily simulations .....................................done.
## Final soil water content (mm): 271.212
## Final snowpack content (mm): 0
## Change in soil water content (mm): -20.0449
## Soil water balance result (mm): -20.0449
## Change in snowpack water content (mm): 0
## Snowpack water balance result (mm): 0
## Water balance components:
##   Precipitation (mm) 513
##   Rain (mm) 462 Snow (mm) 51
##   Interception (mm) 86 Net rainfall (mm) 376
##   Infiltration (mm) 418 Runoff (mm) 10 Deep drainage (mm) 100
##   Soil evaporation (mm) 45 Transpiration (mm) 293

Function spwb() will be implicitly called multiple times in the sensitivity analyses and calibration analyses that we will illustrate below.

Sensitivity analysis

Introduction and input factors

Model sensitivity analyses are used to investigate how variation in the output of a numerical model can be attributed to variations of its input factors. Input factors are elements that can be changed before model execution and may affect its output. They can be model parameters, initial values of state variables, boundary conditions or the input forcing data (Pianosi et al. 2016).

According to Saltelli et al. (2016), there are three main purposes of sensitivity analyses:

Here we will mostly interested in ranking parameters according to different objectives. We will take as input factors three plant traits (leaf area index, fine root distribution and the water potential corresponding to a reduction in plant conductance) in the three plant cohorts (species), so that nine model parameters will be studied. The following shows the initial values for those parameters:

x1$above$LAI_live
## [1] 0.8167012 0.7977952 0.0653033
x1$below$Z50
## [1] 750 750 300
x1$paramsTransp$Psi_Extract
## [1] -2 -3 -4

In the following code we define a vector of parameter names (using naming rules of function modifyInputParams()) as well as the input variability space, defined by the minimum and maximum parameter values:

#Parameter names of interest
parNames = c("T1_54/LAI_live", "T2_68/LAI_live", "S1_65/LAI_live",
             "T1_54/Z50", "T2_68/Z50", "S1_65/Z50",
             "T1_54/Psi_Extract", "T2_68/Psi_Extract", "S1_65/Psi_Extract")
#Parameter minimum and maximum values
parMin = c(0.1,0.1,0.1,
           100,100,100,
           -7,-7,-7)
parMax = c(2,2,2,
           1000,1000,1000,
           -1,-1,-1)

Model output functions

In sensitivity analyses, model output is summarized into a single variable whose variation is to be analyzed. Pianosi et al. (2016) distinguish two types of model output functions:

Here we will use examples of both kinds. First, we define a function that, given a simulation result, calculates total transpiration (mm) over the simulated period (one year):

sf_transp<-function(x) {sum(x$WaterBalance$Transpiration, na.rm=TRUE)}
sf_transp(S1)
## [1] 292.5647

Another prediction function can focus on plant drought stress. We define a function that, given a simulation result, calculates the average drought stress of plants (measured using the water stress index) over the simulated period:

sf_stress<-function(x) {
  lai <- x$spwbInput$above$LAI_live
  lai_p <- lai/sum(lai)
  stress <- spwb_stress(x, index="WSI", draw = F)
  mean(sweep(stress,2, lai_p, "*"), na.rm=T)
}
sf_stress(S1)
## [1] 18.32941

Sensitivity analysis requires model output functions whose parameters are the input factors to be studied. \[\begin{equation} y = g(\mathbf{x}) = g(x_1, x_2, \dots, x_n) \end{equation}\] where \(y\) is the output, \(g\) is the output function and \(\mathbf{x} = \{x_1, x_2, \dots, x_n\}\) is the vector of parameter input factors. Functions of_transp and of_stress take simulation results as input, not values of input factors. Instead, we need to define functions that take trait values as input, run the soil plant water balance model and return the desired prediction or performance statistic. These functions can be generated using the function factory optimization_function(). The following code defines one of such functions focusing on total transpiration:

of_transp<-optimization_function(parNames = parNames,
                                 x = x1, soil = examplesoil1,
                                 meteo = examplemeteo, 
                                 latitude = 41.82592, elevation = 100,
                                 summary_function = sf_transp)

Note that we provided all the data needed for simulations as input to optimization_function(), as well as the names of the parameters to study and the function sf_transp. The resulting object of_transp is a function itself, which we can call with parameter values (or sets of parameter values) as input:

of_transp(parMin)
## [1] 55.93924
of_transp(parMax)
## [1] 359.395

It is important to understand the steps that are done when we call of_transp():

  1. The function of_transp() calls spwb() using all the parameters specified in its construction (i.e. in the call to the function factory), except for the input factors indicated in parNames, which are specified as input at the time of calling of_transp().
  2. The result of soil plant water balance is then passed to function sf_transp() and the output of this last function is returned as output of of_transp().

We can build a similar model output function, in this case focusing on plant stress (note that the only difference in the call to the factory is in the specification of sf_stress as summary function, instead of sf_transp).

of_stress<-optimization_function(parNames = parNames,
                                 x = x1, soil = examplesoil1,
                                 meteo = examplemeteo, 
                                 latitude = 41.82592, elevation = 100,
                                 summary_function = sf_stress)
of_stress(parMin)
## [1] 0.7265393
of_stress(parMax)
## [1] 108.5529

As mentioned above, another kind of output function can be the evaluation of model performance. Here we will assume that performance in terms of predictability of soil water content is desired; and use a data set of ‘observed’ values (actually simulated values with gaussian error) as reference:

data(exampleobs)
head(exampleobs)
##                  SWC      ETR   E_T1_54    E_T2_68 FMC_T1_54 FMC_T2_68
## 2001-01-01 0.3000484 2.213630 0.1405273 0.17762272  114.5560  80.86158
## 2001-01-02 0.3021858 2.557506 0.3274197 0.31564033  114.5929  81.10532
## 2001-01-03 0.3018332 1.028869 0.2427104 0.17259014  114.8712  80.78809
## 2001-01-04 0.3008440 1.865832 0.1386069 0.09992405  114.4544  81.17683
## 2001-01-05 0.3014843 1.922079 0.4472215 0.22262168  114.8581  81.11235
## 2001-01-06 0.3016900 2.426368 0.1628993 0.26955241  114.5257  80.63737

where soil water content dynamics is in column SWC. The model fit to observed data can be measured using the Nash-Sutcliffe coefficient, which we calculate for the initial run using function evaluation_metric():

evaluation_metric(S1, measuredData = exampleobs, type = "SWC", 
                  metric = "NSE")
## [1] 0.9324636

A call to evaluation_metric() provides the coefficient given a model simulation result, but is not a model output function as we defined above. Analogously to the measures of total transpiration and average plant stress, we can use a function factory to define a model output function that takes input factors as inputs, runs the model and performs the evaluation:

of_eval<-optimization_evaluation_function(parNames = parNames,
                x = x1, soil = examplesoil1,
                meteo = examplemeteo, latitude = 41.82592, elevation = 100,
                measuredData = exampleobs, type = "SWC", 
                metric = "NSE")

Function of_eval() stores internally both the data needed for conducting simulations and the data needed for evaluating simulation results, so that we only need to provide values for the input factors:

of_eval(parMin)
## [1] 0.9128893
of_eval(parMax)
## [1] 0.8316012

Global sensitivity analyses

Sensitivity analysis is either referred to as local or global, depending on variation of input factors is studied with respect to some initial parameter set (local) or the whole space of input factors is taken into account (global). Here we will conduct global sensitivity analyses using package sensitivity (Ioss et al. 2020):

library(sensitivity)
## Registered S3 method overwritten by 'sensitivity':
##   method    from 
##   print.src dplyr

This package provides a suite of approaches to global sensitivity analysis. Among them, we will follow the Elementary Effect Test implemented in function morris(). We call this function to analyze sensitivity of total transpiration simulated by spwb() to input factors (500 runs are done, so be patient):

sa_transp <- morris(of_transp, parNames, r = 50, 
             design = list(type = "oat", levels = 10, grid.jump = 3), 
             binf = parMin, bsup = parMax, scale=TRUE, verbose=FALSE)

Apart from indicating the sampling design to sample the input factor space, the call to morris() includes the response model function (in our case of_transp), the parameter names and parameter value boundaries (i.e. parMin and parMax).

print(sa_transp)
## 
## Call:
## morris(model = of_transp, factors = parNames, r = 50, design = list(type = "oat",     levels = 10, grid.jump = 3), binf = parMin, bsup = parMax,     scale = TRUE, verbose = FALSE)
## 
## Model runs: 500 
##                           mu    mu.star     sigma
## T1_54/LAI_live     88.560833  97.432304 96.641969
## T2_68/LAI_live    115.428602 115.738104 95.599569
## S1_65/LAI_live    119.938300 119.938300 90.217958
## T1_54/Z50         -24.852219  33.057904 35.644799
## T2_68/Z50         -26.408773  33.898221 32.100491
## S1_65/Z50         -15.704972  17.950179 18.110924
## T1_54/Psi_Extract  -7.245209   7.245209  7.503915
## T2_68/Psi_Extract -10.524463  10.524463 11.323048
## S1_65/Psi_Extract  -5.432094   5.432094  5.443987

mu.star values inform about the mean of elementary effects of each \(i\) factor and can be used to rank all the input factors, whereas sigma inform about the degree of interaction of the \(i\)-th factor with others. According to the result of this sensitivity analysis, leaf area index (LAI_live) parameters are the most relevant to determine total transpiration, much more than fine root distribution (Z50) and the water potentials corresponding to whole-plant conductance reduction (i.e. Psi_Extract).

plot(sa_transp, xlim=c(0,200))

We can run the same sensitivity analysis but focusing on the input factors relevant for predicted plant drought stress (i.e. using of_stress as model output function):

sa_stress <- morris(of_stress, parNames, r = 50, 
             design = list(type = "oat", levels = 10, grid.jump = 3), 
             binf = parMin, bsup = parMax, scale=TRUE, verbose=FALSE)
print(sa_stress)
## 
## Call:
## morris(model = of_stress, factors = parNames, r = 50, design = list(type = "oat",     levels = 10, grid.jump = 3), binf = parMin, bsup = parMax,     scale = TRUE, verbose = FALSE)
## 
## Model runs: 500 
##                           mu   mu.star     sigma
## T1_54/LAI_live     187.85982 188.19217 109.29336
## T2_68/LAI_live     184.08751 184.08751 119.81780
## S1_65/LAI_live     157.01856 157.01856  97.04261
## T1_54/Z50           42.12187  43.49016  35.80496
## T2_68/Z50           59.57615  61.71122  61.79803
## S1_65/Z50           42.03816  44.59840  53.03594
## T1_54/Psi_Extract -114.42951 114.42951  84.05694
## T2_68/Psi_Extract  -82.73608  82.73608  65.15790
## S1_65/Psi_Extract  -61.00125  61.00125  58.56132

Again, LAI values parameters are the most relevant, but closely followed by the water potentials corresponding to whole-plant conductance reduction (i.e. Psi_Extract), which appear as more relevant than parameters of fine root distribution (Z50).

plot(sa_stress, xlim=c(0,300))

Finally, we can study the contribution of input factors to model performance in terms of soil water content dynamics (i.e. using of_eval as model output function):

sa_eval <- morris(of_eval, parNames, r = 50, 
             design = list(type = "oat", levels = 10, grid.jump = 3), 
             binf = parMin, bsup = parMax, scale=TRUE, verbose=FALSE)
print(sa_eval)
## 
## Call:
## morris(model = of_eval, factors = parNames, r = 50, design = list(type = "oat",     levels = 10, grid.jump = 3), binf = parMin, bsup = parMax,     scale = TRUE, verbose = FALSE)
## 
## Model runs: 500 
##                           mu   mu.star     sigma
## T1_54/LAI_live    -56.971813 57.002351 69.319800
## T2_68/LAI_live    -36.656677 37.390494 52.514755
## S1_65/LAI_live    -24.121369 24.129905 28.433808
## T1_54/Z50          70.768482 70.798509 81.853942
## T2_68/Z50          60.500445 60.500445 71.047382
## S1_65/Z50          35.728383 35.739419 53.786266
## T1_54/Psi_Extract   4.315353  4.315353  9.302024
## T2_68/Psi_Extract   3.744051  3.744051  7.349683
## S1_65/Psi_Extract   1.425251  1.425251  2.988838

Contrary to the previous cases, the contribution of LAI parameters is similar to that of parameters of fine root distribution (Z50), which appear as more relevant than the water potentials corresponding to whole-plant conductance reduction (i.e. Psi_Extract).

plot(sa_eval, xlim=c(0,100))

Calibration

By model calibration we mean here the process of finding suitable parameter values (or suitable parameter distributions) given a set of observations. Hence, the idea is to optimize the correspondence between model predictions and observations by changing model parameter values.

Defining parameter space and objective function

To simplify our analysis and avoid problems of parameter identifiability, we focus here on the calibration of parameter Z50 of fine root distribution. Below we redefine vectors parNames, parMin, and parMax; and we specify a vector of initial values.

#Parameter names of interest
parNames = c("T1_54/Z50", "T2_68/Z50", "S1_65/Z50")
#Parameter minimum and maximum values
parMin = c(100,100,100)
parMax = c(1000,1000,1000)
parIni = x1$below$Z50

In order to run calibration analyses we need to define an objective function. Many evaluation metrics could be used but it is common practice to use likelihood functions . We can use the function factory optimization_evaluation_function and the ‘observed’ data to this aim, but in this case we specify a log-likelihood with Gaussian error as the evaluation metric for of_eval().

of_eval<-optimization_evaluation_function(parNames = parNames,
                x = x1, soil = examplesoil1,
                meteo = examplemeteo, latitude = 41.82592, elevation = 100,
                measuredData = exampleobs, type = "SWC", 
                metric = "loglikelihood")

Bayesian calibration

As an example of a more sophisticated model calibration, we will conduct a Bayesian calibration analysis using package BayesianTools (Hartig et al. 2019):

library(BayesianTools)

In a Bayesian analysis one evaluates how the uncertainty in model parameters is changed (hopefully reduced) after observing some data, because observed values do not have the same likelihood under all regions of the parameter space. For a Bayesian analysis we need to specify a (log)likelihood function and the prior distribution (i.e. the initial uncertainty) of the input factors. The central object in the BayesianTools package is the BayesianSetup. This class, created by calls to createBayesianSetup(), contains the information about the model to be fit (likelihood), and the priors for model parameters. In absence of previous data, we specify a uniform distribution between the minimum and maximum values, which in the BayesianTools package can be done using function createUniformPrior():

prior <- createUniformPrior(parMin, parMax, parIni)
mcmc_setup <- createBayesianSetup(likelihood = of_eval, 
                                  prior = prior, 
                                  names = parNames)

Function createBayesianSetup() automatically creates the posterior and various convenience functions for the Markov Chain Monte Carlo (MCMC) samplers. The runMCMC() function is the main wrapper for all other implemented MCMC functions. Here we call it with three chains of 3000 iterations each.

mcmc_out <- runMCMC(
  bayesianSetup = mcmc_setup, 
  sampler = "DEzs",
  settings = list(iterations = 1000, nrChains = 9))

By default runMCMC() uses parallel computation, but the calibration process is nevertheless rather slow.

A summary function is provided to inspect convergence results and correlation between parameters:

summary(mcmc_out)
## # # # # # # # # # # # # # # # # # # # # # # # # # 
## ## MCMC chain summary ## 
## # # # # # # # # # # # # # # # # # # # # # # # # # 
##  
## # MCMC sampler:  DEzs 
## # Nr. Chains:  9 
## # Iterations per chain:  1000 
## # Rejection rate:  0.815 
## # Effective sample size:  381 
## # Runtime:  1621.505  sec. 
##  
## # Parameters
##             psf     MAP    2.5%  median   97.5%
## T1_54/Z50 1.043 627.839 580.046 741.939 989.060
## T2_68/Z50 1.035 889.719 569.645 734.803 981.448
## S1_65/Z50 1.030 480.428 121.669 555.615 965.331
## 
## ## DIC:  -2633.085 
## ## Convergence 
##  Gelman Rubin multivariate psrf:  1.049 
##  
## ## Correlations 
##           T1_54/Z50 T2_68/Z50 S1_65/Z50
## T1_54/Z50     1.000    -0.884    -0.097
## T2_68/Z50    -0.884     1.000    -0.066
## S1_65/Z50    -0.097    -0.066     1.000

According to the Gelman-Rubin diagnostic, the convergence can be accepted because the multivariate potential scale reduction factor was ≤ 1.1. We can plot the Markov Chains and the posterior density distribution of parameters that they generate using:

plot(mcmc_out)

We can also plot the marginal prior and posterior density distributions for each parameter. In this case, we see a similar Z50 distribution for the two trees, which is more informative than the prior distribution. In contrast, the posterior distribution of Z50 for the kermes oak remains as uncertain as the prior one. This happens because the LAI value of kermes oak is low, so that it has small influence on soil water dynamics regardless of its root distribution.

marginalPlot(mcmc_out, prior = T)

Plots can also be produced to display the correlation between parameter values.

correlationPlot(mcmc_out)

Here it can be observed the large correlation between Z50 of the two tree cohorts. Since their LAI values are similar, a similar effect on soil water depletion can be obtained to some extent by exchanging their fine root distribution.

Posterior model prediction distributions can be obtained if we take samples from the Markov chains and use them to perform simulations (here we use sample size of 99 but a larger value is preferred).

s = getSample(mcmc_out, numSamples = 99)
head(s)
##      T1_54/Z50 T2_68/Z50 S1_65/Z50
## [1,]  957.5108  342.2688  475.9529
## [2,]  803.5055  250.0038  610.7945
## [3,]  979.6384  379.2092  951.8966
## [4,]  634.6976  910.1615  698.5915
## [5,]  161.8533  641.4018  286.4261
## [6,]  641.1469  758.0742  376.1948

To this aim, medfate includes function multiple_runs() that allows running a simulation model with a matrix of parameter values. For example, the following code runs spwb() with all combinations of fine root distribution specified in s.

MS = multiple_runs(s, x = x1, soil = examplesoil1, meteo = examplemeteo,
                   latitude = 41.82592, elevation = 100, verbose = FALSE)

Function multiple_runs() determines the model to be called inspecting the class of x (here x1 is a spwbInput). Once we have conducted the simulations we can inspect the posterior distribution of several prediction variables, for example total transpiration:

plot(density(unlist(lapply(MS, sf_transp))), main = "Posterior transpiration", 
     xlab = "Total transpiration (mm)")

or average plant drought stress:

plot(density(unlist(lapply(MS, sf_stress))), 
     xlab = "Average plant stress", main="Posterior stress")

Finally, we can use object prior to generate another sample under the prior parameter distribution, perform simulations:

s_prior = prior$sampler(99)
colnames(s_prior)<- parNames
MS_prior = multiple_runs(s_prior, x = x1, soil = examplesoil1, meteo = examplemeteo,
                         latitude = 41.82592, elevation = 100, verbose = FALSE)

and compare the prior prediction uncertainty with the posterior prediction uncertainty for the same output variables:

plot(density(unlist(lapply(MS_prior, sf_transp))), main = "Transpiration", 
     xlab = "Total transpiration (mm)",
     xlim = c(280,295), ylim = c(0,1.2))
lines(density(unlist(lapply(MS, sf_transp))), col = "red")
legend("topleft", legend = c("Prior", "Posterior"), 
       col = c("black", "red"), lty=1, bty="n")

plot(density(unlist(lapply(MS_prior, sf_stress))), main = "Plant stress", 
     xlab = "Average plant stress",
     xlim = c(0,30), ylim = c(0,0.3))
lines(density(unlist(lapply(MS, sf_stress))), col = "red")
legend("topleft", legend = c("Prior", "Posterior"), col = c("black", "red"), lty=1, bty="n")

References