Uncertainty and Sensitivity Analysis

Overview

This app allows exploration of the concept of uncertainty and sensitivity analysis. For this purpose, we use the SIR model with demographics (also used in the stochastic SIR app and model exploration app).

The Model

Model Overview

The model used here is the SIR model with births and deaths. It is also used and described in the stochastic SIR app. This model tracks susceptibles, infected/infectious and recovered hosts. The following compartments are included:

The included processes/mechanisms are the following:

Model Implementation

The flow diagram for the model implemented in this app is:

Flow diagram for this model.

Flow diagram for this model.

The deterministic model implemented as set of differential equations is given by the following equations:

\[\dot S = m - bSI - nS\] \[\dot I = bSI - gI - nI\] \[\dot R = gI - nR\]

This is almost the same model as the basic SIR model from the introductory app, with the only difference that this model also allows natural births and deaths.

Uncertainty and Sensitivity analysis

Often, for a given system we want to model, we only have rough estimates for the model parameters and starting values. Instead of specifying fixed values (which results in a single time-series), we can instead specify parameter ranges, choose sets of parameter values from these ranges, and run the model for multiple sets of parameters.

The simplest way of specifying parameter ranges is to set an upper and lower bound (based on what we know about the biology of the system) and randomly choose any value within those bounds. We can almost always set bounds even if we know very little about a system. Assume we want to model the duration of the infectious period for some disease in humans. We might not little, but we can still be fairly confident that it’s longer than say 1 hour and less than 100 years. That’s of course a wide range and we should and usually can narrow ranges further, based on biological knowledge of a given system.

If we are fairly certain that values are close to some quantity, instead of specifying a uniform distribution, we can choose one that is more peaked around the most likely value. Normal distributions are not ideal since they allow negative values, which doesn’t make sense for our parameters. The gamma distribution is a better idea, since it leads to only positive values.

To run the model for this app, we need to specify values for the initial conditions and model parameters. Initial conditions and all parameters are sampled uniformly between the specified upper and lower bound, apart from the recovery rate, which is given by a gamma distribution, with user-specified mean and variance. For this teaching app, there is no biological reason for making this parameter different, I just picked one parameter and decided to make it non-uniformly distributed to show you different ways one can implement distributions from which to draw parameter samples.

The way the samples are drawn could be done completely randomly, but that would lead to inefficient sampling. A smarter method exists, known as Latin Hypercube sampling (LHS). It essentially ensures that we sample the full range of possible parameter combinations in an efficient manner. For more technical details, see e.g. (Saltelli et al. 2004). For this app, we use LHS.

Once we specify the ranges for each parameter, the sampling method, and the number of samples, the simulation draws that many samples, runs the model for each sample, and records outcomes of interest. While the underlying simulation returns a time-series for each sample, we are usually not interested in the full time-series. Instead, we are interested in some summary quantity. For instance in this model, we might be interested in the maximum and final number of infected and final number of susceptible. This app records and reports those 3 quantities as Ipeak, Ifinal and Sfinal.

Results from such simulations for multiple samples can be analyzed in different ways. The most basic one, called uncertainty analysis only asks what level of uncertainty we have in our outcomes of interest, given the amount of uncertainty in our model parameter values. This can be graphically represented with a boxplot, and is one of the plot options for this app.

In a next step, we can ask ‘how sensitive is the outcome(s) of interest to variation in specific parameters’ - that part is the sensitivity analysis. When you run the simulations, you essentially do both uncertainty and sensitivity analysis at the same time, it’s just a question of how you further process the results. We can graphically inspect the relation between outcome and some parameter with scatterplots. If we find that there is a monotone up or down (or neither) trend between parameter and outcome, we can also summarize the finding using a correlation coefficient. For this type of analysis, using the Spearman rank correlation coefficient is useful, which is what the app produces below the figures.

A note on randomness in computer simulations

This simulation (as well as some of the others) involves sampling. This leads to some level of randomness. In science, we want to be as reproducible as possible. Fortunately, random numbers on a computer are not completely random, but can be reproduced. In practice, this is done by specifying a random number seed, in essence a starting position for the algorithm to produce pseudo-random numbers. As long as the seed is the same, the code should produce the same pseudo-random numbers each time, thus ensuring reproducibility.

What to do

First, familiarize yourself with the setup of the app, it looks different from most others. Parameters are not set to specific values. Instead, most parameters have a lower and upper bound. For each simulation that is run, random values for the parameter are chosen uniformly between those bounds. The parameter g does not have a uniform but instead a gamma distribution, you can specify its mean and variance to determine the distribution from which values are sampled.

For the purpose of uncertainty and sensitivity analysis, starting values for variables can be treated like parameters. For this app you can vary the starting values for susceptibles and infected, the inital number of recovered are fixed at 0.

The default outcome plots are boxplots, which show the distribution of the 3 outcomes of interest for the different parameter samples. You can set the number of samples you want to run. Samples are constructed using the latin hypercube method to efficiently span the space of possible parameter values. In general, more samples are better, but of course take longer to run.

Task 1

Since the creation of parameter samples involves some element of uncertainty, we need to make use of random numbers. We still want results to be reproducible. That’s where the random number seed comes in. As long as the seed is the same, the code should produce the same pseudo-random numbers each time, thus ensuring reproducibility. Let’s explore this.

Note that each sample means one simulation of the underlying dynamical model, so as sample numbers increase, things slow down. Also note the ‘system might not have reached steady state’ message. If for too many of the samples steady state has not been reached, the results for Sfinal and Ifinal do not reflect steady-state values. Increasing the simulation time can help the system reach a steady state (if there is one). For some parameter combinations, that can take very long.

Task 2

Task 3

Task 4

Task 5

The above approach of exploring the impact of a parameter on results by varying bounds is tedious. Also, often we have bounds that are specified by biology, and not subject to us changing them. It would still be useful to know how a given parameter impacts the results. This is where sensitivity analysis comes in. We run the same simulations, but now instead of plotting outcomes as a boxplot, we produce scatterplots for outcomes as function of each varied parameter.

Task 6

Since our model is rather simple, we can actually determine relations between parameters and some of the outcomes analytically. Specifically, it is possible to compute the steady state values for S and I. If you don’t know what steady states are and how to compute them, go through the “patterns of ID” and/or “model exploration” apps, where this is explained.

Task 7

Further Information

References

Hoare, Alexander, David G Regan, and David P Wilson. 2008. “Sampling and Sensitivity Analyses Tools (Sasat) for Computational Modelling.” Theor Biol Med Model 5: 4. https://doi.org/10.1186/1742-4682-5-4.

Marino, Simeone, Ian B. Hogue, Christian J. Ray, and Denise E. Kirschner. 2008. “A Methodology for Performing Global Uncertainty and Sensitivity Analysis in Systems Biology.” J. Theor. Biol 254 (1): 178–96. https://doi.org/10.1016/j.jtbi.2008.04.011.

Saltelli, A., Stefano Tarantola, Francesca Campolongo, and Marco Ratto. 2004. Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models. 1st ed. Wiley.