This tutorial documents how to solve determinstic compartmental models in R
using the deSolve
package, which includes a solver for differential equations. The tutorial assumes a basic knowledge of both R
functionality and epidemic models.
To get started, load the deSolve
library in R
. If it is not installed, use the install.packages
function to obtain the latest version.
library(deSolve)
This package is loaded automatically when you load the EpiModel
package.
To start, we model a very basic SI model in a closed population (no births or deaths) to show the basic functionality of solving and analyzing deterministic models. Running ordinary differential equation (ODE) models in R requires three broad steps as follows.
Each mathematical model solved using deSolve
must have its own R
function. The function structure should be the essentially the same for each model:
prev <- I/N
).dS
below) but must be consistent with the output.deSolve
outputs to a data.frame
for your further analysis and visualization. This should include, at the least, all the differential equations that you have defined in the function. This is structured in a list, with the equation names listed first in a combined vector. Always list the equation objects first, in this combined vector, and in the same order they are written above. After the equation output, you can also output dynamic calculations from the model run. These should be named in the manner below to ensure that the output has the appropriate column names.SI <- function(t, t0, parms) {
with(as.list(c(t0, parms)), {
## Dynamic Calculations
N <- S+I
lambda <- rho*c*I/N
## Differential Equations
dS <- -lambda*S
dI <- lambda*S
## Output
list(c(dS, dI),
N = N,
lambda = lambda)
})
}
Similar to Stella, it is necessary to define the fixed parameters, the initial compartment sizes, and the run specifications related to time.
vector
like below, with names matching exactly to their use in the model function. vector
. These compartments must be in exacty the same order as the differential equations that solve their derivatives are ordered in the function output. If you are experiencing unexpected model results, this is the first thing to check.dt <- 1:25
).params <- c(c = 4,
rho = 0.2)
t0 <- c(S = 999,
I = 1)
dt <- seq(from = 0,
to = 25,
by = 0.25)
Up to here, we have only specified all the elements of the model, but not yet solved it. That requires a call to the ode
function in deSolve
. This function has arguments requiring inputs of the four objects that we have just created: the initial state sizes, the time vector, the model function, and the fixed parameters. We also specify the integration method for solving the system: there are several possibilities, but we prefer the Runge-Kutta 4 method, which is also the default in Madonna. Euler's method, which tends not to perform as well for rapidly changing epidemics but which is default in Stella, can be specified with method = 'euler'
.
df <- data.frame(
ode(y = t0, times = dt, func = SI, parms = params, method = 'rk4')
)
The ode
function is output to its own native object class, but we prefer to convert this to a data.frame
for further plotting and analysis.
The model output is saved to the df
object, which we can now view and analyze. As described in the R Tutorial, there are several ways to extract output from a data frame.
head(df, 20)
time S I N lambda
1 0.00 999.0 1.000 1000 0.0008000
2 0.25 998.8 1.221 1000 0.0009769
3 0.50 998.5 1.491 1000 0.0011929
4 0.75 998.2 1.821 1000 0.0014565
5 1.00 997.8 2.223 1000 0.0017782
6 1.25 997.3 2.714 1000 0.0021709
7 1.50 996.7 3.312 1000 0.0026499
8 1.75 996.0 4.043 1000 0.0032342
9 2.00 995.1 4.933 1000 0.0039468
10 2.25 994.0 6.019 1000 0.0048153
11 2.50 992.7 7.342 1000 0.0058736
12 2.75 991.0 8.953 1000 0.0071624
13 3.00 989.1 10.913 1000 0.0087308
14 3.25 986.7 13.298 1000 0.0106381
15 3.50 983.8 16.194 1000 0.0129552
16 3.75 980.3 19.709 1000 0.0157670
17 4.00 976.0 23.968 1000 0.0191741
18 4.25 970.9 29.120 1000 0.0232957
19 4.50 964.7 35.339 1000 0.0282711
20 4.75 957.2 42.828 1000 0.0342623
df$I[1:100]
[1] 1.000 1.221 1.491 1.821 2.223 2.714 3.312 4.043
[9] 4.933 6.019 7.342 8.953 10.913 13.298 16.194 19.709
[17] 23.968 29.120 35.339 42.828 51.819 62.573 75.383 90.561
[25] 108.437 129.340 153.578 181.413 213.022 248.466 287.653 330.304
[33] 375.942 423.894 473.323 523.281 572.777 620.857 666.676 709.548
[41] 748.982 784.686 816.556 844.643 869.118 890.239 908.311 923.662
[49] 936.623 947.508 956.610 964.194 970.493 975.711 980.026 983.587
[57] 986.522 988.938 990.925 992.558 993.899 994.999 995.902 996.642
[65] 997.249 997.747 998.154 998.488 998.762 998.986 999.170 999.320
[73] 999.443 999.544 999.627 999.694 999.750 999.795 999.832 999.863
[81] 999.888 999.908 999.925 999.938 999.949 999.959 999.966 999.972
[89] 999.977 999.981 999.985 999.988 999.990 999.992 999.993 999.994
[97] 999.995 999.996 999.997 999.997
To plot the output, we use the plot
function to set up the plot, and graph the susceptible line. Next we need to also add the infected line, which requires a separate call to the lines
function.
par(mar=c(3.2,3.2,2,1), mgp=c(2,1,0))
plot(x=df$time, y=df$S, type='l', col='steelblue', lwd=3, xlab='Time', ylab='Prevalence')
lines(x=df$time, y=df$I, col='firebrick', lwd=3)
legend(x=22, y=500, legend=c('Sus', 'Inf'), lty=1, lwd=3, col=c('steelblue', 'firebrick'), cex=0.9, bty='n')
The next model is a reproduction of the variable mixing model in which we used the Q statistic to vary the propensity for mixing between high and low activity (in terms of their partner change rates) persons. For a sensitivity analysis, we vary the Q statistic from fully dissasortative mixing to partially dissasortative mixing to proportional (random) mixing to partially assortative mixing to fully assortative mixing.
The model function is specified in the same steps as the basic SI model, but with many more calculations needed for the flexible assortivity. Note that this is an SIS model in a closed population, so the equations for each group are balanced. Also note that we calculated the total disease prevalence and have included it as model output. Finally, as before, the order of the differential equation formulas and their corresponding output objects matches.
Qmod <- function(t, t0, parms) {
with(as.list(c(t0, parms)), {
## Dynamic Calculations ##
# Popsize
N.high <- S.high + I.high
N.low <- S.low + I.low
N <- N.high + N.low
prev <- (I.high+I.low)/N
# Contact rates
c.high <- (c.mean*N - c.low*N.low)/N.high
# mixing matrix calculations based on Q
g.hh <- ((c.high*N.high) + (Q*c.low*N.low)) / ((c.high*N.high) + (c.low*N.low))
g.lh <- 1 - g.hh
g.hl <- (1 - g.hh) * ((c.high*N.high) / (c.low*N.low))
g.ll <- 1 - g.hl
# prob that p is infected
p.high <- (g.hh*I.high/N.high)+(g.lh*I.low/N.low)
p.low <- (g.ll*I.low/N.low)+(g.hl*I.high/N.high)
# lambda
lambda.high <- rho * c.high * p.high
lambda.low <- rho * c.low * p.low
## Differential Equations ##
dS.high <- -lambda.high*S.high + nu*I.high
dI.high <- lambda.high*S.high - nu*I.high
dS.low <- -lambda.low*S.low + nu*I.low
dI.low <- lambda.low*S.low - nu*I.low
## Output ##
list(c(dS.high, dI.high,
dS.low, dI.low),
N = N,
prev = prev)
})
}
We want to vary the Q statistic from its minimum of -0.45 (calculation to derive that minimum not shown) to its maximum of 1. First we create a list of all the other model parameters that will not vary. Then we create five sets of parameters, adding a different value of Q to end.
params <- c(
c.mean = 2,
c.low = 1.4,
rho = 0.75,
nu = 6
)
p1 <- c(params, Q = -0.45)
p2 <- c(params, Q = -0.33)
p3 <- c(params, Q = 0)
p4 <- c(params, Q = 0.5)
p5 <- c(params, Q = 1)
The initial state sizes are a function of the total population and the proportion in each group, which we calculate ahead of time. The compartments are entered into the t0
vector in exactly the same order as their corresponding differential equations are listed. Finally, we specify our time step vector representing 25 years in approximate weekly intervals.
N.tot <- 20000000
prop.high <- 0.02
prop.low <- 0.98
t0 <- c(
S.high = N.tot*prop.high - 1,
I.high = 1,
S.low = N.tot*prop.low - 1,
I.low = 1
)
dt <- seq(1, 25, 0.02)
For a sensitivity analysis, one must run one model for each set of parameter values. Since we have five Q statistics, from which we create 5 parameter sets, we run the model 5 times, only varying the parms
input. Each model is saved out to a different object.
df1 <- data.frame(ode(y=t0, times=dt, func=Qmod, parms=p1, method='rk4'))
df2 <- data.frame(ode(y=t0, times=dt, func=Qmod, parms=p2, method='rk4'))
df3 <- data.frame(ode(y=t0, times=dt, func=Qmod, parms=p3, method='rk4'))
df4 <- data.frame(ode(y=t0, times=dt, func=Qmod, parms=p4, method='rk4'))
df5 <- data.frame(ode(y=t0, times=dt, func=Qmod, parms=p5, method='rk4'))
Another method to run sensitivity models that combines these steps into one easy process is to write your own function. As inputs to the function are the same inputs to the ode
function, but this allows for easy specification of the varying Q value.
Qsens <- function(params, Q, t0, Qmod) {
temp.params <- c(params, Q = Q)
df <- data.frame(ode(y=t0, times=dt, func=Qmod, parms=temp.params, method='rk4'))
return(df)
}
df1 <- Qsens(params, Q = -0.45, t0, Qmod)
df2 <- Qsens(params, Q = -0.33, t0, Qmod)
df3 <- Qsens(params, Q = 0, t0, Qmod)
df4 <- Qsens(params, Q = 0.5, t0, Qmod)
df5 <- Qsens(params, Q = 1, t0, Qmod)
Each time the function is called, it creates a temporary parameter set by combining params with our specific Q value, then runs the ode based on that parameter set, returning the results in the desired data frame class object. Writing our own function to automate the process is that easy.
The RColorBrewer
library includes helpful functions to access visually appealing color palettes that are commonly used in geographic mapping, but which make for nice line plots too. Here we load the library, specify a palette of 5 colors, and then set some plot margin options.
library(RColorBrewer)
pal <- brewer.pal(5, 'Set1')
par(mar=c(3.2,3.2,2,1), mgp=c(2,1,0))
Finally, we plot each line separately, using a different color from the palette (which is just a vector so elements may be accessed with standard indexing).
plot(df1$time, df1$prev, type='l', ylim=c(0,0.04), col=pal[1], lwd=3, xlab='Time', ylab='Prevalence')
lines(df2$time, df2$prev, col=pal[2], lwd=3)
lines(df3$time, df3$prev, col=pal[3], lwd=3)
lines(df4$time, df4$prev, col=pal[4], lwd=3)
lines(df5$time, df5$prev, col=pal[5], lwd=3)
legend('topright', legend=paste('Q =', c(-0.45, -0.33, 0, 0.5, 1)), lty=1, lwd=3, col=pal, cex=0.9, bg='white')
This next example considers the effect of different starting times for an intervention (or vaccine administration, etc.) that has the effect of reducing the \( \rho \) parameter (transmission probability per contact). This is an important scientific question because the population-level effectiveness of an intervention with a fixed individual-level efficacy dramatically depends on the stage of the epidemic when the intervention is introduced. The other factors that are quite important are the disease type (SI versus SIS) and the efficacy level of the intervention.
Below we show how to model this sort of intervention in an SI and SIS disease epidemic, with varying start times. We use a wrapper function, time.sens
, to facilitate easy running of various intervention efficacies and start times in the model. You can easily modify the baseline parameters (c
, rho
, and nu
), change the intervention start times in the start.times
vector, and also change the efficacy with the rel.haz
measure.
## Set a basic SI model
SI <- function(t, t0, parms) {
with(as.list(c(t0, parms)), {
## Dynamic Calculations
# Population size
N <- S+I
# Intervention start time: if time > start,
# then multiply lambda by relative hazard
if (t < start) {
lambda <- rho*c*I/N
} else {
lambda <- (rho*c*I/N) * rel.haz
}
## Differential Equations
dS <- -lambda*S
dI <- lambda*S
## Output
list(c(dS, dI),
N = N,
lambda = lambda)
})
}
## Try it with an SIS model
SIS <- function(t, t0, parms) {
with(as.list(c(t0, parms)), {
## Dynamic Calculations
# Population size
N <- S+I
# Intervention start time: if time > start,
# then multiply lambda by relative hazard
if (t < start) {
lambda <- rho*c*I/N
} else {
lambda <- (rho*c*I/N) * rel.haz
}
## Differential Equations
dS <- -lambda*S + nu*I
dI <- lambda*S - nu*I
## Output
list(c(dS, dI),
N = N,
lambda = lambda)
})
}
## This function wraps calcuations over a vector of start.times
time.sens <- function(mod, params, t0, dt, start.times, rel.haz) {
# Create an empty matrix for storing results
out <- matrix(rep(NA, max(dt)*length(start.times)), ncol=length(start.times))
# The base parameters are those fixed for all start times (which varies)
base.params <- c(params, rel.haz=rel.haz)
# Loop over all start times
for (i in 1:length(start.times)) {
temp.params <- c(base.params, start = start.times[i])
df <- data.frame(ode(y = t0, times = dt, func = mod, parms = temp.params, method = 'rk4'))
out[,i] <- df$I
}
# Set up a plot for each of the start times
pal <- rainbow(length(start.times))
plot(out[,1], type='n', col=pal[1], lwd=3, ylim=c(0,max(out)),
xlab='Time', ylab='Number Infected')
for (i in 1:ncol(out)) {
lines(x=dt, y=out[,i], col=pal[i], lwd=3)
}
legend('bottomright', paste('Start=', start.times, sep=''),
lty=1, col=pal, lwd=4, cex=0.8)
# Return the model output to a matrix
return(out)
}
## SI model tests
params <- c(c = 0.5,
rho = 0.1)
t0 <- c(S = 999,
I = 1)
dt <- 1:1000
start.times <- seq(50, 200, 50)
# Baseline model (intervention has no effect)
out <- time.sens(mod=SI, params, t0, dt, start.times, rel.haz=1)
# Counterfactual models (intervention reduces risk by 50%)
out <- time.sens(mod=SI, params, t0, dt, start.times, rel.haz=0.5)
## SIS model tests
params <- c(c = 0.5,
rho = 0.2,
nu = 0.05)
t0 <- c(S = 999,
I = 1)
dt <- 1:1000
start.times <- seq(50, 200, 50)
# Baseline model (intervention has no effect)
out <- time.sens(mod=SIS, params, t0, dt, start.times, rel.haz=1)
# Counterfacutal model (intervention reduces risk by 25%)
out <- time.sens(mod=SIS, params, t0, dt, start.times, rel.haz=0.75)