Extending DCMs Past EpiModel


This tutorial documents how to solve determinstic compartmental models in R using the deSolve package, which includes a solver for differential equations. The tutorial assumes a basic knowledge of both R functionality and epidemic models.

Library

To get started, load the deSolve library in R. If it is not installed, use the install.packages function to obtain the latest version.

library(deSolve)

This package is loaded automatically when you load the EpiModel package.

Basic SI Model

To start, we model a very basic SI model in a closed population (no births or deaths) to show the basic functionality of solving and analyzing deterministic models. Running ordinary differential equation (ODE) models in R requires three broad steps as follows.

Model Function

Each mathematical model solved using deSolve must have its own R function. The function structure should be the essentially the same for each model:

  1. It must have include the overall function structure, including the two lines at the top and two lines at the bottom. These lines should not be changed at all. They essentially pass in the initial conditions and fixed parameters to be evaluated over time.
  2. Next, include all dynamic calculations, including varying parameters like lambda (the force of infection), which changes as a function of population size. Here, we include the calculation for N just for example because we also want to output this in our model, but it is not dynamic in this model since we have a closed population. Dynamic calculations also include composite calculations (similar to converters in Stella) that you would like to output with the model (e.g., prev <- I/N).
  3. The differential equations should next be specified very similiarly to how they are written in Stella or Madonna software. The names of the differential equations are arbitrary (e.g., dS below) but must be consistent with the output.
  4. The output is what deSolve outputs to a data.frame for your further analysis and visualization. This should include, at the least, all the differential equations that you have defined in the function. This is structured in a list, with the equation names listed first in a combined vector. Always list the equation objects first, in this combined vector, and in the same order they are written above. After the equation output, you can also output dynamic calculations from the model run. These should be named in the manner below to ensure that the output has the appropriate column names.
SI <- function(t, t0, parms) {
  with(as.list(c(t0, parms)), {    

    ## Dynamic Calculations
    N <- S+I
    lambda <- rho*c*I/N

    ## Differential Equations
    dS <- -lambda*S
    dI <- lambda*S 

    ## Output
    list(c(dS, dI), 
         N = N, 
         lambda = lambda)
  })
}

Parameters, Initial States, and Time

Similar to Stella, it is necessary to define the fixed parameters, the initial compartment sizes, and the run specifications related to time.

  1. Parameters should go in a vector like below, with names matching exactly to their use in the model function.
  2. The initial conditions should also go in a vector. These compartments must be in exacty the same order as the differential equations that solve their derivatives are ordered in the function output. If you are experiencing unexpected model results, this is the first thing to check.
  3. Time should be specified as a vector of number with a starting time, stopping time, and interval. Below we specify time from \( t_0 \) to \( t_{25} \) in increments of quarter time units. If full time units are needed, the sequential integer notation may be used (i.e., dt <- 1:25).
params <- c(c = 4,
            rho = 0.2)
t0 <- c(S = 999, 
        I = 1)
dt <- seq(from = 0, 
          to = 25, 
          by = 0.25)

Running the Model

Up to here, we have only specified all the elements of the model, but not yet solved it. That requires a call to the ode function in deSolve. This function has arguments requiring inputs of the four objects that we have just created: the initial state sizes, the time vector, the model function, and the fixed parameters. We also specify the integration method for solving the system: there are several possibilities, but we prefer the Runge-Kutta 4 method, which is also the default in Madonna. Euler's method, which tends not to perform as well for rapidly changing epidemics but which is default in Stella, can be specified with method = 'euler'.

df <- data.frame(
  ode(y = t0, times = dt, func = SI, parms = params, method = 'rk4')
)

The ode function is output to its own native object class, but we prefer to convert this to a data.frame for further plotting and analysis.

Extracting and Plotting Output

The model output is saved to the df object, which we can now view and analyze. As described in the R Tutorial, there are several ways to extract output from a data frame.

head(df, 20)
   time     S      I    N    lambda
1  0.00 999.0  1.000 1000 0.0008000
2  0.25 998.8  1.221 1000 0.0009769
3  0.50 998.5  1.491 1000 0.0011929
4  0.75 998.2  1.821 1000 0.0014565
5  1.00 997.8  2.223 1000 0.0017782
6  1.25 997.3  2.714 1000 0.0021709
7  1.50 996.7  3.312 1000 0.0026499
8  1.75 996.0  4.043 1000 0.0032342
9  2.00 995.1  4.933 1000 0.0039468
10 2.25 994.0  6.019 1000 0.0048153
11 2.50 992.7  7.342 1000 0.0058736
12 2.75 991.0  8.953 1000 0.0071624
13 3.00 989.1 10.913 1000 0.0087308
14 3.25 986.7 13.298 1000 0.0106381
15 3.50 983.8 16.194 1000 0.0129552
16 3.75 980.3 19.709 1000 0.0157670
17 4.00 976.0 23.968 1000 0.0191741
18 4.25 970.9 29.120 1000 0.0232957
19 4.50 964.7 35.339 1000 0.0282711
20 4.75 957.2 42.828 1000 0.0342623
df$I[1:100]
  [1]   1.000   1.221   1.491   1.821   2.223   2.714   3.312   4.043
  [9]   4.933   6.019   7.342   8.953  10.913  13.298  16.194  19.709
 [17]  23.968  29.120  35.339  42.828  51.819  62.573  75.383  90.561
 [25] 108.437 129.340 153.578 181.413 213.022 248.466 287.653 330.304
 [33] 375.942 423.894 473.323 523.281 572.777 620.857 666.676 709.548
 [41] 748.982 784.686 816.556 844.643 869.118 890.239 908.311 923.662
 [49] 936.623 947.508 956.610 964.194 970.493 975.711 980.026 983.587
 [57] 986.522 988.938 990.925 992.558 993.899 994.999 995.902 996.642
 [65] 997.249 997.747 998.154 998.488 998.762 998.986 999.170 999.320
 [73] 999.443 999.544 999.627 999.694 999.750 999.795 999.832 999.863
 [81] 999.888 999.908 999.925 999.938 999.949 999.959 999.966 999.972
 [89] 999.977 999.981 999.985 999.988 999.990 999.992 999.993 999.994
 [97] 999.995 999.996 999.997 999.997

To plot the output, we use the plot function to set up the plot, and graph the susceptible line. Next we need to also add the infected line, which requires a separate call to the lines function.

par(mar=c(3.2,3.2,2,1), mgp=c(2,1,0))
plot(x=df$time, y=df$S, type='l', col='steelblue', lwd=3, xlab='Time', ylab='Prevalence')
lines(x=df$time, y=df$I, col='firebrick', lwd=3)
legend(x=22, y=500, legend=c('Sus', 'Inf'), lty=1, lwd=3, col=c('steelblue', 'firebrick'), cex=0.9, bty='n')

plot of chunk unnamed-chunk-7

Two-Group Variable Mixing Model with Sensitivity Analyses

The next model is a reproduction of the variable mixing model in which we used the Q statistic to vary the propensity for mixing between high and low activity (in terms of their partner change rates) persons. For a sensitivity analysis, we vary the Q statistic from fully dissasortative mixing to partially dissasortative mixing to proportional (random) mixing to partially assortative mixing to fully assortative mixing.

Model Function

The model function is specified in the same steps as the basic SI model, but with many more calculations needed for the flexible assortivity. Note that this is an SIS model in a closed population, so the equations for each group are balanced. Also note that we calculated the total disease prevalence and have included it as model output. Finally, as before, the order of the differential equation formulas and their corresponding output objects matches.

Qmod <- function(t, t0, parms) {
  with(as.list(c(t0, parms)), {    

    ## Dynamic Calculations ##

    # Popsize
    N.high <- S.high + I.high
    N.low <- S.low + I.low
    N <- N.high + N.low
    prev <- (I.high+I.low)/N

    # Contact rates
    c.high <- (c.mean*N - c.low*N.low)/N.high

    # mixing matrix calculations based on Q
    g.hh <- ((c.high*N.high) + (Q*c.low*N.low)) / ((c.high*N.high) + (c.low*N.low))
    g.lh <- 1 - g.hh 
    g.hl <- (1 - g.hh) * ((c.high*N.high) / (c.low*N.low))
    g.ll <- 1 - g.hl

    # prob that p is infected
    p.high <- (g.hh*I.high/N.high)+(g.lh*I.low/N.low)
    p.low <- (g.ll*I.low/N.low)+(g.hl*I.high/N.high)

    # lambda
    lambda.high <- rho * c.high * p.high
    lambda.low <- rho * c.low * p.low


    ## Differential Equations ##
    dS.high <- -lambda.high*S.high + nu*I.high
    dI.high <- lambda.high*S.high - nu*I.high

    dS.low <- -lambda.low*S.low + nu*I.low
    dI.low <- lambda.low*S.low - nu*I.low


    ## Output ##
    list(c(dS.high, dI.high,
           dS.low, dI.low), 
           N = N,
           prev = prev)
  })
}

Sensitivity Parameters for Q

We want to vary the Q statistic from its minimum of -0.45 (calculation to derive that minimum not shown) to its maximum of 1. First we create a list of all the other model parameters that will not vary. Then we create five sets of parameters, adding a different value of Q to end.

params <- c(
  c.mean = 2,
  c.low = 1.4,
  rho = 0.75,
  nu = 6
)
p1 <- c(params, Q = -0.45)
p2 <- c(params, Q = -0.33)
p3 <- c(params, Q = 0)
p4 <- c(params, Q = 0.5)
p5 <- c(params, Q = 1)

Initial States and Time

The initial state sizes are a function of the total population and the proportion in each group, which we calculate ahead of time. The compartments are entered into the t0 vector in exactly the same order as their corresponding differential equations are listed. Finally, we specify our time step vector representing 25 years in approximate weekly intervals.

N.tot <- 20000000
prop.high <- 0.02
prop.low <- 0.98

t0 <- c(
  S.high = N.tot*prop.high - 1,
  I.high = 1,
  S.low = N.tot*prop.low - 1,
  I.low = 1
)
dt <- seq(1, 25, 0.02)

Running Sensitivity Models

For a sensitivity analysis, one must run one model for each set of parameter values. Since we have five Q statistics, from which we create 5 parameter sets, we run the model 5 times, only varying the parms input. Each model is saved out to a different object.

df1 <- data.frame(ode(y=t0, times=dt, func=Qmod, parms=p1, method='rk4'))
df2 <- data.frame(ode(y=t0, times=dt, func=Qmod, parms=p2, method='rk4'))
df3 <- data.frame(ode(y=t0, times=dt, func=Qmod, parms=p3, method='rk4'))
df4 <- data.frame(ode(y=t0, times=dt, func=Qmod, parms=p4, method='rk4'))
df5 <- data.frame(ode(y=t0, times=dt, func=Qmod, parms=p5, method='rk4'))

Alternative Method

Another method to run sensitivity models that combines these steps into one easy process is to write your own function. As inputs to the function are the same inputs to the ode function, but this allows for easy specification of the varying Q value.

Qsens <- function(params, Q, t0, Qmod) {
  temp.params <- c(params, Q = Q)
  df <- data.frame(ode(y=t0, times=dt, func=Qmod, parms=temp.params, method='rk4'))
  return(df)
}
df1 <- Qsens(params, Q = -0.45, t0, Qmod)
df2 <- Qsens(params, Q = -0.33, t0, Qmod)
df3 <- Qsens(params, Q = 0, t0, Qmod)
df4 <- Qsens(params, Q = 0.5, t0, Qmod)
df5 <- Qsens(params, Q = 1, t0, Qmod)

Each time the function is called, it creates a temporary parameter set by combining params with our specific Q value, then runs the ode based on that parameter set, returning the results in the desired data frame class object. Writing our own function to automate the process is that easy.

Plotting Results

The RColorBrewer library includes helpful functions to access visually appealing color palettes that are commonly used in geographic mapping, but which make for nice line plots too. Here we load the library, specify a palette of 5 colors, and then set some plot margin options.

library(RColorBrewer)
pal <- brewer.pal(5, 'Set1')
par(mar=c(3.2,3.2,2,1), mgp=c(2,1,0))

Finally, we plot each line separately, using a different color from the palette (which is just a vector so elements may be accessed with standard indexing).

plot(df1$time, df1$prev, type='l', ylim=c(0,0.04), col=pal[1], lwd=3, xlab='Time', ylab='Prevalence')
lines(df2$time, df2$prev, col=pal[2], lwd=3)
lines(df3$time, df3$prev, col=pal[3], lwd=3)
lines(df4$time, df4$prev, col=pal[4], lwd=3)
lines(df5$time, df5$prev, col=pal[5], lwd=3)
legend('topright', legend=paste('Q =', c(-0.45, -0.33, 0, 0.5, 1)), lty=1, lwd=3, col=pal, cex=0.9, bg='white')

plot of chunk unnamed-chunk-14

Intervention Start Times

This next example considers the effect of different starting times for an intervention (or vaccine administration, etc.) that has the effect of reducing the \( \rho \) parameter (transmission probability per contact). This is an important scientific question because the population-level effectiveness of an intervention with a fixed individual-level efficacy dramatically depends on the stage of the epidemic when the intervention is introduced. The other factors that are quite important are the disease type (SI versus SIS) and the efficacy level of the intervention.

Below we show how to model this sort of intervention in an SI and SIS disease epidemic, with varying start times. We use a wrapper function, time.sens, to facilitate easy running of various intervention efficacies and start times in the model. You can easily modify the baseline parameters (c, rho, and nu), change the intervention start times in the start.times vector, and also change the efficacy with the rel.haz measure.

## Set a basic SI model
SI <- function(t, t0, parms) {
  with(as.list(c(t0, parms)), {    

    ## Dynamic Calculations
    # Population size
    N <- S+I

    # Intervention start time: if time > start, 
    #   then multiply lambda by relative hazard
    if (t < start) {
      lambda <- rho*c*I/N
    } else {
      lambda <- (rho*c*I/N) * rel.haz
    }

    ## Differential Equations
    dS <- -lambda*S
    dI <- lambda*S 

    ## Output
    list(c(dS, dI), 
         N = N, 
         lambda = lambda)
  })
}

## Try it with an SIS model
SIS <- function(t, t0, parms) {
  with(as.list(c(t0, parms)), {    

    ## Dynamic Calculations

    # Population size
    N <- S+I

    # Intervention start time: if time > start, 
    #   then multiply lambda by relative hazard
    if (t < start) {
      lambda <- rho*c*I/N
    } else {
      lambda <- (rho*c*I/N) * rel.haz
    }

    ## Differential Equations
    dS <- -lambda*S + nu*I
    dI <- lambda*S - nu*I

    ## Output
    list(c(dS, dI), 
         N = N, 
         lambda = lambda)
  })
}

## This function wraps calcuations over a vector of start.times
time.sens <- function(mod, params, t0, dt, start.times, rel.haz) {

  # Create an empty matrix for storing results
  out <- matrix(rep(NA, max(dt)*length(start.times)), ncol=length(start.times))

  # The base parameters are those fixed for all start times (which varies)
  base.params <- c(params, rel.haz=rel.haz)

  # Loop over all start times
  for (i in 1:length(start.times)) {
    temp.params <- c(base.params, start = start.times[i])
    df <- data.frame(ode(y = t0, times = dt, func = mod, parms = temp.params, method = 'rk4'))
    out[,i] <- df$I
  }

  # Set up a plot for each of the start times
  pal <- rainbow(length(start.times))
  plot(out[,1], type='n', col=pal[1], lwd=3, ylim=c(0,max(out)),
       xlab='Time', ylab='Number Infected')
  for (i in 1:ncol(out)) {
    lines(x=dt, y=out[,i], col=pal[i], lwd=3)
  }
  legend('bottomright', paste('Start=', start.times, sep=''), 
         lty=1, col=pal, lwd=4, cex=0.8)

  # Return the model output to a matrix
  return(out)
}

## SI model tests
params <- c(c = 0.5,
            rho = 0.1)
t0 <- c(S = 999, 
        I = 1)
dt <- 1:1000
start.times <- seq(50, 200, 50)

# Baseline model (intervention has no effect)
out <- time.sens(mod=SI, params, t0, dt, start.times, rel.haz=1)

plot of chunk unnamed-chunk-15


# Counterfactual models (intervention reduces risk by 50%)
out <- time.sens(mod=SI, params, t0, dt, start.times, rel.haz=0.5)

plot of chunk unnamed-chunk-15


## SIS model tests
params <- c(c = 0.5,
            rho = 0.2,
            nu = 0.05)
t0 <- c(S = 999, 
        I = 1)
dt <- 1:1000
start.times <- seq(50, 200, 50)

# Baseline model (intervention has no effect)
out <- time.sens(mod=SIS, params, t0, dt, start.times, rel.haz=1)

plot of chunk unnamed-chunk-15


# Counterfacutal model (intervention reduces risk by 25%)
out <- time.sens(mod=SIS, params, t0, dt, start.times, rel.haz=0.75)

plot of chunk unnamed-chunk-15