This app illustrates how to fit a mechanistic dynamical model to data and how to use simulated data to evaluate if it is possible to fit a specific model.
For this app, viral load data from patients infected with influenza is being fit. The data is average log viral titer on days 1-8 post infection. The data comes from (Hayden et al. 1996), specifically the ‘no treatment’ group shown in Figure 2 of this paper.
Another source of ‘data’ is by using our simulation to produce artificial data.
The underlying model that is being fit to the data is the basic virus model used in the app of this name. See that app for a description of the model.
This app fits the log viral titer of the data to the virus kinetics produced by the model simulation. The fit is evaluated by computing the sum of square errors between data and model for all data points, i.e. \[ SSR= \sum_t (Vm_t - Vd_t)^2 \] where \(Vm_t\) is the virus load (in log units) predicted from the model simulation at days \(t=1..8\) and \(Vd_t\) is the data, reported in those units and on those time points. The underlying code varies model parameters to try to get the predicted viral load from the model as close as possible to the data, by minimizing the SSR. The app reports the final SSR for the fit.
In general, with enough data, one could fit/estimate every parameter in the model and the initial conditions. However, with just the virus load data available, the data are not rich enough to allow estimation of all model parameters (even for a model as simple as this). The app is therefore implemented by assuming that most model parameters are known and fixed, and only 3, the rate of virus production, p, the rate of infection of cells, b, and the rate of virus death/removal, dV can be estimated. The app also allows to keep some of those parameters fixed, we’ll explore this in the tasks.
The model is assumed to run in units of days.
Generally, with increasing iterations, the fits get better. A fitting step or iteration is essentially a ‘try’ of the underlying code to find the best possible model. Increasing the tries usually improves the fit. In practice, one should not specify a fixed number of iterations, that is just done here so things run reasonably fast. Instead, one should ask the solver to run as long as it takes until it can’t find a way to further improve the fit (don’t further reduce the SSR). The technical expression for this is that the solver has converged to the solution. This can be done with the solver used here (nloptr
R package), but it would take too long, so we implement a “hard stop” after the specified number of iterations.
Ideally, with enough iterations, all solvers should reach the best fit with the lowest possible SSR. In practice, that does not always happen, often it depends on the starting conditions. Let’s explore this idea that starting values matter.
Optimizers can ‘get stuck’ and even running them for a longt ime, they might not find the best fit. What can happen is that a solver found a local optimum. It found a good fit, and now as it varies parameters, each new fit is worse, so the solver “thinks” it found the best fit, even though there are better ones further away in parameter space. Many solvers - even so-called ‘global’ solvers - can get stuck. Unfortunately, we never know if the solution is real or if the solver is stuck in a local optimum. One way to figure this out is to try different solvers and different starting conditions, and let each one run for a long time. If all return the same answer, no matter what type of solver you use and where you start, it’s quite likely (though not guaranteed) that we found the overall best fit (lowest SSR).
While that unit conversion factor shows up in most apps, it is arguably not that important if we explore our model without trying to fit it to data. But here, for fitting purposes, this is important. The experimental units are TCID50/mL, so in our model, virus load needs to have the same units. Then, to make all units work, g needs to have those units, i.e. convert from infectious virions at the site of infection to experimental units. Unfortunately, how one relates to the other is not quite clear. See e.g. (Handel, Longini, and Antia 2007) for a discussion of that. If you plan to fit models to data you collected, you need to pay attention to units and make sure what you simulate and the data you have are in agreement.
One major consideration when fitting these kind of mechanistic models to data is the balance between data availability and model complexity. The more and “richer” data one has available the more parameters one can estimate and therefore the more detailed a model can be. If one tries to ‘ask too much’ from the data, it leads to the problem of overfitting - trying to estimate more parameters than can be robustly estimated for a given dataset. One way to safeguard against overfitting is by probing if the model can in principle recover estimates in a scenario where parameter values are known. To do so, we can use our model with specific parameter values and simulate data. We can then fit the model to this simulated data. If everything works, we expect that - ideally independent of the starting values for our solver - we end up with estimated best-fit parameter values that agree with the ones we used to simulate the artificial data. We’ll try this now with the app.
Let’s see if the fitting routine can recover parameters from a simulation if we start with different initial guesses.
If you ran things long enough in the previous task you should have obtained best fit values that were the same as the ones you used to produce the simulated data, and the SSR should have been close to 0. That indicates that you can estimate these parameters with that kind of data. Once you’ve done this test, you can be somewhat confident that fitting your model to the real data will allow you to get robust parameter estimates.
Note that since you now change your data after you simulated it, you don’t expect the parameter values for the simulation and those you obtain from your best fit to be the same. However, if the noise is not too large, you expect them to be similar.
simulate_basicmodel_fit
. You can call them directly, without going through the shiny app. Use the help()
command for more information on how to use the functions directly. If you go that route, you need to use the results returned from this function and produce useful output (such as a plot) yourself.simulatorfunctions
. Of course to modify these functions, you’ll need to do some coding.vignette('DSAIRM')
into the R console.R
is (Bolker 2008). Note though that the focus is on ecological data and ODE-type models are not/barely discussed.pomp
package in R. (If you don’t know what stochastic models are, check out the stochastic apps in DSAIRM.)Bolker, Benjamin M. 2008. Ecological Models and Data in R. Princeton University Press.
Handel, Andreas, Ira M Longini Jr, and Rustom Antia. 2007. “Neuraminidase Inhibitor Resistance in Influenza: Assessing the Danger of Its Generation and Spread.” PLoS Comput Biol 3 (12): e240. https://doi.org/10.1371/journal.pcbi.0030240.
Hayden, F G, J J Treanor, R F Betts, M Lobo, J D Esinhart, and E K Hussey. 1996. “Safety and Efficacy of the Neuraminidase Inhibitor Gg167 in Experimental Human Influenza.” JAMA 275 (4): 295–99.
Hilborn, Ray, and Marc Mangel. 1997. The ecological detective : confronting models with data. Monographs in Population Biology 28. Princeton, N.J.: Princeton University Press.
Miao, Hongyu, Xiaohua Xia, Alan S. Perelson, and Hulin Wu. 2011. “On Identifiability of Nonlinear ODE Models and Applications in Viral Dynamics.” SIAM Review 53 (1): 3. https://doi.org/10.1137/090757009.