The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Parametric models

library(serosv)
library(dplyr)
library(magrittr)

Frequentist methods

Polynomial models

Seroprevalence is modeled using the following serocatalytic model

\[ \pi(a) = 1 - \text{exp}({-\Sigma_{i=1}^k \beta_i a^i}) \]

Where:

Which implies the force of infection is \(\lambda(a) = \Sigma_{i=1}^k \beta_i i a^{i-1}\)

This generalization encompasses several classical serocatalytic model including

Refer to Chapter 6.1.1 of the book by Hens et al. (2012) for a more detailed explanation of the methods.

Fitting data

We will use the Parvo B19 data from Finland 1997–1998 for this example.

data <- parvob19_fi_1997_1998[order(parvob19_fi_1997_1998$age), ]

To fit a polynomial model, use the polynomial_model() function.

# Fit a Muench model
muench <- polynomial_model(data, k = 1, status_col="seropositive")
summary(muench$info)
#> 
#> Call:
#> glm(formula = Age(k), family = binomial(link = link), data = df)
#> 
#> Coefficients:
#>      Estimate Std. Error z value Pr(>|z|)    
#> age -0.029088   0.001375  -21.15   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance:    Inf  on 1117  degrees of freedom
#> Residual deviance: 1366.9  on 1116  degrees of freedom
#> AIC: 1368.9
#> 
#> Number of Fisher Scoring iterations: 6
plot(muench) 

The users can also choose to provide a range of values for \(k\) in which case the package will try to find the best \(k\) parameter determined by Loglikelihood Ratio test (LRT)

# Provide a range of values for k
best_param <- polynomial_model(data, k = 1:5, status_col = "seropositive")
plot(best_param)

# View the best model here which suggests k = 4 is the best parameter value
best_param
#> Polynomial model
#> 
#> Input type:  linelisting 
#> Degree (k):  4 
#> 
#> Call:  glm(formula = Age(k), family = binomial(link = link), data = df)
#> 
#> Coefficients:
#>        age    I(age^2)    I(age^3)    I(age^4)  
#> -3.381e-02  -9.950e-04   5.551e-05  -5.737e-07  
#> 
#> Degrees of Freedom: 1117 Total (i.e. Null);  1113 Residual
#> Null Deviance:       Inf 
#> Residual Deviance: 1336  AIC: 1344

Fractional polynomial model

Proposed model

Fractional polynomial model generalize conventional polynomial class of functions. In the context of binary responses, a fractional polynomial of degree \(m\) for the linear predictor is defined as followed

\[ \eta_m(a, \beta, p_1, p_2, ...,p_m) = \Sigma^m_{i=0} \beta_i H_i(a) \]

Where \(m\) is an integer, \(p_1 \le p_2 \le... \le p_m\) is a sequence of powers, and \(H_i(a)\) is a transformation given by

\[ H_i = \begin{cases} a^{p_i} & \text{ if } p_i \neq p_{i-1}, \\ H_{i-1}(a) \times log(a) & \text{ if } p_i = p_{i-1}, \end{cases} \]

Refer to Chapter 6.2 of the book by Hens et al. (2012) for a more detailed explanation of the methods.

Fitting data

Use fp_model() to fit a fractional polynomial model.

The parameter p specifies the powers of each polynomial term (length of p is thus the model’s degree)

hav <- hav_be_1993_1994
model <- fp_model(hav, p=c(1, 1.5), link="cloglog")
model
#> Fractional polynomial model 
#> 
#> Input type:  aggregated 
#> Powers:  1, 1.5 
#> 
#> Call:  glm(formula = as.formula(formulate(p)), family = binomial(link = link))
#> 
#> Coefficients:
#> (Intercept)     I(age^1)   I(age^1.5)  
#>    -4.56685      0.22340     -0.01775  
#> 
#> Degrees of Freedom: 85 Total (i.e. Null);  83 Residual
#> Null Deviance:       1320 
#> Residual Deviance: 85.81     AIC: 365.4
plot(model)

The users can also tell the package to perform parameter selection by providing p as a named list with 2 elements:

model <- fp_model(hav, 
                  p=list(
                    p_range=seq(-2,3,0.1),
                    degree=2
                  ), 
                  monotonic=FALSE,
                  link="cloglog")
plot(model)

# the best set of powers for this dataset is 1.5 and 1.6
model
#> Fractional polynomial model 
#> 
#> Input type:  aggregated 
#> Powers:  1.5, 1.6 
#> 
#> Call:  glm(formula = as.formula(formulate(curr_p)), family = binomial(link = link))
#> 
#> Coefficients:
#> (Intercept)   I(age^1.5)   I(age^1.6)  
#>    -3.61083      0.12443     -0.07656  
#> 
#> Degrees of Freedom: 85 Total (i.e. Null);  83 Residual
#> Null Deviance:       1320 
#> Residual Deviance: 81.6  AIC: 361.2

To restrict the parameters search such that the predictions are monotonic (thus ensuring the FOI to be \(\lambda \geq 0\)) set monotonic=TRUE

# ---- Best model with the monotonic constraint -----
model <- fp_model(hav, 
                  p=list(
                    p_range=seq(-2,3,0.1),
                    degree=2
                  ), 
                  monotonic=TRUE,
                  link="cloglog")
plot(model)

# the best set of powers with the monotonic constraint is 0.5 and 1.1
model
#> Fractional polynomial model 
#> 
#> Input type:  aggregated 
#> Powers:  0.5, 1.1 
#> 
#> Call:  glm(formula = as.formula(formulate(curr_p)), family = binomial(link = link))
#> 
#> Coefficients:
#> (Intercept)   I(age^0.5)   I(age^1.1)  
#>    -7.64170      1.67492     -0.05304  
#> 
#> Degrees of Freedom: 85 Total (i.e. Null);  83 Residual
#> Null Deviance:       1320 
#> Residual Deviance: 106   AIC: 385.5

Nonlinear models

Farrington model

Proposed model

For Farrington’s model, the force of infection was defined non-negative for all a \(\lambda(a) \geq 0\) and increases to a peak in a linear fashion followed by an exponential decrease

\[ \lambda(a) = (\alpha a - \gamma)e^{-\beta a} + \gamma \]

Where \(\gamma\) is called the long term residual for FOI, as \(a \rightarrow \infty\) , \(\lambda (a) \rightarrow \gamma\)

Integrating \(\lambda(a)\) would results in the following non-linear model for prevalence

\[ \pi (a) = 1 - e^{-\int_0^a \lambda(s) ds} \\ = 1 - exp\{ \frac{\alpha}{\beta}ae^{-\beta a} + \frac{1}{\beta}(\frac{\alpha}{\beta} - \gamma)(e^{-\beta a} - 1) -\gamma a \} \]

Refer to Chapter 6.1.2 of the book by Hens et al. (2012) for a more detailed explanation of the methods.

Fitting data

Use farrington_model() to fit a Farrington’s model.

farrington_md <- farrington_model(
   rubella_uk_1986_1987,
   start=list(alpha=0.07,beta=0.1,gamma=0.03)
   )
farrington_md
#> Farrington model 
#> 
#> Input type:  aggregated 
#> 
#> Call:
#> mle(minuslogl = farrington, start = start, fixed = fixed)
#> 
#> Coefficients:
#>      alpha       beta      gamma 
#> 0.07034904 0.20243950 0.03665599
plot(farrington_md)

Weibull model

Proposed model

For a Weibull model, the prevalence is given by

\[ \pi (d) = 1 - e^{ - \beta_0 d ^ {\beta_1}} \]

Where \(d\) is exposure time (difference between age of injection and age at test)

The model was reformulated as a GLM model with log - log link and linear predictor using log(d)

\[\eta(d) = log(\beta_0) + \beta_1 log(d)\]

Thus implies that the force of infection is a monotone function of the exposure time as followed

\[ \lambda(d) = \beta_0 \beta_1 d^{\beta_1 - 1} \]

Refer to Chapter 6.1.2 of the book by Hens et al. (2012) for a more detailed explanation of the methods.

Fitting data

Use weibull_model() to fit a Weibull model.

hcv <- hcv_be_2006[order(hcv_be_2006$dur), ]

wb_md <- hcv %>% weibull_model(t_lab = "dur", status_col="seropositive")
wb_md
#> Weibull model 
#> 
#> Input type:  linelisting 
#> b0=-0.276, b1=0.3807 
#> 
#> Call:  glm(formula = spos ~ log(t), family = binomial(link = "cloglog"))
#> 
#> Coefficients:
#> (Intercept)       log(t)  
#>     -0.2760       0.3807  
#> 
#> Degrees of Freedom: 420 Total (i.e. Null);  419 Residual
#> Null Deviance:       452.1 
#> Residual Deviance: 419.4     AIC: 423.4
plot(wb_md) 

Bayesian methods

Proposed approach

Consider a model for prevalence that has a parametric form \(\pi(a_i, \alpha)\) where \(\alpha\) is a parameter vector

One can constraint the parameter space of the prior distribution \(P(\alpha)\) in order to achieve the desired monotonicity of the posterior distribution \(P(\pi_1, \pi_2, ..., \pi_m|y,n)\)

Where:

Farrington

Proposed model

The model for prevalence is as followed

\[ \pi (a) = 1 - exp\{ \frac{\alpha_1}{\alpha_2}ae^{-\alpha_2 a} + \frac{1}{\alpha_2}(\frac{\alpha_1}{\alpha_2} - \alpha_3)(e^{-\alpha_2 a} - 1) -\alpha_3 a \} \]

For likelihood model, independent binomial distribution are assumed for the number of infected individuals at age \(a_i\)

\[ y_i \sim Bin(n_i, \pi_i), \text{ for } i = 1,2,3,...m \]

The constraint on the parameter space can be incorporated by assuming truncated normal distribution for the components of \(\alpha\), \(\alpha = (\alpha_1, \alpha_2, \alpha_3)\) in \(\pi_i = \pi(a_i,\alpha)\)

\[ \alpha_j \sim \text{truncated } \mathcal{N}(\mu_j, \tau_j), \text{ } j = 1,2,3 \]

The joint posterior distribution for \(\alpha\) can be derived by combining the likelihood and prior as followed

\[ P(\alpha|y) \propto \prod^m_{i=1} \text{Bin}(y_i|n_i, \pi(a_i, \alpha)) \prod^3_{i=1}-\frac{1}{\tau_j}\text{exp}(\frac{1}{2\tau^2_j} (\alpha_j - \mu_j)^2) \]

The full conditional distribution of \(\alpha_i\) is thus \[ P(\alpha_i|\alpha_j,\alpha_k, k, j \neq i) \propto -\frac{1}{\tau_i}\text{exp}(\frac{1}{2\tau^2_i} (\alpha_i - \mu_i)^2) \prod^m_{i=1} \text{Bin}(y_i|n_i, \pi(a_i, \alpha)) \]

Refer to Chapter 10.3.1 of the book by Hens et al. (2012) for a more detailed explanation of the method.

Fitting data

To fit Farrington model, use hierarchical_bayesian_model() and define type = "far2" or type = "far3" where

df <- mumps_uk_1986_1987
model <- hierarchical_bayesian_model(df, type="far3")
#> 
#> SAMPLING FOR MODEL 'fra_3' NOW (CHAIN 1).
#> Chain 1: Rejecting initial value:
#> Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
#> Chain 1:   Stan can't start sampling from this initial value.
#> Chain 1: Rejecting initial value:
#> Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
#> Chain 1:   Stan can't start sampling from this initial value.
#> Chain 1: Rejecting initial value:
#> Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
#> Chain 1:   Stan can't start sampling from this initial value.
#> Chain 1: Rejecting initial value:
#> Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
#> Chain 1:   Stan can't start sampling from this initial value.
#> Chain 1: Rejecting initial value:
#> Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
#> Chain 1:   Stan can't start sampling from this initial value.
#> Chain 1: 
#> Chain 1: Gradient evaluation took 4.4e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.44 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 5000 [  0%]  (Warmup)
#> Chain 1: Iteration:  500 / 5000 [ 10%]  (Warmup)
#> Chain 1: Iteration: 1000 / 5000 [ 20%]  (Warmup)
#> Chain 1: Iteration: 1500 / 5000 [ 30%]  (Warmup)
#> Chain 1: Iteration: 1501 / 5000 [ 30%]  (Sampling)
#> Chain 1: Iteration: 2000 / 5000 [ 40%]  (Sampling)
#> Chain 1: Iteration: 2500 / 5000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 3000 / 5000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 3500 / 5000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 4000 / 5000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 4500 / 5000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 5000 / 5000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 1.438 seconds (Warm-up)
#> Chain 1:                2.796 seconds (Sampling)
#> Chain 1:                4.234 seconds (Total)
#> Chain 1:
model
#> Hierarchical Bayesian model 
#> 
#> Input type:  aggregated 
#> Model:  Farrington model with 3 parameters 
#> 
#> Fitted parameters:
#>  alpha1 = 0.1398 (95% CrI [0.129, 0.1506], sd = 0.005772)
#>  alpha2 = 0.1992 (95% CrI [0.1854, 0.2178], sd = 0.008347)
#>  alpha3 = 0.009197 (95% CrI [0.000544, 0.02878], sd = 0.007448)
plot(model)

Log-logistic

Proposed approach

The model for seroprevalence is as followed

\[ \pi(a) = \frac{\beta a^\alpha}{1 + \beta a^\alpha}, \text{ } \alpha, \beta > 0 \]

The likelihood is specified to be the same as Farrington model (\(y_i \sim Bin(n_i, \pi_i)\)) with

\[ \text{logit}(\pi(a)) = \alpha_2 + \alpha_1\log(a) \]

The prior model of \(\alpha_1\) is specified as \(\alpha_1 \sim \text{truncated } \mathcal{N}(\mu_1, \tau_1)\) with flat hyperprior as in Farrington model

\(\beta\) is constrained to be positive by specifying \(\alpha_2 \sim \mathcal{N}(\mu_2, \tau_2)\)

The full conditional distribution of \(\alpha_1\) is thus

\[ P(\alpha_1|\alpha_2) \propto -\frac{1}{\tau_1} \text{exp} (\frac{1}{2 \tau_1^2} (\alpha_1 - \mu_1)^2) \prod_{i=1}^m \text{Bin}(y_i|n_i,\pi(a_i, \alpha_1, \alpha_2) ) \]

And \(\alpha_2\) can be derived in the same way

Refer to Chapter 10.3.3 of the book by Hens et al. (2012) for a more detailed explanation of the method.

Fitting data

To fit Log-logistic model, use hierarchical_bayesian_model() and define type = "log_logistic"

df <- rubella_uk_1986_1987
model <- hierarchical_bayesian_model(df, type="log_logistic")
#> 
#> SAMPLING FOR MODEL 'log_logistic' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 2e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 5000 [  0%]  (Warmup)
#> Chain 1: Iteration:  500 / 5000 [ 10%]  (Warmup)
#> Chain 1: Iteration: 1000 / 5000 [ 20%]  (Warmup)
#> Chain 1: Iteration: 1500 / 5000 [ 30%]  (Warmup)
#> Chain 1: Iteration: 1501 / 5000 [ 30%]  (Sampling)
#> Chain 1: Iteration: 2000 / 5000 [ 40%]  (Sampling)
#> Chain 1: Iteration: 2500 / 5000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 3000 / 5000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 3500 / 5000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 4000 / 5000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 4500 / 5000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 5000 / 5000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.348 seconds (Warm-up)
#> Chain 1:                0.15 seconds (Sampling)
#> Chain 1:                0.498 seconds (Total)
#> Chain 1:
model
#> Hierarchical Bayesian model 
#> 
#> Input type:  aggregated 
#> Model:  Log-logistic model 
#> 
#> Fitted parameters:
#>  alpha1 = 1.651 (95% CrI [1.544, 1.765], sd = 0.05827)
#>  alpha2 = -2.979 (95% CrI [-3.264, -2.726], sd = 0.1356)
plot(model)

Grenfell, B. T., and R. M. Anderson. 1985. “The Estimation of Age-Related Rates of Infection from Case Notifications and Serological Data.” The Journal of Hygiene 95 (2): 419–36. https://doi.org/10.1017/s0022172400062859.
Hens, Niel, Ziv Shkedy, Marc Aerts, Christel Faes, Pierre Van Damme, and Philippe Beutels. 2012. Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective. Statistics for Biology and Health. Springer New York. https://doi.org/10.1007/978-1-4614-4072-7.
Muench, Hugo. 1934. “Derivation of Rates from Summation Data by the Catalytic Curve.” Journal of the American Statistical Association 29 (185): 25–38. https://doi.org/10.1080/01621459.1934.10502684.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.