Modeling eDNA qPCR Data with artemis

library(artemis)

Introduction

A primary purpose of the artemis package is to facilitate modeling of qPCR data from eDNA samples. It does this via two functions: eDNA_lm() for fixed effects models and eDNA_lmer() for mixed effects models.

The underlying Stan models are compiled on install. Thereafter, the models will not need to be re-compiled. The model’s Stan code can be found in the artemis source code here.

Model Inputs

Both modeling functions require the following inputs:

  1. A vector of numeric Cq values (one for each qPCR replicate). Cq values corresponding to non-detections for your assay should be recorded as the threshold value (the default is 40.0 cycles).

  2. The intercept value \(\alpha\) and the slope value \(\beta\) from a standard curve equation associated with the qPCR analysis. This is used to convert the observed Cq values to the corresponding log concentration of eDNA.

An example of qPCR data in the correct format for modeling with artemis can be viewed by calling eDNA_data, which is a data.frame with Cq values from live car experiments completed in the California Sacramento-San Joaquin Delta with Delta Smelt:

head(eDNA_data)
#>         Date FilterID TechRep    Cq Distance_m Volume_mL Biomass_N
#> 1 2017-08-02  cvp-1-1       1 40.00         50        50       100
#> 2 2017-08-02  cvp-1-1       2 38.13         50        50       100
#> 3 2017-08-02  cvp-1-1       3 37.38         50        50       100
#> 4 2017-08-02 cvp-1-10       1 36.24         40       200       100
#> 5 2017-08-02 cvp-1-10       2 40.00         40       200       100
#> 6 2017-08-02 cvp-1-10       3 40.00         40       200       100
#>   StdCrvAlpha_lnForm StdCrvBeta_lnForm
#> 1             21.168            -1.529
#> 2             21.168            -1.529
#> 3             21.168            -1.529
#> 4             21.168            -1.529
#> 5             21.168            -1.529
#> 6             21.168            -1.529
str(eDNA_data)
#> 'data.frame':    180 obs. of  9 variables:
#>  $ Date              : Date, format: "2017-08-02" "2017-08-02" ...
#>  $ FilterID          : chr  "cvp-1-1" "cvp-1-1" "cvp-1-1" "cvp-1-10" ...
#>  $ TechRep           : num  1 2 3 1 2 3 1 2 3 1 ...
#>  $ Cq                : num  40 38.1 37.4 36.2 40 ...
#>  $ Distance_m        : num  50 50 50 40 40 40 40 40 40 40 ...
#>  $ Volume_mL         : num  50 50 50 200 200 200 200 200 200 200 ...
#>  $ Biomass_N         : num  100 100 100 100 100 100 100 100 100 100 ...
#>  $ StdCrvAlpha_lnForm: num  21.2 21.2 21.2 21.2 21.2 ...
#>  $ StdCrvBeta_lnForm : num  -1.53 -1.53 -1.53 -1.53 -1.53 ...

Note that there are no variable levels with missing or NA values - Stan models do not take NA values, and any rows with NAs in the data will be dropped in the construction of the model matrix when the data is prepped for modeling.


Fixed effects models with eDNA_lm()

To fit a fixed effects model to the sample eDNA_data where Distance_m is the only predictor, we give the function a model formula and the input data listed above:


model_fit = eDNA_lm(Cq ~ Distance_m, 
                    data = eDNA_data,
                    std_curve_alpha = 21.2, std_curve_beta = -1.5)


The model functions, similar to lm() in base R, will automatically add an intercept term. You can explicitly omit the intercept if you have a good reason for doing so. Full control of the MCMC algorithm can be accomplished by adding these control arguments to the end of the eDNA_lm*() call, which then passes them on to cmdstanr::model$sample(). Note: the arguments for the model follow the naming conventions for cmdstan, which differ from those used by rstan. Documentation of these arguments can be found here

For example,

model_fit = eDNA_lm(Cq ~ Distance_m, 
                    data = eDNA_data,
                    std_curve_alpha = 21.2, std_curve_beta = -1.5,
                    seed = 1234, 
                    chains = 1) # we don't recommend sampling just 1 chain; the default is 4


Mixed effects models with eDNA_lmer()


To fit a model with one or more random effect(s), use the eDNA_lmer() function. Random effects are specified using the same syntax as the lme4 package, e.g. (1|random effect):

d = eDNA_data # create a copy to modify 
d$Year = factor(sample(2018:2020, size = nrow(d), replace = TRUE)) # create a random variable

model_fit2 = eDNA_lmer(Cq ~ Distance_m + Volume_mL + (1|Year),
                       data = d,
                       std_curve_alpha = 21.2, std_curve_beta = -1.5,
                       seed = 1234) 

Summarizing and plotting model output

As with the simulation objects, the model results can be summarized or plotted with default methods using summary() and plot(), or converted to a data.frame object for further manipulation.

summary(model_fit)

plot(model_fit, pars = c("intercept", "betas"))


Matching lme4 convention, random effects are not included in the default summary() output. You can view or plot the random effects estimates by subsetting the stanfit slot of the model object with @, and specifying the random_betas parameters with the pars argument:

rstan::summary(model_fit2@stanfit, pars = "rand_betas", probs = c(0.50, 0.025, 0.975))$summary
plot(model_fit2, pars = "rand_betas")

Similar to lme4, there is a ranef method for eDNA_model objects

ranef(model_fit2)


Further notes on modeling

Because the models implemented in artemis are Bayesian, you will get the most out of their results when you can work with and summarize posterior probabilities. Some helpful resources for this are the Stan User’s Guide, and the stanfit objects vignette from the rstan package.