Stable Isotope Mixing Models in R with simmr

Introduction

simmr is a package designed to solve mixing equations for stable isotopic data within a Bayesian framework. This guide is designed to get researchers up and running with the package as quickly as possible. No expertise is required in the use of R.

simmr is designed as an upgrade to the SIAR package and contains many of the same features. This new version contains a slightly more sophisticated mixing model, a simpler user interface, and more advanced plotting features. The key differences between SIAR and simmr are:

We assume that you have a sound working knowledge of stable isotopic mixing models, and the assumptions and potential pitfalls associated with these models. A list of required reading is presented in Appendix A of this guide. We strongly recommend that all users read these papers.

We assume that if you have got this far you have installed R. We also recommend installing Rstudio as this provides a very neat interface to use R and simmr. The instructions below all assume you are using Rstudio.

Installation of the simmr package

The simmr package uses the JAGS (Just Another Gibbs Sampler) programmer to run the stable isotope mixing model. Before you install simmr, visit the JAGS website and download and install JAGS for your operating system.

Next, start Rstudio and find the window with the command prompt (the symbol >). Type

install.packages('simmr')

It may ask you to pick your nearest CRAN mirror (the nearest site which hosts R packages). You will then see some activity on the screen as the simmr package and the other packages it uses are downloaded. The final line should then read:

package 'simmr' successfully unpacked and MD5 sums checked

You then need to load the package. Type

library(simmr)

This will load the simmr package and all the associated packages. You’ll need to type the library(simmr) command every time you start R. If you haven't installed JAGS properly you will be informed at this point.

Running simmr

Before getting started there are a couple of points to consider.

Working with scripts

The best way to use the simmr package is by creating scripts. A script can be created in Rstudio by clicking File > New File > Rscript. This opens a text window which allows commands to be typed in order and saved. The command can be sent to the command prompt (which Rstudio calls the Console) by highlighting the command and clicking Run (or going to Code > Run Lines). There are also keyboard shortcuts to speed up the process. We strongly recommend you learn to run R via scripts.

Data Structure.

Are you interested in determining the source proportions based on single or multiple data points? In most analyses you will most likely require some sort of grouping variable. The different groups could represent for example:

Structure of the input files

The general structure for running simmr is as follows:

1 Call simmr_load on the data to get it into the right format 2 Plot the data in isotope space ('iso-space') using plot 3 Run the mixing model with simmr_mcmc 4 Explore the results with plot and summary

Step 1: Getting the data into simmr

simmr requires at minimum 3 input objects; the consumers or mixtures, the source means, and the source standard deviations. Optionally, you can also add correction data (also called trophic enrichment factors, TEFs) represented again as means and standard deviations, and concentration dependence values. __The easiest way to get these objects into simmr is to create an Excel or similar file creating data and then copying it across as in the following example:

mix = matrix(c(-10.13, -10.72, -11.39, -11.18, -10.81, -10.7, -10.54, 
-10.48, -9.93, -9.37, 11.59, 11.01, 10.59, 10.97, 11.52, 11.89, 
11.73, 10.89, 11.05, 12.3), ncol=2, nrow=10)
colnames(mix) = c('d13C','d15N')
s_names=c('Source A','Source B','Source C','Source D')
s_means = matrix(c(-14, -15.1, -11.03, -14.44, 3.06, 7.05, 13.72, 5.96), ncol=2, nrow=4)
s_sds = matrix(c(0.48, 0.38, 0.48, 0.43, 0.46, 0.39, 0.42, 0.48), ncol=2, nrow=4)
c_means = matrix(c(2.63, 1.59, 3.41, 3.04, 3.28, 2.34, 2.14, 2.36), ncol=2, nrow=4)
c_sds = matrix(c(0.41, 0.44, 0.34, 0.46, 0.46, 0.48, 0.46, 0.66), ncol=2, nrow=4)
conc = matrix(c(0.02, 0.1, 0.12, 0.04, 0.02, 0.1, 0.09, 0.05), ncol=2, nrow=4)

The mix object above contains the stable isotopic data for the consumers and should consist of 2 columns. The first column contains the data for isotope 1, and the second the data for isotope 2. Any number of isotopes and observations can be used. Whatever the size, the object needs to be a matrix. It is recommended but not necessary to give the mixtures column names representing the isotopes to which each column corresponds.

The source names are provided in the s_names object, and the source means and standard deviations in s_means and s_sds. These latter objects must also be matrices, where the number of rows is the number of sources, and the number of columns the number of isotopes.

The correction data is stored in c_means and c_sds. Again this should be a matrix of the same dimension as s_means and s_sds. Finally the concentration dependencies (i.e. the elemental concentration values) are included as conc.

To load the data into simmr, use:

simmr_in = simmr_load(mixtures=mix,
                     source_names=s_names,
                     source_means=s_means,
                     source_sds=s_sds,
                     correction_means=c_means,
                     correction_sds=c_sds,
                     concentration_means = conc)

Note that the correction_means, correction_sds, and concentration_means are optional.

Step 2: Plotting the data in iso-space

We can now plot the raw isotopic data with:

plot(simmr_in)

plot of chunk unnamed-chunk-6

This will produce a biplot with the isotope that is in the first column on the x-axis, and the isotope in the second column on the y-axis. You can make the plot slightly nicer with some extra arguments:

plot(simmr_in,xlab=expression(paste(delta^13, "C (\u2030)",sep="")), 
     ylab=expression(paste(delta^15, "N (\u2030)",sep="")), 
     title='Isospace plot of example data')

plot of chunk unnamed-chunk-7

See the help file help(plot.simmr_input) for more options on the plotting commands, including the ability to plot different tracers/isotopes when there are more than 2 isotopes.

If all the mixtures lie inside the mixing polygon defined by the sources, then the data are acceptable for running simmr. See Philips et al 2015, Canadian Journal of Zoology for more details on when data are suitable for running through a mixing model.

Step 3: running simmr

The next step is to actually run the model. This is achieved with the command:

simmr_out = simmr_mcmc(simmr_in)

This command takes the object simmr_in we created earlier and uses it as input for the model. It tells simmr to store the output from the model run in an object called simmr_out.

The model will take between a few seconds to a few minutes to run depending on the number of sources, isotopes, observations, and the speed of the computer you are using. The progress of the model is displayed on the command line window, which shows the percentage complete.

Markov chain Monte Carlo (MCMC) works by repeatedly guessing the values of the dietary proportions and find those which fit the data best. The initial guesses are usually poor and are discarded as part of an initial phase known as the burn-in. Subsequent iterations are then stored and used for the posterior distribution; the best estimates of the dietary proportions given the data and the model. Because it can take many thousands of iterations to move away from the initial guesses, convergence diagnostics can be created to check the model has run properly. In simmr this is done with:

summary(simmr_out,type='diagnostics')
## Gelman diagnostics - these values should all be close to 1.
## If not, try a longer run of simmr_mcmc.
##          Point est. Upper C.I.
## Source A          1       1.01
## Source B          1       1.00
## Source C          1       1.00
## Source D          1       1.00
## sd_d13C           1       1.00
## sd_d15N           1       1.00

If the model run has converged properly the values should be close to 1. If they are above 1.1, we recommend a longer run. See help(simmr_mcmc) for how to do this. The values in this in this example seem to have converged well.

Step 4: exploring the results

simmr produces both textual and graphical summaries of the model run. Starting with the textual summaries, we can get tables of the means and credible intervals (the Bayesian equivalent of a confidence interval) with:

summary(simmr_out,type='statistics')
##           mean    sd
## Source A 0.220 0.131
## Source B 0.285 0.079
## Source C 0.275 0.032
## Source D 0.219 0.121
## sd_d13C  0.550 0.235
## sd_d15N  0.426 0.227
summary(simmr_out,type='quantiles')
##           2.5%   25%   50%   75% 97.5%
## Source A 0.030 0.111 0.203 0.314 0.501
## Source B 0.145 0.230 0.281 0.335 0.457
## Source C 0.217 0.254 0.275 0.295 0.341
## Source D 0.030 0.121 0.208 0.302 0.471
## sd_d13C  0.192 0.396 0.515 0.665 1.114
## sd_d15N  0.046 0.272 0.403 0.551 0.945

These suggest that the dietary proportions for this model are quite uncertain. However we can see that the credible interval for source C is the narrowest, running from approximately 20% to 35% of the diet. The reason this one is the narrowest can be seen from the isospace plot - this source is the most clearly separated from the others.

simmr can also produce histograms, boxplots, density plots, and matrix plots of the output. Starting with the density plot:

plot(simmr_out,type='density')

plot of chunk unnamed-chunk-11

We can see that sources A and D are poorly constrained in comparison to sources B and especially C. Again this is unsurprising since the isospace plot indicated that these were the two most clearly separated sources.

The most useful output plot is the matrix plot

plot(simmr_out,type='matrix')

plot of chunk unnamed-chunk-12

This shows the source histograms on the diagonal, contour plots of the relationship between the sources on the upper diagonal, and the correlation between the sources on the lower diagonal. Large negative correlations indicate that the model cannot discern between the two sources; they may lie close together in iso-space. Large positive correlations are also possible when mixture data lie in a polygon consisting of multiple competing sources. Here the largest negative correlation is between source A and D. This is because they lie closest together in isospace.

Advanced use of simmr

Whilst the above gives an introduction to the basic functions of simmr, the package is open source and all code is open to editing. The two objects created as part of this vignette simmr_in and simmr_out are R lists. They can be explored with e.g.

str(simmr_in)

which will show their contents. The simmr_out object in particular allows for full access to all of the posterior dietary proportion samples. We can calculate for example the mean of the first dietary proportion:

mean(simmr_out$output[[1]][,1])
## [1] 0.2160584

Or we can find the probability that the posterior dietary proportion for source 1 is bigger than for source 2:

mean(simmr_out$output[[1]][,1]>simmr_out$output[[1]][,2])
## [1] 0.338

With more detailed R knowledge, it is possible to create scripts which run multiple data sets and compare dietary proportions across groups. See the help file help(simmr_mcmc) for some other more complex examples.

Appendix - suggested reading

For the maths on the original SIAR model: Andrew C Parnell, Richard Inger, Stuart Bearhop, and Andrew L Jackson. Source partitioning using stable isotopes: coping with too much variation. PLoS ONE, 5(3):5, 2010.

For the maths behind the more advanced JAGS models: Andrew C. Parnell, Donald L. Phillips, Stuart Bearhop, Brice X. Semmens, Eric J. Ward, Jonathan W. Moore, Andrew L. Jackson, Jonathan Grey, David J. Kelly, and Richard Inger. Bayesian stable isotope mixing models. Environmetrics, 24(6):387–399, 2013.

For some good advice about mixing models: Donald L Phillips, Richard Inger, Stuart Bearhop, Andrew L Jackson, Jonathan W Moore, Andrew C Parnell, Brice X Semmens, and Eric J Ward. Best practices for use of stable isotope mixing models in food-web studies. Canadian Journal of Zoology, 92(10):823–835, 2014.