StochKit2R User Guide

Kevin R. Sanft January 2015

This document provides usage instructions for StochKit2R.

Introduction

StochKit2R is an R version of StochKit2.0, a software package for discrete stochastic simulation of biochemical systems using Gillespie’s Stochastic Simulation Algorithm (SSA) and tau-leaping [1-3]. It is designed to support much of the functionality of StochKit2.0. One limitation of StochKit2R is that it does not accept input model files that contain non-mass-action (customized) propensity functions or events (this is due to the platform-specific compilation that is required; this functionality might be added to future versions of StochKit2R).

StochKit2R was written by and is maintained by Kevin R. Sanft (kevin@kevinsanft.com). StochKit2.0 was written by Kevin R. Sanft, Sheng Wu, Min Roh, Jin Fu, Rone Kwei Lim, and others from Professor Linda Petzold's research group at UC Santa Barbara.

Citing StochKit2R

If you use StochKit2R for simulations or analysis used in a publication or presentation, please cite the software. Until a StochKit2R paper is published, please cite the StochKit2 paper:

Kevin R. Sanft, Sheng Wu, Min Roh, Jin Fu, Rone Kwei Lim, and Linda R. Petzold. (2011). StochKit2: software for discrete stochastic simulation of biochemical systems with events. Bioinformatics. 27(17), pp. 2457-2458.

You can also get this information by using the R citation function:

citation(package="StochKit2R")

Acknowledgments

This work was supported by NSF grant DMS-1045015 (through St. Olaf College’s Center for Interdisciplinary Research). I would like to thank Thomas Scott for his work on an early version of this package. I also thank Linda Petzold and her research group from the University of California, Santa Barbara.

Installation

As of this writing, I anticipate StochKit2R will soon be available on CRAN, the Comprehensive R Archive Network (http://cran.r-project.org/). As soon as StochKit2R is available on CRAN, you should be able to install it from the R console:

install.packages("StochKit2R")

Then load and attach the package with:

library(StochKit2R)

Advanced Mac OS X users may want to view the additional instructions below on installation for high performance.

Installing from source (Mac OS X, Linux/Unix) and Windows binaries

Note: Installing from source will not install the dependencies. Therefore, you must install the “ggplot2”, “reshape”, “Rcpp”, “BH”, and “XML” packages (via install.packages) prior to installing StochKit2R.

Mac OS X and Linux/Unix

To install from source on Mac OS X, you must install XCode and the Command Line Tools before proceeding. Advanced Mac OS X users may want to view the additional instructions below on installation for high performance.

You can install from the source file StochKit2R_<version>.tar.gz, available at http://www.kevinsanft.com/StochKit2R, on Linux and Mac OS X by running the following at a terminal (not the R console) from within the directory of the .tar.gz file: R CMD INSTALL StochKit2R_<version>.tar.gz

Or, you can run the following command in the R console:

install.packages("<path to>/StochKit2R_<version>.tar.gz",repos=NULL,type="source")

Where you replace <path to> with the path to the .tar.gz file and <version> is the version number in the .tar.gz file name.

Advanced Mac OS X installation for high performance

As of this writing, the default compiler on recent versions of OS X does not support OpenMP. (The gcc/g++ that comes with XCode's Command Line Tools is a wrapper for clang/clang++, not GNU GCC.) StochKit2R uses OpenMP to run simulations in parallel. Without OpenMP, StochKit2R uses only one processor. Advanced users can follow these instructions to enable parallel ensemble simulations on Mac OS X.

These instructions use GNU GCC. I tested it with GNU GCC 4.9. It may be possible to do this with the experimental version of Clang here: http://clang-omp.github.io/, but I have not tested it.

  1. Install GNU GCC. (XCode's gcc and g++ are symbolic links to clang and clang++, respectively.) This can probably be done through homebrew, macports, or from source.
  2. Download StochKit2R_<version>.tar.gz from http://www.kevinsanft.com/StochKit2R
  3. Unpack the StochKit2R .tar.gz file. In the file StochKit2R/src/Makevars, replace the string “$(SHLIB_OPENMP_CXXFLAGS)” with “–fopenmp” (two occurrences).
  4. Create or edit your ~/.R/Makevars file to contain the line:
    CXX=<path to g++>
    If an existing CXX line exists, replace it with the above line, where you should replace <path to g++> with the full path to the GNU g++. For example, on my system, it would be:
    CXX=/usr/local/bin/g++
  5. In R, set the working directory to the unpacked StochKit2R directory with the setwd function.
  6. In R, run devtools::load_all(recompile=T, export_all=F). You will need to install the devtools package first, if you haven’t already.

If everything worked correctly, you may see some warning messages but you should see no errors. You should also be able to scroll up to view the compilation commands to confirm that g++ and –fopenmp were used in the compile commands.

Windows

Windows binary packages are available. On Windows, download StochKit2R_<version>.zip from http://www.kevinsanft.com/StochKit2R.

To install the package from .zip file:

Running a simulation

StochKit2R contains three primary simulation functions: ssa, ssaSingle, and tauLeaping.

ssa and tauLeaping run ensembles (multiple simulation trajectories) and output data at regularly spaced intervals. They have nearly identical function calls. Consult the help pages for information. Here are the arguments:

An example of running the ssa function:

# use the dimer_decay.xml model that is included with StochKit2R
# to make things easier to read, store the path in a variable
model <- system.file("dimer_decay.xml",package="StochKit2R")
# now, run a simulation:
ssa(model,"ex_out",10,100,20)

The above function will run 100 simulations to time 10 using 20 output intervals. Output will be stored in a new directory named ex_out which will be created in the current working directory. The above example uses the dimer_decay.xml file included with the StochKit2R distribution. This file is also available for download at http://www.kevinsanft.com/StochKit2R/dimer_decay.xml.

In the more typical case, the model file will be stored elsewhere in your file system. For example:

ssa("~/Desktop/dimer_decay.xml","~/Desktop/dimer_decay_output",10,100,20)

The file dimer_decay.xml must exist in the ~/Desktop directory. (This corresponds to a machine-specific full path such as “/Users/kevinsanft/Desktop/dimer_decay.xml”, which might be required on some systems. On Windows, “~” usually corresponds to the user's Documents folder, so ~/Desktop would not be a valid path.) Output will be stored in a new directory named dimer_decay_output which will be created in the Desktop directory. If dimer_decay_output already exists, the ssa function will exit with an error saying that the directory already exists. To overwrite an existing directory, use force=TRUE:

ssa("~/Desktop/dimer_decay.xml","~/Desktop/dimer_decay_output",10,100,20,force=T)

By default, the simulation will create an output directory named “stats” within the outputDir. The stats directory will contain two files: means.txt and variances.txt, containing the means and variances of the species at the output intervals. If keepTrajectories or keepHistograms is TRUE, a directory named trajectories or histograms, respectively, will be created within the output directory.

ssaSingle runs a single trajectory and outputs a row of data for every reaction event that occurs to a file. The function takes as arguments: a path to a StochKit2 .xml model, a path to an output file name, a start time (you will usually use 0 here), and an end time. Here is the example from the documentation, using the dimer_decay.xml model that is included with the StochKit2R package:

ssaSingle(system.file("dimer_decay.xml",package="StochKit2R"),"single_output.txt",0,10)

The above command will run one realization from t=0 to t=10 and write the output to single_output.txt. Note: the file will contain about 25,000 lines, one line for each reaction event (plus one line for the initial population and one line for the end time).

Plotting

StochKit2R includes four plotting functions for visualizing simulation output data from ssa and tauLeaping (these are not designed to work with ssaSingle data).

Consult the R function documentation (e.g. ?plotStats) for details. Here an example of the plotStats function, which plots the means and +/- one standard deviation curves.

library(StochKit2R)
#example using included dimer_decay.xml file
model <- system.file("dimer_decay.xml",package="StochKit2R")
#output written to ex_out directory (created in current working directory)
ssa(model,"ex_out",10,100,20,force=TRUE)
#plot the data for species 2 and 3 (all of them in the dimer decay model)
plotStats("ex_out/stats",c(2,3))

plotStats figure available at: http://kevinsanft.com/StochKit2R/plotStats_example.png

This example has relatively small variances, so the +/- standard deviation curves are very close to the mean curves.

Model definition file format

StochKit2R simulations use models in the StochKit2 xml model definition format. The StochKit2 model definition format uses tags to specify the description of a biochemical model in an xml format.

One sample model file, dimer_decay.xml, is included with the StochKit2R package. You can access it with the R command: system.file(“dimer_decay.xml”,package=“StochKit2R”)

To copy this file to a location of your choice, you can use R’s file.copy function. For simplicity, first copy the dimer_decay.xml path into a variable, then call file.copy:

model <- system.file("dimer_decay.xml",package="StochKit2R")
file.copy(model,"~/Desktop/dimer_decay.xml")

Again, “~/Desktop/dimer_decay.xml” must be a valid path; replace the destination with the appropriate path on your system. (On Windows, the dimer_decay.xml file may be difficult to read and edit due in some viewers and editors due to the way Windows handles newlines in text files.)

You can view dimer_decay.xml in a text editor. Here, I will view it in R using the XML package:

model <- system.file("dimer_decay.xml",package="StochKit2R")
XML::xmlParse(model)
## <?xml version="1.0"?>
## <!-- Stochkit 2.0 text input format 
## 
## reference:
## M. Rathinam, L.R. Petzold, Y. Cao, and D.T. Gillespie. J Chem Phys 119, 12784 (2003).
## -->
## <Model>
##   <Description>Dimer Decay</Description>
##   <NumberOfReactions>4</NumberOfReactions>
##   <NumberOfSpecies>3</NumberOfSpecies>
##   <ParametersList>
##     <Parameter>
##       <Id>c1</Id>
##       <Expression>1.0</Expression>
##     </Parameter>
##     <Parameter>
##       <Id>c2</Id>
##       <Expression>0.002</Expression>
##     </Parameter>
##     <Parameter>
##       <Id>c3</Id>
##       <Expression>0.5</Expression>
##     </Parameter>
##     <Parameter>
##       <Id>c4</Id>
##       <Expression>0.04</Expression>
##     </Parameter>
##   </ParametersList>
##   <ReactionsList>
##     <Reaction>
##       <Id>R1</Id>
##       <Description> S1 -&gt; null </Description>
##       <Type>mass-action</Type>
##       <Rate>c1</Rate>
##       <Reactants>
##         <SpeciesReference id="S1" stoichiometry="1"/>
##       </Reactants>
##       <Products>
##        </Products>
##       <!-- no product -->
##     </Reaction>
##     <Reaction>
##       <Id>R2</Id>
##       <Description> 2 * S1 -&gt; S2 </Description>
##       <Type>mass-action</Type>
##       <Rate>c2</Rate>
##       <Reactants>
##         <SpeciesReference id="S1" stoichiometry="2"/>
##       </Reactants>
##       <Products>
##         <SpeciesReference id="S2" stoichiometry="1"/>
##       </Products>
##     </Reaction>
##     <Reaction>
##       <Id>R3</Id>
##       <Description> S2 -&gt; 2 * S1 </Description>
##       <Type>mass-action</Type>
##       <Rate>c3</Rate>
##       <Reactants>
##         <SpeciesReference id="S2" stoichiometry="1"/>
##       </Reactants>
##       <Products>
##         <SpeciesReference id="S1" stoichiometry="2"/>
##       </Products>
##     </Reaction>
##     <Reaction>
##       <Id>R4</Id>
##       <Description> S2 -&gt; S3 </Description>
##       <Type>mass-action</Type>
##       <Rate>c4</Rate>
##       <Reactants>
##         <SpeciesReference id="S2" stoichiometry="1"/>
##       </Reactants>
##       <Products>
##         <SpeciesReference id="S3" stoichiometry="1"/>
##       </Products>
##     </Reaction>
##   </ReactionsList>
##   <SpeciesList>
##     <Species>
##       <Id>S1</Id>
##       <Description>Species #1</Description>
##       <InitialPopulation>10000</InitialPopulation>
##     </Species>
##     <Species>
##       <Id>S2</Id>
##       <Description>Species #2</Description>
##       <InitialPopulation>0</InitialPopulation>
##     </Species>
##     <Species>
##       <Id>S3</Id>
##       <Description>Species #3</Description>
##       <InitialPopulation>0</InitialPopulation>
##     </Species>
##   </SpeciesList>
## </Model>
## 

The file might be intimidating at first, but it is quite simple! (Ignore the “##” preceeding the lines as this is generated by R's XML package.)

The first tag, “<Model>” indicates the beginning of the model definition. The closing tag </Model> at the end indicates the end of the model definition. Most xml tags must be matched by a closing tag. The <NumberOfReactions> and <NumberOfSpecies> tags are required and specify the number of reactions and species, respectively, in the model. The “ReactionsList” tag encloses one or more (in this case only one) “<Reaction>” elements. A Reaction must contain an “<Id>” and a “<Type>”. In most cases, a Reaction contains a “<Reactants>” tag and a “<Products>” tag to specify the reactant and product species, respectively. In StochKit2R, the Reaction Type must be “mass-action”, and the “<Rate>” tag is required. The Rate is the stochastic reaction rate that is multiplied by the Reactants’ populations to calculate the propensity [1].

The above model definition file describes a model with 3 species and 4 reactions. Reaction R1 has one SpeciesReference tag within the Reactants and no Products. (Note the SpeciesReference does not use a closing tag, but instead ends the opening tag with “/>”) Since the stoichiometry of the Reactant is 1 and there are no products, Reaction R1 describes species S1 decaying into “nothing”.

NOTE: The rate constants in StochKit2 models are interpreted as stochastic rate constants (see Gillespie 1977). For enzymatic reactions where a species is both a reactant and a product, the species should appear in the Reactants and Products to ensure the mass‐action propensity is computed correctly.

The best way to create your own model is to copy and rename an existing model file, then open it in a text editor and modify it to suit your needs. Save it with a .xml extension.

The StochKit2 model definition supports (global) parameters as seen in the model above. Instead of putting a numeric value as the Rate, a parameter is used. Parameters are defined within the ParametersList tag.

NOTE: parameters cannot be functions of the species' populations.

Bugs

Please report bugs to kevin@kevinsanft.com. Please include a description of the error message, your R version, operating system, StochKit2R version, model file, and steps to recreate the error.

Note: StochKit2R currently does not support customized (non-mass-action) propensity functions or Events, so please do not report this as a bug.

License

StochKit2R is released under the GPL-3 (GPLv3) open source license (http://www.gnu.org/licenses/gpl-3.0.html).

Copyright

StochKit2R © 2015, Kevin R. Sanft and Linda R. Petzold.

Additional resources

You may also want to consult the StochKit2.0 (i.e. not StochKit2R) manual. StochKit2 can be downloaded from SourceForge.com. StochKit2.0 has some additional features, including being able to use non-mass-action propensity functions and plotting functions for Matlab.

References

  1. Gillespie, D.T. J. Phys. Chem. 81, 25, 2340 (1977).
  2. Gillespie, D.T. J. Chem. Phys. 115, 1716 (2001).
  3. Cao, Y., Gillespie, D.T., and Petzold, L.R. J. Chem. Phys. 124, 044109 (2006).