# install.packages("RcppTskit")
test <- require(RcppTskit)
#> Loading required package: RcppTskit
if (!test) {
message("RcppTskit not available; skipping vignette execution.")
knitr::opts_chunk$set(eval = FALSE)
}The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
2026-02-22
This vignette introduces working with tree sequences in R using the RcppTskit package. RcppTskit provides R access to the tskit C application programming interface (API) (Jeffery et al. 2026) (https://tskit.dev/tskit/docs/stable/c-api.html). If you are new to tree sequences and the broader concept of ancestral recombination graphs (ARGs), see Brandt et al. (2024), Lewanski, Grundler, and Bradburd (2024), Nielsen, Vaughn, and Deng (2024), and Wong et al. (2024). Before showing how to use RcppTskit, we summarise the now extensive tree sequence ecosystem, because this has shaped the aim and design of RcppTskit. We then highlight the aims of RcppTskit, describe the implemented data and class model, and show four typical use cases.
As summarised below, Python is the most widely used environment for working with tree sequences. Using the R package reticulate (Ushey, Allaire, and Tang 2025) (https://rstudio.github.io/reticulate/), most R users can and should leverage the large ecosystem of Python packages, in particular the popular tskit Python API (Jeffery et al. 2026) (https://tskit.dev/tskit/docs/stable/python-api.html). With this in mind, RcppTskit is primarily geared towards providing R access to the tskit C API (Jeffery et al. 2026), for cases where the reticulate option is not optimal; for example, high-performance or low-level work with tree sequences. As a result, RcppTskit currently provides a limited set of R functions because the Python API (and reticulate) already covers most needs. As the name suggests, RcppTskit leverages the R package Rcpp (Eddelbuettel et al. 2026) (https://www.rcpp.org), which significantly lowers the barrier to using C++ in R. However, we still need to write C++ wrappers and expose them to R, so we recommend using reticulate first. The implemented R functions in RcppTskit closely mimic tskit Python functions to streamline the use of both the R and Python APIs.
The tree sequence ecosystem is rapidly evolving. The website https://tskit.dev/software/ lists tools that closely interoperate with tskit, while Jeffery et al. (2026) lists additional tools that depend on tskit functionality. Consequently, there are now many tools for the generation and analysis of tree sequences. Below is a quick summary of some of the tools relevant to RcppTskit as of January 2026.
tskit (https://tskit.dev/tskit/docs/, https://github.com/tskit-dev/tskit) is the core toolkit for working with tree sequences. It has an efficient C API and user-friendly Python API. The Python API is a popular entry point for most users and extends the C API in some aspects (for example, metadata encoding/decoding). There is also a Rust API that wraps the C API.
msprime (https://tskit.dev/msprime/docs/, https://github.com/tskit-dev/msprime) generates tree sequences with backward-in-time simulation. It has a Python API and command line interface.
SLiM (https://messerlab.org/slim/, https://github.com/MesserLab/SLiM) generates tree sequences with forward-in-time simulation. It is written in C++ (with embedded tskit C library) and has both a command line and a graphical user interface. Its tree sequence recording is described in detail at https://github.com/MesserLab/SLiM/blob/master/treerec/implementation.md.
pyslim (https://tskit.dev/pyslim/docs/, https://github.com/tskit-dev/pyslim) provides a Python API for reading and modifying SLiM tree sequences, or adapting tree sequences from other programs (e.g., msprime) for use in SLiM.
fwdpy11 (https://molpopgen.github.io/fwdpy11/, https://github.com/molpopgen/fwdpy11) generates tree sequences with forward-in-time simulation. Its Python API is built on a C++ API (fwdpp).
stdpopsim (https://popsim-consortium.github.io/stdpopsim-docs/, https://github.com/popsim-consortium/stdpopsim) is a standard library of population genetic models used in simulations with msprime and SLiM. It has a Python API and command line interface.
slendr (https://bodkan.net/slendr/, https://github.com/bodkan/slendr) is an R package for describing population genetic models, simulating them with either msprime or SLiM, and analysing the resulting tree sequences using tskit.
slimr (https://rdinnager.github.io/slimr/, https://github.com/rdinnager/slimr) provides an R API for specifying and running SLiM scripts and analysing results in R. It runs SLiM via the R package processx.
The above tools enable work with tree sequences and/or generate them via simulation. There is a growing list of tools that estimate ARGs from observed genomic data and can export them in the tree sequence file format. Notable examples include: tsinfer (https://tskit.dev/tsinfer/docs/, https://github.com/tskit-dev/tsinfer), Relate (https://myersgroup.github.io/relate/, https://github.com/MyersGroup/relate), SINGER (https://github.com/popgenmethods/SINGER), ARGNeedle (https://palamaralab.github.io/software/argneedle/, https://github.com/PalamaraLab/arg-needle-lib), and Threads (https://palamaralab.github.io/software/threads/, https://github.com/palamaraLab/threads).
As described above, the tree sequence ecosystem is extensive. Python is the most widely used platform to interact with tree sequences, with comprehensive packages for simulation and analysis.
There is interest in working with tree sequences in R. Because we can call Python from within R using the reticulate R package, there is no pressing need for a dedicated R API for work with tree sequences. See https://tskit.dev/tutorials/tskitr.html for an example of this approach. This keeps the community focused on the Python collection of packages. While there are differences between Python and R, many R users should be able to follow the extensive Python API documentation, examples, and tutorials listed above, especially those at https://tskit.dev/tutorials/.
To provide an idiomatic R interface to some population genetic simulation steps and operations with tree sequences, slendr implements bespoke functions and wrappers to interact with msprime, SLiM, and tskit. It uses reticulate to interact with the Python APIs of these packages, which further lowers barriers for R users to work with tree sequences.
One downside of using reticulate is the overhead of calling Python functions. This overhead is minimal for most analyses because a user calls a few Python functions, which do all the work (including loops) on the Python side, which often call the tskit C API. However, the overhead can be limiting for repeated calls between R and Python, such as calling Python functions from within an R loop, say to record a tree sequence in a multi-generation simulation with many individuals.
RcppTskitGiven the current tree sequence ecosystem, the aims of the RcppTskit package are to provide an easy-to-install R package that supports users in four typical cases. The authors are open to expanding this scope of RcppTskit depending on user demand and engagement. The four typical cases are:
Load a tree sequence1 into R and summarise it,
Pass a tree sequence2 between R and reticulate or standard Python,
Call the tskit C API from C++ in an R session or script, and
Call the tskit C API from C++ in another R package.
Examples for all of these cases are provided below after we describe the implemented data and class model.
RcppTskit represents a tree sequence as a lightweight R6 object of class TreeSequence. The R6 class was chosen in part so that TreeSequence method calls in R resemble the tskit Python API, particularly when compared to reticulate Python. TreeSequence wraps an external pointer (externalptr) to the tskit C object structure tsk_treeseq_t. Most methods (for example, ts$num_individuals() and ts$dump()) call the tskit C API via Rcpp, so the calls are fast. The underlying pointer is exposed as TreeSequence$pointer for developers and advanced users who work with C++. In C++, the pointer has type RcppTskit_treeseq_xptr, and the tree sequence memory is released by the Rcpp::XPtr finaliser when the pointer is garbage-collected in R.
RcppTskit also provides a lightweight TableCollection R6 class, which wraps an an external pointer to the tskit C object structure tsk_table_collection_t. In C++, the pointer has type RcppTskit_table_collection_xptr with the same memory management as RcppTskit_treeseq_xptr. While tsk_treeseq_t is an immutable object, tsk_table_collection_t is a mutable object, which can be edited. No R functions for editing are implemented to date, so all editing should happen in C++ or Python.
First install RcppTskit from CRAN and load it.
# install.packages("RcppTskit")
test <- require(RcppTskit)
#> Loading required package: RcppTskit
if (!test) {
message("RcppTskit not available; skipping vignette execution.")
knitr::opts_chunk$set(eval = FALSE)
}# Load a tree sequence
ts_file <- system.file("examples/test.trees", package = "RcppTskit")
ts <- ts_load(ts_file)
methods::is(ts)
#> [1] "TreeSequence"
# Print the summary of the tree sequence
ts$print()
# ts # the same as above
ts$num_individuals()
#> [1] 8
# Access the table collection
tc <- ts$dump_tables()
tc$print()
# Convert the table collection to tree sequence
ts2 <- tc$tree_sequence()
# Explore the help pages
help(package = "RcppTskit")# Tree sequence in R
ts_file <- system.file("examples/test.trees", package = "RcppTskit")
ts <- ts_load(ts_file)
# If you now want to use the tskit Python API via reticulate, use
tskit <- get_tskit_py()
#> Python module tskit is not available. Attempting to install it ...
if (check_tskit_py(tskit)) {
ts_py <- ts$r_to_py()
# ... continue in reticulate Python ...
ts_py$num_individuals # 8
ts2_py = ts_py$simplify(samples = c(0L, 1L, 2L, 3L))
ts2_py$num_individuals # 2
ts2_py$num_nodes # 8
ts2_py$tables$nodes$time # 0.0 ... 5.0093910
# ... and to bring it back to R, use ...
ts2 <- ts_py_to_r(ts2_py)
ts2$num_individuals() # 2
}
#> [1] 2
# If you prefer standard (non-reticulate) Python, use
ts_file <- tempfile()
print(ts_file)
#> [1] "/var/folders/f_/58r_f9r97t3gmvlmt3_l834m0000gq/T//RtmpJazHRM/file11cbc17a99ebd"
ts$dump(file = ts_file)
# ... continue in standard Python ...
# import tskit
# ts = tskit.load("insert_ts_file_path_here")
# ts.num_individuals # 8
# ts2 = ts.simplify(samples = [0, 1, 2, 3])
# ts2.num_individuals # 2
# ts2.dump("insert_ts_file_path_here")
# ... and to bring it back to R, use ...
ts2 <- ts_load(ts_file)
ts$num_individuals() # 2 (if you have run the above Python code)
#> [1] 8
file.remove(ts_file)
#> [1] TRUE
# You can work similarly with table collection between R & Python
tc <- ts$dump_tables()
if (check_tskit_py(tskit)) {
tc_py <- tc$r_to_py()
# ... continue in reticulate Python ...
tmp <- tc_py$simplify(samples = c(0L, 1L, 2L, 3L))
tmp
tc_py$individuals$num_rows # 2
tc_py$nodes$num_rows # 8
tc_py$nodes$time # 0.0 ... 5.0093910
# ... and to bring it back to R, use ...
tc2 <- tc_py_to_r(tc_py)
tc2$print()
}tskit C API from C++ in an R session or script# Write a C++ function as a multi-line character string
codeString <- '
#include <RcppTskit.hpp>
int ts_num_individuals(SEXP ts) {
RcppTskit_treeseq_xptr ts_xptr(ts);
return (int) tsk_treeseq_get_num_individuals(ts_xptr);
}'
# Compile the C++ function
ts_num_individuals2 <- Rcpp::cppFunction(
code = codeString,
depends = "RcppTskit",
plugins = "RcppTskit"
)
# We must specify both the `depends` and `plugins` arguments!
# Load a tree sequence
ts_file <- system.file("examples/test.trees", package = "RcppTskit")
ts <- ts_load(ts_file)
# Apply the compiled function
# (on the pointer)
ts_num_individuals2(ts$pointer)
#> [1] 8
# An identical RcppTskit implementation
# (available as the method of the TreeSequence class)
ts$num_individuals()
#> [1] 8tskit C API from C++ in another R packageTo call the tskit C API in your own R package via Rcpp you can leverage RcppTskit, which simplifies installation and provides the linking flags you need. To do this, follow the steps below and check how these are implemented in the demo R package RcppTskitTestLinkingTo at https://github.com/HighlanderLab/RcppTskitTestLinking.
Open the DESCRIPTION file and add RcppTskit to the Imports: and LinkingTo: fields, and add Rcpp to the LinkingTo: field.
Create R/YourPackage-package.R and add at minimum: #' @import RcppTskit in one line and "_PACKAGE" in another line, assuming you use devtools to manage your package NAMESPACE imports.
Add #include <RcppTskit.hpp> as needed to your C++ header files in src directory.
Add // [[Rcpp::depends(RcppTskit)]] to your C++ files in src directory.
Add // [[Rcpp::plugins(RcppTskit)]] to your C++ files in src directory.
Call the RcppTskit C++ API and the tskit C API as needed in src directory.
Configure your package build to link against the RcppTskit library with the following steps:
Add src/Makevars.in and src/Makevars.win.in files with the PKG_LIBS = @RCPPTSKIT_LIB@ line, in addition to any other flags.
Add tools/configure.R file, which will replace @RCPPTSKIT_LIB@ in src/Makevars.in and src/Makevars.win.in with the installed RcppTskit library file (including appropriate flags), and generate src/Makevars and src/Makevars.win.
Add configure and configure.win scripts (and make them executable) to call tools/configure.R.
Add cleanup and cleanup.win scripts (and make them executable) to remove src/Makevars and src/Makevars.win as well as other build artefacts.
devtools::build(), devtools::check(), and devtools::install() or their R CMD equivalents.Here is code you can run to check out RcppTskitTestLinkingTo:
# Install and load the demo package
remotes::install_github("HighlanderLab/RcppTskitTestLinking")
library(RcppTskitTestLinking)
# Check the demo function
print(ts_num_individuals_ptr2)
# Example tree sequence
ts_file <- system.file("examples", "test.trees", package = "RcppTskit")
ts <- ts_load(ts_file)
# Function from RcppTskit (working with TreeSequence R6 class)
ts$num_individuals()
# Function from RcppTskitTestLinking (working with externalptr)
ts_num_individuals_ptr2(ts$pointer)RcppTskit provides R access to the tskit C API with a simple installation and a lightweight interface. It provides a limited number of R functions because most users can and should use reticulate to call the tskit Python API from R. The implemented R functions closely mimic tskit Python functions to streamline the use of both the R and Python APIs. When this option is not optimal, developers and advanced users can call the tskit C API via Rcpp.
sessionInfo()
#> R version 4.5.0 (2025-04-11)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sequoia 15.7.4
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Europe/Oslo
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] RcppTskit_0.2.0
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 R6_2.6.1 fastmap_1.2.0 xfun_0.53
#> [5] Matrix_1.7-3 lattice_0.22-6 reticulate_1.44.1 glue_1.8.0
#> [9] knitr_1.50 htmltools_0.5.9 png_0.1-8 rmarkdown_2.30
#> [13] lifecycle_1.0.4 cli_3.6.5 vctrs_0.6.5 grid_4.5.0
#> [17] withr_3.0.2 compiler_4.5.0 rprojroot_2.1.1 here_1.0.2
#> [21] tools_4.5.0 pillar_1.11.1 evaluate_1.0.5 Rcpp_1.1.1
#> [25] yaml_2.3.10 rlang_1.1.7 jsonlite_2.0.0These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.